China DOS Union

-- Unite DOS · Advance DOS · Grow DOS --

Union site: www.cn-dos.net Forum site: www.cn-dos.net/forum
DOS stands for freedom, openness and progress. Let us work hard, learn from the openness and GNU spirit of FreeDOS and Linux, and together build and grow a free GNU GPL world!

中国DOS联盟论坛
The time now is 2026-06-22 17:40
中国DOS联盟论坛 » 其它操作系统综合讨论区 » Discussion about the problem of txt to htm is welcome here View 1,523 Replies 5
Original Poster Posted 2007-04-06 11:09 ·  中国 四川 遂宁 电信
中级用户
★★
Credits 278
Posts 103
Joined 2006-10-21 21:08
19-year member
UID 67562
Gender Male
Status Offline
I usually like to collect some text document materials. I originally wanted to make a batch processing for converting txt to htm, to convert a large amount of collected things into htm, and then compile them into CHM e-books. Unexpectedly, I was first stumped by the problem of converting txt to htm just when I started. I was going to post in the batch processing column, but thought this is not a DOS problem, so I posted here. I hope the moderator won't say I posted in the wrong place..

One method is that some e-book making software itself supports compiling txt, which maintains the original appearance of the text, but there is a fatal weakness that it cannot support full-text search. This is quite bad for larger CHM. Weakness two is that for lines that are longer in a single line, when browsing, it cannot wrap automatically, and you have to drag the horizontal scroll bar to finish reading.

There are also very few such small software on the Internet. There is one on Huajun, which is just a method of inserting text between <pre></pre>. The htm converted by this method can maintain the original style of txt, and after compiling into CHM, it can support full-text search, but it is also for lines that are longer in a single line, and when browsing, it cannot wrap automatically. It feels very inconvenient.

I actually used a clumsy method, that is, first replace all spaces in the text with &nbsp;, then add <br> at the beginning of each line, then put the entire text between <p></P>, and finally add the start and end codes, change the extension to htm. This first converts the text to htm and then compiles it into chm, which maintains the original style of the text, and can also wrap automatically for longer lines, and also supports full-text search. But there is one shortcoming, that is, the converted htm file is nearly twice as large as the original txt file (this is caused by the replacement of spaces, but this is the price to maintain the original style of the text). This is also not ideal for a large amount of text. Since I have limited knowledge in web pages, I hope that those who are more proficient can talk about the best method for converting text to htm.
The best method is simple, easy to implement, and overcomes the above shortcomings.

[ Last edited by zzhh612 on 2007-4-6 at 11:50 AM ]
Floor 2 Posted 2007-04-06 12:23 ·  中国 湖北 武汉 电信
版主
★★★★★
Credits 11,386
Posts 4,938
Joined 2006-07-23 17:10
19-year member
UID 59080
Status Offline

I'd like to give a thumbs up to the LZ's idea first!

LZ can search the forum, there were related discussions about txt2html before.

Also, you can post the bat code you wrote and let everyone help you improve it step by step.
Floor 3 Posted 2007-04-06 21:16 ·  中国 江苏 苏州 电信
银牌会员
★★★
Credits 2,227
Posts 790
Joined 2005-01-27 00:00
21-year member
UID 35703
Gender Male
Status Offline
Every 2 Western-style spaces can be replaced with 1 full-width Chinese character space, so that the volume is consistent. The entire file volume will be a little smaller than "spaces to &nbsp".
my major is english----my love is dos----my teacher is the buddha----my friends--how about U
Floor 4 Posted 2007-04-06 22:42 ·  中国 四川 成都 教育网
中级用户
★★
Credits 346
Posts 103
Joined 2004-04-06 00:00
22-year member
UID 21852
Gender Male
Status Offline
txt2html.vbs

'By est, electronicstar@126.com
'Copyright CN-DOS
set oSh=CreateObject("WScript.Shell")
If LCase(Right(WScript.FullName, 11)) = "wscript.exe" Then
oSh.Run "cmd /ccscript //NoLogo //e:vbscript " & WScript.ScriptFullName
WScript.Quit
End If

Set objDOM = CreateObject("htmlfile")
objDOM.write "<html><body><div id='targetDiv'></div></body></html>"
Set targetDiv=objDOM.getElementById("targetDiv")
targetDiv.innerText=WScript.StdIn.ReadAll
WScript.Echo objDOM.documentElement.outerHTML


Usage

cscript //nologo < D:\test.txt > D:\test.htm
Floor 5 Posted 2007-04-07 09:56 ·  中国 四川 遂宁 电信
中级用户
★★
Credits 278
Posts 103
Joined 2006-10-21 21:08
19-year member
UID 67562
Gender Male
Status Offline
The method upstairs is similar to mine, except that DIV tags are used instead of P. The disadvantage of this method is that when there are many spaces in the text format, the converted document will be much larger than the original document. I tried it, and a 40k text became 57k after being converted to htm. Although there is only a 17k difference, it is considerable for a large number of files. And using the PRE tag can basically keep the size from changing much, but it can't wrap lines automatically. Can we have both? Of course, it's not a strict requirement, just a discussion.
Floor 6 Posted 2007-04-07 13:20 ·  中国 四川 成都 教育网
中级用户
★★
Credits 346
Posts 103
Joined 2004-04-06 00:00
22-year member
UID 21852
Gender Male
Status Offline

But for a large number of files, it's quite noticeable

Then just zip them.

I'm too lazy to write the pre-version today, I'll come back tomorrow
Forum Jump: