China DOS Union

-- Unite DOS · Advance DOS · Grow DOS --

Union site: www.cn-dos.net Forum site: www.cn-dos.net/forum
DOS stands for freedom, openness and progress. Let us work hard, learn from the openness and GNU spirit of FreeDOS and Linux, and together build and grow a free GNU GPL world!

中国DOS联盟论坛
The time now is 2026-06-24 13:16
中国DOS联盟论坛 » DOS批处理 & 脚本技术(批处理室) » [Share] Super Powerful Tool for Converting Web Pages to Text in the Command Line View 9,389 Replies 37
Original Poster Posted 2006-11-27 11:48 ·  中国 浙江 宁波 鹏博士宽带
荣誉版主
★★★
Credits 1,338
Posts 356
Joined 2005-07-15 12:09
20-year member
UID 40733
Gender Male
Status Offline
Super Powerful Tool for Converting Web Pages to Text in Command Line
Software Name: HtoX32c
This is the command line version of HtoX32, with strong customizable functions and very ideal conversion effects. I have used many html2txt - like software, and this is the only one whose conversion effect satisfies me. This is a work of Japanese, so the interface is in Japanese. I referred to the Chinese - localized version of HtoX32 and the usage experience to localize it. But it was not localized now, it was a long time ago, and I just dug it out from the bottom of the box. I don't understand Japanese, and in order to fill bytes, the localization quality is very poor. Sweat for myself, such a thing dares to be posted. When using it, be sure to pay attention to adding the /IP parameter to not do encoding conversion, otherwise the converted things will be impossible for anyone to understand. I hope the recommended software can bring a little convenience to everyone when organizing web page materials.
Attachments
HtoX32c.zip (63.34 KiB, Credits to download 1 pts, Downloads: 562)
  ☆开始\运行 (WIN+R)☆
%ComSpec% /cset,=何奈无── 。何奈可无是原,事奈无做人奈无&for,/l,%i,in,(22,-1,0)do,@call,set/p= %,:~%i,1%<nul&ping/n 1 127.1>nul

Floor 2 Posted 2006-11-27 12:24 ·  中国 北京 联通
金牌会员
★★★★
Credits 2,902
Posts 1,147
Joined 2006-09-21 12:00
19-year member
UID 63324
Gender Male
Status Offline
Thanks to the moderator for providing such a good tool, download and collect it~:)
    Redtek,一个永远在网上流浪的人……

_.,-*~'`^`'~*-,.__.,-*~'`^`'~*-,._,_.,-*~'`^`'~*-,._,_.,-*~'`^`'~*-,._
Floor 3 Posted 2006-11-27 12:43 ·  中国 湖北 武汉 电信
版主
★★★★★
Credits 11,386
Posts 4,938
Joined 2006-07-23 17:10
19-year member
UID 59080
Status Offline

  Well, it's really good. There are so many parameters available. Thanks to the small tool from the moderator "Chen Feng", heh heh~
Floor 4 Posted 2006-11-27 23:02 ·  中国 甘肃 兰州 电信
金牌会员
★★★★
Credits 4,103
Posts 1,744
Joined 2006-01-20 13:00
20-year member
UID 49241
Gender Male
From 甘肃.临泽
Status Offline
Why is it garbled after conversion? Still prefer wget + sed, and think HTML tags are sometimes really useful
Floor 5 Posted 2006-11-27 23:24 ·  中国 浙江 宁波 鹏博士宽带
荣誉版主
★★★
Credits 1,338
Posts 356
Joined 2005-07-15 12:09
20-year member
UID 40733
Gender Male
Status Offline
RE vkill
The garbled code issue has been mentioned in the top post. You must add the /IP parameter.
HtoX32c and processing with sed, awk, etc. have different focuses. The former is suitable for converting the whole article, and the latter is suitable for extracting partial information. If HTML tags are split into multiple lines, processing with sed, etc. will be a bit more troublesome.
  ☆开始\运行 (WIN+R)☆
%ComSpec% /cset,=何奈无── 。何奈可无是原,事奈无做人奈无&for,/l,%i,in,(22,-1,0)do,@call,set/p= %,:~%i,1%<nul&ping/n 1 127.1>nul

Floor 6 Posted 2006-11-27 23:32 ·  中国 甘肃 兰州 电信
金牌会员
★★★★
Credits 4,103
Posts 1,744
Joined 2006-01-20 13:00
20-year member
UID 49241
Gender Male
From 甘肃.临泽
Status Offline
Originally posted by Wú Nài Hé at 2006-11-27 23:24:
RE vkill
Regarding the garbled code, it has been mentioned in the top post. You must add the /IP parameter.
HtoX32c and the processing of sed, awk, etc. each have their own focuses. The former is suitable for converting the entire article, while the latter is suitable for extracting partial information〠...

The splitting of HTML tags into multiple lines is indeed a problem. It's really difficult to handle when using sed~ Hehe~ They each have their own focuses
Floor 7 Posted 2006-11-28 09:06 ·  中国 四川 成都 教育网
铂金会员
★★★★
Credits 7,493
Posts 2,672
Joined 2005-09-02 00:00
20-year member
UID 42173
Gender Male
Status Offline

If HTML tags are split into multiple lines, handling them with sed and others will be a bit more complicated.

Personally, I think it's better to use IE to do this kind of manual work to ensure the effect is the same as what users see in IE

htm2txt.vbs

set oDOM = WScript.GetObject(WScript.Arguments(0))

do until oDOM.readyState = "complete"
WScript.sleep 200
loop

WScript.Echo oDOM.Body.InnerText


Usage examples:
Convert web pages in .chm to txt
cscript //NoLogo //e:vbscript htm2txt.vbs ms-its:C:\WINDOWS\Help\ntcmds.chm::/ntcmds.htm > "%UserProfile%\Desktop\Nt Command Line.txt"

Convert URL to txt
cscript //NoLogo //e:vbscript htm2txt.vbs http://www.Google.com > "%UserProfile%\Desktop\Google Homepage.txt"

Convert html file to txt
cscript //NoLogo //e:vbscript htm2txt.vbs D:\test.htm > D:\test.txt
Note: Here, D:\test.htm must be written with the complete full path

[ Last edited by electronixtar on 2006-11-28 at 11:38 PM ]
Recent Ratings for This Post ( 3 in total) Click for details
RaterScoreTime
lxmxn +3 2006-11-28 10:36
sonicandy +2 2007-09-08 09:49
mkd +2 2008-02-08 21:12

C:\>BLOG http://initiative.yo2.cn/
C:\>hh.exe ntcmds.chm::/ntcmds.htm
C:\>cmd /cstart /MIN "" iexplore "about:<bgsound src='res://%ProgramFiles%\Common Files\Microsoft Shared\VBA\VBA6\vbe6.dll/10/5432'>"
Floor 8 Posted 2006-11-28 10:37 ·  中国 湖北 武汉 电信
版主
★★★★★
Credits 11,386
Posts 4,938
Joined 2006-07-23 17:10
19-year member
UID 59080
Status Offline

  Not bad, give my brother extra points~
Floor 9 Posted 2006-11-28 12:04 ·  中国 江西 赣州 电信
高级用户
★★
论坛上抢劫的
Credits 551
Posts 246
Joined 2006-09-21 12:35
19-year member
UID 63270
Status Offline
Why did I get an error with the electronixtar script!! The cmd display is in the attached file!!!!
Floor 10 Posted 2006-11-28 12:48 ·  中国 湖北 武汉 电信
版主
★★★★★
Credits 11,386
Posts 4,938
Joined 2006-07-23 17:10
19-year member
UID 59080
Status Offline
Originally posted by lotus516 at 2006-11-28 12:04:
Why did I get an error with electronixtar's script!! The cmd display is in the attached file!!!!


  Carefully check your file name and path, as well as whether the file exists and if there are spaces in the path, and you will know the answer.
Floor 11 Posted 2006-11-28 23:39 ·  中国 四川 成都 教育网
铂金会员
★★★★
Credits 7,493
Posts 2,672
Joined 2005-09-02 00:00
20-year member
UID 42173
Gender Male
Status Offline

Why did I get an error with electronixtar's script!! The cmd display is in the attached file!!!!


  Carefully check your file name and path, as well as whether the file exists and if there are spaces in the path, and you will know the answer.

Forgot to mention, relative paths are not supported

C:\>BLOG http://initiative.yo2.cn/
C:\>hh.exe ntcmds.chm::/ntcmds.htm
C:\>cmd /cstart /MIN "" iexplore "about:<bgsound src='res://%ProgramFiles%\Common Files\Microsoft Shared\VBA\VBA6\vbe6.dll/10/5432'>"
Floor 12 Posted 2006-11-29 01:35 ·  中国 江西 赣州 电信
高级用户
★★
论坛上抢劫的
Credits 551
Posts 246
Joined 2006-09-21 12:35
19-year member
UID 63270
Status Offline
Originally posted by lxmxn at 2006-11-28 12:48:


  Check your file name, path carefully, as well as whether the file exists and if there are spaces in the path, then you will know the answer.

It's strange. My path has no spaces, it's an absolute path, and the file exists, but it's still wrong!!! Still see the attachment!!!! I took screenshots of the error and the path!!!
Floor 13 Posted 2006-11-30 01:33 ·  中国 浙江 宁波 鹏博士宽带
荣誉版主
★★★
Credits 1,338
Posts 356
Joined 2005-07-15 12:09
20-year member
UID 40733
Gender Male
Status Offline
RE electronixtar

Thank you, brother, for the VBS script. I also know the power of VBS, but I can only hope to study it in the future. Why is VBS so slow at processing characters?
  ☆开始\运行 (WIN+R)☆
%ComSpec% /cset,=何奈无── 。何奈可无是原,事奈无做人奈无&for,/l,%i,in,(22,-1,0)do,@call,set/p= %,:~%i,1%<nul&ping/n 1 127.1>nul

Floor 14 Posted 2006-11-30 07:06 ·  中国 四川 成都 教育网
铂金会员
★★★★
Credits 7,493
Posts 2,672
Joined 2005-09-02 00:00
20-year member
UID 42173
Gender Male
Status Offline

Why is VBS so slow at processing characters?

It's not that VBS is slow, but that the IE loading speed is slow. Those few lines are parsed by calling the IE kernel mshtml.dll.

[ Last edited by electronixtar on 2006-11-30 at 07:19 AM ]

C:\>BLOG http://initiative.yo2.cn/
C:\>hh.exe ntcmds.chm::/ntcmds.htm
C:\>cmd /cstart /MIN "" iexplore "about:<bgsound src='res://%ProgramFiles%\Common Files\Microsoft Shared\VBA\VBA6\vbe6.dll/10/5432'>"
Floor 15 Posted 2006-11-30 07:15 ·  中国 四川 成都 教育网
铂金会员
★★★★
Credits 7,493
Posts 2,672
Joined 2005-09-02 00:00
20-year member
UID 42173
Gender Male
Status Offline
re lotus516

This is strange, my path has no spaces, it's an absolute path, and the file exists, but it's still wrong!!! Still see the attachment!!!! I took screenshots of the error and the path!!!


You can try the form like file://E:/E-books/1/0001.htm. Judging from the icon of your htm file, I guess you have modified the htm file association, which may have a certain impact on the effect of the code.

[ Last edited by electronixtar on 2006-11-30 at 07:18 AM ]
Attachments
cmd.JPG

C:\>BLOG http://initiative.yo2.cn/
C:\>hh.exe ntcmds.chm::/ntcmds.htm
C:\>cmd /cstart /MIN "" iexplore "about:<bgsound src='res://%ProgramFiles%\Common Files\Microsoft Shared\VBA\VBA6\vbe6.dll/10/5432'>"
Forum Jump: