|
无奈何
荣誉版主
      
积分 1338
发帖 356
注册 2005-7-15
状态 离线
|
『楼 主』:
[分享]命令行下网页转文本的超强工具
使用 LLM 解释/回答一下
命令行下网页转文本的超强工具
软件名:HtoX32c
这是 HtoX32 的命令行版本,具有很强的可定制功能,转换效果非常理想。我用过很多 html2txt 类的软件这是唯一一款转换效果让我满意的软件。这是小鬼子的作品,所以界面是日文的。我参照 HtoX32 汉化版及使用体会汉化了一下。不过不是现在汉化的,是很久以前弄的,刚刚让我从箱底翻出来。我不懂日文,又为了凑字节,所以汉化质量非常差。汗一下自己,这样的东西也敢贴出来。使用的时候一定注意加 /IP 参数不做编码转换,否则转换出来的东西谁也看不懂了。希望推荐的这个软件能给大家在整理网页资料时带来一点方便。
Super Powerful Tool for Converting Web Pages to Text in Command Line
Software Name: HtoX32c
This is the command line version of HtoX32, with strong customizable functions and very ideal conversion effects. I have used many html2txt - like software, and this is the only one whose conversion effect satisfies me. This is a work of Japanese, so the interface is in Japanese. I referred to the Chinese - localized version of HtoX32 and the usage experience to localize it. But it was not localized now, it was a long time ago, and I just dug it out from the bottom of the box. I don't understand Japanese, and in order to fill bytes, the localization quality is very poor. Sweat for myself, such a thing dares to be posted. When using it, be sure to pay attention to adding the /IP parameter to not do encoding conversion, otherwise the converted things will be impossible for anyone to understand. I hope the recommended software can bring a little convenience to everyone when organizing web page materials.
附件
1: HtoX32c.zip (2006-11-27 11:48, 63.34 KiB, 下载附件所需积分 1 点
,下载次数: 560)
|

☆开始\运行 (WIN+R)☆
%ComSpec% /cset,=何奈无── 。何奈可无是原,事奈无做人奈无&for,/l,%i,in,(22,-1,0)do,@call,set/p= %,:~%i,1%<nul&ping/n 1 127.1>nul
|
|
2006-11-27 11:48 |
|
|
redtek
金牌会员
     
积分 2902
发帖 1147
注册 2006-9-21
状态 离线
|
『第 2 楼』:
使用 LLM 解释/回答一下
感谢版主提供这么好的工具,下载收藏~:)
Thanks to the moderator for providing such a good tool, download and collect it~:)
|

Redtek,一个永远在网上流浪的人……
_.,-*~'`^`'~*-,.__.,-*~'`^`'~*-,._,_.,-*~'`^`'~*-,._,_.,-*~'`^`'~*-,._ |
|
2006-11-27 12:24 |
|
|
lxmxn
版主
       
积分 11386
发帖 4938
注册 2006-7-23
状态 离线
|
『第 3 楼』:
使用 LLM 解释/回答一下
恩,是不错啊,还有这么多参数可以用呢,感谢版主"尘封"的小工具,呵呵~
Well, it's really good. There are so many parameters available. Thanks to the small tool from the moderator "Chen Feng", heh heh~
|
|
2006-11-27 12:43 |
|
|
vkill
金牌会员
     
积分 4103
发帖 1744
注册 2006-1-20 来自 甘肃.临泽
状态 离线
|
『第 4 楼』:
使用 LLM 解释/回答一下
转换完是乱码怎么?还是喜欢 wget+sed ,觉得html标签有时候真有用
Why is it garbled after conversion? Still prefer wget + sed, and think HTML tags are sometimes really useful
|
|
2006-11-27 23:02 |
|
|
无奈何
荣誉版主
      
积分 1338
发帖 356
注册 2005-7-15
状态 离线
|
『第 5 楼』:
使用 LLM 解释/回答一下
RE vkill
关于乱码顶楼帖子已经提及,一定要加 /IP 参数。
HtoX32c 与 sed、awk 等的处理各有侧重的,前者适合整篇文章转换,后者适合部分信息的提取。如果 HTML 标签被拆分到多行,用 sed 等处理会稍微麻烦一些。
RE vkill
The garbled code issue has been mentioned in the top post. You must add the /IP parameter.
HtoX32c and processing with sed, awk, etc. have different focuses. The former is suitable for converting the whole article, and the latter is suitable for extracting partial information. If HTML tags are split into multiple lines, processing with sed, etc. will be a bit more troublesome.
|

☆开始\运行 (WIN+R)☆
%ComSpec% /cset,=何奈无── 。何奈可无是原,事奈无做人奈无&for,/l,%i,in,(22,-1,0)do,@call,set/p= %,:~%i,1%<nul&ping/n 1 127.1>nul
|
|
2006-11-27 23:24 |
|
|
vkill
金牌会员
     
积分 4103
发帖 1744
注册 2006-1-20 来自 甘肃.临泽
状态 离线
|
『第 6 楼』:
使用 LLM 解释/回答一下
Originally posted by 无奈何 at 2006-11-27 23:24:
RE vkill
关于乱码顶楼帖子已经提及,一定要加 /IP 参数。
HtoX32c 与 sed、awk 等的处理各有侧重的,前者适合整篇文章转换,后者适合部分信息的提取〠...
HTML 标签被拆分到多行这确实是问题,sed处理的时候真不好弄~呵呵~是各有侧重
Originally posted by Wú Nài Hé at 2006-11-27 23:24:
RE vkill
Regarding the garbled code, it has been mentioned in the top post. You must add the /IP parameter.
HtoX32c and the processing of sed, awk, etc. each have their own focuses. The former is suitable for converting the entire article, while the latter is suitable for extracting partial information〠...
The splitting of HTML tags into multiple lines is indeed a problem. It's really difficult to handle when using sed~ Hehe~ They each have their own focuses
|
|
2006-11-27 23:32 |
|
|
electronixtar
铂金会员
      
积分 7493
发帖 2672
注册 2005-9-2
状态 离线
|
 『第 7 楼』:
使用 LLM 解释/回答一下
如果 HTML 标签被拆分到多行,用 sed 等处理会稍微麻烦一些。
个人觉得还是用IE来作这种体力活的好,保证效果和用户在IE里看到的一样
htm2txt.vbs
set oDOM = WScript.GetObject(WScript.Arguments(0))
do until oDOM.readyState = "complete"
WScript.sleep 200
loop
WScript.Echo oDOM.Body.InnerText
使用实例:
转换 .chm 里的网页到txt
cscript //NoLogo //e:vbscript htm2txt.vbs ms-its:C:\WINDOWS\Help\ntcmds.chm::/ntcmds.htm > "%UserProfile%\桌面\Nt命令行.txt"
转换URL到txt
cscript //NoLogo //e:vbscript htm2txt.vbs http://www.Google.com > "%UserProfile%\桌面\Google首页.txt"
转换html文件到txt
cscript //NoLogo //e:vbscript htm2txt.vbs D:\test.htm > D:\test.txt
注意:此处的 D:\test.htm必须写 完整的全部路径
Last edited by electronixtar on 2006-11-28 at 11:38 PM ]
If HTML tags are split into multiple lines, handling them with sed and others will be a bit more complicated.
Personally, I think it's better to use IE to do this kind of manual work to ensure the effect is the same as what users see in IE
htm2txt.vbs
set oDOM = WScript.GetObject(WScript.Arguments(0))
do until oDOM.readyState = "complete"
WScript.sleep 200
loop
WScript.Echo oDOM.Body.InnerText
Usage examples:
Convert web pages in .chm to txt
cscript //NoLogo //e:vbscript htm2txt.vbs ms-its:C:\WINDOWS\Help\ntcmds.chm::/ntcmds.htm > "%UserProfile%\Desktop\Nt Command Line.txt"
Convert URL to txt
cscript //NoLogo //e:vbscript htm2txt.vbs http://www.Google.com > "%UserProfile%\Desktop\Google Homepage.txt"
Convert html file to txt
cscript //NoLogo //e:vbscript htm2txt.vbs D:\test.htm > D:\test.txt
Note: Here, D:\test.htm must be written with the complete full path
Last edited by electronixtar on 2006-11-28 at 11:38 PM ]
此帖被 +7 点积分 点击查看详情 评分人:【 lxmxn 】 | 分数: +3 | 时间:2006-11-28 10:36 | 评分人:【 sonicandy 】 | 分数: +2 | 时间:2007-9-8 09:49 | 评分人:【 mkd 】 | 分数: +2 | 时间:2008-2-8 21:12 |
|
|

C:\>BLOG http://initiative.yo2.cn/
C:\>hh.exe ntcmds.chm::/ntcmds.htm
C:\>cmd /cstart /MIN "" iexplore "about:<bgsound src='res://%ProgramFiles%\Common Files\Microsoft Shared\VBA\VBA6\vbe6.dll/10/5432'>" |
|
2006-11-28 09:06 |
|
|
lxmxn
版主
       
积分 11386
发帖 4938
注册 2006-7-23
状态 离线
|
『第 8 楼』:
使用 LLM 解释/回答一下
不错,给兄弟加分咯~
Not bad, give my brother extra points~
|
|
2006-11-28 10:37 |
|
|
lotus516
高级用户
    论坛上抢劫的
积分 551
发帖 246
注册 2006-9-21
状态 离线
|
『第 9 楼』:
使用 LLM 解释/回答一下
为什么我用electronixtar的脚本出了错!!cmd显示见附件!!!!
Why did I get an error with the electronixtar script!! The cmd display is in the attached file!!!!
|
|
2006-11-28 12:04 |
|
|
lxmxn
版主
       
积分 11386
发帖 4938
注册 2006-7-23
状态 离线
|
『第 10 楼』:
使用 LLM 解释/回答一下
Originally posted by lotus516 at 2006-11-28 12:04:
为什么我用electronixtar的脚本出了错!!cmd显示见附件!!!!
仔细检查你的文件名和路径,以及文件是否存在,路径是否有空格,你就知道答案了。
Originally posted by lotus516 at 2006-11-28 12:04:
Why did I get an error with electronixtar's script!! The cmd display is in the attached file!!!!
Carefully check your file name and path, as well as whether the file exists and if there are spaces in the path, and you will know the answer.
|
|
2006-11-28 12:48 |
|
|
electronixtar
铂金会员
      
积分 7493
发帖 2672
注册 2005-9-2
状态 离线
|
『第 11 楼』:
使用 LLM 解释/回答一下
为什么我用electronixtar的脚本出了错!!cmd显示见附件!!!!
仔细检查你的文件名和路径,以及文件是否存在,路径是否有空格,你就知道答案了。
忘记说了,不支持相对路径
Why did I get an error with electronixtar's script!! The cmd display is in the attached file!!!!
Carefully check your file name and path, as well as whether the file exists and if there are spaces in the path, and you will know the answer.
Forgot to mention, relative paths are not supported
|

C:\>BLOG http://initiative.yo2.cn/
C:\>hh.exe ntcmds.chm::/ntcmds.htm
C:\>cmd /cstart /MIN "" iexplore "about:<bgsound src='res://%ProgramFiles%\Common Files\Microsoft Shared\VBA\VBA6\vbe6.dll/10/5432'>" |
|
2006-11-28 23:39 |
|
|
lotus516
高级用户
    论坛上抢劫的
积分 551
发帖 246
注册 2006-9-21
状态 离线
|
『第 12 楼』:
使用 LLM 解释/回答一下
Originally posted by lxmxn at 2006-11-28 12:48:
仔细检查你的文件名和路径,以及文件是否存在,路径是否有空格,你就知道答案了。
这就奇了,我的路径没有空格,是绝对路径,文件也存在,就是错!!!还是见附件!!!!我截了出错的屏和路径!!!
Originally posted by lxmxn at 2006-11-28 12:48:
Check your file name, path carefully, as well as whether the file exists and if there are spaces in the path, then you will know the answer.
It's strange. My path has no spaces, it's an absolute path, and the file exists, but it's still wrong!!! Still see the attachment!!!! I took screenshots of the error and the path!!!
|
|
2006-11-29 01:35 |
|
|
无奈何
荣誉版主
      
积分 1338
发帖 356
注册 2005-7-15
状态 离线
|
『第 13 楼』:
使用 LLM 解释/回答一下
RE electronixtar
谢谢兄的 VBS 脚本,我也知道 VBS 的强大,只能寄希望于以后学习了。VBS 处理字符的速度怎么这么慢。
RE electronixtar
Thank you, brother, for the VBS script. I also know the power of VBS, but I can only hope to study it in the future. Why is VBS so slow at processing characters?
|

☆开始\运行 (WIN+R)☆
%ComSpec% /cset,=何奈无── 。何奈可无是原,事奈无做人奈无&for,/l,%i,in,(22,-1,0)do,@call,set/p= %,:~%i,1%<nul&ping/n 1 127.1>nul
|
|
2006-11-30 01:33 |
|
|
electronixtar
铂金会员
      
积分 7493
发帖 2672
注册 2005-9-2
状态 离线
|
『第 14 楼』:
使用 LLM 解释/回答一下
VBS 处理字符的速度怎么这么慢
不是vbs慢,是IE载入速度慢。那几句是调用的IE的内核 mshtml.dll 来解析的
Last edited by electronixtar on 2006-11-30 at 07:19 AM ]
Why is VBS so slow at processing characters?
It's not that VBS is slow, but that the IE loading speed is slow. Those few lines are parsed by calling the IE kernel mshtml.dll.
Last edited by electronixtar on 2006-11-30 at 07:19 AM ]
|

C:\>BLOG http://initiative.yo2.cn/
C:\>hh.exe ntcmds.chm::/ntcmds.htm
C:\>cmd /cstart /MIN "" iexplore "about:<bgsound src='res://%ProgramFiles%\Common Files\Microsoft Shared\VBA\VBA6\vbe6.dll/10/5432'>" |
|
2006-11-30 07:06 |
|
|
electronixtar
铂金会员
      
积分 7493
发帖 2672
注册 2005-9-2
状态 离线
|
『第 15 楼』:
使用 LLM 解释/回答一下
re lotus516
这就奇了,我的路径没有空格,是绝对路径,文件也存在,就是错!!!还是见附件!!!!我截了出错的屏和路径!!!
可以试试 file://E:/电子书/1/0001.htm 这样的形式。看你的 htm 文件的图标,猜测你是改过 htm 文件关联的,可能会对代码的效果产生一定的影响。
Last edited by electronixtar on 2006-11-30 at 07:18 AM ]
re lotus516
This is strange, my path has no spaces, it's an absolute path, and the file exists, but it's still wrong!!! Still see the attachment!!!! I took screenshots of the error and the path!!!
You can try the form like file://E:/E-books/1/0001.htm. Judging from the icon of your htm file, I guess you have modified the htm file association, which may have a certain impact on the effect of the code.
Last edited by electronixtar on 2006-11-30 at 07:18 AM ]
附件
1: cmd.JPG (2006-11-30 07:16, 32.45 KiB, 下载附件所需积分 1 点
,下载次数: 12)
|

C:\>BLOG http://initiative.yo2.cn/
C:\>hh.exe ntcmds.chm::/ntcmds.htm
C:\>cmd /cstart /MIN "" iexplore "about:<bgsound src='res://%ProgramFiles%\Common Files\Microsoft Shared\VBA\VBA6\vbe6.dll/10/5432'>" |
|
2006-11-30 07:15 |
|