namejm
荣誉版主
       batch fan
积分 5226
发帖 1737
注册 2006-3-10 来自 成都
状态 离线
|
     『楼 主』:
批处理室经典帖子分类索引[20080921](求助前必看)
使用 LLM 解释/回答一下
以下是批处理室经典帖子的分类索引,帖子记录提取的截止时间:2008-09-21 9:00am,新增1条记录(上次更新时间:07-06-08)。 (请各位版主常来更新,以方便新手)
在整理这些帖子的过程中,一些经典帖子的标题太过模糊,本人(namejm)已一一修正,但是没有精力再逐一通知帖主,在此对这些帖主表示歉意。
说明:
1、每条记录都由 帖子发表时间+帖子ID+帖子标题+可打印版本链接 组成。加上ID和可打印版本的原因请参考这个帖子: DOS联盟论坛解答室精华帖索引(2005.08.10)
2、由于本人仅对DOS批处理有所了解,因此,这个索引只收录了批处理方面的帖子,大量的其他脚本如vbs没有收录。
附上本人(namejm)整理索引的教程及脚本,以方便各位整理自己的索引。现摘抄附件中的部分内容:
废话少说,我们来看看我做的这个帖子索引需要做些什么工作:
首先,我们需要获得批处理室所有帖子的相关记录。这个可以在联盟DOS界面下得到,具体操作步骤是:
① 打开 http://www.cn-dos.net/forum/cmdprmt.php 这个页面,进入联盟DOS界面;
② 提示符下输入 cd d:\23 ,跳转到批处理室DOS版;
在此界面下,输入 dir /? 可以得到可查询内容的详细信息。具体到我们这个索引中,我们需要的关键信息是帖子标题和ID:有了标题就能大概知道帖子的内容和价值(BTW,这也是我一直对批处理室帖子标题盯得很严的原因);有了ID就可以在网页浏览器里打开这个帖子。如果想了解一下帖子的新旧情况,则发帖日期不可少;如果要通过参考回帖量和点击率来评价某个帖子的价值,那就还得获取帖子的这两个参数。我使用的是 dir /a /c /-p /vdinr 语句,查询帖子的发帖时间、帖子ID、回复数、浏览量、标题。
③ 选择网页的全部内容(快捷键Ctrl+A),复制后保存为文本文件。切不可使用IE浏览器里的 网页另存为 这个功能,否则,会导致保存的帖子记录格式错位,无法用批处理进行后面的操作。
然后,对帖子列表里的经典帖子记录做标记,以便于批量提取。这可是件工程量浩大的体力活,想当初,我可是把所有记录都瞄了一遍的,连回复数不超过5的帖子都没放过,这样做的动机十分简单:不放过任何一条有价值的记录。有时,为了确定某篇帖子是否值得标记,还得手工打开网页来查看,其间的烦琐乏味,实在是前所未有。当然,你也可以把回复数低于某个值的记录过滤掉,以减少工作量,至少,0回复的帖子是可以忽略的,这个需求可以通过编写批处理代码来实现,不过,在用批处理来解决这个问题之前,得先对文本里的特殊字符做前期处理,可参考下一步里的做法。在这里,我是用"√"来标记的,你也可以换成其他的符号,但是,应该避免使用findstr难以处理的字符,比如引号、点号、星号等等。
再然后,把做了标记的记录分类整理到一个新文件中去。推荐使用 EditPlus 2 来处理,因为它的快捷键 Ctrl+R 能迅速选定光标所在的行,整行提取记录方便快捷——这一步需要不停地Ctr+R、Ctr+C、Ctrl+V,这注定是一件枯燥乏味的体力活。
再再然后,对经典帖子记录中CMD下的特殊字符做替换处理,以便批处理脚本能正确处理。比如"、<、<<、>、>>、|、||、&……我的做法是把它们都转换为全角状态下的符号。替换字符串的操作推荐使用 文本整理器 来做,它处理大文件的速度相当快,远非系统自带的记事本所能企及。
最后,最考验技术的事情来了:把已经在内部分好类了的经典帖子新文件,按照类别拆分为不同的文件,每个文件里的内容均为 发帖子时间+帖子ID+帖子标题+帖子路径+可打印版本路径,一行一条记录,最好还能把帖子标题加粗、带色显示。具体代码参见 经典帖子索引整理器.cmd 。
chm电子书版本(每半年更新一次,当前版本为070614版)请在此下载。
Last edited by HAT on 2008-11-9 at 18:26 ]
The following is the classified index of classic posts in the Batch Processing Room. The extraction deadline of post records: 9:00 am on September 21, 2008. One new record is added (last update time: 07-06-08). (Please let the moderators update frequently to facilitate newcomers)
During the process of organizing these posts, the titles of some classic posts were too vague. I (namejm) have corrected them one by one, but I don't have the energy to notify the post owners one by one. Here, I express my apologies to these post owners.
Instructions:
1. Each record consists of post publication time + post ID + post title + printable version link. The reason for adding the ID and the printable version can be referred to this post: DOS Union Forum Q&A Room Essence Post Index (August 10, 2005)
2. Since I only have an understanding of DOS batch processing, this index only includes posts related to batch processing. A large number of other scripts such as vbs are not included.
The tutorial and script for me (namejm) to organize the index are attached for your convenience to organize your own index. Part of the content in the attachment is excerpted here:
Let's not beat around the bush. Let's see what work needs to be done for the post index I made:
First, we need to obtain relevant records of all posts in the Batch Processing Room. This can be obtained under the Union DOS interface. The specific operation steps are:
① Open http://www.cn-dos.net/forum/cmdprmt.php this page to enter the Union DOS interface;
② Enter cd d:\23 at the prompt to jump to the Batch Processing Room DOS version;
Under this interface, entering dir /? can get detailed information about the queryable content. Specifically, for our index, the key information we need is the post title and ID: with the title, we can roughly know the content and value of the post (BTW, this is also the reason why I have been strictly following the post titles in the Batch Processing Room); with the ID, we can open this post in a web browser. If you want to know the newness of the post, the posting date is indispensable; if you want to evaluate the value of a post by referring to the number of replies and click-through rate, then you also need to obtain these two parameters of the post. The statement I used is dir /a /c /-p /vdinr to query the post posting time, post ID, number of replies, page views, and title.
③ Select all the content of the web page (shortcut key Ctrl+A), copy it, and save it as a text file. Do not use the "Save as Web Page" function in IE browser, otherwise, it will cause the saved post record format to be misaligned, and the subsequent operations with batch processing cannot be performed.
Then, mark the classic post records in the post list for batch extraction. This is a huge physical work. Back then, I actually looked through all the records, not even letting go of posts with no more than 5 replies. The motivation for this is very simple: not letting go of any valuable record. Sometimes, in order to determine whether a post is worth marking, I have to open the web page manually to check. The tediousness and boredom in the meantime are really unprecedented. Of course, you can also filter out records with a reply count lower than a certain value to reduce the workload. At least, posts with 0 replies can be ignored. This requirement can be realized by writing batch processing code. However, before using batch processing to solve this problem, the special characters in the text need to be preprocessed first, which can refer to the approach in the next step. Here, I use "√" to mark, and you can also replace it with other symbols, but you should avoid using characters that are difficult for findstr to handle, such as quotation marks, dots, asterisks, etc.
Then, sort the marked records into a new file. It is recommended to use EditPlus 2 to handle it, because its shortcut key Ctrl+R can quickly select the line where the cursor is located, and it is convenient and fast to extract the record line by line - this step requires continuous Ctr+R, Ctr+C, Ctrl+V, which is destined to be a boring physical work.
Then again, the most technical thing comes: splitting the classified classic post new file into different files according to categories. Each file contains post publication time + post ID + post title + post path + printable version path, one record per line. It is best to also bold the post title and display it in color. The specific code can be referred to in the Classic Post Index Organizer.cmd.
The CHM e-book version (updated every six months, the current version is 070614 version) please download here.
Last edited by HAT on 2008-11-9 at 18:26 ]
此帖被 +157 点积分 点击查看详情 评分人:【 redtek 】 | 分数: +20 | 时间:2007-2-13 14:09 | 评分人:【 zhct 】 | 分数: +1 | 时间:2007-2-13 15:45 | 评分人:【 htysm 】 | 分数: +2 | 时间:2007-2-16 04:15 | 评分人:【 xycoordinate 】 | 分数: +2 | 时间:2007-2-24 13:04 | 评分人:【 electronixtar 】 | 分数: +23 | 时间:2007-2-28 08:59 | 评分人:【 qasa 】 | 分数: +8 | 时间:2007-2-28 09:34 | 评分人:【 lxmxn 】 | 分数: +20 | 时间:2007-2-28 09:55 | 评分人:【 pengfei 】 | 分数: +15 | 时间:2007-3-1 23:10 | 评分人:【 Eblis 】 | 分数: +2 | 时间:2007-3-8 06:59 | 评分人:【 huzixuan 】 | 分数: +2 | 时间:2007-3-10 05:44 | 评分人:【 ebfok 】 | 分数: +1 | 时间:2007-4-4 07:24 | 评分人:【 wangjf 】 | 分数: +2 | 时间:2007-4-10 00:44 | 评分人:【 sisos 】 | 分数: +1 | 时间:2007-4-17 23:34 | 评分人:【 dingamao 】 | 分数: +2 | 时间:2007-4-30 13:10 | 评分人:【 my3439955 】 | 分数: +2 | 时间:2007-6-16 21:59 | 评分人:【 26933062 】 | 分数: +4 | 时间:2007-6-25 17:53 | 评分人:【 lianjiang2004 】 | 分数: +15 | 时间:2007-9-17 15:17 | 评分人:【 liuyun20 】 | 分数: +1 | 时间:2007-10-4 19:08 | 评分人:【 dthao 】 | 分数: +2 | 时间:2007-11-3 00:28 | 评分人:【 zqdarkday 】 | 分数: +4 | 时间:2007-11-17 22:42 | 评分人:【 ngd 】 | 分数: +2 | 时间:2008-1-22 17:41 | 评分人:【 borly 】 | 分数: +2 | 时间:2008-2-20 19:52 | 评分人:【 regvip2008 】 | 分数: +2 | 时间:2008-3-4 11:58 | 评分人:【 slndx 】 | 分数: +1 | 时间:2008-3-23 17:15 | 评分人:【 maidu 】 | 分数: +2 | 时间:2008-4-18 22:17 | 评分人:【 nipo 】 | 分数: +2 | 时间:2008-4-27 19:20 | 评分人:【 Dana 】 | 分数: +2 | 时间:2008-4-29 10:47 | 评分人:【 antsking 】 | 分数: +1 | 时间:2008-5-10 00:35 | 评分人:【 523066680 】 | 分数: +8 | 时间:2008-8-24 06:23 | 评分人:【 516526966 】 | 分数: +1 | 时间:2008-10-6 15:21 | 评分人:【 iyou 】 | 分数: +1 | 时间:2008-11-11 15:31 | 评分人:【 】 | 分数: +1 | 时间:2009-12-23 16:53 | 评分人:【 Evangel 】 | 分数: +2 | 时间:2010-6-2 23:56 | 评分人:【 zzz19760225 】 | 分数: +1 | 时间:2016-2-14 09:57 |
|
附件
1: 经典帖子整理教程(附原始数据及脚本).rar (2007-6-8 13:29, 110.21 KiB,下载次数: 9019)
|

尺有所短,寸有所长,学好CMD没商量。
考虑问题复杂化,解决问题简洁化。 |
|