流行的RSS
随着各式各样的博客越来越红火,作为特性之一的RSS也为越来越多的人接受和喜爱。
RSS的出现改变了人们从一个网站到另一个网站逐页寻找自己感兴趣的内容的传统浏览方式,
RSS通过可订制的聚合式内容使人们从网站跳转中解放出来,更高效快捷的获取信息。
这些优势使得RSS阅读群不断的扩大,各大网站看到这种趋势也纷纷为自己的网页加入了RSS Feed以供订阅。
无奈的RSS
但是使用RSS自然会分走一部分网站的浏览量,这对于很多网站管理者来说是比较介意的,
所以还是有相当数量的网站持保留态度,暂不提供RSS内容。这对于已经体验到RSS的好处而相当依赖的人来说不算什么好事。
既然伸手讨不到,那就只有自给自足这一条路了,我们可以利用几个很简单的命令行小工具来自动生成RSS。
工具的选择
PHP、ASP这些服务器脚本是高级货,就是太复杂还要配置专门的环境,于是Pass。C++等高级语言门槛更高,于是也Pass。
就只有命令行脚本易学易用,还不需要专门的运行环境资源占用也不大,就是它了。
光靠CMD想干这事儿还是有些困难,还是找几位帮手来共举大业吧:
CURL:凡是和网络上传下载沾边的找它准没错
ICONV:网页编码何其多,就靠它来统一格式了
有这两位帮忙,CMD脚本便是如虎添翼,能人所不能。
设计的思路
无论什么软件都不肯能应用于所有的情况,对于批处理类的脚本来说更是如此。往往一个脚本只能对应一个或一类网站。
所以清晰的思路就异常的重要,有了好的思路就可以反复应用,所需很少工作量就可以修改适应到其他的网站上。
下面以
CCF精品技术论坛为例,尝试生成新帖子的RSS内容。
获取页面
要处理内容首先就得获取页面,CURL便是专司此职的。我们只需要下面这个简单的命令就可以获取CCF的『软件使用』子论坛的首页。
curl -b cookie@et8.txt -A "Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.0)" -e "http://bbs.et8.net/" -o tmp1.txt http://bbs.et8.net/bbs/forumdisplay.php?f=17^&page=1^&pp=200^&daysprune=-1
cookie@et8.txt是IE的Cooke文件,对于论坛等需要的登陆的网站很有就靠它提供用户名密码。
取得的内容由
-o tmp1.txt
指定保存在 tmp1.txt 中。
编码处理
由于命令行不能很好的处理 unicode。所以需要用 ICONV 把文本转为 ANSI:
iconv -c -f UTF-8 -t GB2312 tmp1.txt >temp.txt
这样我们就将刚才获取的 tmp1.txt 从 UTF-8 转为了 GB2312 并另存为 temp.txt。
最后访问的帖子
既然是订阅新帖子的RSS,那么便需要读取上一次最后列出的帖子,假设我们把上次访问的帖子的ID保存在 last.txt 中:
for /f "delims=" %%z in (last.txt) do (set last=%%z)
set lastcheck=!last!
我们用 last 和 lastcheck 两个扩展环境变量来保存获得的ID。
查找新帖
知道了上次最后访问的ID,我们就可以通过搜索 temp.txt 中的首页前 200 个帖子的ID,以比较的方式筛选出新帖。
同时不要忘了将最新的一篇的ID记录下来,以便写入 last.txt 供下次使用。
对于RSS来说,只有题目和地址是不够的,我们还需要作为摘要的文字,所以我们需要再次使用 CURL 和 ICONV,
获取新帖的页面,来做成完整的RSS内容,同时以 echo 到文件的方式创建 XML 格式。
for /f "tokens=2-5 delims==<" %%i in ('findstr "showthread.php?s" temp.txt') do (
set tmp=%%l
set post=!tmp:~0,6%!
set title=!tmp:~8!
set flag=0
set first=1
if "%%i"=="a href" if /i !post! GTR !last! if NOT "!post!"=="lastpo" (
if !post! GTR !lastcheck! set lastcheck=!post!
echo.
(echo.^<item^>
echo.^<title^>!title!^</title^>
echo.^<link^>http://bbs.et8.net/bbs/showthread.php?t=!post!^</link^>
echo.^<description^>
echo.^<
]^>
echo.^</description^>
echo.^<pubDate^>^</pubDate^>
echo.^<author^>^</author^>
echo.^</item^>)>>tmp.xml
)
)
@echo !lastcheck! >last.txt
通过分析页面的内容,可以发现主楼的文字是用 <!-- message --> 和 <!-- / message --> 作为开始和结束的标示的。
所以用 flag 和 first 两个扩展环境变量来做双闸开关,控制文字的输出。
XML的格式化
最后用一个 for 语句将 tmp.xml 格式化为符合 RSS 2.0。
echo.
(echo.^<?xml version="1.0" encoding="GB18030"?^>
echo.^<rss version="2.0"^>
echo.^<channel^>
echo.^<title^>CCF精品技术论坛^</title^>
echo.^<link^>http://bbs.et8.net/^</link^>
echo.^<description^>CCF ClassiClubForm forum^</description^>
echo.^<language^>zh-cn^</language^>) >rss2.xml
for /f "delims=" %%x in (tmp.xml) do (
if %%x==^<
完整的脚本
@echo off
setlocal ENABLEDELAYEDEXPANSION
curl -b cookie@et8.txt -A "Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.0)" -e "http://bbs.et8.net/" -o tmp1.txt http://bbs.et8.net/bbs/forumdisplay.php?f=17^&page=1^&pp=200^&daysprune=-1
iconv -c -f UTF-8 -t GB2312 tmp1.txt >temp.txt
for /f "delims=" %%z in (last.txt) do (set last=%%z)
set lastcheck=!last!
for /f "tokens=2-5 delims==<" %%i in ('findstr "showthread.php?s" temp.txt') do (
set tmp=%%l
set post=!tmp:~0,6%!
set title=!tmp:~8!
set flag=0
set first=1
if "%%i"=="a href" if /i !post! GTR !last! if NOT "!post!"=="lastpo" (
if !post! GTR !lastcheck! set lastcheck=!post!
echo.
(echo.^<item^>
echo.^<title^>!title!^</title^>
echo.^<link^>http://bbs.et8.net/bbs/showthread.php?t=!post!^</link^>
echo.^<description^>
echo.^<
]^>
echo.^</description^>
echo.^<pubDate^>^</pubDate^>
echo.^<author^>^</author^>
echo.^</item^>)>>tmp.xml
)
)
@echo !lastcheck! >last.txt
endlocal
echo.
(echo.^<?xml version="1.0" encoding="GB18030"?^>
echo.^<rss version="2.0"^>
echo.^<channel^>
echo.^<title^>CCF精品技术论坛^</title^>
echo.^<link^>http://bbs.et8.net/^</link^>
echo.^<description^>CCF ClassiClubForm forum^</description^>
echo.^<language^>zh-cn^</language^>) >rss2.xml
for /f "delims=" %%x in (tmp.xml) do (
if %%x==^<
保存为et8-rss.cmd即可双击运行。
RSS的使用
每次想要阅读之前只要先运行一遍 et8-rss.cmd 就可以更新 RSS 列出新帖。
生成的 rss2.xml 可以添加到 RSS 阅读器中,比如用 Firefox 的 Sage 扩展来阅读:

本文为chenke_ikari原创,首发于豆腐的简陋小屋
本文采用Creative Commons 署名-非商业性使用-相同方式共享 2.5 China 许可协议 进行许可

Last edited by ikari on 2006-8-2 at 13:40 ]
### Popular RSS
With the increasing popularity of various blogs, RSS, as one of its features, has been accepted and loved by more and more people.
The emergence of RSS has changed the traditional browsing method of people looking for content they are interested in page by page from one website to another. RSS enables people to be freed from jumping between websites and obtain information more efficiently and quickly through customizable aggregated content. These advantages have led to the continuous expansion of the RSS reading group. Major websites have also seen this trend and have successively added RSS Feeds to their web pages for subscription.
### Regrettable RSS
However, using RSS will naturally take away a part of the website's page views, which is relatively mind-boggling for many website managers. So, there are still a considerable number of websites that are reserved and do not provide RSS content for the time being. This is not a good thing for those who have experienced the benefits of RSS and are quite dependent on it. Since you can't get it by asking, there is only one way to be self-sufficient. We can use a few simple command-line small tools to automatically generate RSS.
### Tool Selection
Server scripts like PHP and ASP are high-end goods, but they are too complicated and require configuring a special environment, so they are passed. High-level languages like C++ have a higher threshold, so they are also passed. Only command-line scripts are easy to learn and use, and they don't need a special running environment and occupy less resources. That's it. It's still a bit difficult to do this with CMD alone. We still need to find a few helpers to work together:
CURL: Whenever it comes to network uploads and downloads, it's the right choice.
ICONV: There are so many web page encodings, and it's relied on to unify the format.
With the help of these two, the CMD script is like adding wings to a tiger and can do what others can't.
### Design Idea
No software can be applied to all situations, and this is even more true for batch processing scripts. Often, a script can only correspond to one or a type of website. So, a clear idea is extremely important. With a good idea, it can be applied repeatedly, and with very little workload, it can be modified to adapt to other websites. The following takes
CCF Boutique Technology Forum as an example to try to generate the RSS content of new posts.
#### Obtaining the Page
To process the content, we first need to obtain the page, and CURL is specifically responsible for this job. We only need the following simple command to obtain the homepage of the "Software Usage" sub-forum of CCF.
curl -b cookie@et8.txt -A "Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.0)" -e "http://bbs.et8.net/" -o tmp1.txt http://bbs.et8.net/bbs/forumdisplay.php?f=17^&page=1^&pp=200^&daysprune=-1
cookie@et8.txt is the IE Cooke file, which is very useful for websites like forums that require login to provide the username and password. The obtained content is specified to be saved in tmp1.txt by
-o tmp1.txt
.
#### Encoding Processing
Since the command line can't handle unicode well, we need to use ICONV to convert the text to ANSI:
iconv -c -f UTF-8 -t GB2312 tmp1.txt >temp.txt
In this way, we convert the just-obtained tmp1.txt from UTF-8 to GB2312 and save it as temp.txt.
#### Last Visited Post
Since it's to subscribe to the RSS of new posts, we need to read the last listed post last time. Suppose we save the ID of the last visited post in last.txt:
for /f "delims=" %%z in (last.txt) do (set last=%%z)
set lastcheck=!last!
We use two extended environment variables, last and lastcheck, to save the obtained ID.
#### Finding New Posts
Knowing the ID of the last visited post last time, we can search for the IDs of the first 200 posts in the homepage of temp.txt and screen out the new posts by comparison. At the same time, don't forget to record the ID of the latest one to write it into last.txt for next use. For RSS, it's not enough to just have the title and address. We also need the text as a summary, so we need to use CURL and ICONV again to obtain the page of the new post to make the complete RSS content, and at the same time create XML format by echoing to the file.
for /f "tokens=2-5 delims==<" %%i in ('findstr "showthread.php?s" temp.txt') do (
set tmp=%%l
set post=!tmp:~0,6%!
set title=!tmp:~8!
set flag=0
set first=1
if "%%i"=="a href" if /i !post! GTR !last! if NOT "!post!"=="lastpo" (
if !post! GTR !lastcheck! set lastcheck=!post!
echo.
(echo.^<item^>
echo.^<title^>!title!^</title^>
echo.^<link^>http://bbs.et8.net/bbs/showthread.php?t=!post!^</link^>
echo.^<description^>
echo.^<
]^>
echo.^</description^>
echo.^<pubDate^>^</pubDate^>
echo.^<author^>^</author^>
echo.^</item^>)>>tmp.xml
)
)
@echo !lastcheck! >last.txt
By analyzing the content of the page, it can be found that the text of the main post is marked with <!-- message --> and <!-- / message --> as the start and end marks. So, use two extended environment variables, flag and first, as double switches to control the output of the text.
#### XML Formatting
Finally, use a for statement to format tmp.xml to be in line with RSS 2.0.
echo.
(echo.^<?xml version="1.0" encoding="GB18030"?^>
echo.^<rss version="2.0"^>
echo.^<channel^>
echo.^<title^>CCF精品技术论坛^</title^>
echo.^<link^>http://bbs.et8.net/^</link^>
echo.^<description^>CCF ClassiClubForm forum^</description^>
echo.^<language^>zh-cn^</language^>) >rss2.xml
for /f "delims=" %%x in (tmp.xml) do (
if %%x==^<
#### Complete Script
@echo off
setlocal ENABLEDELAYEDEXPANSION
curl -b cookie@et8.txt -A "Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.0)" -e "http://bbs.et8.net/" -o tmp1.txt http://bbs.et8.net/bbs/forumdisplay.php?f=17^&page=1^&pp=200^&daysprune=-1
iconv -c -f UTF-8 -t GB2312 tmp1.txt >temp.txt
for /f "delims=" %%z in (last.txt) do (set last=%%z)
set lastcheck=!last!
for /f "tokens=2-5 delims==<" %%i in ('findstr "showthread.php?s" temp.txt') do (
set tmp=%%l
set post=!tmp:~0,6%!
set title=!tmp:~8!
set flag=0
set first=1
if "%%i"=="a href" if /i !post! GTR !last! if NOT "!post!"=="lastpo" (
if !post! GTR !lastcheck! set lastcheck=!post!
echo.
(echo.^<item^>
echo.^<title^>!title!^</title^>
echo.^<link^>http://bbs.et8.net/bbs/showthread.php?t=!post!^</link^>
echo.^<description^>
echo.^<
]^>
echo.^</description^>
echo.^<pubDate^>^</pubDate^>
echo.^<author^>^</author^>
echo.^</item^>)>>tmp.xml
)
)
@echo !lastcheck! >last.txt
endlocal
echo.
(echo.^<?xml version="1.0" encoding="GB18030"?^>
echo.^<rss version="2.0"^>
echo.^<channel^>
echo.^<title^>CCF精品技术论坛^</title^>
echo.^<link^>http://bbs.et8.net/^</link^>
echo.^<description^>CCF ClassiClubForm forum^</description^>
echo.^<language^>zh-cn^</language^>) >rss2.xml
for /f "delims=" %%x in (tmp.xml) do (
if %%x==^<
Save it as et8-rss.cmd and you can double-click to run it.
### RSS Usage
Every time you want to read, just run et8-rss.cmd once to update the RSS and list the new posts. The generated rss2.xml can be added to an RSS reader, such as using the Sage extension of Firefox to read:

This article is original by chenke_ikari and first published on Dofu's Simple Hut
This article is licensed under the Creative Commons Attribution-NonCommercial-ShareAlike 2.5 China License

Last edited by ikari on 2006-8-2 at 13:40 ]