### Popular RSS
With the increasing popularity of various blogs, RSS, as one of its features, has been accepted and loved by more and more people.
The emergence of RSS has changed the traditional browsing method of people looking for content they are interested in page by page from one website to another. RSS enables people to be freed from jumping between websites and obtain information more efficiently and quickly through customizable aggregated content. These advantages have led to the continuous expansion of the RSS reading group. Major websites have also seen this trend and have successively added RSS Feeds to their web pages for subscription.
### Regrettable RSS
However, using RSS will naturally take away a part of the website's page views, which is relatively mind-boggling for many website managers. So, there are still a considerable number of websites that are reserved and do not provide RSS content for the time being. This is not a good thing for those who have experienced the benefits of RSS and are quite dependent on it. Since you can't get it by asking, there is only one way to be self-sufficient. We can use a few simple command-line small tools to automatically generate RSS.
### Tool Selection
Server scripts like PHP and ASP are high-end goods, but they are too complicated and require configuring a special environment, so they are passed. High-level languages like C++ have a higher threshold, so they are also passed. Only command-line scripts are easy to learn and use, and they don't need a special running environment and occupy less resources. That's it. It's still a bit difficult to do this with CMD alone. We still need to find a few helpers to work together:
CURL: Whenever it comes to network uploads and downloads, it's the right choice.
ICONV: There are so many web page encodings, and it's relied on to unify the format.
With the help of these two, the CMD script is like adding wings to a tiger and can do what others can't.
### Design Idea
No software can be applied to all situations, and this is even more true for batch processing scripts. Often, a script can only correspond to one or a type of website. So, a clear idea is extremely important. With a good idea, it can be applied repeatedly, and with very little workload, it can be modified to adapt to other websites. The following takes CCF Boutique Technology Forum as an example to try to generate the RSS content of new posts.
#### Obtaining the Page
To process the content, we first need to obtain the page, and CURL is specifically responsible for this job. We only need the following simple command to obtain the homepage of the "Software Usage" sub-forum of CCF.
cookie@et8.txt is the IE Cooke file, which is very useful for websites like forums that require login to provide the username and password. The obtained content is specified to be saved in tmp1.txt by
#### Encoding Processing
Since the command line can't handle unicode well, we need to use ICONV to convert the text to ANSI:
In this way, we convert the just-obtained tmp1.txt from UTF-8 to GB2312 and save it as temp.txt.
#### Last Visited Post
Since it's to subscribe to the RSS of new posts, we need to read the last listed post last time. Suppose we save the ID of the last visited post in last.txt:
We use two extended environment variables, last and lastcheck, to save the obtained ID.
#### Finding New Posts
Knowing the ID of the last visited post last time, we can search for the IDs of the first 200 posts in the homepage of temp.txt and screen out the new posts by comparison. At the same time, don't forget to record the ID of the latest one to write it into last.txt for next use. For RSS, it's not enough to just have the title and address. We also need the text as a summary, so we need to use CURL and ICONV again to obtain the page of the new post to make the complete RSS content, and at the same time create XML format by echoing to the file.
By analyzing the content of the page, it can be found that the text of the main post is marked with <!-- message --> and <!-- / message --> as the start and end marks. So, use two extended environment variables, flag and first, as double switches to control the output of the text.
#### XML Formatting
Finally, use a for statement to format tmp.xml to be in line with RSS 2.0.
With the increasing popularity of various blogs, RSS, as one of its features, has been accepted and loved by more and more people.
The emergence of RSS has changed the traditional browsing method of people looking for content they are interested in page by page from one website to another. RSS enables people to be freed from jumping between websites and obtain information more efficiently and quickly through customizable aggregated content. These advantages have led to the continuous expansion of the RSS reading group. Major websites have also seen this trend and have successively added RSS Feeds to their web pages for subscription.
### Regrettable RSS
However, using RSS will naturally take away a part of the website's page views, which is relatively mind-boggling for many website managers. So, there are still a considerable number of websites that are reserved and do not provide RSS content for the time being. This is not a good thing for those who have experienced the benefits of RSS and are quite dependent on it. Since you can't get it by asking, there is only one way to be self-sufficient. We can use a few simple command-line small tools to automatically generate RSS.
### Tool Selection
Server scripts like PHP and ASP are high-end goods, but they are too complicated and require configuring a special environment, so they are passed. High-level languages like C++ have a higher threshold, so they are also passed. Only command-line scripts are easy to learn and use, and they don't need a special running environment and occupy less resources. That's it. It's still a bit difficult to do this with CMD alone. We still need to find a few helpers to work together:
CURL: Whenever it comes to network uploads and downloads, it's the right choice.
ICONV: There are so many web page encodings, and it's relied on to unify the format.
With the help of these two, the CMD script is like adding wings to a tiger and can do what others can't.
### Design Idea
No software can be applied to all situations, and this is even more true for batch processing scripts. Often, a script can only correspond to one or a type of website. So, a clear idea is extremely important. With a good idea, it can be applied repeatedly, and with very little workload, it can be modified to adapt to other websites. The following takes CCF Boutique Technology Forum as an example to try to generate the RSS content of new posts.
#### Obtaining the Page
To process the content, we first need to obtain the page, and CURL is specifically responsible for this job. We only need the following simple command to obtain the homepage of the "Software Usage" sub-forum of CCF.
curl -b cookie@et8.txt -A "Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.0)" -e "http://bbs.et8.net/" -o tmp1.txt http://bbs.et8.net/bbs/forumdisplay.php?f=17^&page=1^&pp=200^&daysprune=-1
cookie@et8.txt is the IE Cooke file, which is very useful for websites like forums that require login to provide the username and password. The obtained content is specified to be saved in tmp1.txt by
-o tmp1.txt .#### Encoding Processing
Since the command line can't handle unicode well, we need to use ICONV to convert the text to ANSI:
iconv -c -f UTF-8 -t GB2312 tmp1.txt >temp.txt
In this way, we convert the just-obtained tmp1.txt from UTF-8 to GB2312 and save it as temp.txt.
#### Last Visited Post
Since it's to subscribe to the RSS of new posts, we need to read the last listed post last time. Suppose we save the ID of the last visited post in last.txt:
for /f "delims=" %%z in (last.txt) do (set last=%%z)
set lastcheck=!last!
We use two extended environment variables, last and lastcheck, to save the obtained ID.
#### Finding New Posts
Knowing the ID of the last visited post last time, we can search for the IDs of the first 200 posts in the homepage of temp.txt and screen out the new posts by comparison. At the same time, don't forget to record the ID of the latest one to write it into last.txt for next use. For RSS, it's not enough to just have the title and address. We also need the text as a summary, so we need to use CURL and ICONV again to obtain the page of the new post to make the complete RSS content, and at the same time create XML format by echoing to the file.
for /f "tokens=2-5 delims==<" %%i in ('findstr "showthread.php?s" temp.txt') do (
set tmp=%%l
set post=!tmp:~0,6%!
set title=!tmp:~8!
set flag=0
set first=1
if "%%i"=="a href" if /i !post! GTR !last! if NOT "!post!"=="lastpo" (
if !post! GTR !lastcheck! set lastcheck=!post!
echo.
(echo.^<item^>
echo.^<title^>!title!^</title^>
echo.^<link^>http://bbs.et8.net/bbs/showthread.php?t=!post!^</link^>
echo.^<description^>
echo.^<
]^>
echo.^</description^>
echo.^<pubDate^>^</pubDate^>
echo.^<author^>^</author^>
echo.^</item^>)>>tmp.xml
)
)
@echo !lastcheck! >last.txt
By analyzing the content of the page, it can be found that the text of the main post is marked with <!-- message --> and <!-- / message --> as the start and end marks. So, use two extended environment variables, flag and first, as double switches to control the output of the text.
#### XML Formatting
Finally, use a for statement to format tmp.xml to be in line with RSS 2.0.
echo.
(echo.^<?xml version="1.0" encoding="GB18030"?^>
echo.^<rss version="2.0"^>
echo.^<channel^>
echo.^<title^>CCF精品技术论坛^</title^>
echo.^<link^>http://bbs.et8.net/^</link^>
echo.^<description^>CCF ClassiClubForm forum^</description^>
echo.^<language^>zh-cn^</language^>) >rss2.xml
for /f "delims=" %%x in (tmp.xml) do (
if %%x==^<
#### Complete Script
@echo off
setlocal ENABLEDELAYEDEXPANSION
curl -b cookie@et8.txt -A "Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.0)" -e "http://bbs.et8.net/" -o tmp1.txt http://bbs.et8.net/bbs/forumdisplay.php?f=17^&page=1^&pp=200^&daysprune=-1
iconv -c -f UTF-8 -t GB2312 tmp1.txt >temp.txt
for /f "delims=" %%z in (last.txt) do (set last=%%z)
set lastcheck=!last!
for /f "tokens=2-5 delims==<" %%i in ('findstr "showthread.php?s" temp.txt') do (
set tmp=%%l
set post=!tmp:~0,6%!
set title=!tmp:~8!
set flag=0
set first=1
if "%%i"=="a href" if /i !post! GTR !last! if NOT "!post!"=="lastpo" (
if !post! GTR !lastcheck! set lastcheck=!post!
echo.
(echo.^<item^>
echo.^<title^>!title!^</title^>
echo.^<link^>http://bbs.et8.net/bbs/showthread.php?t=!post!^</link^>
echo.^<description^>
echo.^<
]^>
echo.^</description^>
echo.^<pubDate^>^</pubDate^>
echo.^<author^>^</author^>
echo.^</item^>)>>tmp.xml
)
)
@echo !lastcheck! >last.txt
endlocal
echo.
(echo.^<?xml version="1.0" encoding="GB18030"?^>
echo.^<rss version="2.0"^>
echo.^<channel^>
echo.^<title^>CCF精品技术论坛^</title^>
echo.^<link^>http://bbs.et8.net/^</link^>
echo.^<description^>CCF ClassiClubForm forum^</description^>
echo.^<language^>zh-cn^</language^>) >rss2.xml
for /f "delims=" %%x in (tmp.xml) do (
if %%x==^<
Save it as et8-rss.cmd and you can double-click to run it.
### RSS Usage
Every time you want to read, just run et8-rss.cmd once to update the RSS and list the new posts. The generated rss2.xml can be added to an RSS reader, such as using the Sage extension of Firefox to read:

This article is original by chenke_ikari and first published on Dofu's Simple Hut
This article is licensed under the Creative Commons Attribution-NonCommercial-ShareAlike 2.5 China License

[ Last edited by ikari on 2006-8-2 at 13:40 ]

DigestI