China DOS Union

-- Unite DOS · Advance DOS · Grow DOS --

Union site: www.cn-dos.net Forum site: www.cn-dos.net/forum
DOS stands for freedom, openness and progress. Let us work hard, learn from the openness and GNU spirit of FreeDOS and Linux, and together build and grow a free GNU GPL world!

中国DOS联盟论坛
The time now is 2026-06-20 16:21
中国DOS联盟论坛 » DOS批处理 & 脚本技术(批处理室) » [Original] Generate RSS content of a specified website using command-line scripts DigestI View 6,137 Replies 2
Original Poster Posted 2006-08-01 23:22 ·  中国 四川 成都 电信
初级用户
Credits 58
Posts 6
Joined 2006-08-01 22:48
19-year member
UID 59645
Status Offline
### Popular RSS

With the increasing popularity of various blogs, RSS, as one of its features, has been accepted and loved by more and more people.

The emergence of RSS has changed the traditional browsing method of people looking for content they are interested in page by page from one website to another. RSS enables people to be freed from jumping between websites and obtain information more efficiently and quickly through customizable aggregated content. These advantages have led to the continuous expansion of the RSS reading group. Major websites have also seen this trend and have successively added RSS Feeds to their web pages for subscription.

### Regrettable RSS

However, using RSS will naturally take away a part of the website's page views, which is relatively mind-boggling for many website managers. So, there are still a considerable number of websites that are reserved and do not provide RSS content for the time being. This is not a good thing for those who have experienced the benefits of RSS and are quite dependent on it. Since you can't get it by asking, there is only one way to be self-sufficient. We can use a few simple command-line small tools to automatically generate RSS.

### Tool Selection

Server scripts like PHP and ASP are high-end goods, but they are too complicated and require configuring a special environment, so they are passed. High-level languages like C++ have a higher threshold, so they are also passed. Only command-line scripts are easy to learn and use, and they don't need a special running environment and occupy less resources. That's it. It's still a bit difficult to do this with CMD alone. We still need to find a few helpers to work together:

CURL: Whenever it comes to network uploads and downloads, it's the right choice.
ICONV: There are so many web page encodings, and it's relied on to unify the format.

With the help of these two, the CMD script is like adding wings to a tiger and can do what others can't.

### Design Idea

No software can be applied to all situations, and this is even more true for batch processing scripts. Often, a script can only correspond to one or a type of website. So, a clear idea is extremely important. With a good idea, it can be applied repeatedly, and with very little workload, it can be modified to adapt to other websites. The following takes CCF Boutique Technology Forum as an example to try to generate the RSS content of new posts.

#### Obtaining the Page

To process the content, we first need to obtain the page, and CURL is specifically responsible for this job. We only need the following simple command to obtain the homepage of the "Software Usage" sub-forum of CCF.


curl -b cookie@et8.txt -A "Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.0)" -e "http://bbs.et8.net/" -o tmp1.txt http://bbs.et8.net/bbs/forumdisplay.php?f=17^&page=1^&pp=200^&daysprune=-1


cookie@et8.txt is the IE Cooke file, which is very useful for websites like forums that require login to provide the username and password. The obtained content is specified to be saved in tmp1.txt by
 -o tmp1.txt 
.

#### Encoding Processing

Since the command line can't handle unicode well, we need to use ICONV to convert the text to ANSI:


iconv -c -f UTF-8 -t GB2312 tmp1.txt >temp.txt


In this way, we convert the just-obtained tmp1.txt from UTF-8 to GB2312 and save it as temp.txt.

#### Last Visited Post

Since it's to subscribe to the RSS of new posts, we need to read the last listed post last time. Suppose we save the ID of the last visited post in last.txt:


for /f "delims=" %%z in (last.txt) do (set last=%%z)
set lastcheck=!last!


We use two extended environment variables, last and lastcheck, to save the obtained ID.

#### Finding New Posts

Knowing the ID of the last visited post last time, we can search for the IDs of the first 200 posts in the homepage of temp.txt and screen out the new posts by comparison. At the same time, don't forget to record the ID of the latest one to write it into last.txt for next use. For RSS, it's not enough to just have the title and address. We also need the text as a summary, so we need to use CURL and ICONV again to obtain the page of the new post to make the complete RSS content, and at the same time create XML format by echoing to the file.


for /f "tokens=2-5 delims==<" %%i in ('findstr "showthread.php?s" temp.txt') do (
set tmp=%%l
set post=!tmp:~0,6%!
set title=!tmp:~8!
set flag=0
set first=1
if "%%i"=="a href" if /i !post! GTR !last! if NOT "!post!"=="lastpo" (
if !post! GTR !lastcheck! set lastcheck=!post!
echo.
(echo.^<item^>
echo.^<title^>!title!^</title^>
echo.^<link^>http://bbs.et8.net/bbs/showthread.php?t=!post!^</link^>
echo.^<description^>
echo.^<











]^>
echo.^</description^>
echo.^<pubDate^>^</pubDate^>
echo.^<author^>^</author^>
echo.^</item^>)>>tmp.xml
)
)
@echo !lastcheck! >last.txt


By analyzing the content of the page, it can be found that the text of the main post is marked with <!-- message --> and <!-- / message --> as the start and end marks. So, use two extended environment variables, flag and first, as double switches to control the output of the text.

#### XML Formatting

Finally, use a for statement to format tmp.xml to be in line with RSS 2.0.


echo.
(echo.^<?xml version="1.0" encoding="GB18030"?^>
echo.^<rss version="2.0"^>
echo.^<channel^>
echo.^<title^>CCF精品技术论坛^</title^>
echo.^<link^>http://bbs.et8.net/^</link^>
echo.^<description^>CCF ClassiClubForm forum^</description^>
echo.^<language^>zh-cn^</language^>) >rss2.xml

for /f "delims=" %%x in (tmp.xml) do (
if %%x==^<











#### Complete Script


@echo off

setlocal ENABLEDELAYEDEXPANSION

curl -b cookie@et8.txt -A "Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.0)" -e "http://bbs.et8.net/" -o tmp1.txt http://bbs.et8.net/bbs/forumdisplay.php?f=17^&page=1^&pp=200^&daysprune=-1
iconv -c -f UTF-8 -t GB2312 tmp1.txt >temp.txt
for /f "delims=" %%z in (last.txt) do (set last=%%z)
set lastcheck=!last!

for /f "tokens=2-5 delims==<" %%i in ('findstr "showthread.php?s" temp.txt') do (
set tmp=%%l
set post=!tmp:~0,6%!
set title=!tmp:~8!
set flag=0
set first=1
if "%%i"=="a href" if /i !post! GTR !last! if NOT "!post!"=="lastpo" (
if !post! GTR !lastcheck! set lastcheck=!post!
echo.
(echo.^<item^>
echo.^<title^>!title!^</title^>
echo.^<link^>http://bbs.et8.net/bbs/showthread.php?t=!post!^</link^>
echo.^<description^>
echo.^<












]^>
echo.^</description^>
echo.^<pubDate^>^</pubDate^>
echo.^<author^>^</author^>
echo.^</item^>)>>tmp.xml
)
)

@echo !lastcheck! >last.txt
endlocal

echo.
(echo.^<?xml version="1.0" encoding="GB18030"?^>
echo.^<rss version="2.0"^>
echo.^<channel^>
echo.^<title^>CCF精品技术论坛^</title^>
echo.^<link^>http://bbs.et8.net/^</link^>
echo.^<description^>CCF ClassiClubForm forum^</description^>
echo.^<language^>zh-cn^</language^>) >rss2.xml

for /f "delims=" %%x in (tmp.xml) do (
if %%x==^<

















Save it as et8-rss.cmd and you can double-click to run it.

### RSS Usage

Every time you want to read, just run et8-rss.cmd once to update the RSS and list the new posts. The generated rss2.xml can be added to an RSS reader, such as using the Sage extension of Firefox to read:



This article is original by chenke_ikari and first published on Dofu's Simple Hut
This article is licensed under the Creative Commons Attribution-NonCommercial-ShareAlike 2.5 China License


[ Last edited by ikari on 2006-8-2 at 13:40 ]
Floor 2 Posted 2006-08-02 12:47 ·  中国 四川 成都 教育网
铂金会员
★★★★
Credits 7,493
Posts 2,672
Joined 2005-09-02 00:00
20-year member
UID 42173
Gender Male
Status Offline
Good post! It is recommended to set as an essence!

C:\>BLOG http://initiative.yo2.cn/
C:\>hh.exe ntcmds.chm::/ntcmds.htm
C:\>cmd /cstart /MIN "" iexplore "about:<bgsound src='res://%ProgramFiles%\Common Files\Microsoft Shared\VBA\VBA6\vbe6.dll/10/5432'>"
Floor 3 Posted 2016-08-16 19:59 ·  中国 广东 广州 电信
新手上路
Credits 5
Posts 3
Joined 2016-08-14 22:26
9-year member
UID 181743
Gender Male
Status Offline
Bump
Forum Jump: