China DOS Union

-- Unite DOS · Advance DOS · Grow DOS --

Union site: www.cn-dos.net Forum site: www.cn-dos.net/forum
DOS stands for freedom, openness and progress. Let us work hard, learn from the openness and GNU spirit of FreeDOS and Linux, and together build and grow a free GNU GPL world!

中国DOS联盟论坛
The time now is 2026-06-24 05:11
中国DOS联盟论坛 » DOS批处理 & 脚本技术(批处理室) » [Original] Batch Intelligent Download of Web Page Images [For Discussion] View 3,980 Replies 23
Original Poster Posted 2008-03-14 17:00 ·  中国 安徽 马鞍山 电信
中级用户
★★
拟谷盗
Credits 312
Posts 108
Joined 2007-01-21 11:36
19-year member
UID 77238
Gender Male
Status Offline
It's not convenient to surf the Internet at school all the time. It's rare to have the opportunity to study something. I take it out to start a discussion, and everyone can discuss batch downloading and intelligent downloading together.

Requirement:

I saw some nice pictures on 6dzone.com,
http://www.6dzone.com/photo/wyxc/userphoto.asp?pid=8029
http://www.6dzone.com/photo/wyxc/userphoto.asp?pid=10369
But it's really troublesome to save the pictures. Originally, the Internet speed is extremely slow, and I need to open one web page after another, then right-click the picture and "Save as". I still consider batch downloading.


Analysis:

The picture batch downloading tool that is used more currently seems to be GlobalFetch. A lot of pictures are downloaded, but 99% are not what I want.

In addition, Thunder and FlashGet both have batch downloading functions, but they must have download addresses with common characteristics, and are only suitable for downloading files with addresses like ***001.jpg, ***002.jpg, ***003.jpg...

I haven't found a software that can meet my requirements yet. It's still better to write a program to simulate user operations. After all, data is all completed through http, and some commonalities can also be found in the html code. Just leave these repetitive operations to the program to execute.

Each language has its own characteristics. It's really convenient and efficient to do these things with cmd batch, write and run immediately. But it's very troublesome to implement with a high-level language. "Why use a butcher's knife to kill a chicken"!


Required command-line tools

curl ——A powerful command-line browser and download tool
http://www.cn-dos.net/forum/viewthread.php?tid=20453&fpage=1&highlight=curl

wget ——A powerful command-line download tool
http://baike.baidu.com/view/1312507.htm

sed ——A powerful command-line stream editor
http://www.cn-dos.net/forum/viewthread.php?tid=24210&fpage=1&highlight=sed%20%2B%20wget%20%2B%20%E6%AD%A3%E5%88%99

To use sed, grep, awk and other editors well, one needs to master regular expressions

Regular expression
http://www.cn-dos.net/forum/viewthread.php?tid=24206&fpage=1&highlight=sed%20%2B%20wget%20%2B%20%E6%AD%A3%E5%88%99


To solve a problem, one must first have an environment. After all, one plan can't cover all problems. It's only for downloading albums from 6dzone. I personally like to use Maxthon browser. First, add the favorite album URLs to the favorites, then export the bookmarked URLs to a bookmark.html file.

Use sed to parse the bookmark.htm file and get the desired URLs,
Find the lines with URLs
sed "/photo/!d" bookmark.htm

Or
sed -n "/photo/p" bookmark.htm

Then get the URLs in quotes. Combined, it is: (where the ASCII value of " is 34, converted to regular expression is \x22)
sed "/photo/!d;s/*\x22//;s/\x22.*//" bookmark.htm



In addition, 6dzone requires registered user authentication and login to see large pictures. One needs to use curl to simulate user login and export cookies

First, one needs to analyze the web page code and its form. It is recommended to use the View page plugin. In addition, IE's httpwatch and Firefox's TamperData are very good plugins!

There are generally 2 methods for curl to submit forms: get method and post method. This depends on the method of the form. In addition, one also needs to analyze Action and the Name to be used for submitting the form

The following is the html code of the cn-dos forum login page
----------------------------------------------------------------------------------------------------------
<FORM action=logging.php?action=login method=post><INPUT type=hidden value=28c5c8a4 name=formhash> <INPUT type=hidden value=http://www.cn-dos.net/forum/viewthread.php?tid=22634&amp;fpage=1&amp;highlight=sed%20%2B%20wget%20%2B%20%E6%AD%A3%E5%88%99&amp;sid=FHJYXn name=referer>
<TABLE cellSpacing=0 cellPadding=0 width="99%" align=center border=0>
<TBODY>
<TR>
<TD bgColor=#dde3ec>
<TABLE cellSpacing=1 cellPadding=4 width="100%" border=0>
<TBODY>
<TR class=header>
<TD colSpan=2>Member Login</TD></TR>
<TR>
<TD bgColor=#f8f9fc>Invisible Login:</TD>
<TD class=smalltxt bgColor=#ffffff><SELECT name=loginmode> <OPTION value="" selected>- Use Default -</OPTION> <OPTION value=normal>Normal Mode</OPTION> <OPTION value=invisible>Invisible Mode</OPTION></SELECT> </TD></TR>
<TR>
<TD bgColor=#f8f9fc>Interface Style:</TD>
<TD bgColor=#ffffff><SELECT name=styleid><OPTION value="" selected>- Use Default -</OPTION> <OPTION value=1>Default Style</OPTION></SELECT> </TD></TR>
<TR>
<TD bgColor=#f8f9fc>Cookie Validity Period:</TD>
<TD class=smalltxt bgColor=#ffffff><INPUT type=radio value=31536000 name=cookietime> One Year &nbsp; <INPUT style="BACKGROUND: #ffffcc" type=radio CHECKED value=31536000 name=cookietime> One Month &nbsp; <INPUT type=radio value=86400 name=cookietime> One Day &nbsp; <INPUT type=radio value=0 name=cookietime> Browser Process &nbsp; &nbsp; <A href="faq.php?page=usermaint#2" target=_blank></A></TD></TR>
<TR>
<TD bgColor=#ffffff colSpan=2 height=1></TD></TR>
<TR>
<TD align=middle colSpan=2><FONT color=red>Note:</FONT> For the first time <B>logging in</B> to the converted PHP forum for old users, please repair the password first. For details, please see <A href="http://www.cn-dos.net/forum/announcement.php?id=2#2">Forum Announcement</A>.</TD></TR>
<TR>
<TR>
<TD width="21%" bgColor=#f8f9fc>Username (Required):</TD>
<TD bgColor=#ffffff><INPUT style="BACKGROUND: #ffffcc" tabIndex=1 maxLength=40 size=25 name=username> &nbsp;<SPAN class=smalltxt><A href="register.php">Register Now</A></SPAN></TD></TR>
<TR>
<TD bgColor=#f8f9fc>Password (Required):</TD>
<TD bgColor=#ffffff><INPUT style="BACKGROUND: #ffffcc" tabIndex=2 type=password size=25 value="" name=password> &nbsp;<SPAN class=smalltxt><A href="member.php?action=lostpasswd">Forgot Password</A></SPAN></TD></TR>
----------------------------------------------------------------------------------------------------------

Use curl to log in to the cn-dos forum
curl -d "username=ngd&password=cndos" http://www.cn-dos.net/forum/logging.php?action=login


By the way, the code to open and log in directly in the browser can use the following code
http://www.cn-dos.net/forum/logging.php?action=login&username=ngd&password=cndos&loginsubmit=.


Save the logged-in cookie in 6dzonecookie.txt
curl -c 6dzonecookie.txt


Use the cookie file
curl -b 6dzonecookie.txt


Pretend to be an IE browser
curl -A "Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.01)"


Combined, it is

curl -c 6dzonecookie.txt -A "Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.01)" -d "username=dddddd6&pwd=cndos" http://www.6dzone.com/user/f_login.asp>nul


The simplest usage of wget
wget http://www.cn-dos.net/forum/images/default/logo.gif


My network is not very stable, and the Internet speed is extremely slow. Add some more parameters
wget -t 8 -w 3 -T 30 -c -N


There's not much to say about the others. It's mainly about the usage of sed. Just add a for nested loop.

The entire code:

@echo off
rem code by 拟谷盗 for download 6dzone photo.

curl -c 6dzonecookie.txt -A "Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.01)" -d "username=dddddd6&pwd=cndos" http://www.6dzone.com/user/f_login.asp>nul

for /f "delims=" %%a in ('sed "/photo/!d;s/*\x22//;s/\x22.*//" bookmark.htm' do (
for /f "usebackq delims=" %%b in (`curl %%a ^| sed "/pic_id/!d;s/*\x22//;s/\x22.*//;s/photo.asp/pic.asp/g;s/\/photo/http:\/\/www.6dzone.com&/g"`) do (
curl -b 6dzonecookie.txt -A "Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.01)" %%b | sed "/http:\/\/.*jpg/!d;s/.*http/http/g;s/jpg.*/jpg/g" >>picurl.list
wget -t 8 -w 3 -T 30 -c -N -i picurl.list
del picurl.list
)
)
del 6dzonecookie.txt
exit/b

Roughly looking at it, it's written very messily. It's like this for now. I'll organize it later.

In addition, the code is not concise enough. Can any expert help modify it? Then it will be convenient to download web page pictures in the future!

curl+wget+sed+bat file download
http://upload.cn-dos.net/img/095.rar
Extract it and run the bat file to download pictures

[ Last edited by ngd on 2008-3-15 at 02:05 PM ]
Recent Ratings for This Post ( 2 in total) Click for details
RaterScoreTime
plp626 +4 2008-03-14 18:21
junchen2 +4 2008-03-15 02:22
FLOSS
Floor 2 Posted 2008-03-14 18:21 ·  中国 陕西 西安 电信
银牌会员
★★★★
钻石会员
Credits 2,278
Posts 1,020
Joined 2007-11-19 13:34
18-year member
UID 103127
Gender Male
Status Offline
Learning wget and curl, and it's very beneficial to have such posts from the owner as examples. Here, I add points to encourage the owner to contribute next time.

[ Last edited by plp626 on 2008-3-14 at 06:41 PM ]
山外有山,人外有人;低调做人,努力做事。

进入网盘(各种工具)~~ 空间~~cmd学习
Floor 3 Posted 2008-03-14 19:51 ·  中国 陕西 西安 电信
银牌会员
★★★★
钻石会员
Credits 2,278
Posts 1,020
Joined 2007-11-19 13:34
18-year member
UID 103127
Gender Male
Status Offline
Ask the LZ: (I want to write a batch to download songs in bulk.)

The URL address of Baidu's MP3 songs seems to be encrypted. Before, there was a post in the forum where vkill posted a batch to download Baidu songs, but the test never succeeded. I didn't understand it.
For example, search for "Must Love" on Baidu.
Then find the URL of the third song option in the source file as:
http://www.yantaiblog.net/UploadFiles/2007-7/aGhmbGtmbGsx.mp3
而 the actual link address is:
http://www.yantaiblog.net/UploadFiles/2007-7/76386053.mp3
Using ctrl+f to search the entire source file, I can't find any 76386053 characters.

Please guide me, how to directly get this link address
Thank you! Grateful!
山外有山,人外有人;低调做人,努力做事。

进入网盘(各种工具)~~ 空间~~cmd学习
Floor 4 Posted 2008-03-14 23:30 ·  中国 安徽 马鞍山 电信
中级用户
★★
拟谷盗
Credits 312
Posts 108
Joined 2007-01-21 11:36
19-year member
UID 77238
Gender Male
Status Offline
Just now I thought it was the XX website using VirtualWall anti-hotlinking technology. It turned out to be Baidu's own encryption algorithm. Directly finding the actual address from the search results doesn't seem to work.

I still have to find a link from the search results and crawl again. But the link address is too long and there are a lot of symbols. Using curl doesn't work. I can use wget "url" -O file to generate a temporary file, and then find the actual address of the mp3 in this temporary file.

wget -O tmp.htm "http://220.181.38.82/m?ct=134217728&tn=baidusg,不得不爱 &word=mp3,http://www.yantaiblog.net/UploadFiles/2007-7/aGhmbGtmbGsx.mp3,,&si=%B2%BB%B5%C3%B2%BB%B0%AE;;%C5%CB%E7%E2%B0%D8;;51913;;51913&lm=16777216"


sed "/www.*mp3/!d;s/.*www/http:\/\/www/;s/mp3.*/mp3/" tmp.htm

You can give it a try. Although this is a bit troublesome, it still works! look



Looking forward to seeing your program soon, and wishing you all the best!
FLOSS
Floor 5 Posted 2008-03-14 23:45 ·  中国 福建 福州 联通
高级用户
★★
Credits 581
Posts 277
Joined 2006-12-23 05:10
19-year member
UID 74328
Gender Male
Status Offline
It's really not generally powerful when used well
Floor 6 Posted 2008-03-14 23:47 ·  中国 陕西 西安 电信
银牌会员
★★★★
钻石会员
Credits 2,278
Posts 1,020
Joined 2007-11-19 13:34
18-year member
UID 103127
Gender Male
Status Offline
Thanks!
Although the basic skills are still very poor, but with the ideas provided by the brother, I am very confident about this. I am not familiar with wget yet,
Continue to study, I believe that practice makes perfect...
山外有山,人外有人;低调做人,努力做事。

进入网盘(各种工具)~~ 空间~~cmd学习
Floor 7 Posted 2008-03-15 00:16 ·  中国 安徽 马鞍山 电信
中级用户
★★
拟谷盗
Credits 312
Posts 108
Joined 2007-01-21 11:36
19-year member
UID 77238
Gender Male
Status Offline
o(∩_∩)o... Let's progress together.
It's rare to meet like-minded friends. What we pursue is all about freedom, openness, and sharing!
I feel that many users in the forum just register to ask questions, and it seems they don't have the habit of searching ~_~

Also, I seem to see you are counting the number of students in the forum.
Here I report.

The education in China always makes people feel indignant! When going to college, I always don't know what I should do. All the courses I have come into contact with, I don't understand how they will be used in future work. For me, if I don't know what can be achieved by doing this, then there is no need to study. It's still good to stroll around such a forum!
FLOSS
Floor 8 Posted 2008-03-15 03:09 ·  中国 陕西 西安 电信
银牌会员
★★★★
钻石会员
Credits 2,278
Posts 1,020
Joined 2007-11-19 13:34
18-year member
UID 103127
Gender Male
Status Offline
o(∩_∩)o... True friends are hard to find, and being able to chat together is fate.

About to graduate, only then do I deeply realize the happiness of students... It's useless to sigh.
When just entering college, perhaps one doesn't know what to do, and by the time one knows what to do, time doesn't allow, so learn more beneficial things.

Life has no回头路, hope brother, during college time, can leave giant footprints ^_^

oooO.............
(....)... Oooo...
.\..(.....(.....)...
..\_)..... )../....
.......... (_/.....
oooO.............
(....)... Oooo...
.\..(.....(.....)...
..\_)..... )../....
.......... (_/.....
oooO.............
(....)... Oooo...
.\..(.....(.....)...
..\_)..... )../....
.......... (_/.....
oooO.............
(....)... Oooo...
.\..(.....(.....)...
..\_)..... )../....
.......... (_/.....
oooO.............
(....)... Oooo...
.\..(.....(.....)...
..\_)..... )../....
.......... (_/.....
oooO.............
(....)... Oooo...
.\..(.....(.....)...
..\_)..... )../....
.......... (_/.....
oooO.............
(....)... Oooo...
.\..(.....(.....)...
..\_)..... )../....
.......... (_/.....
oooO.............
(....)... Oooo...
.\..(.....(.....)...
..\_)..... )../....
.......... (_/.....
oooO.............
(....)... Oooo...
.\..(.....(.....)...
..\_)..... )../....
.......... (_/.....
oooO.............
(....)... Oooo...
.\..(.....(.....)...
..\_)..... )../....
.......... (_/.....

[ Last edited by plp626 on 2008-3-15 at 03:11 AM ]
山外有山,人外有人;低调做人,努力做事。

进入网盘(各种工具)~~ 空间~~cmd学习
Floor 9 Posted 2008-03-15 11:49 ·  中国 北京 海淀区 联通
银牌会员
★★★
Credits 1,287
Posts 634
Joined 2007-05-02 15:06
19-year member
UID 87277
Gender Male
From cmd.exe
Status Offline
Directly search and locate "Song source:", and the URL following it is it.
Floor 10 Posted 2008-03-15 11:54 ·  中国 江西 南昌 电信
新手上路
Credits 4
Posts 2
Joined 2008-02-17 21:13
18-year member
UID 110851
Gender Male
Status Offline
Floor 11 Posted 2008-03-15 12:01 ·  中国 北京 海淀区 联通
银牌会员
★★★
Credits 1,287
Posts 634
Joined 2007-05-02 15:06
19-year member
UID 87277
Gender Male
From cmd.exe
Status Offline
Got it.
Directly
http://mp3.baidu.com/u?u=The encrypted address
You can directly download it with curl/wget

Then the decryption method from CSDN (Since Baidu can decrypt it by itself, there's no need to write our own.)
#!/bin/sh
# Author Liang
# Modified at June 14 2004
#
rm mp3.list html.list link.list mp3topsong.html
wget http://list.mp3.baidu.com/topso/mp3topsong.html
cat mp3topsong.html | tr \" \\n | grep htm$ >html.list

CC=1
for VAL in `cat html.list`
do
wget http://list.mp3.baidu.com/topso/$VAL -O $CC.html

cat $CC.html | tr \" \\n | grep mp3\$ | grep http | head -1 >> mp3.list
echo -ne "$CC " >> link.list
cat $CC.html | tr \" \\n | grep mp3\$ | grep http | head -1 >> link.list


CC=`expr $CC + 1`

done

CC=1
for VAL in `cat mp3.list`
do

echo $CC
wget $VAL -O $CC.mp3
echo $VAL

CC=`expr $CC + 1`

done

more mp3topsong.html | sed "s/target=_blank/\\n/g" | grep ^\> | grep href | cut -f1 -d\< | cut -f2 -d\> |grep ^ > name.l
ist

CC=1
while
do
NAME=`grep ^$CC' ' name.list | gawk '{print $2}'`
LINK=`grep ^$CC' ' link.list | gawk '{print $2}'`

echo $NAME $LINK
CC=`expr $CC + 1`
done
Floor 12 Posted 2008-03-15 21:01 ·  中国 安徽 马鞍山 电信
中级用户
★★
拟谷盗
Credits 312
Posts 108
Joined 2007-01-21 11:36
19-year member
UID 77238
Gender Male
Status Offline

Directly search and locate "Song source:", and the URL follows it

Speechless

Got it. Directly http://mp3.baidu.com/u?u= the encrypted address
You can download it directly with curl/wget

Has LS tried it and succeeded!?
Then the decryption method from CSDN (Since Baidu can decrypt it by itself, there's no need to write our own.)

It would be best if you can write the decryption algorithm.

Just now I was a bit excited when I saw this bash code. I originally planned to "translate" it into cmd batch to taste the differences between the two shells.
But then LS let people down again. The 3rd line is wrong (excluding comment lines and blank lines). After all, it's code from 2004, which is no longer functional. Lines 3~17 complete the mp3 download function but there is no decryption algorithm among them. ~_~!!
Has LS read the code!?

[ Last edited by ngd on 2008-3-16 at 11:29 AM ]
FLOSS
Floor 13 Posted 2008-03-15 21:12 ·  中国 安徽 马鞍山 电信
中级用户
★★
拟谷盗
Credits 312
Posts 108
Joined 2007-01-21 11:36
19-year member
UID 77238
Gender Male
Status Offline
Originally posted by plp626 at 2008-3-15 03:09:
o(∩_∩)o... There are few bosom friends, and it's fate to chat together.

When it's time to know what to do, time is not...

Congratulations to you for finding the sense of direction in study and life o(∩_∩)o...
Mansmost precious thing is life.Life is given only once to each person.In ones lifetime,he should not regret having wasted his youth,nor be ashamed of having done nothing.

[ Last edited by ngd on 2008-3-15 at 09:15 PM ]
FLOSS
Floor 14 Posted 2008-03-16 19:00 ·  中国 北京 海淀区 联通
银牌会员
★★★
Credits 1,287
Posts 634
Joined 2007-05-02 15:06
19-year member
UID 87277
Gender Male
From cmd.exe
Status Offline
Got it. Directly http://mp3.baidu.com/u?u= the encrypted address. You can just download it directly with curl/wget.
Tried and it worked.
Floor 15 Posted 2008-03-17 00:33 ·  中国 安徽 马鞍山 电信
中级用户
★★
拟谷盗
Credits 312
Posts 108
Joined 2007-01-21 11:36
19-year member
UID 77238
Gender Male
Status Offline
Originally posted by knoppix7 at 2008-3-16 19:00:
Got it. Directly use the http://mp3.baidu.com/u?u= encrypted address.
You can download it directly with curl/wget.
Tried and succeeded.

Successfully downloaded error.html
FLOSS
Forum Jump: