It's not convenient to surf the Internet at school all the time. It's rare to have the opportunity to study something. I take it out to start a discussion, and everyone can discuss batch downloading and intelligent downloading together.
Requirement:
I saw some nice pictures on 6dzone.com,
http://www.6dzone.com/photo/wyxc/userphoto.asp?pid=8029
http://www.6dzone.com/photo/wyxc/userphoto.asp?pid=10369
But it's really troublesome to save the pictures. Originally, the Internet speed is extremely slow, and I need to open one web page after another, then right-click the picture and "Save as". I still consider batch downloading.
Analysis:
The picture batch downloading tool that is used more currently seems to be GlobalFetch. A lot of pictures are downloaded, but 99% are not what I want.
In addition, Thunder and FlashGet both have batch downloading functions, but they must have download addresses with common characteristics, and are only suitable for downloading files with addresses like ***001.jpg, ***002.jpg, ***003.jpg...
I haven't found a software that can meet my requirements yet. It's still better to write a program to simulate user operations. After all, data is all completed through http, and some commonalities can also be found in the html code. Just leave these repetitive operations to the program to execute.
Each language has its own characteristics. It's really convenient and efficient to do these things with cmd batch, write and run immediately. But it's very troublesome to implement with a high-level language. "Why use a butcher's knife to kill a chicken"!
Required command-line tools
curl ——A powerful command-line browser and download tool
http://www.cn-dos.net/forum/viewthread.php?tid=20453&fpage=1&highlight=curl
wget ——A powerful command-line download tool
http://baike.baidu.com/view/1312507.htm
sed ——A powerful command-line stream editor
http://www.cn-dos.net/forum/viewthread.php?tid=24210&fpage=1&highlight=sed%20%2B%20wget%20%2B%20%E6%AD%A3%E5%88%99
To use sed, grep, awk and other editors well, one needs to master regular expressions
Regular expression
http://www.cn-dos.net/forum/viewthread.php?tid=24206&fpage=1&highlight=sed%20%2B%20wget%20%2B%20%E6%AD%A3%E5%88%99
To solve a problem, one must first have an environment. After all, one plan can't cover all problems. It's only for downloading albums from 6dzone. I personally like to use Maxthon browser. First, add the favorite album URLs to the favorites, then export the bookmarked URLs to a bookmark.html file.
Use sed to parse the bookmark.htm file and get the desired URLs,
Find the lines with URLs
Or
Then get the URLs in quotes. Combined, it is: (where the ASCII value of " is 34, converted to regular expression is \x22)
In addition, 6dzone requires registered user authentication and login to see large pictures. One needs to use curl to simulate user login and export cookies
First, one needs to analyze the web page code and its form. It is recommended to use the View page plugin. In addition, IE's httpwatch and Firefox's TamperData are very good plugins!
There are generally 2 methods for curl to submit forms: get method and post method. This depends on the method of the form. In addition, one also needs to analyze Action and the Name to be used for submitting the form
The following is the html code of the cn-dos forum login page
----------------------------------------------------------------------------------------------------------
<FORM action=logging.php?action=login method=post><INPUT type=hidden value=28c5c8a4 name=formhash> <INPUT type=hidden value=http://www.cn-dos.net/forum/viewthread.php?tid=22634&fpage=1&highlight=sed%20%2B%20wget%20%2B%20%E6%AD%A3%E5%88%99&sid=FHJYXn name=referer>
<TABLE cellSpacing=0 cellPadding=0 width="99%" align=center border=0>
<TBODY>
<TR>
<TD bgColor=#dde3ec>
<TABLE cellSpacing=1 cellPadding=4 width="100%" border=0>
<TBODY>
<TR class=header>
<TD colSpan=2>Member Login</TD></TR>
<TR>
<TD bgColor=#f8f9fc>Invisible Login:</TD>
<TD class=smalltxt bgColor=#ffffff><SELECT name=loginmode> <OPTION value="" selected>- Use Default -</OPTION> <OPTION value=normal>Normal Mode</OPTION> <OPTION value=invisible>Invisible Mode</OPTION></SELECT> </TD></TR>
<TR>
<TD bgColor=#f8f9fc>Interface Style:</TD>
<TD bgColor=#ffffff><SELECT name=styleid><OPTION value="" selected>- Use Default -</OPTION> <OPTION value=1>Default Style</OPTION></SELECT> </TD></TR>
<TR>
<TD bgColor=#f8f9fc>Cookie Validity Period:</TD>
<TD class=smalltxt bgColor=#ffffff><INPUT type=radio value=31536000 name=cookietime> One Year <INPUT style="BACKGROUND: #ffffcc" type=radio CHECKED value=31536000 name=cookietime> One Month <INPUT type=radio value=86400 name=cookietime> One Day <INPUT type=radio value=0 name=cookietime> Browser Process <A href="faq.php?page=usermaint#2" target=_blank></A></TD></TR>
<TR>
<TD bgColor=#ffffff colSpan=2 height=1></TD></TR>
<TR>
<TD align=middle colSpan=2><FONT color=red>Note:</FONT> For the first time <B>logging in</B> to the converted PHP forum for old users, please repair the password first. For details, please see <A href="http://www.cn-dos.net/forum/announcement.php?id=2#2">Forum Announcement</A>.</TD></TR>
<TR>
<TR>
<TD width="21%" bgColor=#f8f9fc>Username (Required):</TD>
<TD bgColor=#ffffff><INPUT style="BACKGROUND: #ffffcc" tabIndex=1 maxLength=40 size=25 name=username> <SPAN class=smalltxt><A href="register.php">Register Now</A></SPAN></TD></TR>
<TR>
<TD bgColor=#f8f9fc>Password (Required):</TD>
<TD bgColor=#ffffff><INPUT style="BACKGROUND: #ffffcc" tabIndex=2 type=password size=25 value="" name=password> <SPAN class=smalltxt><A href="member.php?action=lostpasswd">Forgot Password</A></SPAN></TD></TR>
----------------------------------------------------------------------------------------------------------
Use curl to log in to the cn-dos forum
By the way, the code to open and log in directly in the browser can use the following code
Save the logged-in cookie in 6dzonecookie.txt
Use the cookie file
Pretend to be an IE browser
Combined, it is
The simplest usage of wget
My network is not very stable, and the Internet speed is extremely slow. Add some more parameters
There's not much to say about the others. It's mainly about the usage of sed. Just add a for nested loop.
The entire code:
Roughly looking at it, it's written very messily. It's like this for now. I'll organize it later.
In addition, the code is not concise enough. Can any expert help modify it? Then it will be convenient to download web page pictures in the future!

curl+wget+sed+bat file download
http://upload.cn-dos.net/img/095.rar
Extract it and run the bat file to download pictures
[ Last edited by ngd on 2008-3-15 at 02:05 PM ]
Requirement:
I saw some nice pictures on 6dzone.com,
http://www.6dzone.com/photo/wyxc/userphoto.asp?pid=8029
http://www.6dzone.com/photo/wyxc/userphoto.asp?pid=10369
But it's really troublesome to save the pictures. Originally, the Internet speed is extremely slow, and I need to open one web page after another, then right-click the picture and "Save as". I still consider batch downloading.
Analysis:
The picture batch downloading tool that is used more currently seems to be GlobalFetch. A lot of pictures are downloaded, but 99% are not what I want.
In addition, Thunder and FlashGet both have batch downloading functions, but they must have download addresses with common characteristics, and are only suitable for downloading files with addresses like ***001.jpg, ***002.jpg, ***003.jpg...
I haven't found a software that can meet my requirements yet. It's still better to write a program to simulate user operations. After all, data is all completed through http, and some commonalities can also be found in the html code. Just leave these repetitive operations to the program to execute.
Each language has its own characteristics. It's really convenient and efficient to do these things with cmd batch, write and run immediately. But it's very troublesome to implement with a high-level language. "Why use a butcher's knife to kill a chicken"!
Required command-line tools
curl ——A powerful command-line browser and download tool
http://www.cn-dos.net/forum/viewthread.php?tid=20453&fpage=1&highlight=curl
wget ——A powerful command-line download tool
http://baike.baidu.com/view/1312507.htm
sed ——A powerful command-line stream editor
http://www.cn-dos.net/forum/viewthread.php?tid=24210&fpage=1&highlight=sed%20%2B%20wget%20%2B%20%E6%AD%A3%E5%88%99
To use sed, grep, awk and other editors well, one needs to master regular expressions
Regular expression
http://www.cn-dos.net/forum/viewthread.php?tid=24206&fpage=1&highlight=sed%20%2B%20wget%20%2B%20%E6%AD%A3%E5%88%99
To solve a problem, one must first have an environment. After all, one plan can't cover all problems. It's only for downloading albums from 6dzone. I personally like to use Maxthon browser. First, add the favorite album URLs to the favorites, then export the bookmarked URLs to a bookmark.html file.
Use sed to parse the bookmark.htm file and get the desired URLs,
Find the lines with URLs
sed "/photo/!d" bookmark.htmOr
sed -n "/photo/p" bookmark.htmThen get the URLs in quotes. Combined, it is: (where the ASCII value of " is 34, converted to regular expression is \x22)
sed "/photo/!d;s/*\x22//;s/\x22.*//" bookmark.htmIn addition, 6dzone requires registered user authentication and login to see large pictures. One needs to use curl to simulate user login and export cookies
First, one needs to analyze the web page code and its form. It is recommended to use the View page plugin. In addition, IE's httpwatch and Firefox's TamperData are very good plugins!
There are generally 2 methods for curl to submit forms: get method and post method. This depends on the method of the form. In addition, one also needs to analyze Action and the Name to be used for submitting the form
The following is the html code of the cn-dos forum login page
----------------------------------------------------------------------------------------------------------
<FORM action=logging.php?action=login method=post><INPUT type=hidden value=28c5c8a4 name=formhash> <INPUT type=hidden value=http://www.cn-dos.net/forum/viewthread.php?tid=22634&fpage=1&highlight=sed%20%2B%20wget%20%2B%20%E6%AD%A3%E5%88%99&sid=FHJYXn name=referer>
<TABLE cellSpacing=0 cellPadding=0 width="99%" align=center border=0>
<TBODY>
<TR>
<TD bgColor=#dde3ec>
<TABLE cellSpacing=1 cellPadding=4 width="100%" border=0>
<TBODY>
<TR class=header>
<TD colSpan=2>Member Login</TD></TR>
<TR>
<TD bgColor=#f8f9fc>Invisible Login:</TD>
<TD class=smalltxt bgColor=#ffffff><SELECT name=loginmode> <OPTION value="" selected>- Use Default -</OPTION> <OPTION value=normal>Normal Mode</OPTION> <OPTION value=invisible>Invisible Mode</OPTION></SELECT> </TD></TR>
<TR>
<TD bgColor=#f8f9fc>Interface Style:</TD>
<TD bgColor=#ffffff><SELECT name=styleid><OPTION value="" selected>- Use Default -</OPTION> <OPTION value=1>Default Style</OPTION></SELECT> </TD></TR>
<TR>
<TD bgColor=#f8f9fc>Cookie Validity Period:</TD>
<TD class=smalltxt bgColor=#ffffff><INPUT type=radio value=31536000 name=cookietime> One Year <INPUT style="BACKGROUND: #ffffcc" type=radio CHECKED value=31536000 name=cookietime> One Month <INPUT type=radio value=86400 name=cookietime> One Day <INPUT type=radio value=0 name=cookietime> Browser Process <A href="faq.php?page=usermaint#2" target=_blank></A></TD></TR>
<TR>
<TD bgColor=#ffffff colSpan=2 height=1></TD></TR>
<TR>
<TD align=middle colSpan=2><FONT color=red>Note:</FONT> For the first time <B>logging in</B> to the converted PHP forum for old users, please repair the password first. For details, please see <A href="http://www.cn-dos.net/forum/announcement.php?id=2#2">Forum Announcement</A>.</TD></TR>
<TR>
<TR>
<TD width="21%" bgColor=#f8f9fc>Username (Required):</TD>
<TD bgColor=#ffffff><INPUT style="BACKGROUND: #ffffcc" tabIndex=1 maxLength=40 size=25 name=username> <SPAN class=smalltxt><A href="register.php">Register Now</A></SPAN></TD></TR>
<TR>
<TD bgColor=#f8f9fc>Password (Required):</TD>
<TD bgColor=#ffffff><INPUT style="BACKGROUND: #ffffcc" tabIndex=2 type=password size=25 value="" name=password> <SPAN class=smalltxt><A href="member.php?action=lostpasswd">Forgot Password</A></SPAN></TD></TR>
----------------------------------------------------------------------------------------------------------
Use curl to log in to the cn-dos forum
curl -d "username=ngd&password=cndos" http://www.cn-dos.net/forum/logging.php?action=loginBy the way, the code to open and log in directly in the browser can use the following code
http://www.cn-dos.net/forum/logging.php?action=login&username=ngd&password=cndos&loginsubmit=.Save the logged-in cookie in 6dzonecookie.txt
curl -c 6dzonecookie.txtUse the cookie file
curl -b 6dzonecookie.txtPretend to be an IE browser
curl -A "Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.01)"Combined, it is
curl -c 6dzonecookie.txt -A "Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.01)" -d "username=dddddd6&pwd=cndos" http://www.6dzone.com/user/f_login.asp>nul
The simplest usage of wget
wget http://www.cn-dos.net/forum/images/default/logo.gifMy network is not very stable, and the Internet speed is extremely slow. Add some more parameters
wget -t 8 -w 3 -T 30 -c -NThere's not much to say about the others. It's mainly about the usage of sed. Just add a for nested loop.
The entire code:
@echo off
rem code by 拟谷盗 for download 6dzone photo.
curl -c 6dzonecookie.txt -A "Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.01)" -d "username=dddddd6&pwd=cndos" http://www.6dzone.com/user/f_login.asp>nul
for /f "delims=" %%a in ('sed "/photo/!d;s/*\x22//;s/\x22.*//" bookmark.htm'
do (
for /f "usebackq delims=" %%b in (`curl %%a ^| sed "/pic_id/!d;s/*\x22//;s/\x22.*//;s/photo.asp/pic.asp/g;s/\/photo/http:\/\/www.6dzone.com&/g"`) do (
curl -b 6dzonecookie.txt -A "Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.01)" %%b | sed "/http:\/\/.*jpg/!d;s/.*http/http/g;s/jpg.*/jpg/g" >>picurl.list
wget -t 8 -w 3 -T 30 -c -N -i picurl.list
del picurl.list
)
)
del 6dzonecookie.txt
exit/b
Roughly looking at it, it's written very messily. It's like this for now. I'll organize it later.
In addition, the code is not concise enough. Can any expert help modify it? Then it will be convenient to download web page pictures in the future!


curl+wget+sed+bat file download
http://upload.cn-dos.net/img/095.rar
Extract it and run the bat file to download pictures
[ Last edited by ngd on 2008-3-15 at 02:05 PM ]
Recent Ratings for This Post
( 2 in total)
Click for details
FLOSS


