问用户主页面之后,会留下最近来访的痕迹。如果使用程序把所有用户个人主页全部爬过一遍,那很多用户的个人主页上都是你的足迹。
这次还是用curl实现,主要是因为fiddler抓包可保存会话成curl script,不用自己手动构造HTTP头。
根据我用的尝试算了一下curl发包的速度,只用一个curl进程,14个小时发了七万包,将近是一分钟可以发一百个包。这是在网络好且稳定的情况下的结果。
chocolatey速度很慢,于是这次又介绍新玩具了---Batch-CN 介绍页面
http://www.bathome.net/thread-32322-1-1.html
Batch-CN 是一个第一方windows 下的第三方命令行软件管理工具,由国人开发和维护,可以为windows 批处理提供各种第三方命令行工具下载,安装管理,查看使用说明等功能。 类似linux下的apt-get,windows下的同类的软件包管理器chocolatey。
安装Batch-CN后,一行命令就可以完成下载安装curl了,速度很快:
gt curl
使用fiddler 抓包,访问一个用户的主页,保存会话为curl script,记事本打开xxx.bat得到以下一行命令:
curl -k -i --raw -o 0.dat "https://www.shiyanlou.com/user/1" -H "Host:
www.shiyanlou.com" -H "Connection: keep-alive" -H "Accept: text/html,application/xhtml+xml,application/xml;q=0.9,image/webp,*/*;q=0.8" -H "Upgrade-Insecure-Requests: 1" -H "User-Agent: Mozilla/5.0 (Windows NT 6.1; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/45.0.2454.101 Safari/537.36" -H "DNT: 1" -H "Referer:
https://www.shiyanlou.com/courses/1" -H "Accept-Encoding: gzip, deflate, sdch" -H "Accept-Language: zh-CN,zh;q=0.8,en;q=0.6" -H "Cookie: remember_token=71302|adf71304sdfe82ce1f6c15484f53b0797c60f02e; session=c8eb7ggg-2e8f-4c9c-ab4b-9802a4ea898b.Jz0GE_43ZH_cJjK_7h--gtNw22k"
作一下修改,使用set /a n=%n%+1 实现用户ID的自增,就可以穷举所有的用户ID了:
@echo off
set n=1
:g
set /a n=%n%+1
echo %n%
curl -k -i --raw -o 0.dat "https://www.shiyanlou.com/user/%n%" -H "Host:
www.shiyanlou.com" -H "Connection: keep-alive" -H "Accept: text/html,application/xhtml+xml,application/xml;q=0.9,image/webp,*/*;q=0.8" -H "Upgrade-Insecure-Requests: 1" -H "User-Agent: Mozilla/5.0 (Windows NT 6.1; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/45.0.2454.101 Safari/537.36" -H "DNT: 1" -H "Referer:
https://www.shiyanlou.com/courses/1" -H "Accept-Encoding: gzip, deflate, sdch" -H "Accept-Language: zh-CN,zh;q=0.8,en;q=0.6" -H "Cookie: remember_token=71302|gggg30497a982ce1f6c15484f53b0797c60f02e; session=gggg7ac5-2e8f-4c9c-ab4b-9802a4ea898b.Jz0GE_43ZH_cJjK_7h--gtNw22k"
goto g
多开几个批处理就是使用多个进程同时发包了。不过最终应该是会被封号封IP的。
After accessing the user's main page, there will be traces of recent visits. If you use a program to crawl all user personal pages, many users' personal pages will show your footprints.
This time, we still use curl mainly because Fiddler can save the session as a curl script, so there's no need to manually construct the HTTP headers.
Based on my attempts, I calculated the speed of curl sending requests. Using just one curl process, it sent 70,000 requests in 14 hours, which is nearly 100 requests per minute under good and stable network conditions.
Chocolatey is very slow, so this time I introduced a new toy - the introduction page of Batch-CN
http://www.bathome.net/thread-32322-1-1.html
Batch-CN is a first-party third-party command-line software management tool for Windows, developed and maintained by Chinese people. It can provide various third-party command-line tool downloads, installation management, and viewing of usage instructions for Windows batch processing. It is similar to apt-get in Linux and the similar software package manager Chocolatey in Windows.
After installing Batch-CN, you can complete the download and installation of curl with one line command, and the speed is very fast:
gt curl
Use Fiddler to capture the packet, visit a user's homepage, save the session as a curl script, open xxx.bat in Notepad to get the following line command:
curl -k -i --raw -o 0.dat "https://www.shiyanlou.com/user/1" -H "Host:
www.shiyanlou.com" -H "Connection: keep-alive" -H "Accept: text/html,application/xhtml+xml,application/xml;q=0.9,image/webp,*/*;q=0.8" -H "Upgrade-Insecure-Requests: 1" -H "User-Agent: Mozilla/5.0 (Windows NT 6.1; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/45.0.2454.101 Safari/537.36" -H "DNT: 1" -H "Referer:
https://www.shiyanlou.com/courses/1" -H "Accept-Encoding: gzip, deflate, sdch" -H "Accept-Language: zh-CN,zh;q=0.8,en;q=0.6" -H "Cookie: remember_token=71302|adf71304sdfe82ce1f6c15484f53b0797c60f02e; session=c8eb7ggg-2e8f-4c9c-ab4b-9802a4ea898b.Jz0GE_43ZH_cJjK_7h--gtNw22k"
Make some modifications, use set /a n=%n%+1 to achieve the self-increment of the user ID, and then you can exhaust all user IDs:
@echo off
set n=1
:g
set /a n=%n%+1
echo %n%
curl -k -i --raw -o 0.dat "https://www.shiyanlou.com/user/%n%" -H "Host:
www.shiyanlou.com" -H "Connection: keep-alive" -H "Accept: text/html,application/xhtml+xml,application/xml;q=0.9,image/webp,*/*;q=0.8" -H "Upgrade-Insecure-Requests: 1" -H "User-Agent: Mozilla/5.0 (Windows NT 6.1; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/45.0.2454.101 Safari/537.36" -H "DNT: 1" -H "Referer:
https://www.shiyanlou.com/courses/1" -H "Accept-Encoding: gzip, deflate, sdch" -H "Accept-Language: zh-CN,zh;q=0.8,en;q=0.6" -H "Cookie: remember_token=71302|gggg30497a982ce1f6c15484f53b0797c60f02e; session=gggg7ac5-2e8f-4c9c-ab4b-9802a4ea898b.Jz0GE_43ZH_cJjK_7h--gtNw22k"
goto g
Opening several batch processes means using multiple processes to send requests simultaneously. But eventually, it should lead to account and IP bans.