|
740011611
初级用户
 
积分 96
发帖 83
注册 2009-6-4 来自 信阳
状态 离线
|
『楼 主』:
怎样把此论坛上所有网页下载下来,且能脱机使用?
使用 LLM 解释/回答一下
怎样把此怎论坛上所有网页下载下来,且能脱机使用?
Last edited by 740011611 on 2010-3-19 at 18:33 ]
How to download all web pages on this forum and use them offline?
Last edited by 740011611 on 2010-3-19 at 18:33 ]
|
|
2010-3-10 22:37 |
|
|
radem
高级用户
    CMD感染者
积分 691
发帖 383
注册 2008-5-23
状态 离线
|
|
2010-3-11 01:33 |
|
|
000000000000000
初级用户
 
积分 49
发帖 42
注册 2009-11-26
状态 离线
|
|
2010-3-11 03:10 |
|
|
rs369007
初级用户
 
积分 147
发帖 131
注册 2008-9-22
状态 离线
|
『第 4 楼』:
使用 LLM 解释/回答一下
百度收索 全站下载 。。。
Baidu search, full site download...
|

freedom! |
|
2010-3-11 21:40 |
|
|
740011611
初级用户
 
积分 96
发帖 83
注册 2009-6-4 来自 信阳
状态 离线
|
『第 5 楼』:
使用 LLM 解释/回答一下
谢谢各位,是不太容易实现!只能手工操作了。
Thanks everyone, it's not easy to achieve! We can only do it manually.
|
|
2010-3-12 16:25 |
|
|
onlyu2000
新手上路

积分 15
发帖 14
注册 2008-9-2
状态 离线
|
|
2010-3-12 22:15 |
|
|
740011611
初级用户
 
积分 96
发帖 83
注册 2009-6-4 来自 信阳
状态 离线
|
|
2010-3-14 19:37 |
|
|
740011611
初级用户
 
积分 96
发帖 83
注册 2009-6-4 来自 信阳
状态 离线
|
|
2010-3-16 12:51 |
|
|
HAT
版主
       
积分 9023
发帖 5017
注册 2007-5-31
状态 离线
|
『第 9 楼』:
使用 LLM 解释/回答一下
www.google.cn
注:
并不是希望楼主通过google搜索就能马上掌握wget或者curl的用法,而是希望楼主能搜索到“整站下载”之类的N年前就到处可见的工具。
Last edited by HAT on 2010-3-26 at 17:17 ]
www.google.cn
Note:
It is not that we hope the original poster can immediately master the usage of wget or curl through Google search, but that we hope the original poster can search for tools like "whole site download" that have been seen everywhere for many years.
Last edited by HAT on 2010-3-26 at 17:17 ]
|

 |
|
2010-3-16 13:42 |
|
|
523066680
银牌会员
     SuperCleaner
积分 2362
发帖 1133
注册 2008-2-2
状态 离线
|
『第 10 楼』:
使用 LLM 解释/回答一下
见过有人做过下载网志所有文章的脚本
是通过下载主页面,然后分析里面的链接
(像下一篇文章,下一页什么的,相关链接提取出来),再进一步下载,循环渐进的。
Last edited by 523066680 on 2010-3-16 at 13:44 ]
I've seen someone make a script to download all articles of a blog. It's done by downloading the main page, then analyzing the links in it (like links to the next article, next page, etc., extracting related links), and then downloading further, in a cyclical manner.
Last edited by 523066680 on 2010-3-16 at 13:44 ]
|

综合型编程论坛
我的作品索引 |
|
2010-3-16 13:43 |
|
|
740011611
初级用户
 
积分 96
发帖 83
注册 2009-6-4 来自 信阳
状态 离线
|
『第 11 楼』:
使用 LLM 解释/回答一下
斑竹,搜索整站下载软件吗?不知道哪种好用?我就想把此论坛的网页下载下来就行了。斑竹给个方法?????急用啊!
Last edited by 740011611 on 2010-3-19 at 18:34 ]
Moderator, can I search the entire site to download software? I don't know which one is easy to use? I just want to download the web pages of this forum. Moderator, give me a method? Urgent!
Last edited by 740011611 on 2010-3-19 at 18:34 ]
|
|
2010-3-16 21:33 |
|
|
plp626
银牌会员
     钻石会员
积分 2278
发帖 1020
注册 2007-11-19
状态 离线
|
『第 12 楼』:
使用 LLM 解释/回答一下
我网盘里(就我签名那个)有个绿色的整站下载工具,在C盘soft目录那个pro.rar,比较好用(非命令行的)。里面自带了一个注册机。注册之后可以无限制下载。
点击文件,新建工程向导,然后选择第二个——复制一个网站,包含该网站的目录结构那个选项,便可以镜像一个网站。
=====================================================
我插一句,你们尽管向我开炮:
网上下载整站的软件很多,但对于一个新手来说要使用好它,需要一个过程,谁都会用搜索,但是要掌握一个专业软件,必得懂一些专业知识,网上那些wge,curl,应该都是比较强大的工具,试问论坛里几个高手说他掌握的好了?若动不动就是给一个新手说google,好像这个问题太简单得不行似地,试问我们论坛就是让人学习怎么搜索吗?
联盟对于整站下载,或者wget,curl用法的讨论帖确实比较少的,仅有的帖子也停留在很浅的深度。
======================================================
Originally posted by 740011611 at 2010-3-14 07:37 PM:
不知道用Wget可以实现下载全站的目标不?
wget -r -p -np -E -k http://www.cn-dos.net/forum/
发现有很多网页被重复下载了,怎么办呀?
我用wget下全站时用一个参数
wget -m www.cn-dos.net/forum
你可以试试
Last edited by plp626 on 2010-3-17 at 09:22 ]
There is a green full-site download tool in my cloud disk (the one in my signature), which is the pro.rar in the C:\soft directory. It is relatively easy to use (not command-line). It comes with a keygen. After registration, you can download without restrictions.
Click on the file, the New Project Wizard, and then select the second option - Copy a website, including the directory structure of the website, and you can mirror a website.
=====================================================
Let me interject, you can all fire at me:
There are many software for downloading full sites online, but for a novice to use it well, it takes a process. Everyone can use search, but to master a professional software, one must understand some professional knowledge. Those wge, curl online should all be relatively powerful tools. I ask, how many experts in the forum say they have mastered them well? If you just tell a novice to use Google at every turn, it seems this problem is too simple. I ask, is our forum just for people to learn how to search?
The union has relatively few discussion threads about full-site download or wget, curl usage, and the only existing threads also stay at a very shallow depth.
======================================================
Originally posted by 740011611 at 2010-3-14 07:37 PM:
I don't know if Wget can achieve the goal of downloading the entire site?
wget -r -p -np -E -k http://www.cn-dos.net/forum/
I found that many web pages are downloaded repeatedly, what should I do?
When I use wget to download the entire site, I use a parameter
wget -m www.cn-dos.net/forum
You can give it a try
Last edited by plp626 on 2010-3-17 at 09:22 ]
|

山外有山,人外有人;低调做人,努力做事。
进入网盘(各种工具)~~ 空间~~cmd学习 |
|
2010-3-16 21:50 |
|
|
740011611
初级用户
 
积分 96
发帖 83
注册 2009-6-4 来自 信阳
状态 离线
|
『第 13 楼』:
使用 LLM 解释/回答一下
真的很感谢你们无私的帮助以及不厌其烦的解释!让我们这些新手有一个很好的学习环境。祝DOS联盟越办越好!
回到正题:我用wget -m http://www.cn-dos/forum 下载的大都是重复的文件(注意不是网页格式的!)这里不好截图,就没上传给你们看!
最后还得用for /f "delims=" %i in ('dir /a /b') do ren "%i" "%i.html" 重命名。
但是还有很多无用的网页,难道还得一个个删除吗?有没有更好的办法?
看到一个帖子是这样的:
@echo off
::下载精华索引帖子
cd.>essence.txt
wget -q -O essence.txt "http://www.cn-dos.net/forum/viewthread.php?tid=27667"
::这里设置文件的保存目录
set downdir=cn-dos论坛精华帖子
md %downdir%
echo\正在下载中...请稍后...
::批量下载帖子到新建的目录中
for /f %%a in ('sed -n "s/^\{\(*\)}.*$/\1/p" essence.txt') do (
wget -q -O %downdir%\%%a.html "http://www.cn-dos.net/forum/viewthread.php?tid=%%a&action=printable"
)
::下载完毕,打开目录
echo 下载完毕
ping -n 3 127.0.0.1 >nul&&start %downdir%
这个下载的就很好,不知道能用到这个地方不?麻烦高手们看看。
I really appreciate your selfless help and tireless explanations! They have created a very good learning environment for us novices. Wish the DOS Union better and better!
Back to the topic: I used wget -m http://www.cn-dos/forum to download mostly duplicate files (note that they are not in web page format! It's not easy to take a screenshot here, so I didn't upload it to you!)
Finally, I still had to use for /f "delims=" %i in ('dir /a /b') do ren "%i" "%i.html" to rename.
But there are still many useless web pages. Do I still have to delete them one by one? Is there a better way?
Saw a post like this:
@echo off
::Download the essence index post
cd.>essence.txt
wget -q -O essence.txt "http://www.cn-dos.net/forum/viewthread.php?tid=27667"
::Set the file save directory here
set downdir=cn-dos forum essence posts
md %downdir%
echo\is downloading... Please wait...
::Batch download posts to the newly created directory
for /f %%a in ('sed -n "s/^\*\)}.*$/\1/p" essence.txt') do (
wget -q -O %downdir%\%%a.html "http://www.cn-dos.net/forum/viewthread.php?tid=%%a&action=printable"
)
::Download completed, open the directory
echo Download completed
ping -n 3 127.0.0.1 >nul&&start %downdir%
This download is very good. I don't know if it can be used here? Please trouble the experts to take a look.
|
|
2010-3-19 20:56 |
|
|
Vista2008
版主
       
积分 707
发帖 287
注册 2010-1-13 来自 尖竹汶府
状态 离线
|
『第 14 楼』:
使用 LLM 解释/回答一下
你搜索一下“网络快捕”,可惜下贴有点慢,都是HTML格式的。
You search for "Network Sniffer", but unfortunately the next post is a bit slow, all in HTML format.
|
|
2010-3-19 20:59 |
|
|
qinchun36
高级用户
    据说是李先生
积分 609
发帖 400
注册 2008-4-23
状态 离线
|
『第 15 楼』:
使用 LLM 解释/回答一下
你如果平时注意一点就会发现帖子的地址都是这样的(后面的几个参数不重要)
http://www.cn-dos.net/forum/viewthread.php?tid=帖子编号
直接输入就能进到一个帖子,而且很幸运,这个编号好像是按照自然数的样子递增的,现在差不多到了 50574 ,如果加上一个参数 &action=printable 那么就会出来一个可打印版本,即只包含基本帖子内容的页面,这就是你13楼说的那个,你大概选个数开始,循环下载就行了。
但是极不推荐这种方法,因为无法获取帖子名字,无法知道它是哪个版块的,也不知道哪些数是没有的(有些帖子已经被删除)
If you pay attention usually, you will find that the addresses of the posts are all like this (the following few parameters are not important)
http://www.cn-dos.net/forum/viewthread.php?tid=Post number
Just enter it directly to enter a post, and fortunately, this number seems to increase in the way of natural numbers. Now it is almost 50574. If you add a parameter &action=printable, then a printable version will come out, that is, a page containing only the basic post content, which is the one you mentioned in building 13. You can probably choose a number to start and cycle to download.
But this method is extremely not recommended because the post name cannot be obtained, it is not known which section it is in, and it is not known which numbers are not available (some posts have been deleted)
|

┏━━━━━━┓
┃据说是李先生┃
┠──────┨
┃*ntRSS┃
┗━━━━━━┛ |
|
2010-3-20 14:59 |
|