China DOS Union

-- Unite DOS · Advance DOS · Grow DOS --

Union site: www.cn-dos.net Forum site: www.cn-dos.net/forum
DOS stands for freedom, openness and progress. Let us work hard, learn from the openness and GNU spirit of FreeDOS and Linux, and together build and grow a free GNU GPL world!

中国DOS联盟论坛
The time now is 2026-06-29 20:36
中国DOS联盟论坛 » DOS批处理 & 脚本技术(批处理室) » [Help] How to extract the IPs I need from a txt file? [Thank you everyone] View 1,516 Replies 14
Original Poster Posted 2007-03-09 02:41 ·  中国 辽宁 丹东 联通
初级用户
★★
Credits 116
Posts 56
Joined 2007-03-07 04:53
19-year member
UID 80966
Gender Male
Status Offline
To extract the content from document1.txt and save it to a new document2.txt with the IP:port and the class attribute, you can use a programming language like Python. Here's a simple Python example using regular expressions:

```python
import re

# Read the content from document1.txt
with open('document1.txt', 'r', encoding='utf-8') as file:
content = file.read()

# Use regular expression to find the relevant parts
pattern = r'<a title=".*?" onMouseOver="s\('.*?'\)" onMouseOut="d\()" class="(\w+)">(\d+\.\d+\.\d+\.\d+:\d+)</a>'
matches = re.findall(pattern, content)

# Write the results to document2.txt
with open('document2.txt', 'w', encoding='utf-8') as file:
for match in matches:
file.write(f"{match}----{match}\n")
```

This code uses regular expressions to find the IP:port pairs and their corresponding class attributes, then writes them to document2.txt. Make sure the Python script is in the same directory as document1.txt.
Floor 2 Posted 2007-03-09 02:50 ·  中国 河北 廊坊 三河市 移动
金牌会员
★★★★
Credits 2,725
Posts 1,160
Joined 2006-09-23 12:00
19-year member
UID 63486
From 河北廊坊
Status Offline
It is better to use sed for handling this kind of content, or ask classmates who are familiar with sed commands to do it.
三人行,必有吾师焉。 学然后知不足,教然后知困,然后能自强也。
Floor 3 Posted 2007-03-09 03:22 ·  中国 甘肃 张掖 电信
金牌会员
★★★★
Credits 4,103
Posts 1,744
Joined 2006-01-20 13:00
20-year member
UID 49241
Gender Male
From 甘肃.临泽
Status Offline
sed -r "s/>[^<>]*<\/a>/&\n/g" 1.txt|sed -r "s/.*class=\x22([A-Z]{1})\x22>([^<>]*)<\/a>$/\2--\1/;/--([A-Z]{1})$/!d"|more>2.txt

Just a matter of tailoring, not considering too much matching
Floor 4 Posted 2007-03-09 03:23 ·  中国 陕西 西安 电信
铂金会员
★★★★
Credits 5,212
Posts 2,478
Joined 2007-02-08 23:39
19-year member
UID 79003
Gender Male
Status Offline
『Poster』: How to extract the content I want from a txt file?

</center><il><li><a title="AU" onMouseOver="s('AU')" onMouseOut="d()" class="D">144.140.22.190:80</a></li><li><a title="KR" onMouseOver="s('KR')" onMouseOut="d()" class="D">125.248.244.131:8080</a></li><li><a title="IN" onMouseOver="s('IN')" onMouseOut="d()" class="D">202.53.13.10:8080</a></li><li><a title="PH" onMouseOver="s('PH')" onMouseOut="d()" class="B">125.212.37.150:8080</a></li><li><a title="CN" onMouseOver="s('CN')" onMouseOut="d()" class="B">220.181.31.44:3128</a></li><li><a title="CO" onMouseOver="s('CO')" onMouseOut="d()" class="D">200-91-243-90-host.ifx.net.co:3128</a></li><li><a title="US" onMouseOver="s('US')" onMouseOut="d()" class="D">216.133.248.229:80</a></li><li><a title="US" onMouseOver="s('US')" onMouseOut="d()" class="D">216.133.248.228:80</a></li><li><a title="US" onMouseOver="s('US')" onMouseOut="d()" class="D">216.133.248.226:80</a></li><li><a title="JP" onMouseOver="s('JP')" onMouseOut="d()" class="D">neptun.ium.ne.jp:8094</a></li><li><a title="TR" onMouseOver="s('TR')" onMouseOut="d()" class="D">195.175.37.71:8080</a></li><li><a title="NL" onMouseOver="s('NL')" onMouseOut="d()" class="D">213.227.149.165:3128</a></li><li><a title="US" onMouseOver="s('US')" onMouseOut="d()" class="D">216.133.248.227:80</a></li></il><center>

The above content (all in the same line, not the multiple lines you see now) is document 1.txt. Please teach me how I can extract from 1.txt
144.140.22.190:80
xxx.xxx.xxx.xxx:xx
xxx.xxx.xxx.xxx:xx
...
...
...
xxx.xxx.xxx.xxx:xx
to a newly created another 2.txt?

It's best to also extract the type attribute after class, for example like this
144.140.22.190:80----D
xxx.xxx.xxx.xxx:xx
xxx.xxx.xxx.xxx:xx
220.181.31.44:3128----B
...
...
xxx.xxx.xxx.xxx:xx

Thanks in advance!

Just saw it, March 8th! I don't know where it went^


Set objFSO = CreateObject("Scripting.FileSystemObject")

Set objText = objFSO.OpenTextFile("D:\Desktop\1.txt", 1)
inputstr = objText.ReadAll
objText.Close
outstr=replace(inputstr,"</a>",vbcrlf)
dim tep
tep=split(outstr,vbcrlf)
outstr=empty
dim i
for i=0 to ubound(tep)-2
outstr=outstr & right(tep(i),len(tep(i))-instrRev(tep(i),">")+3) & vbcrlf
next
outstr=replace(outstr,""">","--")


Set objText = objFSO.OpenTextFile("D:\Desktop\2.txt", 2,True)
objText.Write outstr
objText.Close


Result:

D--144.140.22.190:80
D--125.248.244.131:8080
D--202.53.13.10:8080
B--125.212.37.150:8080
B--220.181.31.44:3128
D--200-91-243-90-host.ifx.net.co:3128
D--216.133.248.229:80
D--216.133.248.228:80
D--216.133.248.226:80
D--neptun.ium.ne.jp:8094
D--195.175.37.71:8080
D--213.227.149.165:3128
D--216.133.248.227:80
Recent Ratings for This Post ( 1 in total) Click for details
RaterScoreTime
3391617 +1 2007-03-10 00:20
Floor 5 Posted 2007-03-09 03:41 ·  中国 河北 廊坊 三河市 移动
金牌会员
★★★★
Credits 2,725
Posts 1,160
Joined 2006-09-23 12:00
19-year member
UID 63486
From 河北廊坊
Status Offline
In the recycle bin. I don't know how it happened.
三人行,必有吾师焉。 学然后知不足,教然后知困,然后能自强也。
Floor 6 Posted 2007-03-09 03:51 ·  中国 广东 电信
荣誉版主
★★★★
batch fan
Credits 5,226
Posts 1,737
Joined 2006-03-10 00:38
20-year member
UID 51697
From 成都
Status Offline
Because the original title of the thread starter was very vague, and it was impossible to quickly know the general content of the post from the title, so it was moved to the recycle bin. After the thread starter modified the title, it has met the specifications, so it is moved back and related threads are merged.
尺有所短,寸有所长,学好CMD没商量。
考虑问题复杂化,解决问题简洁化。
Floor 7 Posted 2007-03-09 06:11 ·  中国 辽宁 丹东 联通
初级用户
★★
Credits 116
Posts 56
Joined 2007-03-07 04:53
19-year member
UID 80966
Gender Male
Status Offline
Floor 4 is VBS? Floor 3's sed doesn't seem to work yet~~~~bat doesn't work? Went busy with business in the afternoon, didn't have time to come to the forum! Thanks to you several for your help here!
Floor 8 Posted 2007-03-09 06:59 ·  中国 广东 电信
荣誉版主
★★★★
batch fan
Credits 5,226
Posts 1,737
Joined 2006-03-10 00:38
20-year member
UID 51697
From 成都
Status Offline
The following code can extract IP - formatted records, but it cannot extract the type attribute:

@echo off
setlocal enabledelayedexpansion

cls
for /f "delims=" %%i in (1.txt) do (
set "str=%%i"
set "str=!str:"=!"
call :pickup "!str!"
)
pause
exit

:pickup
for /f "tokens=1* delims=<>" %%i in (%1) do (
echo %%i|findstr "^\.">nul&&echo %%i
set "str=%%j"
if defined str call :pickup "!str!"
)
goto :eof
Recent Ratings for This Post ( 1 in total) Click for details
RaterScoreTime
3391617 +2 2007-03-10 00:20
尺有所短,寸有所长,学好CMD没商量。
考虑问题复杂化,解决问题简洁化。
Floor 9 Posted 2007-03-09 08:09 ·  中国 辽宁 丹东 联通
初级用户
★★
Credits 116
Posts 56
Joined 2007-03-07 04:53
19-year member
UID 80966
Gender Male
Status Offline
It's really useful for the format of XXX.XXX.XXX.XXX:XX.
Received it.
Thank you namejm!
Floor 10 Posted 2007-03-09 12:00 ·  中国 广东 清远 联通
高级用户
★★
Credits 846
Posts 247
Joined 2006-10-27 12:03
19-year member
UID 68504
Gender Male
From 湖南==》广东
Status Offline
Borrowing the code from namejm, you can extract records in IP format and type attributes:

@echo off
setlocal enabledelayedexpansion

for /f "delims=" %%i in (test.txt) do (
set "str=%%i"
set "str=!str:"=!"
call :pickup "!str!"
)
pause>nul

:pickup
for /f "tokens=1* delims=<" %%i in (%1) do (
echo "%%i"|findstr "class">nul && (
for /f "tokens=1,2 delims=>" %%a in ("%%i") do (
set class=%%a
set class=!class:~-1!
set IP=%%b
echo !class!--!IP!
)
)
set "str=%%j"
if defined str call :pickup "!str!"
)

goto :eof

Result:

D--144.140.22.190:80
D--125.248.244.131:8080
D--202.53.13.10:8080
B--125.212.37.150:8080
B--220.181.31.44:3128
D--200-91-243-90-host.ifx.net.co:3128
D--216.133.248.229:80
D--216.133.248.228:80
D--216.133.248.226:80
D--neptun.ium.ne.jp:8094
D--195.175.37.71:8080
D--213.227.149.165:3128
D--216.133.248.227:80
Recent Ratings for This Post ( 1 in total) Click for details
RaterScoreTime
3391617 +1 2007-03-10 00:21
Floor 11 Posted 2007-03-09 15:14 ·  美国 北达科他州立大学
初级用户
Credits 94
Posts 46
Joined 2006-05-14 01:59
20-year member
UID 55490
Gender Male
Status Offline
This question was asked before by someone else, but that was in multiple lines.

[ Last edited by clonecd on 2007-3-9 at 04:32 PM ]
Floor 12 Posted 2007-03-09 16:30 ·  美国 北达科他州立大学
初级用户
Credits 94
Posts 46
Joined 2006-05-14 01:59
20-year member
UID 55490
Gender Male
Status Offline
@sed "s/class/\n&/g;s/<\/a>/&\n/g" 1.txt|sed "/:/!d;s/.*\x22\(.*\)\x22>\(\+\)<.*/\2----\1/">2.txt
The above does not include 200-91-243-90-host.ifx.net.co:3128----D
If you need this line, use the following code
sed "s/class/\n&/g;s/<\/a>/&\n/g" 1.txt|sed "/:/!d;s/.*\x22\(.*\)\x22>\(\+\)<.*/\2----\1/">2.txt

[ Last edited by clonecd on 2007-3-9 at 04:40 PM ]
Floor 13 Posted 2007-03-09 23:18 ·  中国 辽宁 丹东 联通
初级用户
★★
Credits 116
Posts 56
Joined 2007-03-07 04:53
19-year member
UID 80966
Gender Male
Status Offline
The one above doesn't seem to work well. Is there a general method that can extract all IP addresses in the following formats from any document:
XXX.XXX.XXX.XXX:XX
XXX.XXX.XXX.XX:XXXX
XX.XXX.XX.XXX:XXX
XX.XXX.XXX.XX:XXXX
Or the format:
XXX.XX.XXX.XXX--XX
XXX.XXX.XXX.XXX--XXXX
XX.XXX.XX.XXX--XXX
XX.XXX.XXX.XX--XX
? In the same line/not in the same line?

[ Last edited by 3391617 on 2007-3-9 at 10:23 AM ]
Floor 14 Posted 2007-03-09 23:43 ·  中国 安徽 马鞍山 电信
中级用户
★★
Credits 493
Posts 228
Joined 2007-02-16 00:38
19-year member
UID 79596
Gender Male
From 安徽
Status Offline

if defined str call :pickup "!str!"


Can anyone help explain this?

I saw in if /?:

If command extensions are enabled, IF will change as follows:

IF string1 compare-op string2 command
IF CMDEXTVERSION number command
IF DEFINED variable command


If an environment variable is defined, the effect of the DEFINED condition is the same as EXISTS, except that it takes an environment variable and returns true.


If the environment variable is not defined, then the subsequent command is not executed???

[ Last edited by xycoordinate on 2007-3-9 at 10:51 AM ]
Floor 15 Posted 2007-03-09 23:56 ·  美国 缅因州
初级用户
Credits 94
Posts 46
Joined 2006-05-14 01:59
20-year member
UID 55490
Gender Male
Status Offline
Originally posted by 3391617 at 2007-3-9 23:18:
The above one doesn't seem to work well~
Is there any way to extract all content in the following formats from any document:
XXX.XXX.XXX.XXX:XX
XXX.XXX.XXX.XX:XXXX
XX.XXX.XX.XXX:XXX
XX.XXX.XXX.XX:XXXX
or format: ...


The code on floor 12 is written for the conditions you provided on floor 1. It's completely possible to use sed to handle the additional conditions you provided on floor 13. You can study sed by yourself and then research it.
Forum Jump: