China DOS Union

-- Unite DOS · Advance DOS · Grow DOS --

Union site: www.cn-dos.net Forum site: www.cn-dos.net/forum
DOS stands for freedom, openness and progress. Let us work hard, learn from the openness and GNU spirit of FreeDOS and Linux, and together build and grow a free GNU GPL world!

中国DOS联盟论坛
The time now is 2026-06-25 01:42
中国DOS联盟论坛 » DOS批处理 & 脚本技术(批处理室) » How to split a file by markers using batch processing DigestI View 13,120 Replies 30
Original Poster Posted 2005-12-05 13:02 ·  中国 浙江 金华 电信
初级用户
Credits 90
Posts 26
Joined 2005-12-05 12:41
20-year member
UID 46522
Status Offline
There is a file AA.TXT as follows:

FGAFFFFSFF
GGFDGG
R*01
FSFF
HHH
R*02
FSDGF
HDFH
GHGH
JJHGJR
R*03

....

kljljfljfkj
fmlfk
R*n

How to split the file according to the R* mark, and name it according to the characters after *? That is: 01.txt content is
01.txt
FGAFFFFSFF
GGFDGG

02.txt content is
02.txt
FSFF
HHH

03.txt content is
03.txt
FSDGF
HDFH
GHGH
JJHGJR

.....
n.txtcontent is
n.txt
kljljfljfkj
fmlfk


In actual situations, the file AA.TXT to be split may be very large, and efficiency of splitting must be considered

Thank you各位 predecessors in advance here
Floor 2 Posted 2005-12-05 14:29 ·  中国 山西 太原 中移铁通
元老会员
★★★★
Batchinger
Credits 4,432
Posts 1,512
Joined 2002-10-18 00:00
23-year member
UID 19
Gender Male
Status Offline
Re yhsean:

The efficiency of splitting files using batch processing is unlikely to be very high. If third-party tools are used in it, it will deviate from the meaning of batch processing. The following is an example under cmd, which is directly used in the command line. The simplest prototype is used. Some lines containing special symbols may not output correctly, such as non-file mark lines containing *.

for /f "tokens=1,2 delims=*" %f in (aa.txt) do @if "%f"=="R" (ren temp %g.txt) else (echo %f>>temp)
※ Batchinger 致 Bat Fans:请访问 批处理编程的异类 ,欢迎交流与共享批处理编程心得!
Floor 3 Posted 2005-12-05 18:39 ·  中国 浙江 金华 电信
初级用户
Credits 90
Posts 26
Joined 2005-12-05 12:41
20-year member
UID 46522
Status Offline
Thanks to the moderator for the reply.

I also used the above command to complete the splitting, but when the file is too large, the execution efficiency is very low, so it is not very meaningful to use it in actual work.

Just came to the forum, feeling that the moderator is very enthusiastic, thank you again.

The purpose of posting is to see if there is a more concise and effective way. I wonder if DEBUG can be used to achieve it.
Floor 4 Posted 2005-12-05 18:55 ·  中国 浙江 金华 电信
初级用户
Credits 90
Posts 26
Joined 2005-12-05 12:41
20-year member
UID 46522
Status Offline
Another question, moderator, how to call a third-party tool in a batch file if using a third-party tool, (during the calling process, when the tool realizes the function, it emphasizes full automation - no manual intervention)

The file has no other * except for the marked *
Floor 5 Posted 2005-12-05 19:02 ·  中国 山西 临汾 中移铁通
元老会员
★★★★
Batchinger
Credits 4,432
Posts 1,512
Joined 2002-10-18 00:00
23-year member
UID 19
Gender Male
Status Offline
Re yhsean:

DEBUG should be achievable, but it will be more complex, less efficient, and the debug itself is not very ideal for handling large files.

The third-party tools called by batch processing need to be selectively chosen. We first prefer tools that can perform all operations in command-line parameters, and only then consider tools that need dynamic interaction during operation, at which time a complete response file needs to be carefully designed.

Your problem should be simply achievable with sed, but there's no time now; before noon tomorrow, if no other experts provide the implementation of this solution, I will consider completing one.
※ Batchinger 致 Bat Fans:请访问 批处理编程的异类 ,欢迎交流与共享批处理编程心得!
Floor 6 Posted 2005-12-06 08:54 ·  中国 浙江 金华 电信
初级用户
Credits 90
Posts 26
Joined 2005-12-05 12:41
20-year member
UID 46522
Status Offline
Last night I downloaded an SSED FOR WIN32. After researching how to use it online, but due to my shallow knowledge, I haven't found a solution. Since my work really needs it, I'm looking forward to the answer from teacher willsort, thanks a lot. Also, among many SED versions, is SSED better?
Floor 7 Posted 2005-12-06 11:29 ·  中国 辽宁 锦州 中移铁通
荣誉版主
★★★
Credits 1,338
Posts 356
Joined 2005-07-15 12:09
20-year member
UID 40733
Gender Male
Status Offline
to yhsean

SED is a line-based stream editor and not suitable for text splitting. You can use csplit to complete this splitting task.

The following is a scheme for you:

1. First execute the following statement to split the text.

csplit -z -b %02d.txt -f file AA.txt "/^R.*+/+1" {*}


2. Generate a file renaming reference file.

findstr /r "^R.**" file*.txt>rename.txt


3. Delete the last line of the file and rename the file.

sed -i.bak "$d" file*.txt
sed "s/^\(.*\):R*\(\+\)/ren \1 \2.txt/" rename.txt>rname.bat
rname.bat


Instructions:
1. The above-mentioned third-party tools can be found in the GNU utilities for Win32 toolkit.
2. SED of too low version does not support the -i parameter. Please use GNU SED. There are partial differences between non-GNU SED and its expressions, and I cannot guarantee to get the correct result.
3. You can save the above statements as a batch file to execute.

If you have any questions, please post them.
  ☆开始\运行 (WIN+R)☆
%ComSpec% /cset,=何奈无── 。何奈可无是原,事奈无做人奈无&for,/l,%i,in,(22,-1,0)do,@call,set/p= %,:~%i,1%<nul&ping/n 1 127.1>nul

Floor 8 Posted 2005-12-06 23:38 ·  中国 浙江 金华 浦江县 电信
初级用户
Credits 90
Posts 26
Joined 2005-12-05 12:41
20-year member
UID 46522
Status Offline
Thanks for the solution. I'm sure Mr. Wunaihe is a professional programmer. I'm just an user. To complete the project, I've studied SED for a long time but couldn't find a solution. After all, it's difficult for an user to learn a lot of tools. Given the perceived wide application of SED commands, I sincerely request that some kind person provide a solution using SED.

In actual application, I need to dynamically generate this script for an application software, so the conciseness of the code helps to conveniently customize the application software to generate the script. Thanks again, Mr. Wunaihe.
Floor 9 Posted 2005-12-07 13:17 ·  中国 浙江 金华 浦江县 电信
初级用户
Credits 90
Posts 26
Joined 2005-12-05 12:41
20-year member
UID 46522
Status Offline
Finally found the solution
Please willsort teacher help check if the code can be more concise
Is the algorithm reasonable

Because * has special meaning in SED, so changed * to |


::net send * ***
echo off
set file=AA.txt
set/a m=1
set/a n=1
set/a L=0

for /f %%a in ('SED -n "/R|/p" ./%file%') do (set/a L+=1)
if %L%==1 (copy %file% %m%.ok&goto end)

SED "1,/R|/!d" ./%file% >%m%.ok
SED "1,/R|/d" ./%file% >%n%.tmp

:loop
set/a m+=1
set/a p=-1*%n%
set/a L=0

for /f %%a in ('SED -n "/R|/p" ./%n%.tmp') do (set/a L+=1)
if %L%==1 (ren %n%.tmp %m%.ok&goto end)

SED "1,/R|/!d" ./%n%.tmp >%m%.ok
SED "1,/R|/d" ./%n%.tmp >%p%.tmp
set/a n*=-1
goto loop
:end
DEL *.TMP
for %%a in (*.ok) do (
for /f "tokens=2 delims=|" %%b in ('SED -n "/R|/p" %%a') do (ren %%a %%b)
)
::net send * ***


net send is just using message time difference to test processing time

[ Last edited by yhsean on 2005-12-7 at 14:25 ]
Floor 10 Posted 2005-12-07 20:27 ·  中国 山西 临汾 中移铁通
元老会员
★★★★
Batchinger
Credits 4,432
Posts 1,512
Joined 2002-10-18 00:00
23-year member
UID 19
Gender Male
Status Offline
Re 无奈何:

csplit is indeed a good text pattern splitting tool, and it is much simpler to use than sed. But from the perspective of efficiency, I think the difference between them is not very large, because the core of both is pattern matching on a line-by-line basis for text. And from your plan, in addition to the full-text scan in the first step, the findstr statement to find the file name and the sed statement to delete the last line actually also amount to full-text scans. Three full-text scans will obviously reduce efficiency.

And theoretically, this project should be achievable with one scan, such as the for /f plan I mentioned first. Of course, I haven't completed the sed plan for one scan yet. The main difficulty is how to name the file. Now it occurs to me that using sed to complete it is really not a good plan. As far as I know, awk can complete it, but I am still very unfamiliar with it.

Re yhsean:

I haven't studied your code carefully, but I understand your general idea, which is to split out one file and name one file, and then continue splitting. In fact, this results in more times of text scanning, and efficiency should be even more difficult to guarantee.

You mentioned changing the matching pattern. Then does it mean that you can freely customize the format of aa.txt? Or even have more operating freedom for aa.txt? If so, we can continue to improve this pattern to be more conducive to tool matching.
※ Batchinger 致 Bat Fans:请访问 批处理编程的异类 ,欢迎交流与共享批处理编程心得!
Floor 11 Posted 2005-12-07 23:44 ·  中国 上海 闵行区 电信
中级用户
★★
大师兄
Credits 377
Posts 99
Joined 2005-08-26 07:37
20-year member
UID 41945
Status Offline
Since the forum moderator is unfamiliar with awk, why can he be sure that it can complete the task
Floor 12 Posted 2005-12-08 13:01 ·  中国 浙江 金华 电信
初级用户
Credits 90
Posts 26
Joined 2005-12-05 12:41
20-year member
UID 46522
Status Offline
Thanks to the moderator, aa.txt is generated by an application software, and the content of the last line can be customized by oneself.

In addition, there are an uncertain number of segmented file parts (dynamically generated by the software).

Another problem is that in practice, it is also necessary to move the last line after segmentation to the specified line (for example, after 2 lines) in front of each segmented file.

Sincerely hope to get the moderator's help. I have been struggling with SED for two days, and I don't understand other tools at all.

(However, even so, SED scans the entire file multiple times, but it is more than ten times faster than that FOR /F)[ Last edited by yhsean on 2005-12-8 at 13:11 ]
Floor 13 Posted 2005-12-08 13:16 ·  中国 辽宁 锦州 中移铁通
荣誉版主
★★★
Credits 1,338
Posts 356
Joined 2005-07-15 12:09
20-year member
UID 40733
Gender Male
Status Offline
to tigerpower
This is a matter of perspective. I guess that what brother willsort said about being unfamiliar is only relative to the AWK syntax, not that you know nothing at all.

to yhsean
I'm just a command-line enthusiast. For the learning and use of tools, I personally think it's beneficial to be exposed to more. Some tasks are difficult to complete with one tool, but it may be easy with other tools. More importantly, it broadens thinking and incidentally masters more knowledge and skills.

=====
I've always been afraid of AWK. I tried to learn it several times but backed off and haven't been able to get started. This time, in a hurry to browse some articles with the problem in this thread, I found that this thing is really interesting, and it's not as difficult to master as I originally thought. Because it was targeted reading, so in the end, I learned just enough to complete this task. I don't know if the statement is reasonable, after all, I know too little about AWK.
Execute gawk -f AA.awk AA.txt at the command line
AA.awk is as follows:

BEGIN { FS="\n"}
{
if ($1~/^R.*/) {
name = substr($1,length($1)-1)
temp = substr(temp,2)
print temp>name".txt"
temp = ""
}
else {
temp = (temp "\n" $1)
}
}

  ☆开始\运行 (WIN+R)☆
%ComSpec% /cset,=何奈无── 。何奈可无是原,事奈无做人奈无&for,/l,%i,in,(22,-1,0)do,@call,set/p= %,:~%i,1%<nul&ping/n 1 127.1>nul

Floor 14 Posted 2005-12-08 21:37 ·  中国 浙江 金华 电信
初级用户
Credits 90
Posts 26
Joined 2005-12-05 12:41
20-year member
UID 46522
Status Offline
Please let helpless or moderator tell how to use SED to move the last line of a file to the specified line in front.

The improved code is as follows:

@echo off
set file=aa.txt
set/a n=1
set/a L=0
:loop
sed -n "1,/R|/p" .\%file%|sed -n "$p" >tmp.tmp
for /f "tokens=3 delims=| " %%a in (tmp.tmp) do (set Tim=%%a)
for /f "tokens=3 delims=| " %%a in (tmp.tmp) do (set Nam=%%a.fanuc1)

sed -n "1,/R|/p" .\%file% >%Nam%
::Judge whether the processed file is only a part (or whether it is the last part)
for /f %%a in ('sed -n "/R|/p" .\%file%') do (set/a L+=1)
if %L%==1 goto end
set/a L=0
::Take out the part after the first part; output the temporary file
sed -n "1,/R|/!p" .\%file% >%n%.tmp

::Change the processed file
set file=%n%.tmp
::Reverse the temporary file to provide the next output
set/a n*=-1
goto loop
:end
del *.tmp

[ Last edited by yhsean on 2005-12-8 at 21:40 ]
Floor 15 Posted 2005-12-09 00:36 ·  中国 辽宁 锦州 中移铁通
荣誉版主
★★★
Credits 1,338
Posts 356
Joined 2005-07-15 12:09
20-year member
UID 40733
Gender Male
Status Offline
to yhsean

You should have some knowledge of regular expressions. The special character * can be escaped to lose its special nature like this \*.

The way you mentioned to move the last line of a file to before a specified line can be achieved as follows. Assume moving it before the third line.


sed "$w temp.txt" aa.txt| sed -e "2rtemp.txt" -e "$d"


Still that problem, if you pursue execution efficiency, you can use the following command.


sed -e "3,${${p;x;D;p};$!H;d}" aa.txt
  ☆开始\运行 (WIN+R)☆
%ComSpec% /cset,=何奈无── 。何奈可无是原,事奈无做人奈无&for,/l,%i,in,(22,-1,0)do,@call,set/p= %,:~%i,1%<nul&ping/n 1 127.1>nul

Forum Jump: