|
yhsean
初级用户
 
积分 90
发帖 26
注册 2005-12-5
状态 离线
|
『楼 主』:
如何用批处理按标记分割文件
使用 LLM 解释/回答一下
现 有一文件 AA.TXT
内容如下:
FGAFFFFSFF
GGFDGG
R*01
FSFF
HHH
R*02
FSDGF
HDFH
GHGH
JJHGJR
R*03
....
kljljfljfkj
fmlfk
R*n
请问如何将文件按照R*标记分割,名字按*后字符为名字
即:01.txt 内容为
01.txt
FGAFFFFSFF
GGFDGG
02.txt 内容为
02.txt
FSFF
HHH
03.txt 内容为
03.txt
FSDGF
HDFH
GHGH
JJHGJR
.....
n.txt内容为
n.txt
kljljfljfkj
fmlfk
实际情况可能被分割的文件AA.TXT很大,分割的效率是务必考虑的
在此先感谢各位前辈们
Last edited by yhsean on 2005-12-8 at 13:33 ]
There is a file AA.TXT as follows:
FGAFFFFSFF
GGFDGG
R*01
FSFF
HHH
R*02
FSDGF
HDFH
GHGH
JJHGJR
R*03
....
kljljfljfkj
fmlfk
R*n
How to split the file according to the R* mark, and name it according to the characters after *? That is: 01.txt content is
01.txt
FGAFFFFSFF
GGFDGG
02.txt content is
02.txt
FSFF
HHH
03.txt content is
03.txt
FSDGF
HDFH
GHGH
JJHGJR
.....
n.txtcontent is
n.txt
kljljfljfkj
fmlfk
In actual situations, the file AA.TXT to be split may be very large, and efficiency of splitting must be considered
Thank you各位 predecessors in advance here
|
|
2005-12-5 13:02 |
|
|
willsort
元老会员
         Batchinger
积分 4432
发帖 1512
注册 2002-10-18
状态 离线
|
『第 2 楼』:
使用 LLM 解释/回答一下
Re yhsean:
使用批处理分割文件的效率不太可能很高,如果在其中使用第三方工具,则就脱离了批处理的意义。下面给出 cmd 下的例句,直接在命令行使用,使用了最简的原型,一些含有特殊符号的行可能不会正确的输出,比如含有*的非文件标记行。
for /f "tokens=1,2 delims=*" %f in (aa.txt) do @if "%f"=="R" (ren temp %g.txt) else (echo %f>>temp)
Re yhsean:
The efficiency of splitting files using batch processing is unlikely to be very high. If third-party tools are used in it, it will deviate from the meaning of batch processing. The following is an example under cmd, which is directly used in the command line. The simplest prototype is used. Some lines containing special symbols may not output correctly, such as non-file mark lines containing *.
for /f "tokens=1,2 delims=*" %f in (aa.txt) do @if "%f"=="R" (ren temp %g.txt) else (echo %f>>temp)
|

※ Batchinger 致 Bat Fans:请访问 批处理编程的异类 ,欢迎交流与共享批处理编程心得! |
|
2005-12-5 14:29 |
|
|
yhsean
初级用户
 
积分 90
发帖 26
注册 2005-12-5
状态 离线
|
『第 3 楼』:
使用 LLM 解释/回答一下
感谢版主回复
我也是用上面命令完成分割,但是当文件太大的时候,执行效率非常低,这样在实际工作中用的话,意义就不十分大了
刚来改论坛,感觉版主非常热心,再一次致谢。
贴出的目的在于能否找到一种更简洁有效的办法,不知道用DEBUG有没办法实现
Thanks to the moderator for the reply.
I also used the above command to complete the splitting, but when the file is too large, the execution efficiency is very low, so it is not very meaningful to use it in actual work.
Just came to the forum, feeling that the moderator is very enthusiastic, thank you again.
The purpose of posting is to see if there is a more concise and effective way. I wonder if DEBUG can be used to achieve it.
|
|
2005-12-5 18:39 |
|
|
yhsean
初级用户
 
积分 90
发帖 26
注册 2005-12-5
状态 离线
|
『第 4 楼』:
使用 LLM 解释/回答一下
另外请问版主如果使用第三方工具,如何在批处理中调用工具,(调用过程中,工具实现功能的时候强调完全自动——不能人工参与)
文件除了标记*外无别的*
Another question, moderator, how to call a third-party tool in a batch file if using a third-party tool, (during the calling process, when the tool realizes the function, it emphasizes full automation - no manual intervention)
The file has no other * except for the marked *
|
|
2005-12-5 18:55 |
|
|
willsort
元老会员
         Batchinger
积分 4432
发帖 1512
注册 2002-10-18
状态 离线
|
『第 5 楼』:
使用 LLM 解释/回答一下
Re yhsean:
DEBUG 应该可以实现,只是会更复杂,效率会更低,而且 debug 本身对大文件的处理性能也不是很理想。
批处理调用的第三方工具需要有一定的选择性,我们首选可以在命令行参数中进行全部操作的工具,其次才会考虑需要在运行过程中动态交互的工具,此时需要精心设计一个完备的应答文件。
你的问题应该可以用 sed 简单的实现,只是现在没有时间了;明天中午之前,若无其它达人提供此方案的实现,我会考虑完成一个。
Re yhsean:
DEBUG should be achievable, but it will be more complex, less efficient, and the debug itself is not very ideal for handling large files.
The third-party tools called by batch processing need to be selectively chosen. We first prefer tools that can perform all operations in command-line parameters, and only then consider tools that need dynamic interaction during operation, at which time a complete response file needs to be carefully designed.
Your problem should be simply achievable with sed, but there's no time now; before noon tomorrow, if no other experts provide the implementation of this solution, I will consider completing one.
|

※ Batchinger 致 Bat Fans:请访问 批处理编程的异类 ,欢迎交流与共享批处理编程心得! |
|
2005-12-5 19:02 |
|
|
yhsean
初级用户
 
积分 90
发帖 26
注册 2005-12-5
状态 离线
|
『第 6 楼』:
使用 LLM 解释/回答一下
昨天晚上下了个 SSED FOR WIN32 ,网上一番研究用法,介于行道太浅,未找出解决方案。因工作十分需要,期待willsort师之答案,感激不尽
另外众多SED版本中,是否SSED 会更好些
Last night I downloaded an SSED FOR WIN32. After researching how to use it online, but due to my shallow knowledge, I haven't found a solution. Since my work really needs it, I'm looking forward to the answer from teacher willsort, thanks a lot. Also, among many SED versions, is SSED better?
|
|
2005-12-6 08:54 |
|
|
无奈何
荣誉版主
      
积分 1338
发帖 356
注册 2005-7-15
状态 离线
|
『第 7 楼』:
使用 LLM 解释/回答一下
to yhsean
SED 是基于行的流编辑器不适合文本的分割。可以用 csplit 来完成这个分割任务。
下面给你提供一个方案,如下:
1、先执行下面的语句,分割文本。
csplit -z -b %02d.txt -f file AA.txt "/^R.*+/+1" {*}
2、生成文件重命名对照文件。
findstr /r "^R.**" file*.txt>rename.txt
3、删除文件最后一行,并重命名文件。
sed -i.bak "$d" file*.txt
sed "s/^\(.*\):R*\(\+\)/ren \1 \2.txt/" rename.txt>rname.bat
rname.bat
说明:
1、以上提到的第三方工具可以从 GNU utilities for Win32 工具包中找到。
2、SED 过低版本不支持 -i 参数,请使用 GNU SED ,非 GNU SED 与其表达式有部分差异,我不能保证能得到正确结果。
3、你可以将上面语句存为批处理文件执行。
如有问题跟贴提出。
to yhsean
SED is a line-based stream editor and not suitable for text splitting. You can use csplit to complete this splitting task.
The following is a scheme for you:
1. First execute the following statement to split the text.
csplit -z -b %02d.txt -f file AA.txt "/^R.*+/+1" {*}
2. Generate a file renaming reference file.
findstr /r "^R.**" file*.txt>rename.txt
3. Delete the last line of the file and rename the file.
sed -i.bak "$d" file*.txt
sed "s/^\(.*\):R*\(\+\)/ren \1 \2.txt/" rename.txt>rname.bat
rname.bat
Instructions:
1. The above-mentioned third-party tools can be found in the GNU utilities for Win32 toolkit.
2. SED of too low version does not support the -i parameter. Please use GNU SED. There are partial differences between non-GNU SED and its expressions, and I cannot guarantee to get the correct result.
3. You can save the above statements as a batch file to execute.
If you have any questions, please post them.
|

☆开始\运行 (WIN+R)☆
%ComSpec% /cset,=何奈无── 。何奈可无是原,事奈无做人奈无&for,/l,%i,in,(22,-1,0)do,@call,set/p= %,:~%i,1%<nul&ping/n 1 127.1>nul
|
|
2005-12-6 11:29 |
|
|
yhsean
初级用户
 
积分 90
发帖 26
注册 2005-12-5
状态 离线
|
『第 8 楼』:
使用 LLM 解释/回答一下
谢谢解答,想必无奈何先生是专业程序员,我只是应用者,为完成方案,研究SED好长时间,找不到解,毕竟对于一个应用者来说,学习大量的工具是有难度的,介于SED命令的广泛性(自我感觉而已),所以肯请那位仁兄给出SED之解。
实际应用的时候为一应用软件动态产生该脚本,所以代码的简洁性有助于方便定制应用软件使其生成脚本。再次感谢无奈何先生
Thanks for the solution. I'm sure Mr. Wunaihe is a professional programmer. I'm just an user. To complete the project, I've studied SED for a long time but couldn't find a solution. After all, it's difficult for an user to learn a lot of tools. Given the perceived wide application of SED commands, I sincerely request that some kind person provide a solution using SED.
In actual application, I need to dynamically generate this script for an application software, so the conciseness of the code helps to conveniently customize the application software to generate the script. Thanks again, Mr. Wunaihe.
|
|
2005-12-6 23:38 |
|
|
yhsean
初级用户
 
积分 90
发帖 26
注册 2005-12-5
状态 离线
|
『第 9 楼』:
使用 LLM 解释/回答一下
终于找到方案了
请willsort师帮忙检查下代码能否在简洁一点
算法是否合理
因为 * 在SED有特殊意义,所以将 * 改为了 |
::net send * ***
echo off
set file=AA.txt
set/a m=1
set/a n=1
set/a L=0
for /f %%a in ('SED -n "/R|/p" ./%file%') do (set/a L+=1)
if %L%==1 (copy %file% %m%.ok&goto end)
SED "1,/R|/!d" ./%file% >%m%.ok
SED "1,/R|/d" ./%file% >%n%.tmp
:loop
set/a m+=1
set/a p=-1*%n%
set/a L=0
for /f %%a in ('SED -n "/R|/p" ./%n%.tmp') do (set/a L+=1)
if %L%==1 (ren %n%.tmp %m%.ok&goto end)
SED "1,/R|/!d" ./%n%.tmp >%m%.ok
SED "1,/R|/d" ./%n%.tmp >%p%.tmp
set/a n*=-1
goto loop
:end
DEL *.TMP
for %%a in (*.ok) do (
for /f "tokens=2 delims=|" %%b in ('SED -n "/R|/p" %%a') do (ren %%a %%b)
)
::net send * ***
net send只是利用消息时间差测试处理时间
Last edited by yhsean on 2005-12-7 at 14:25 ]
Finally found the solution
Please willsort teacher help check if the code can be more concise
Is the algorithm reasonable
Because * has special meaning in SED, so changed * to |
::net send * ***
echo off
set file=AA.txt
set/a m=1
set/a n=1
set/a L=0
for /f %%a in ('SED -n "/R|/p" ./%file%') do (set/a L+=1)
if %L%==1 (copy %file% %m%.ok&goto end)
SED "1,/R|/!d" ./%file% >%m%.ok
SED "1,/R|/d" ./%file% >%n%.tmp
:loop
set/a m+=1
set/a p=-1*%n%
set/a L=0
for /f %%a in ('SED -n "/R|/p" ./%n%.tmp') do (set/a L+=1)
if %L%==1 (ren %n%.tmp %m%.ok&goto end)
SED "1,/R|/!d" ./%n%.tmp >%m%.ok
SED "1,/R|/d" ./%n%.tmp >%p%.tmp
set/a n*=-1
goto loop
:end
DEL *.TMP
for %%a in (*.ok) do (
for /f "tokens=2 delims=|" %%b in ('SED -n "/R|/p" %%a') do (ren %%a %%b)
)
::net send * ***
net send is just using message time difference to test processing time
Last edited by yhsean on 2005-12-7 at 14:25 ]
|
|
2005-12-7 13:17 |
|
|
willsort
元老会员
         Batchinger
积分 4432
发帖 1512
注册 2002-10-18
状态 离线
|
『第 10 楼』:
使用 LLM 解释/回答一下
Re 无奈何:
csplit 确实是个不错的文本模式分割工具,使用起来要比 sed 简单许多。但是从效率上来考虑,我认为他们相差不会很大,因为二者的核心都是对文本进行以行为单位的模式匹配。而从你的方案来看,除了第一步中对全文进行了扫描之外,查找文件名的findstr语句和删除最后一行的sed语句实际上也相当于扫描全文,三次全文扫描显然会降低效率。
而理论上这个项目应该是可以通过一次扫描实现的,比如我最先提到的for /f方案,当然我目前尚未完成一次扫描的 sed 方案,主要困难在于如何给文件命名上。现在想来,通过 sed 来完成确实不是太好的方案。就目前得知,awk应该可以完成,只是对它仍很陌生。
Re yhsean:
你的代码我没有仔细研究,但理解了你的大致思路,是分割出一个文件命名一个文件,然后再继续分割,这样实际上对文本扫描的次数更多了,效率应该更难保证。
你提到了更改了匹配模式,那么是否意味着 aa.txt 的格式你是可以自由定制的?甚或你对 aa.txt 有着更多的操作自由度?若果然如此,我们可以继续改善这个模式,以更利于工具的匹配。
Re 无奈何:
csplit is indeed a good text pattern splitting tool, and it is much simpler to use than sed. But from the perspective of efficiency, I think the difference between them is not very large, because the core of both is pattern matching on a line-by-line basis for text. And from your plan, in addition to the full-text scan in the first step, the findstr statement to find the file name and the sed statement to delete the last line actually also amount to full-text scans. Three full-text scans will obviously reduce efficiency.
And theoretically, this project should be achievable with one scan, such as the for /f plan I mentioned first. Of course, I haven't completed the sed plan for one scan yet. The main difficulty is how to name the file. Now it occurs to me that using sed to complete it is really not a good plan. As far as I know, awk can complete it, but I am still very unfamiliar with it.
Re yhsean:
I haven't studied your code carefully, but I understand your general idea, which is to split out one file and name one file, and then continue splitting. In fact, this results in more times of text scanning, and efficiency should be even more difficult to guarantee.
You mentioned changing the matching pattern. Then does it mean that you can freely customize the format of aa.txt? Or even have more operating freedom for aa.txt? If so, we can continue to improve this pattern to be more conducive to tool matching.
|

※ Batchinger 致 Bat Fans:请访问 批处理编程的异类 ,欢迎交流与共享批处理编程心得! |
|
2005-12-7 20:27 |
|
|
tigerpower
中级用户
   大师兄
积分 377
发帖 99
注册 2005-8-26
状态 离线
|
『第 11 楼』:
使用 LLM 解释/回答一下
既然板主对awk很陌生,何以断定它可以完成任务呢 
Since the forum moderator is unfamiliar with awk, why can he be sure that it can complete the task :D
|
|
2005-12-7 23:44 |
|
|
yhsean
初级用户
 
积分 90
发帖 26
注册 2005-12-5
状态 离线
|
『第 12 楼』:
使用 LLM 解释/回答一下
感谢版主,aa.txt是由一应用软件产生,其中最后行内容可以自己定制
另外被分割的文件有不确定的若干部分(由软件动态产生)
还有个问题就是实际中还要将分割后最后一行搬移到每一个分割后文件前面指定行(比如2行后)
真诚希望得到版主的帮助,SED也是猛肯了两天,别的工具是一点不懂啊
(不过尽管如此,SED多次扫描文件全文,但比起那个FOR /F也快十几倍)
Last edited by yhsean on 2005-12-8 at 13:11 ]
Thanks to the moderator, aa.txt is generated by an application software, and the content of the last line can be customized by oneself.
In addition, there are an uncertain number of segmented file parts (dynamically generated by the software).
Another problem is that in practice, it is also necessary to move the last line after segmentation to the specified line (for example, after 2 lines) in front of each segmented file.
Sincerely hope to get the moderator's help. I have been struggling with SED for two days, and I don't understand other tools at all.
(However, even so, SED scans the entire file multiple times, but it is more than ten times faster than that FOR /F) Last edited by yhsean on 2005-12-8 at 13:11 ]
|
|
2005-12-8 13:01 |
|
|
无奈何
荣誉版主
      
积分 1338
发帖 356
注册 2005-7-15
状态 离线
|
『第 13 楼』:
使用 LLM 解释/回答一下
to tigerpower
这是个眼界问题,我猜想 willsort 兄所说的陌生只是相对AWK语法来说的,而并非一无所知。
to yhsean
我只是一名命令行爱好者。对于工具的学习与使用,我个人认为多接触一些是有好处的,某些任务用某一种工具很难完成,但换用其他工具可能很容易,更重要的是拓宽了思路顺便掌握了更多的知识与技巧。
=====
一直以来很畏惧 AWK ,几次尝试学习都退缩回来,至今未能入门。这次带着本帖的问题匆匆浏览了一些文章,发觉这东西真的很有意思,也并非如我原先想像中的难以掌握。因为是有针对性的阅读,所以最后学到的东西不多不少刚刚能够完成这个任务,也不知道语句是否合理,毕竟对 AWK 的了解太少了。
在命令行下执行 gawk -f AA.awk AA.txt
AA.awk 如下:
BEGIN { FS="\n"}
{
if ($1~/^R.*/) {
name = substr($1,length($1)-1)
temp = substr(temp,2)
print temp>name".txt"
temp = ""
}
else {
temp = (temp "\n" $1)
}
}
to tigerpower
This is a matter of perspective. I guess that what brother willsort said about being unfamiliar is only relative to the AWK syntax, not that you know nothing at all.
to yhsean
I'm just a command-line enthusiast. For the learning and use of tools, I personally think it's beneficial to be exposed to more. Some tasks are difficult to complete with one tool, but it may be easy with other tools. More importantly, it broadens thinking and incidentally masters more knowledge and skills.
=====
I've always been afraid of AWK. I tried to learn it several times but backed off and haven't been able to get started. This time, in a hurry to browse some articles with the problem in this thread, I found that this thing is really interesting, and it's not as difficult to master as I originally thought. Because it was targeted reading, so in the end, I learned just enough to complete this task. I don't know if the statement is reasonable, after all, I know too little about AWK.
Execute gawk -f AA.awk AA.txt at the command line
AA.awk is as follows:
BEGIN { FS="\n"}
{
if ($1~/^R.*/) {
name = substr($1,length($1)-1)
temp = substr(temp,2)
print temp>name".txt"
temp = ""
}
else {
temp = (temp "\n" $1)
}
}
|

☆开始\运行 (WIN+R)☆
%ComSpec% /cset,=何奈无── 。何奈可无是原,事奈无做人奈无&for,/l,%i,in,(22,-1,0)do,@call,set/p= %,:~%i,1%<nul&ping/n 1 127.1>nul
|
|
2005-12-8 13:16 |
|
|
yhsean
初级用户
 
积分 90
发帖 26
注册 2005-12-5
状态 离线
|
『第 14 楼』:
使用 LLM 解释/回答一下
请 无奈何 或斑竹告诉如何用SED将文件的末行移动到前面指定行
经过改进代码如下:
@echo off
set file=aa.txt
set/a n=1
set/a L=0
:loop
sed -n "1,/R|/p" .\%file%|sed -n "$p" >tmp.tmp
for /f "tokens=3 delims=| " %%a in (tmp.tmp) do (set Tim=%%a)
for /f "tokens=3 delims=| " %%a in (tmp.tmp) do (set Nam=%%a.fanuc1)
sed -n "1,/R|/p" .\%file% >%Nam%
::判断已经处理的文件是否只有一部分(或是否为最后一部分)
for /f %%a in ('sed -n "/R|/p" .\%file%') do (set/a L+=1)
if %L%==1 goto end
set/a L=0
::取出第一部分后的部分;输出临时文件
sed -n "1,/R|/!p" .\%file% >%n%.tmp
::改变被处理的文件
set file=%n%.tmp
::反转临时文件以提供下次输出
set/a n*=-1
goto loop
:end
del *.tmp
Last edited by yhsean on 2005-12-8 at 21:40 ]
Please let helpless or moderator tell how to use SED to move the last line of a file to the specified line in front.
The improved code is as follows:
@echo off
set file=aa.txt
set/a n=1
set/a L=0
:loop
sed -n "1,/R|/p" .\%file%|sed -n "$p" >tmp.tmp
for /f "tokens=3 delims=| " %%a in (tmp.tmp) do (set Tim=%%a)
for /f "tokens=3 delims=| " %%a in (tmp.tmp) do (set Nam=%%a.fanuc1)
sed -n "1,/R|/p" .\%file% >%Nam%
::Judge whether the processed file is only a part (or whether it is the last part)
for /f %%a in ('sed -n "/R|/p" .\%file%') do (set/a L+=1)
if %L%==1 goto end
set/a L=0
::Take out the part after the first part; output the temporary file
sed -n "1,/R|/!p" .\%file% >%n%.tmp
::Change the processed file
set file=%n%.tmp
::Reverse the temporary file to provide the next output
set/a n*=-1
goto loop
:end
del *.tmp
Last edited by yhsean on 2005-12-8 at 21:40 ]
|
|
2005-12-8 21:37 |
|
|
无奈何
荣誉版主
      
积分 1338
发帖 356
注册 2005-7-15
状态 离线
|
『第 15 楼』:
使用 LLM 解释/回答一下
to yhsean
你应该掌握一些正则表达式的知识,关于特殊字符 * 可以这样 \* 脱掉其特殊性。
你提到的将文件的末行移动到前面指定行前,可以这样实现。
假定移至第三行前。
sed "$w temp.txt" aa.txt| sed -e "2rtemp.txt" -e "$d"
还是那个问题,如果你追求执行效率可以用下面的命令。
sed -e "3,${${p;x;D;p};$!H;d}" aa.txt
to yhsean
You should have some knowledge of regular expressions. The special character * can be escaped to lose its special nature like this \*.
The way you mentioned to move the last line of a file to before a specified line can be achieved as follows. Assume moving it before the third line.
sed "$w temp.txt" aa.txt| sed -e "2rtemp.txt" -e "$d"
Still that problem, if you pursue execution efficiency, you can use the following command.
sed -e "3,${${p;x;D;p};$!H;d}" aa.txt
|

☆开始\运行 (WIN+R)☆
%ComSpec% /cset,=何奈无── 。何奈可无是原,事奈无做人奈无&for,/l,%i,in,(22,-1,0)do,@call,set/p= %,:~%i,1%<nul&ping/n 1 127.1>nul
|
|
2005-12-9 00:36 |
|