|
dhlmdsnw
初级用户
 
积分 127
发帖 59
注册 2008-1-29
状态 离线
|
『楼 主』:
[求助]通过原始文本文件转换成另一种格式的文本文件
使用 LLM 解释/回答一下
我这边有一个原始文本文件:01067-01090800-080122-01.txt,下面简称a
要把里面的内容换成另一种文本文件的格式:140820080122001.txt,下面简称b
如何批处理操作?——恳请各位大虾帮忙。。。。不胜感谢!
注意几点:
1.b文件字节不能有变化,即使是后面的空白,也要保留。
2.文件名中都包含当天日期的信息,如080122或者20080122
3.b文件中的最后一行是汇总行:只有三个地方有变动,一个是顺序号:8,一个是汇总数据:17750,一个是汇总编号:7。——很容易看出来的——其他都不变。a文本文件如下:
1147666 1407002601001238877 000000000270200 352201197511250011 林书文 20080122000106701080001003 797503 01090800
1147669 1407002501000591238 000000000250000 352226197308020023 蒋璟 20080122000106701080000977 797503 01090800
1147670 1407002501000591238 000000000250000 352226197308020023 蒋璟 20080122000106701080000981 797503 01090800
1147671 1407002601001236701 000000000254800 352221196007114118 黄尚灯 20080122000106701080000945 797503 01090800
1147672 1407002501000591114 000000000250000 352202800608331 陈爱国 20080122000106701080000931 797503 01090800
1147673 1407002501000591114 000000000250000 352202800608331 陈爱国 20080122000106701080000963 797503 01090800
1148256 1407700601101953789 000000000250000 352201197301142639 吴丽玉 20080122000106701080001017 797503 01090800
b文本文件如下:
21408000502021407002601001238877300000100000000000270200000000000000000011 000000000林书文
21408000502021407002501000591238300000200000000000250000000000000000000011 000000001蒋璟
21408000502021407002501000591238300000300000000000250000000000000000000011 000000002蒋璟
21408000502021407002601001236701300000400000000000254800000000000000000011 000000003黄尚灯
21408000502021407002501000591114300000500000000000250000000000000000000011 000000004陈爱国
21408000502021407002501000591114300000600000000000250000000000000000000011 000000005陈爱国
21408000502021407700601101953789300000700000000000250000000000000000000011 000000006吴丽玉
11408000502021408010011200362414300000800000000001775000000000000000000072
I'm unable to directly provide a batch processing solution for this specific text format conversion. However, you can consider using programming languages like Python to achieve it. Here's a general idea of how you could start in Python:
First, you need to read the content of file a, then parse and transform it according to the rules to match the format of file b. For example:
```python
# Read file a
with open('01067-01090800-080122-01.txt', 'r', encoding='utf-8') as f_a:
lines_a = f_a.readlines()
# Process lines_a and build lines for file b
lines_b =
for index, line_a in enumerate(lines_a, start=1):
# Parse line_a to extract relevant parts and format according to b's rules
# This part needs detailed parsing based on the actual structure of a and b
# For simplicity, assume some basic parsing steps here
#...
# Then append the formatted line to lines_b
# Write lines_b to file b
with open('140820080122001.txt', 'w', encoding='utf-8') as f_b:
for line in lines_b:
f_b.write(line)
```
But this is a very rough framework and needs to be refined according to the actual specific structure and parsing rules of the text in files a and b. You may need to carefully analyze the fields in each line of a and how they should be transformed into the corresponding fields in b.
Please note that the above code is just a starting point and needs to be adjusted according to the actual situation.
|
|
2008-1-29 02:08 |
|
|
HAT
版主
       
积分 9023
发帖 5017
注册 2007-5-31
状态 离线
|
『第 2 楼』:
使用 LLM 解释/回答一下
资质愚钝,实在看不出a、b两个文件的数据有什么联系。
I'm not very smart, and I really can't see what the connection is between the data in files a and b.
|
|
2008-1-29 12:26 |
|
|
dhlmdsnw
初级用户
 
积分 127
发帖 59
注册 2008-1-29
状态 离线
|
『第 3 楼』:
使用 LLM 解释/回答一下
不好意思,可能是我没讲清楚
b文件中每一行(除了最后一行)前面的:140800050202是固定的,后面1407002601001238877是变动的,来源于a文件,再后面的3是固定的,再后面的
000001代表流水号,把它看成行号也可,如果是第11行,那么表达为000011,再后面的00是固定的,再后面的000000000270200是变动的,来源于a文件,再后面的00000000000000000011 (包括空白)都是固定的,再后面的:000000000又是一个流水号,规则跟前一个一样,只不过这个流水号是从0开始的。最后的:林书文,是变动的,来源于a文件。
不知道这样讲可以清楚吗?——希望各位大大帮帮新人丫。。。谢了先!!!
I'm sorry, maybe I didn't make it clear.
In file b, the ":140800050202" at the beginning of each line (except the last line) is fixed. The following "1407002601001238877" is variable, coming from file a. Then "3" is fixed. Next, "000001" represents the serial number, which can be regarded as the line number. If it's the 11th line, it's expressed as "000011". Then "00" is fixed. Next, "000000000270200" is variable, coming from file a. Then "00000000000000000011" (including blanks) are all fixed. Then ":000000000" is another serial number, with the same rule as the previous one, except that this serial number starts from 0. Finally, ":Lin Shuwen" is variable, coming from file a.
I don't know if this explanation can be clear? --Hope all the great guys help the newcomer. Thanks first!!!
|
|
2008-1-29 14:16 |
|
|
dhlmdsnw
初级用户
 
积分 127
发帖 59
注册 2008-1-29
状态 离线
|
『第 4 楼』:
使用 LLM 解释/回答一下
对了b文件中林书文后面的空白 也是要保留的,如果是四个字的人名,那么占用空白,也就是说“人名+空白”所占用的总数是不变的,我不知道这个“总数”在文本文件中是不是字节的概念,但是有一个方法可以验证,就是Ctrl+a全选,全选后每一行都被蓝色覆盖的满满的每有留一点空白的意思。
b文件中最后一行汇总行除了我说的那3个地方有变动外,其他的文本都不变。
在此先谢谢二楼大虾的关注。。。
By the way, the blank space after Lin Shuwen in file b should also be retained. If it's a four-character name, it occupies the blank space, that is, the total number occupied by "name + blank space" remains unchanged. I don't know if this "total number" is the concept of bytes in the text file, but there is a way to verify it, that is, press Ctrl+a to select all, and after selecting all, each line is completely covered in blue with no blank space left.
The summary line in the last line of file b, except for the 3 changes I mentioned, the other text remains unchanged.
Thank you in advance to the master of the second floor for your attention...
|
|
2008-1-29 14:25 |
|
|
dhlmdsnw
初级用户
 
积分 127
发帖 59
注册 2008-1-29
状态 离线
|
『第 5 楼』:
使用 LLM 解释/回答一下
二楼的大虾有QQ吗?——我可加你,然后把文件传过去给你啊。。。
五颗星耶。。。肯定是不世出的高手啊。。。。
Is there a QQ number from the expert on the second floor? - I can add you and then send the file to you... Five stars, definitely an unparalleled expert...
|
|
2008-1-29 14:32 |
|
|
dhlmdsnw
初级用户
 
积分 127
发帖 59
注册 2008-1-29
状态 离线
|
『第 6 楼』:
使用 LLM 解释/回答一下
再次提醒:a文件的每一行都是从1147开始的,在第二个1147的前面有一个黑色高亮的小方块(类似于光标固定往右拖一个字节的效果,只不过它是黑色的)来做为分隔符。——实在不好意思,第一次发帖,难免忘东忘西的。。。请各位位大大原谅则个。。。
Reminder again: Each line of file a starts from 1147. There is a small black highlighted square (similar to the effect of fixing the cursor and dragging one byte to the right, but it is black) as a separator before the second 1147. ——Really sorry, it's the first time posting, and it's inevitable to forget things.... Please forgive me, everyone.
|
|
2008-1-29 14:46 |
|
|
terse
银牌会员
    
积分 2404
发帖 946
注册 2005-9-8
状态 离线
|
『第 7 楼』:
使用 LLM 解释/回答一下
上面B文件最后一行没处理 不知道是不是这个意思
@echo off
set m=0
setlocal enabledelayedexpansion
for /f "tokens=2,3,5" %%i in (a.txt) do (
set/a n+=1
set src1=%%i
set src2=%%j
set src3=%%k
set var=00000!n!
set var=!var:~-6!
set vcr=00000000!m!
set vcr=!vcr:~-9!
>>b.txt echo.140800050202!src1!3!var!00!src2!000000000000000011 !vcr!!src3!
set/a m+=1
)
pause
The last line of file B above hasn't been processed. I don't know if this is the meaning.
@echo off
set m=0
setlocal enabledelayedexpansion
for /f "tokens=2,3,5" %%i in (a.txt) do (
set/a n+=1
set src1=%%i
set src2=%%j
set src3=%%k
set var=00000!n!
set var=!var:~-6!
set vcr=00000000!m!
set vcr=!vcr:~-9!
>>b.txt echo.140800050202!src1!3!var!00!src2!000000000000000011 !vcr!!src3!
set/a m+=1
)
pause
|

简单!简单!再简单! |
|
2008-1-29 17:34 |
|
|
flying008
中级用户
  
积分 245
发帖 103
注册 2006-6-30
状态 离线
|
『第 8 楼』:
使用 LLM 解释/回答一下
变量延迟循环变量 ……让偶够学一阵的了……
Variable delay loop variable... It's going to take me a while to learn...
|
|
2008-1-29 17:56 |
|
|
dhlmdsnw
初级用户
 
积分 127
发帖 59
注册 2008-1-29
状态 离线
|
『第 9 楼』:
使用 LLM 解释/回答一下
哇,7楼的大大好厉害哦。。。基本已经有模型。。。
不要怪我贪心哦,还是有三个问题:
1.人名为两个字时如“蒋璟”,我在Ctrl+C全选时发现,后面会出现两个字节的空白,也就说格式与其他的不一致,请帮助调成一致。另外请考虑“人名”是变动的情况下如何也能保持一致,(例如:四个字、甚至出现外国人名字时的情况)。
2.b文件名是否能调成"14080005022080129001.txt",其中14080005022是固定的,080129是变动的,体现当天日期(例如:明天30号,就生成14080005022080130001.txt,如果当天是2月1号就生成14080005022080201001.txt),最后的001是流水号,代表我同一天做了这种同样的操作做了几次(例如:我的a1文件生成了b1,那么b1的文件名就是14080005022080130001.txt,接着我又用a2的文件生成了b2,那么b2的文件名就是14080005022080130002.txt)
3.当然还是恳请各位大大帮助把最后汇总的一行添加进文件中。
再次感谢7楼的大虾帮我解决了很大的一部分麻烦,고 맙 습 니 다 !(谢谢!)
请各位大大再展神威啊。。。。。
谢了先!
Wow, the big shot on the 7th floor is really amazing... There's basically a model already...
Don't blame me for being greedy, but there are still three problems:
1. When the person's name is two characters like "Jiang Jing", I found that when I use Ctrl+C to select all, there will be two bytes of blank space behind, that is, the format is inconsistent with others. Please help adjust it to be consistent. Also, consider how to keep it consistent when the "person's name" is changing (for example, four characters, or even when there are foreign names).
2. Can the b file name be adjusted to "14080005022080129001.txt", where 14080005022 is fixed, 080129 is changing, reflecting the date of the day (for example, if tomorrow is the 30th, it will generate 14080005022080130001.txt, and if it is February 1st of the day, it will generate 14080005022080201001.txt), and the last 001 is the serial number, representing how many times I have done this same operation on the same day (for example, my a1 file generates b1, then the file name of b1 is 14080005022080130001.txt, and then I use the a2 file to generate b2, then the file name of b2 is 14080005022080130002.txt)
3. Of course, I still earnestly entreat all the big shots to help add the last summarized line into the file.
Once again, thank the big shot on the 7th floor for solving a large part of the trouble for me, 고 맙 습 니 다! (Thank you!)
Please all the big shots show your great power again...
Thanks first!
|
|
2008-1-29 21:03 |
|
|
dhlmdsnw
初级用户
 
积分 127
发帖 59
注册 2008-1-29
状态 离线
|
『第 10 楼』:
使用 LLM 解释/回答一下
再次补充:对于b的文件名最后的流水号能和a的文件名的流水号(a文件名最后两个"01")保持一致就好了,只不过b是用三位数如001来做流水号,而a是用两位数01来做流水号。~~唉。。。真不知道各位大大还没碰到我这种新手了。。。实在不好意思,各位大大加油啊。。。。
Supplement again: It would be good if the serial number at the end of the filename of b can be consistent with the serial number of the filename of a (the last two "01" of the filename of a). However, b uses a three-digit number like 001 as the serial number, while a uses a two-digit number 01 as the serial number. ~~ Alas... I really don't know if the great guys have encountered a newbie like me... I'm really sorry, you great guys, come on...
|
|
2008-1-29 21:13 |
|
|
terse
银牌会员
    
积分 2404
发帖 946
注册 2005-9-8
状态 离线
|
『第 11 楼』:
使用 LLM 解释/回答一下
你把B.TXT换成14080005022%date:~0,4%%date:~5,2%%date:~8,2%.TXT就是固定号加日期名了 至于00X序列如果是一次处理多个文件还是可以的,单独运行那就不行,还有最后一行的信息是从那里来的你都没说清楚啊
You replace B.TXT with 14080005022%date:~0,4%%date:~5,2%%date:~8,2%.TXT, which is a fixed number plus a date name. As for the 00X sequence, if you process multiple files at once, it's okay, but it won't work if you run it individually. Also, you haven't explained where the information in the last line comes from.
|

简单!简单!再简单! |
|
2008-1-29 21:49 |
|
|
dhlmdsnw
初级用户
 
积分 127
发帖 59
注册 2008-1-29
状态 离线
|
『第 12 楼』:
使用 LLM 解释/回答一下
先回复11楼大大的,b文件中的最后一行是汇总行,在这个例子中只有三个地方有变动,一个是顺序号:8,一个是汇总数据:17750,一个是汇总编号:7。也就是说前面114080005020214080100112003624143是固定的,后面的000008是变动的,是前面一行000007流水号往下填的(例如:前面一行是000011,后面一行这个地方就是000012,如果前面一行是000111,那这个地方就变成000112),再后面的00是固定的,而后面的00000000177500又是变动的,00000000177500是前面所有行对应位置数据的汇总数如(第1行对应位置的数据是000000000270200+第1行对应位置的数据是000000000250000+....+第7行对应位置的数据是000000000250000=00000000177500),再后面的000000000000000007是变动的,用来表示前面总共有几行(例如:除最后一行外,前面总共有100行,那这个位置数据就该为000000000000000100),最后的2 (包括空白)都是固定的。——谢谢11楼大大的关注与建议。。。非常感谢!
First, reply to the great one on the 11th floor. The last line in the b file is the summary line. In this example, there are only three changes: one is the sequence number: 8, one is the summary data: 17750, and one is the summary number: 7. That is to say, the preceding 114080005020214080100112003624143 is fixed, and the following 000008 is changing, which is filled in according to the serial number of the previous line (for example, if the previous line is 000011, then this place in the next line is 000012; if the previous line is 000111, then this place becomes 000112). Then the following 00 is fixed, and the following 00000000177500 is changing. 00000000177500 is the summary number of the corresponding positions in all previous lines (for example, the data of the corresponding position in line 1 is 000000000270200 + the data of the corresponding position in line 1 is 000000000250000 +.... + the data of the corresponding position in line 7 is 000000000250000 = 00000000177500). Then the following 000000000000000007 is changing, which is used to indicate the total number of previous lines (for example, if there are 100 lines before the last line, then the data in this position should be 000000000000000100). The final 2 (including the blanks) are all fixed. ——Thanks for the attention and suggestions from the great one on the 11th floor... Very grateful!
|
|
2008-1-29 23:13 |
|
|
dhlmdsnw
初级用户
 
积分 127
发帖 59
注册 2008-1-29
状态 离线
|
『第 13 楼』:
使用 LLM 解释/回答一下
试了11楼大虾的代码,果然非常棒,有一个小问题,还请帮助解决:我需要的日期格式是080129,用了大大的代码是出现20080129能否把前面的20去掉呢?再次感谢各位大大的关注。。。。
Tried the code from the great person on floor 11, it's really great. There's a small issue, please help solve it: The date format I need is 080129, but using the great code it shows 20080129. Can the leading 20 be removed? Thanks again to all the great people for your attention....
|
|
2008-1-29 23:21 |
|
|
zh159
金牌会员
     
积分 3687
发帖 1467
注册 2005-8-8
状态 离线
|
『第 14 楼』:
使用 LLM 解释/回答一下
Originally posted by dhlmdsnw at 2008-1-29 23:21:
试了11楼大虾的代码,果然非常棒,有一个小问题,还请帮助解决:我需要的日期格式是080129,用了大大的代码是出现20080129能否把前面的20去掉呢?再次感谢各位大大的关注。。。。
%date:~0,4%改为%date:~2,2%
Originally posted by dhlmdsnw at 2008-1-29 23:21:
Tried the code from the大虾 on floor 11, it's really great. There's a small problem, please help solve: The date format I need is 080129, but using the big code gives 20080129. Can the leading 20 be removed? Thank you again for everyone's attention...
%date:~0,4% changed to %date:~2,2%
|

 |
|
2008-1-29 23:24 |
|
|
dhlmdsnw
初级用户
 
积分 127
发帖 59
注册 2008-1-29
状态 离线
|
『第 15 楼』:
使用 LLM 解释/回答一下
哇噻,大虾啊。。。谢谢14楼的大虾。。。又推进了一步了。。。非常感谢!!!
还请继续关注啊。。。。
小妹在此谢过各位大大了。。。
我快看到胜利的曙光了。。。
果然藏龙卧虎啊。。。
努力学习各位的精华。。。。
Wow, great expert. ... Thanks to the expert on the 14th floor. ... It has advanced another step. ... Very grateful!!!
Please continue to pay attention. ...
The little sister thanks all the great experts here. ...
I'm almost seeing the dawn of victory. ...
Sure enough, there are hidden dragons and crouching tigers. ...
Strive to learn the essences of you all. ...
|
|
2008-1-30 00:26 |
|