在CMD中,对含特殊字符的文本内容的输出处理一直是件很令人头痛的事情:如果要兼容特殊字符,一般会用引号把内容括起来再输出,但是这样一来,就会在所有输出行的首尾都添加了引号,如果对输出后的引号十分在意的话,这个方案就没法实行了。可是,除了这个方案之外,似乎没有别的方案能完美地解决这个难题。
(注:完美方案请参考23楼bjsh的代码)
最近这段时间略有闲暇,把这个问题又拿出来思考了一阵子,几经修改,就有了代码1,发出来让大家测试一下:
代码1:
@echo off
:: 思路:把所有的特殊符号转义之后输出
:: 所受限制:要处理的文件不能用引号括起来;
cd.>output.txt
for /f "delims=" %%i in ('findstr /n .* test.txt') do (
set "str=%%i"
call set "str=%%str:*:=%%"
if defined str (call :output) else echo.>>output.txt
)
start output.txt
exit
:output
set "str=%str:^=^^%"
set "str=%str:>=^>%"
set "str=%str:<=^<%"
set "str=%str:|=^|%"
set "str=%str:&=^&%"
set "str=%str:"=^"%"
call echo.%%str%%>>output.txt
goto :eof
修改自21楼bjsh的代码如下,个人认为是比较完美的方案了:
代码2:
@echo off
cd.>output.txt
for /f "delims=" %%i in ('findstr /n .* test.txt') do (
set "str=%%i"
call set "str=%%str:*:=%%"
if defined str (call :output) else echo.>>output.txt
)
start output.txt
exit
:output
set "str=%str:^=^^%"
set "str=%str:"=%"
set "str=%str:>=^>%"
set "str=%str:<=^<%"
set "str=%str:&=^&%"
set "str=%str:|=^|%"
set "str=%str:="%"
(call echo.%%str%%)>>output.txt
goto :eof
最完美的代码如下(来自23楼bjsh的代码,本人仅作少量改动):
代码3:
@echo off
cd.>output.txt
for /f "delims=" %%i in ('findstr /n .* test.txt') do (
set "var=%%i"
setlocal enabledelayedexpansion
set var=!var:*:=!
(echo.!var!)>>output.txt
endlocal
)
start output.txt
测试文件test.txt的内容(请注意:其中一个空行是以空格组成的):
"aou"eo
;euou%^>
::::aeui
:::E2uo alejou 3<o2io|
^aue||%ou
!aue!
aoue eou 2
!str!auoeu!ueo &&
euo 8
ueyi^^^^aueuo2
~ ! @ # $ % ^ & * ( () " ok " No " <>nul
set ok=^
关于代码1及代码2,分析如下:
① 按照一般的思路,for语句中引用变量,都是使用 setlocal enabledelayedexpansion 语句来启用变量延迟功能,但是,这个功能有个致命的缺陷:当要处理的字符串中含有感叹号的时候,会把感叹号对及其之间的所有字符串置换为空,所以,代码1和代码2抛弃 setlocal 方案,使用 call 一段子过程的方案;
② 如果要处理的文本含有奇数个引号的话,echo.%str%>>output.txt 语句将会出错,所以直接把引号替换为特殊的不可见字符之后再输出(也就是在代码中显示出来的黑框);感谢lxmxn的测试和bjsh的分析;
③ 在 :output 子过程中,set "str=%str:^=^^%" 一句必须放在所有替换语句之前,否则,将会把^重复替换,导致结果不准确;感谢 lxmxn 的测试;
④ 普通的 for 语句会忽略以分号打头的行内容,对空行也会忽略掉,所以,使用 findstr .* test.txt 语句来显示所有行(包括空行);delims=: 会把行首的所有冒号抛弃,所以,使用了 call set "str=%%str:*:=%%" 语句来避免这种情况;
⑤ (call echo.%%str%%)>>output.txt 语句中echo后紧跟的点号不能省略,否则,当行内容为空格的时候,输出后的内容将会显示echo的当前状态;使用call语句是为了兼容带引号的行;使用括号是为了能正确处理行尾是以空格分隔的单独的1~9这九个数字。
⑥ :output 标签段之所以不用 for %%i in (^^ ^> ^< ^| ^&) do call set "str=%%str:%%i=^%%i%%" 这样的语句,是因为这条替换语句并不能正确替换,看来for语句中的 call 延迟机制确实有点让人费解。
关于代码3,由于水平有限,只能做点肤浅而模糊的分析:在这段代码中,利用变量延迟功能来完整地获取特殊字符,并在适当的时候终止变量延迟,以避免因变量延迟过度而造成字符串被识别为变量的问题,实际上,这还是CMD预处理机制在起作用。值得注意的是,setlocal 语句的位置不能与 set 语句做调换,否则,仍然会导致感叹号被识别为变量引用符号,从而被抛弃掉。
Last edited by namejm on 2007-8-14 at 09:27 PM ]
In the CMD, the output processing of text content containing special characters has always been quite a headache: If you want to be compatible with special characters, you usually enclose the content in quotes and then output it. But in this way, quotes will be added at the beginning and end of all output lines. If you are very concerned about the quotes after output, this solution cannot be implemented. However, except for this solution, it seems there are no other solutions that can perfectly solve this difficult problem.
Note: For the perfect solution, please refer to the code of bjsh on floor 23
Recently, I have had a little free time and thought about this problem again for a while. After several modifications, there is Code 1, which is sent out for everyone to test:
Code 1:
@echo off
:: Idea: Escape all special symbols and then output
:: Restriction: The file to be processed cannot be enclosed in quotes;
cd.>output.txt
for /f "delims=" %%i in ('findstr /n .* test.txt') do (
set "str=%%i"
call set "str=%%str:*:=%%"
if defined str (call :output) else echo.>>output.txt
)
start output.txt
exit
:output
set "str=%str:^=^^%"
set "str=%str:>=^>%"
set "str=%str:<=^<%"
set "str=%str:|=^|%"
set "str=%str:&=^&%"
set "str=%str:"=^"%"
call echo.%%str%%>>output.txt
goto :eof
The modification from the code on floor 21 by bjsh is as follows. I personally think it is a relatively perfect solution:
Code 2:
@echo off
cd.>output.txt
for /f "delims=" %%i in ('findstr /n .* test.txt') do (
set "str=%%i"
call set "str=%%str:*:=%%"
if defined str (call :output) else echo.>>output.txt
)
start output.txt
exit
:output
set "str=%str:^=^^%"
set "str=%str:"=%"
set "str=%str:>=^>%"
set "str=%str:<=^<%"
set "str=%str:&=^&%"
set "str=%str:|=^|%"
set "str=%str:="%"
(call echo.%%str%%)>>output.txt
goto :eof
The most perfect code is as follows (from the code of bjsh on floor 23, I only made a small modification):
Code 3:
@echo off
cd.>output.txt
for /f "delims=" %%i in ('findstr /n .* test.txt') do (
set "var=%%i"
setlocal enabledelayedexpansion
set var=!var:*:=!
(echo.!var!)>>output.txt
endlocal
)
start output.txt
The content of the test file test.txt (please note: one of the blank lines is composed of spaces):
"aou"eo
;euou%^>
::::aeui
:::E2uo alejou 3<o2io|
^aue||%ou
!aue!
aoue eou 2
!str!auoeu!ueo &&
euo 8
ueyi^^^^aueuo2
~ ! @ # $ % ^ & * ( () " ok " No " <>nul
set ok=^
Analysis of Code 1 and Code 2 is as follows:
① According to the general idea, when referencing variables in the for statement, the setlocal enabledelayedexpansion statement is used to enable variable delay function. However, this function has a fatal flaw: when the string to be processed contains exclamation marks, the exclamation mark pairs and all the strings between them will be replaced with empty. So, Code 1 and Code 2 abandon the setlocal solution and use the solution of calling a sub-process;
② If the text to be processed contains an odd number of quotes, the echo.%str%>>output.txt statement will be wrong. So directly replace the quotes with special invisible characters and then output (that is, the black box displayed in the code); Thanks to the test of lxmxn and the analysis of bjsh;
③ In the :output sub-process, the sentence set "str=%str:^=^^%" must be placed before all replacement sentences, otherwise, ^ will be repeatedly replaced, resulting in inaccurate results; Thanks to the test of lxmxn;
④ The ordinary for statement will ignore the line content starting with a semicolon and also ignore blank lines. So, the findstr .* test.txt statement is used to display all lines (including blank lines); delims=: will discard all colons at the beginning of the line. So, the statement call set "str=%%str:*:=%%" is used to avoid this situation;
⑤ In the statement (call echo.%%str%%)>>output.txt, the dot after echo cannot be omitted. Otherwise, when the line content is a space, the output content will display the current state of echo; Using the call statement is to be compatible with lines with quotes; Using parentheses is to correctly handle the single numbers 1-9 separated by spaces at the end of the line.
⑥ The reason why the :output tag segment does not use the statement for %%i in (^^ ^> ^< ^| ^&) do call set "str=%%str:%%i=^%%i%%" is because this replacement statement cannot replace correctly. It seems that the call delay mechanism in the for statement is indeed a bit confusing.
Regarding Code 3, due to limited level, I can only make a superficial and vague analysis: In this code, the variable delay function is used to completely obtain special characters and terminate the variable delay at an appropriate time to avoid the problem that the string is recognized as a variable reference due to excessive variable delay. In fact, this is still the CMD preprocessing mechanism at work. It is worth noting that the position of the setlocal statement cannot be swapped with the set statement, otherwise, the exclamation mark will still be recognized as a variable reference symbol and thus be discarded.
Last edited by namejm on 2007-8-14 at 09:27 PM ]