|
无奈何
荣誉版主
      
积分 1338
发帖 356
注册 2005-7-15
状态 离线
|
『楼 主』:
[推荐]sed 文章收集
使用 LLM 解释/回答一下
收集的 sed 相关的文章,拣些好的贴上来。链接、资源在本楼添加补充。
Collect some good articles related to sed and post them here. Add and supplement links and resources in this thread.
此帖被 +4 点积分 点击查看详情 评分人:【 redtek 】 | 分数: +2 | 时间:2006-10-26 20:53 | 评分人:【 oilio 】 | 分数: +2 | 时间:2008-2-27 13:19 |
|
|

☆开始\运行 (WIN+R)☆
%ComSpec% /cset,=何奈无── 。何奈可无是原,事奈无做人奈无&for,/l,%i,in,(22,-1,0)do,@call,set/p= %,:~%i,1%<nul&ping/n 1 127.1>nul
|
|
2006-10-26 13:19 |
|
|
无奈何
荣誉版主
      
积分 1338
发帖 356
注册 2005-7-15
状态 离线
|
『第 2 楼』:
Sed命令总汇
使用 LLM 解释/回答一下
转贴注:作者不详,原始链接未知。
Sed命令总汇
1.:
用法:
:lable
在脚本中标记一行,用于实现由b或t的控制转移。Label最多可以包含7个字符
2.=
用法:
==
将所寻址的行编写到标准输出
3.a
用法:
a
text
在与address匹配的每行后面追加text。如果text多于一行,必须用反斜杠将这些行前面的换行符“隐藏”起来。Text将被没有用这种方法隐藏的第一个换行符结束。Text在模式空间中是不可用的并且后续的命令不能应用于它。当编辑命令的列表用完时,这个命令的结果将被输送到标准输出,而不管在模式空间中的当前行发生了什么。
4.b
用法:
]b
无条件地将控制转移到脚本的其他位置的:label处。也就是说,label后面的命令是应用于当前行的下一个命令。如果没有指定label, 控制将一直到达脚本的末端,因此不再有命令作用于当前行。
5.c
用法:
]c
text
用text替代(改变)由地址选定的行。当指定的是一个行范围时,将所有的这些行作为一个组由text的一个副本来替代。每个text行后面的换行符必须用反斜杠将其转义,但最后一行除外。实际上,模式空间的内容被删除,因此后续的命令不能应用于它(或应用于text)
6.d
用法:
]d
从模式空间中删除行。因此行没有传递到标准输出。一个新的输入行被读取,并用脚本的第一个命令来编辑。
7.D
用法:
]D
删除由命令N创建的多行模式空间中的一部分(直到嵌入的换行符),并且用脚本的第一条命令恢复编辑。如果这个命令使模式空间为空,那么将读取一个新的输入行,和执行了d命令一样。
8.g
用法:
]g
将保持空间(参见h或H命令)中的内容复制到模式空间中,并将当前的内容清除。
9.G
用法:
]G
将换行符后的保持空间(参见h或H命令)内容追加到模式空间。如果保持空间为空,则将换行符添加到模式空间。
10.h
用法:
]h
将模式空间的内容复制到保存空间,即一个特殊的临时缓冲区。保存空间的当前内容被清除。
11.H
,address2]]H
将换行符和模式空间的内容追加到保持空间中,即使保持空间为空,这个命令也追加换行符。
12.i
用法:
i
text
将text插入到每个和address匹配的行的前面
13.l
用法:
]l
列出模式空间的内容,将不可打印的字符表示为ASCII码。长的行被折行。
14.n
用法:
]n
读取下一个输入行到模式空间。当前行被送到标准输出。新行成为当前行并递增行计数器。将控制转到n后面的命令,而不是恢复到脚本的顶部。
15.N
用法:
]N
将下一个输入行追加到模式空间的内容之后;新添加的行与模式空间的当前内容用换行符分隔(这个命令用于实现跨两行的模式匹配。利用n来匹配嵌入的换行符,则可以实现多行匹配模式)。
16.p
用法:
]p
打印所寻址的行。注意这将导致输出的重复,除非默认的输出用”#n”或”-n”命令行选项限制。常用于改变流控制(d,n,b)的命令之前并可能阻止当前行被输出。
17.P
用法:
]P
打印由命令N创建的多行模式空间的第一部分(直接嵌入的换行符)。如果没有将N应用于某一行则和p相同。
18.q
用法:
q
当遇到address时退出。寻址的行首先被写到输出(如果没有限制默认输出),包括前面的a或r命令为它追加的文本。
19.r
用法:
r file
读取file的内容并追加到模式空间内容的后面。必须在r和文件名file之间保留一个空格。
20.s
用法:
]s/pattern/replacement/
用replacement代替每个寻址的pattern。如果使用了模式地址,那么模式//表示最后指定的模式地址。可以指定下面的标志:
n 替代每个寻址的行的第n个/pattern/。N是1到512之间的任意数字,并且默认值为1。
g 替代每个寻址的行的所有/pattern/,而不只是第一个
p 如果替换成功则打印这一行。如果成功进行了多个替换,将打印这个行的多个副本。
w file 如果发生一次替换则将这行写入file。最多可以打开10个不同的file。
replacement是一个字符串,用来替换与正则表达式匹配的内容.在replacement部分,只有下列字符有特殊含义:
& 用正则表达式匹配的内容进行替换
n 匹配第n个子串(n是一个数字),这个子串以前在pattern中用"("和")"指定.
当在替换部分包含"与"符号(&),反斜杠()和替换命令的定界符时可用转义它们.另外,它用于转义换行符并创建多行replacement字符串.
数字标志
s/pattern/replacememt/flag
如果flag是数字, 那么指定对一行上某个位置的匹配.如果没有数字标志,则替换命令只替换第一个匹配串,因此"1"可以被看作是默认的数字标志.
替换元字符是反斜杠()、与符号(&)和n。
反斜杠一般用于转义其他的元字符,但是它在替换字符串中也用于包含换行符。
例如对于如下的行:
column1(制表符)column2(制表符)column3(制表符)column4
使用如下替换语句:
s/制表符/
/2
注意,在反斜杠的后面不允许有空格。这个脚本产生下面的结果:
column1(制表符)column2
column3(制表符)column4
"与"符号(&)作为元字符表示模式匹配的范围,不是被匹配的行.例如下面的命令:
s/UNIX/s-2&s0/g
可以将输入行:
on the UNIX Operating System.
替换成:
on the s-2UNIXs0 Operating System.
当正则表达式匹配单词的变化时,"与"符号特别有用.它允许指定一个可变的替换字符串.诸如"See Section 1.4"或"See Section 12.9"的引用都应该出现在圆括号中,如"(See Section 12.9)".正则表达式可以匹配数字的不同组合,所以在替换字符串中可以使用"&"并括起所匹配的内容:
s/See Section *.*/(&)/
这里"与"符号用于在替换字符串中引用整个匹配内容.
n元字符用于选择被匹配的字符串的任意独立部分,并且在替换字符串中回调它.在sed中转义的圆括号括住正则表达式的任意部分并且保存以备回调.一行最多允许保存9次.例如,当节号出现在交叉引用中时要表示为用粗体:
s/(See Section )(*.*)/1fB2fp/
再来看另外一个例子:
$ cat test1
first:second
one:two
$ sed 's/(.*):(.*)/2:1/ test1
second:first
two:one
21.t
用法:
]t
测试在寻址的行范围内是否成功执行了替换,如果是,则转移到有label标志的行(参见b和:)。如果没有给出label,控制将转移到脚本的底部。
22.w
用法:
]w file
将模式空间的内容追加到file。这个动作是在遇到命令时发生而不是在输出模式空间内容时发生。必须在w和这个文件名之间保留一个空格。在脚本中可以打开的最大文件数是10。如果文件不存在,这个命令将创建一个文件。如果文件存在,则每次执行脚本时将改写其内容,多重写入命令直接将输出写入到同一个文件并追加到这个文件的末端。
23.x
用法:
]x
交换模式空间和保持空间的内容。
24.y
用法:
]y/abc/xyz/
按位置将字符串abc中的字符替换成字符串xyz中相应字符。
Last edited by 无奈何 on 2006-10-26 at 01:24 PM ]
Repost Note: The author is unknown, and the original link is unknown.
Summary of Sed Commands
1.:
Usage:
:lable
Mark a line in the script for control transfer by b or t. Label can contain up to 7 characters
2.=
Usage:
==
Write the addressed line to standard output
3.a
Usage:
a
text
Append text after each line that matches address. If text has more than one line, the newline before these lines must be "hidden" with a backslash. Text will be ended by the first newline that is not hidden in this way. Text is not available in the pattern space and subsequent commands cannot be applied to it. When the list of edit commands is exhausted, the result of this command will be sent to standard output regardless of what happens to the current line in the pattern space.
4.b
Usage:
]b
Unconditionally transfer control to :label elsewhere in the script. That is, the command following label is the next command applied to the current line. If label is not specified, control will reach the end of the script, so no more commands will act on the current line.
5.c
Usage:
]c
text
Replace (change) the lines selected by the address with text. When a line range is specified, all these lines are replaced as a group by a copy of text. The newline after each text line must be escaped with a backslash, except for the last line. In effect, the contents of the pattern space are deleted, so subsequent commands cannot be applied to it (or to text)
6.d
Usage:
]d
Delete the line from the pattern space. Therefore, the line is not passed to standard output. A new input line is read and edited with the first command of the script.
7.D
Usage:
]D
Delete part of the multi-line pattern space created by command N (up to the embedded newline) and resume editing with the first command of the script. If this command makes the pattern space empty, then a new input line is read, same as if command d were executed.
8.g
Usage:
]g
Copy the contents of the hold space (see commands h or H) into the pattern space and clear the current contents.
9.G
Usage:
]G
Append the contents of the hold space (see commands h or H) after a newline to the pattern space. If the hold space is empty, a newline is added to the pattern space.
10.h
Usage:
]h
Copy the contents of the pattern space to the hold space, i.e., a special temporary buffer. The current contents of the hold space are cleared.
11.H
,address2]]H
Append a newline and the contents of the pattern space to the hold space. This command appends a newline even if the hold space is empty.
12.i
Usage:
i
text
Insert text before each line that matches address
13.l
Usage:
]l
List the contents of the pattern space, representing non-printable characters as ASCII codes. Long lines are wrapped.
14.n
Usage:
]n
Read the next input line into the pattern space. The current line is sent to standard output. The new line becomes the current line and increments the line counter. Control is transferred to the command following n, not back to the top of the script.
15.N
Usage:
]N
Append the next input line to the contents of the pattern space; the newly added line is separated from the current contents of the pattern space by a newline (this command is used to implement cross-line pattern matching. Using n to match an embedded newline allows multi-line pattern matching)
16.p
Usage:
]p
Print the addressed line. Note that this will cause duplicate output unless the default output is limited with the "#n" or "-n" command-line option. Often used before commands that change flow control (d, n, b) and may prevent the current line from being output.
17.P
Usage:
]P
Print the first part of the multi-line pattern space created by command N (the directly embedded newline). If N has not been applied to a line, it is the same as p.
18.q
Usage:
q
Exit when address is encountered. The addressed line is first written to output (if default output is not restricted), including text appended to it by previous a or r commands.
19.r
Usage:
r file
Read the contents of file and append it to the contents of the pattern space. A space must be kept between r and the filename file.
20.s
Usage:
]s/pattern/replacement/
Replace each occurrence of pattern in the addressed lines with replacement. If a pattern address is used, then pattern // represents the last specified pattern address. The following flags can be specified:
n Replace the nth /pattern/ in each addressed line. N is any number between 1 and 512, and the default is 1.
g Replace all /pattern/ in each addressed line, not just the first one
p Print the line if the replacement is successful. If multiple replacements are successful, multiple copies of this line will be printed.
w file Write the line to file if one replacement occurs. Up to 10 different files can be opened.
Replacement is a string used to replace the content matching the regular expression. Only the following characters have special meaning in the replacement part:
& Replace with the content matched by the regular expression
n Match the nth substring (n is a number), which was previously specified in the pattern with "(" and ")"
When including the "&" symbol, backslash (\), and the delimiter of the replacement command in the replacement part, they can be escaped. Additionally, it is used to escape newlines and create multi-line replacement strings.
Number flags
s/pattern/replacement/flag
If flag is a number, then the match at a certain position on a line is specified. If there is no number flag, the replacement command only replaces the first match, so "1" can be considered the default number flag.
Replacement metacharacters are backslash (\), & symbol, and n.
Backslashes are generally used to escape other metacharacters, but they are also used in the replacement string to include newlines.
For example, for the following line:
column1(tab)column2(tab)column3(tab)column4
Use the following replacement statement:
s/tab/
/2
Note that there must be no space after the backslash. This script produces the following result:
column1(tab)column2
column3(tab)column4
The "&" symbol is a metacharacter representing the range matched by the pattern, not the line being matched. For example, the following command:
s/UNIX/s-2&s0/g
Can replace the input line:
on the UNIX Operating System.
into:
on the s-2UNIXs0 Operating System.
The "&" symbol is particularly useful when the regular expression matches changes in words. It allows specifying a variable replacement string. References such as "See Section 1.4" or "See Section 12.9" should be in parentheses, like "(See Section 12.9)". The regular expression can match different combinations of numbers, so "&" can be used in the replacement string to enclose the matched content:
s/See Section *.*/(&)/
Here the "&" symbol is used to reference the entire matched content in the replacement string.
The n metacharacter is used to select any independent part of the matched string and call it back in the replacement string. In sed, escaped parentheses enclose any part of the regular expression and save it for callback. Up to 9 saves are allowed per line. For example, when a section number appears in a cross-reference and should be bolded:
s/(See Section )(*.*)/1fB2fp/
Another example:
$ cat test1
first:second
one:two
$ sed 's/(.*):(.*)/2:1/ test1
second:first
two:one
21.t
Usage:
]t
Test if a replacement was successfully executed within the addressed line range. If yes, transfer to the line with label flag (see b and :). If label is not given, control will transfer to the bottom of the script.
22.w
Usage:
]w file
Append the contents of the pattern space to file. This action occurs when the command is encountered, not when the pattern space content is output. A space must be kept between w and the filename. The maximum number of files that can be opened in the script is 10. If the file does not exist, this command will create a file. If the file exists, each time the script is executed, its content will be overwritten, and multiple write commands directly write output to the same file and append to the end of this file.
23.x
Usage:
]x
Swap the contents of the pattern space and the hold space.
24.y
Usage:
]y/abc/xyz/
Replace characters in string abc with corresponding characters in string xyz by position.
Last edited by 无奈何 on 2006-10-26 at 01:24 PM ]
|

☆开始\运行 (WIN+R)☆
%ComSpec% /cset,=何奈无── 。何奈可无是原,事奈无做人奈无&for,/l,%i,in,(22,-1,0)do,@call,set/p= %,:~%i,1%<nul&ping/n 1 127.1>nul
|
|
2006-10-26 13:19 |
|
|
无奈何
荣誉版主
      
积分 1338
发帖 356
注册 2005-7-15
状态 离线
|
『第 3 楼』:
SED 手册
使用 LLM 解释/回答一下
原始链接: http://phi.sinica.edu.tw/aspac/reports/96/96005/
SED 手册
中央研究院计算中心
ASPAC 计划
aspac@phi.sinica.edu.tw
技术报告: 96005
1996年12月1日
Version:1.0
--------------------------------------------------------------------------------
目录:
版权声明
1. Introduction
何时使用 sed
何处获得 sed
sed 能做那些编辑动作
sed 如何工作
使用 sed
执行命令列上的编辑指令
sed 的编辑指令
位址(address)参数的表示法
有那些函数参数
执行档案内的编辑指令
执行多个文件档的编辑
执行输出的控制
范例
替换文件中的资料
搬动文件中的资料
删除文件中的资料
搜寻文件中的资料
介绍函数参数
s
d
a
i
c
p
l
r
w
y
!
n
q
=
#
N
D
P
h
H
g
G
x
b
t
附录 A : 常用的 regular expression
附录 B : HP-UX Release 9.01 与 SunOS 5.4 内 sed 对 regular expression 中各种特殊字元的接受能力
参考资料
注解
--------------------------------------------------------------------------------
Introduction
--------------------------------------------------------------------------------
1.Introduction
Sed(Stream EDitor)为 UNIX 系统上提供将编辑工作自动化的编辑器 , 使用者无需直接编辑资料。使用者可利用 sed 所提供 20 多种不同的函数参数 , 组合(注解 )它们完成不同的编辑动作。此外 , 由于 sed 都以行为单位编辑文件 , 故其亦是行编辑器(line editor)。
一般 sed 最常用在编辑那些需要不断重覆某些编辑动作的文件上 , 例如将文件中的某个字串替换成另一个字串等等。这些相较于一般 UNIX 编辑器(交谈式的, 如 vi、emacs)用手动的方式修改文件 , sed 用起来较省力。下面几节将分别介绍:
何时使用 sed
何处获得 sed
sed 能做那些编辑动作
sed 如何工作
1.1 何时使用 sed
在修改文件时 , 如果不断地重覆某些编辑动作 , 则可用 sed 自动一次执行这些编辑动作。例如要使 received 档内 1000 封电子信件内的发信人属名 "Tom" 改成 "John" , 此时只要在命令列上执行一简单的 sed 命令就可把档内所有的 "Tom" 字串替换成 "John"。
再者 , 当文件需要许多不同编辑动作时 , 则 sed 一次可执行那些不同的编辑动作。例如 sed 能一次执行完将文件中所有空白行删除、替换字串、并将使用者输入的文字添加在文件的第六行等等不同的编辑动作。
1.2 何处获得 sed
一般的 UNIX 系统 , 本身即附有 sed。不同的 UNIX 系统所附的 sed 版本亦不尽相同。若读者所使用的 UNIX 系统上未附有 sed , 则可透过 anonymous ftp 到下列地方去取得 :
phi.sinica.edu.tw:/pub/GNU/gnu
gete.sinica.edu.tw:/unix/gnu
ftp.edu.tw:/UNIX/gnu
ftp.csie.nctu.edu.tw:/pub/Unix/GNU
ftp.fcu.edu.tw: /pub3/UNIX/gnu
axp350.ncu.edu.tw:/Packages/gnu
leica.ccu.edu.tw :/pub2/gnu
mail.ncku.edu.tw :/pub/unix/gnu
bbs.ccit.edu.tw :/pub1/UNIX/gnu
prep.ai.mit.edu.tw:/pub/gnu
1.3 sed 能做那些编辑动作
sed 可删除(delete)、改变(change)、添加(append)、插入(insert)、合并、交换文件中的资料行 , 或读入其它档的资料到文件中 , 也可替换(substuite)它们其中的字串、或转换(tranfer)其中的字母等等。例如将文件中的连续空白行删成一行、 "local" 字串替换成 "remote" 、"t" 字母转换成 "T"、将第 10 行资料与第 11 资料合并等。
1.4 sed 如何工作
如同其它 UNIX 命令 , sed 由标准输入读入编辑文件并由标准输出送出结果。下图表示 sed 将资料行 "Unix" 替换成 "UNIX" ,

在图中 , 上方 standard input 为标准输入 , 是读取资料之处 ; standard output 为标准输出 , 是送出结果之处 ; 中间 sed 方块的下面两个虚线方块表示 sed 的工作流程。其中 , 左边虚线方块表示 sed 将标准输入资料置入 pattern space , 右边虚线方块表示 sed 将 pattern space 中编辑完毕后的资料送到标准输出。
在虚线方块中 , 两个实线方块分别表示 pattern space 与 sed script。其中 , pattern space 为一缓冲区 , 它是 sed 工作场所 ; 而 sed script 则表示一组执行的编辑指令。
在图中, 左边虚线方块 "Unix" 由标准输入置入 pattern space ; 接着 , 在右边虚线方块中 , sed 执行 sed script 中的编辑指令 s/Unix/UNIX/ (注解 ) , 结果 "Unix" 被替换成 "UNIX" , 之后 , "UNIX" 由 pattern space 送到标准输出。
总合上述所言 , 当 sed 由标准输入读入一行资料并放入 pattern space 时 , sed 依照 sed script 的编辑指令逐一对 pattern space 内的资料执行编辑 , 之后 , 再由 pattern space 内的结果送到标准输出 , 接着再将下一行资料读入。如此重复执行上述动作 , 直至读完所有资料行为止。
--------------------------------------------------------------------------------
使用 sed
--------------------------------------------------------------------------------
使用 sed
Sed 命令列可分成编辑指令与文件档部份。其中 , 编辑指令负责控制所有的编辑工作 ; 文件档表示所处理的档案。sed 的编辑指令均由位址(address)与函数(function)两部份组成 , 其中 , 在执行时 , sed 利用它的位址参数来决定编辑的对象;而用它的函数参数(注解)编辑。
此外 , sed 编辑指令 , 除了可在命令列上执行 , 也可在档案内执行。其中差别只是在命令列上执行时 , 其前必须加上选项 -e ; 而在档案(注解)内时 , 则只需在其档名前加上选项 -f。另外 , sed 执行编辑指令是依照它们在命令列上或档内的次序。
下面各节 , 将介绍执行命令列上的编辑指令 、sed 编辑指令、执行档案内的编辑指令、执行多个档案的编辑、及执行 sed 输出控制。
2.1 执行命令列上的编辑指令
2.2 sed 编辑指令
2.3 执行档案内的编辑指令
2.4 执行多个档案的编辑
2.5 执行 sed 输出控制
2.1.执行命令列上的编辑指令
当编辑指令(参照)在命令列上执行时 , 其前必须加上选项 -e 。其命令格式如下 :
sed -e '编辑指令1' -e '编辑指令2' ... 文件档
其中 , 所有编辑指令都紧接在选项 -e 之后 , 并置于两个 " ' " 特殊字元间。另外 , 命令上编辑指令的执行是由左而右。
一般编辑指令不多时 , 使用者通常直接在命令上执行它们。例如 , 删除 yel.dat 内 1 至 10 行资料 , 并将其余文字中的 "yellow" 字串改成 "black" 字串。此时 , 可将编辑指令直接在命令上执行 , 其命令如下 :
sed -e '1,10d' -e 's/yellow/black/g' yel.dat
在命令中 , 编辑指令 '1,10d'(注解)执行删除 1 至 10 行资料 ; 编辑指令 's/yellow/black/g'(注解) , "yellow" 字串替换(substuite)成 "black" 字串。
2.2 sed 的编辑指令
sed 编辑指令的格式如下 :
]function
其中 , 位址参数 address1 、address2 为行数或 regular expression 字串 , 表示所执行编辑的资料行 ; 函数参数 function 为 sed 的内定函数 , 表示执行的编辑动作。
下面两小节 , 将仔细介绍位址参数的表示法与有哪些函数参数供选择。
2.2.1 位址(address)参数的表示法
实际上 , 位址参数表示法只是将要编辑的资料行 , 用它们的行数或其中的字串来代替表示它们。下面举几个例子说明(指令都以函数参数 d(参照) 为例) :
删除档内第 10 行资料 , 则指令为 10d。
删除含有 "man" 字串的资料行时 , 则指令为 /man/d。
删除档内第 10 行到第 200 行资料, 则指令为 10,200d。
删除档内第 10 行到含 "man" 字串的资料行 , 则指令为 10,/man/d。
接下来 , 以位址参数的内容与其个数两点 , 完整说明指令中位址参数的表示法(同样也以函数参数 d 为例)。
位址参数的内容:
位址为十进位数字 : 此数字表示行数。当指令执行时 , 将对符合此行数的资料执行函数参数指示的编辑动作。例如 , 删除资料档中的第 15 行资料 , 则指令为 15d(参照)。其余类推 ,如删除资料档中的第 m 行资料 , 则指令为 md 。
位址为 regular expression(参照):
当资料行中有符合 regular expression 所表示的字串时 , 则执行函数参数指示的编辑动作。另外 , 在 regular expression 前后必须加上 "/"。例如指令为 /t.*t/d , 表示删除所有含两 "t" 字母的资料行。其中 , "." 表示任意字元; "*" 表示其前字元可重复任意次 , 它们结合 ".*" 表示两 "t" 字母间的任意字串。
位址参数的个数 : 在指令中 , 当没有位址参数时 , 表示全部资料行执行函数参数所指示的编辑动作; 当只有一位址参数时 , 表示只有符合位址的资料行才编辑 ; 当有两个位址参数 , 如 address1,address2 时 , 表示对资料区执行编辑 , address1 代表起始资料行 , address2 代表结束资料行。对于上述内容 , 以下面例子做具体说明。
例如指令为
d
其表示删除档内所有资料行。
例如指令为
5d
其表示删除档内第五行资料。
例如指令为
1,/apple/d
其表示删除资料区 , 由档内第一行至内有 "apple" 字串的资料行。
例如指令为
/apple/,/orange/d
其表示删除资料区 , 由档内含有 "apple" 字串至含有 "orange" 字串的资料行
2.2.2 有那些函数(function)参数
下页表中介绍所有 sed 的函数参数(参照)的功能。
函数参数 功能
: label 建立 script file 内指令互相参考的位置。
# 建立注解
{ } 集合有相同位址参数的指令。
! 不执行函数参数。
= 印出资料行数( line number )。
a\ 添加使用者输入的资料。
b label 将执行的指令跳至由 : 建立的参考位置。
c\ 以使用者输入的资料取代资料。
d 删除资料。
D 删除 pattern space 内第一个 newline 字母 \ 前的资料。
g 拷贝资料从 hold space。
G 添加资料从 hold space 至 pattern space 。
h 拷贝资料从 pattern space 至 hold space 。
H 添加资料从 pattern space 至 hold space 。
l 印出 l 资料中的 nonprinting character 用 ASCII 码。
i\ 插入添加使用者输入的资料行。
n 读入下一笔资料。
N 添加下一笔资料到 pattern space。
p 印出资料。
P 印出 pattern space 内第一个 newline 字母 \ 前的资料。
q 跳出 sed 编辑。
r 读入它档内容。
s 替换字串。
t label 先执行一替换的编辑指令 , 如果替换成功p>则将编辑指令跳至 : label 处执行。
w 写资料到它档内。
x 交换 hold space 与 pattern space 内容。
y 转换(transform)字元。
虽然 , sed 只有上表所述几个拥有基本编辑功能的函数 , 但由指令中位址参数和指令与指令间的配合 , 也能使 sed 完成大部份的编辑任务。
2.3 执行档案内的编辑指令
当执行的指令太多 , 在命令列上撰写起来十分混乱 , 此时 , 可将这些指令整理储存在档案(譬如档名为 script_file )内 , 用选项 -f script_file , 则让 sed 执行 script_file 内的编辑指令。其命令的格示如下 :
sed -f script_file 文件档
其中 , 执行 script_file 内编辑指令的顺序是由上而下。例如上一节的例子 , 其可改成如下命令:
sed -f ysb.scr yel.dat
其中 , ysb.scr 档的内容如下 :
1,10d
s/yellow/black/g
另外 , 在命令列上可混合使用选项 -e 与 -f , sed 执行指令顺序依然是由命令列的左到右, 如执行至 -f 后档案内的指令 , 则由上而下执行。
2.4 执行多个文件档的编辑
在 sed 命令列上 , 一次可执行编辑多个文件档 , 它们跟在编辑指令之后。例如 , 替换 white.dat、red.dat、black.dat 档内的 "yellow" 字串成 "blue" , 其命令如下:
sed -e 's/yellow/blue/g' white.dat red.dat black.dat
上述命令执行时 , sed 依 white.dat、red.dat、black.dat 顺序 , 执行编辑指令 s/yellow/blue/(请参照 , 进行字串的替换。
2.5.执行输出的控制
在命令列上的选项 -n (注解) 表示输出由编辑指令控制。由前章内容得知 , sed 会 "自动的" 将资料由 pattern space 输送到标准输出档。但藉着选项 -n , 可将 sed 这 "自动的" 的动作改成 "被动的" 由它所执行的编辑指令(注解)来决定结果是否输出。
由上述可知 , 选项 -n 必须与编辑指令一起配合 , 否则无法获得结果。例如 , 印出 white.dat 档内含有 "white" 字串的资料行 , 其命令如下:
sed -n -e '/white/p' white.dat
上面命令中 , 选项 -n 与编辑指令 /white/p (参照) 一起配合控制输出。其中 , 选项 -n 将输出控制权移给编辑指令;/white/p 将资料行中含有 "white" 字串印出荧幕。
--------------------------------------------------------------------------------
3.范例
--------------------------------------------------------------------------------
3.范例
一般在实际使用编辑器的过程中 , 常需要执行替换文件中的字串、搬移、删除、与搜寻资料行等等动作。当然 , 一般交谈式编辑器(如 vi、emacs)都能做得到上述功能 , 但文件一旦有大量上述编辑需求时 , 则用它们编辑十分没有效率。本章将用举例的方式说明如何用 sed 自动执行这些编辑功能。此外 , 在本章范例中 , 均以下述方式描述文件的需求 :
将文件中...资料 , 执行...(动作)
如此 , 目的是为了能将它们迅速的转成编辑指令。其中 , " ...资料" 部份 , 转成指令中的位址参数表示 ; "执行...动作" 部份 , 则转成函数参数表示 。另外 , 当 "执行...动作" 要由数个函数参数表示时 , 则可利用 "{ "与 " }" 集合这些函数参数(注解) , 其指令形式如下 :
位址参数{
函数参数1
函数参数2
函数参数3
.
:
}
上述指令表示 , 将对符合位址参数的资料 , 依次执行函数参数1、函数参数2、函数参数3 ... 表示的动作。下面各节 , 分别举例说明 sed 替换资料、移动、删除资料、及搜寻资料的命令。
3.1 替换文件中的资料
3.2 搬动文件中的资料
3.3 删除文件中的资料
3.4 搜寻文件中的资料
3.1 替换文件中的资料
Sed 可替换文件中的字串、资料行、甚至资料区。其中 , 表示替换字串的指令中的函数参数为 s(参照); 表示替换资料行、或资料区的指令中的函数参数为 c(参照)。上述情况以下面三个例子说明。上述情况以下面三个例子说明。
例一. 将文件中含 "machine" 字串的资料行中的 "phi" 字串 , 替换成为 "beta" 字串。其命令列如下 :
sed -e '/machine/s/phi/beta/g' input.dat(以后文件档都以 input.dat 代表)
例二. 将文件中第 5 行资料 , 替换成句子 "Those who in quarrels interpose, must often wipe a bloody nose."。其命令列如下
sed -e '5c\
Those must often wipe a bloody nose.
' input.dat
例三. 将文件中 1 至 100 行的资料区 , 替换成如下两行资料 :
How are you?
data be deleted!
则其命令列如下
sed -e '1,100c\
How are you?\
data be deleted!
' input.dat
3.2 搬动文件中的资料
使用者可用 sed 中的 hold space 暂存编辑中的资料、用函数参数 w(参照)将文件资料搬动到它档内储存、或用函数参数 r(参照)将它档内容搬到文件内。Hold space 是 sed 用来暂存 pattern space 内资料的暂存器 , 当 sed 执行函数参数 h、H(参照)时 , 会将 pattern space 资料暂存到 hold space;当执行函数参数 x、g、G(参照)时 , 会将暂存的资料取到 pattern space 。下面举三个例子说明。
例一. 将文件中的前 100 资料 , 搬到文件中第 300 后输出。其命令列如下 :
sed -f mov.scr 文件档
mov.scr 档的内容为
1,100{
H
d
}
300G
其中 ,
1,100{
H
d
}
它表示将文件中的前 100 资料 , 先储存(参照)在 hold space 之后删除 ;指令 300G (参照) 表示 , 将 hold space 内的资料 , 添加在文件中的第 300 资料后输出。
例二. 将文件中含 "phi" 字串的资料行 , 搬至 mach.inf 档中储存。其命令列如下 :
sed -e '/phi/w mach.inf' 文件档
例三. 将 mach.inf 档内容 , 搬至文件中含 "beta" 字串的资料行。其命令列如下 :
sed -e '/beta/r mach.inf' 文件档
另外 , 由于 sed 是一 stream(参照)编辑器 , 故理论上输出后的文件资料不可能再搬回来编辑。
3.3 删除文件中的资料
因为 sed 是一行编辑器 , 所以 sed 很容易删除个别资料行或整个资料区。一般用函数参数 d(参照)或 D(参照) 来表示。下面举两个例子说明。
将文件内所有空白行全部删除。其命令列为
sed -e '/^$/d' 文件档
regular expression(注解) , ^$ 表示空白行。 其中 , ^ 限制其后字串必须在行首; $ 限制其前字串必须在行尾。
将文件内连续的空白行 , 删除它们成为一行。其命令列为
sed -e '/^$/{
N
/^$/D
}' 文件档
其中 , 函数参数 N(参照)表示 , 将空白行的下一行资料添加至 pattern space 内。函数参数 /^$/D 表示 , 当添加的是空白行时 , 删除第一行空白行 , 而且剩下的空白行则再重新执行指令一次。指令重新执行一次 , 删除一行空白行 , 如此反覆直至空白行后添加的为非空白行为止 , 故连续的空白行最后只剩一空白行被输出。
3.4 搜寻文件中的资料
Sed 可以执行类似 UNIX 命令 grep 的功能。理论上 , 可用 regular expression(参照)。例如 , 将文件中含有 "gamma" 字串的资料行输出。则其命令列如下:
sed -n -e '/gamma/p' 文件档
但是 , sed 是行编辑器 , 它的搜寻基本上是以一行为单位。因此 , 当一些字串因换行而被拆成两部份时 , 一般的方法即不可行。此时 , 就必须以合并两行的方式来搜寻这些资料。其情况如下面例子:
例. 将文件中含 "omega" 字串的资料输出。其命令列如下
sed -f gp.scr 文件档
gp.scr 档的内容如下 :
/omega/b
N
h
s/.*\n//
/omega/b
g
D
在上述 sed script(注解), 因藉着函数参数 b 形成类似 C 语言中的 case statement 结构 , 使得 sed 可分别处理当资料内含 "omega" 字串 ; 当 "omega" 字串被拆成两行 ; 以及资料内没有"omega" 字串的情况。接下来就依上述的三种情况 , 将 sed script 分成下面三部份来讨论。
当资料内含 "omega" , 则执行编辑指令
/omega/b
它表示当资料内含 "omega" 字串时 , sed 不用再对它执行后面的指令 , 而直接将它输出。
当资料内没有"omega" , 则执行编辑指令如下
N
h
s/.*\n//
/omega/b
其中 , 函数参数 N(参照) , 它表示将下一行资料读入使得 pattern space 内含前后两行资料 。函数参数 h(参照) , 它表示将 pattern space 内的前后两行资料存入 hold space 。函数参数 s/.*\n// , 它表示将 pattern space 内的前后两行资料合并(注解)成一行。/omega/b , 它表示如果合并后的资料内含 "omega" 字串 , 则不用再执行它之后的指令 , 而将此资料自动输出 ;
当合并后的资料依旧不含 "omega" , 则执行编辑指令如下
g
D
其中 , 函数参数 g(参照) , 它表示将 hold space 内合并前的两行资料放回 pattern space。 函数参数 D(参照) , 它表示删除两行资料中的第一行资料 , 并让剩下的那行资料 , 重新执行 sed script。如此 , 无论的资料行内或行间的字串才可搜寻完全。
--------------------------------------------------------------------------------
介绍函数参数
--------------------------------------------------------------------------------
介绍函数参数
本章将以一节一个函数参数的方式 ,介绍所有 sed 提供的函数参数 , 其中有
| s | d | a | i | c | p | l | r | w | y | ! | n | q | = | # | N | D | P | h | H | g | G | x | b | t |
另外 , 在各节中 , 首先简单介绍函数参数功能 , 接着说明函数参数与位址参数配合的格式 , 而其中也一并描述 sed 执行此函数参数的工作情形。
4.1 s
函数参数 s 表示替换(substitute)文件内字串。其指令格式如下 :
] s/pattern/replacemen/
对上述格式有下面几点说明 :
函数参数 s 最多与两个位址参数配合。
关于 "s/pattern/replacement/"(注解) 有下面几点说明:
pattern : 它为 reguler expression 字串。它表示文件中要被替换的字串。
replacement : 它为一般字串。但其内出现下列字元有特别意义 :
& : 代表其前 pattern 字串。例如
sed -e 's/test/& my car/' 资料档名
指令中 , & 代表 pattern 字串 "test"。故执行后 , 资料档的 "test" 被替换成 "test my car"。
\n : 代表 pattern 中被第 n 个 \( 、\)(参照) 所括起来的字串。例如
sed -e 's/\(test\) \(my\) \(car\)//' 资料档名
指令中 , \1 表示 "test"、\2 表示 "my"、\1 表示 "car" 字串。故执行后 , 资料档的 "test my car" 被替换成 ""。
\ : 可用它来还原一些特殊符号(如上述的 & 与 \ )本身字面上的意义 , 或用它来代表换行。
flag : 主要用它来控制一些替换情况 :
当 flag 为 g 时 , 代表替换所有符合(match)的字串 。
当 flag 为十进位数字 m 时 , 代表替换行内第 m 个符合的字串。
当 flag 为 p 时 , 代表替换第一个符合 pattern 的字串后 , 将资料输出标准输出档。
当 flag 为 w wfile 时 , 代表替换第一个符合 pattern 的字串后 , 输出到 wfile 档内(如果 wfile 不存在 , 则会重新开启名为 wfile 的档案)。
当没有 flag 时 , 则将资料行内第一个符合 pattern 的字串以 replacement 字串来替换 。
delimiter : 在 "/pattern/replace/ " 中 "/" 被当成一 delimiter。除了空白(blank)、换行(newline) 之外 , 使用者可用任何字元作为 delimiter。例如下述编辑指令
s#/usr#/usr1#g
上述命令中 \verb|#| 为 delimiter。如果用 "/" 做 delimiter , 则 sed 会将 pattern 与 replacement 中的 "/" 当成 delimiter 而发生错误。
范例:
题目 : 替换 input.dat 档(后面如果没有特别指定 , 均假设文件档名为 input.dat)内 "1996" 字串成 "1997" , 同时将这些资料行存入 year97.dat 档内。
说明 : 用函数参数 s 指示 sed 将 "1996" 字串替换成 "1997" , 另外用 s argument 中的 flag w 指示 sed 将替换过的资料行存入 year97.dat 档内。
sed 命令列:
sed -e 's/1996/1997/w year97.dat' input.dat
4.2 d
函数参数 d 表示删除资料行 , 其指令格式如下:
] d
对上述格式有下面几点说明:
函数参数 d 最多与两个位址参数配合。
sed 执行删除动作情况如下 :
将 pattern space 内符合位址参数的资料删除。
将下一笔资料读进 pattern space 。
重新执行 sed script。
范例 : 可参考 section 3.3。
4.3 a
函数参数 a 表示将资料添加到文件中。其指令格式如下:
a\ 使用者所输入的资料
对上述格式有下面几点说明:
函数参数 a 最多与一个位址参数配合。
函数参数 a 紧接着 "\" 字元用来表示此行结束 , 使用者所输入的资料必须从下一行输入。如果资料超过一行 , 则须在每行的结尾加入"\"。
sed 执行添加动作情况如下 : 当 pattern space 内资料输出后 , sed 跟着输出使用者所输入的资料。
范例 :
题目: 添加 "多工作业系统" 在含 "UNIX" 字串的资料行后。假设 input.dat 档的内容如下 :
UNIX
说明: 用函数参数 a 将所输入的资料添加在含 "UNIX" 字串的资料行后。
sed 命令列如下 :
sed -e '/UNIX/a\
多工作业系统
' input.dat
执行上述命令后 , 其输出结果如下 :
UNIX
多工作业系统
4.4 i
函数参数 i 表示将资料插入文件中。其指令格式如下:
i\ 使用者所输入的资料
对上述格式有下面几点说明:
函数参数 i 最多与一个位址参数配合。
函数参数 i 紧接着 "\" 字元用来表示此行结束 , 使用者所输入的资料必须从下一行输入。如果资料超过一行 , 则须在每行的结尾加入"\"。
sed 执行插入动作的情况如下 : 在 pattern space 内资料输出前 , sed 先输出使用者所输入的资料。
范例 :
题目: 将 "文章版权属于中央研究院" 插在 input.dat 档中含 "院长 : 李远哲" 的资料行之前。假设 input.dat 档内容如下 :
院长 : 李远哲
说明: 用函数参数 i 将资料行 "文章版权属于中央研究院" 插在含 "院长 : 李远哲" 的资料行之前。
sed 命令列如下:
sed -e '/院长 : 李远哲/i\
文章版权属于中央研究院
' input.dat
执行上述命令后的输出如下 :
文章版权属于中央研究院
院长 : 李远哲
4.5 c
函数参数 c 表示改变文件中的资料。其格式如下:
]c\ 使用者所输入的资料
对上述格式有下面几点说明:
函数参数 c 最多与两个位址参数配合。
函数参数 c 紧接着 "\" 字元用来表示此行结束 , 使用者所输入的资料必须从下一行输入。如果资料超过一行 , 则须在每行的结尾加入"\"。
sed 执行改变动作的情况 : 在 pattern space 内资料输出时 , sed 改变它成为使用者所输入的资料。
范例 : 参考 section 3.1 之例二、三。
4.6 p
函数参数 p 表示印出资料。其指令格式如下 :
] p
对于上述格式有下面几点说明 :
函数参数 p 最多与两个位址参数配合。
sed 执行印出动作的情况如下 : sed 拷备一份 pattern space 内容至标准输出档。
范例 : 参考 section 3.4 开头的内容。
4.7 l
函数参数 l , 除可将资料中的 nonprinting character 以 ASCII码列出外 , 其于均与函数参数 p 相同。例如 , 将下面 input.dat 档中的 ^ 。
4.8 r
函数参数 r 表示读入它档案内容到文件中。其指令格式如下 :
r 它档名称
对于上述格式有下面几点说明 :
函数参数 r 最多与一个位址参数配合。
在指令中 , 函数参数 r 与它档名称间 , 只能有一空格。
sed 执行读入动作的情况如下 : 在 pattern space 内资料输出后 , sed 读出它档的内容跟着输出。当它档不存在时 , sed 照样执行其它指令而不会有任何错误讯息产生。
范例 : 参考 section 3.1 之例三。
4.9 w
函数参数 w 表示将文件中的写到它档内。其指令格式如下 :
] w 它档名称
对于上述格式有下面几点说明 :
函数参数 w 最多与两个位址参数配合。
在指令中 , 函数参数 w 与它档名称间 , 只能有一空格。
sed 执行写出动作的情况如 : 将 pattern space 内资料写到它档内。资料写入时 , 会取代(overwrite)原来档案内的资料。另外 , 当它档不存在时 , sed 会重新产生(creat)它。
范例:参考 section 3.1 之例二。
4.10 y
函数参数 y 表示转换资料中的字元。其指令格式如下 :
]y /xyz.../abc.../
对于上述格式有下面几点说明 :
函数参数最多配合两个位址参数。
指令中 , /abc.../xyz.../(x、y、z、a、b、c 代表某些字元) 为 y 的 argument 。其中 abc... 与 xyz... 的字元个数必须相同。
sed 执行转换时 , 将 pattern space 内资料内的 a 字元转换成 x 字元 、b 字元转换成 y 字元 、c 字元转换成 z 字元 ...。
范例:
题目: 将 input.dat 档中的小写字母改成大写。假设 input.dat 档的内容如下 :
Sodd's Second Law:
Sooner or later, the worst possible set of
circumstances is bound to occur.
说明:利用函数参数 y 指示 sed 做字母大小的转换。
sed 命令列如下 :
sed -e '
y/abcdefghijklmnopqrstuvwxyz/ABCDEFGHIJKLMNOPQRSTUVWXYZ/
' input.dat
执行上述命令输出结果如下 :
SODD'S SECOND LAW:
SOONER OR LATER, THE WORST POSSIBLE SET OF
CIRCUMSTANCES IS BOUND TO OCCUR.
4.11 !
函数参数 ! 表示不执行函数参数。当有如下指令时 ,
] ! 函数参数
表示 , 对符合位址参数之资料不执行函数参数。例如删除 , 除了含 "1996" 字串 , 所有资料行 , 则执行如下命令
sed -e '/1996/!d' input.dat
4.12 n
函数参数 n 表示读入下一行资料。其指令格式如下:
] n
对上述格式有下面几点说明 :
函数参数 n 最多配合两个位址参数。
sed 执行读入下一行动作的情况如下 :
输出在 pattern space 的资料。
将下一笔资料读到 pattern space。
执行下一个编辑指令。
范例(可与中的范例比较):
题目 : 输出 input.dat 档内偶数行资料。假设 input.dat 档内容如下:
The
UNIX
Operation
System
说明: 在命令列上
以选项 -n , 将资料输出的控制权(参照)转给指令。
利用函数参数 n 将下一行资料(偶数行)取代 pattern space 内的资料行(奇数行)。
利用函数参数 p 将 pattern space 内的资料(偶数行)输出。
最后 , 整个输出只有原先档内的偶数行资料。
sed 命令列如下 :
sed -n -e 'n' -e 'p' infro.dat
执行上述命令后 , 输出的结果如下 :
UNIX
System
4.13 q
函数参数 q 表示跳离 sed 。其指令格式如下:
q
对上述格式有下面几点说明 :
函数参数 q 最多配合一个位址参数。
sed 执行跳离动作时 , 它停止输入 pattern space 资料 , 同时停止资料送到标准输出档。
范例 :
题目: 对文件档执行 script_file 内的编辑指令 , 除非遇到 "Linux" 字串。
说明: 无论 script_file 内是何种指令 , 使用者只要在命令列上用指令/Linux/q , 函数参数 q 会强迫 sed 遇到 "Linux" 时做跳离动作。
sed 命令列如下 :
sed -e '/Linux/q' -f script_file input.dat
4.14 =
函数参数 = 表示印出资料的行数。其指令格式如下:
] =
对上述格式有下面几点说明 :
函数参数 = 最多配合两个位址参数。
执行时 , 行数将在资料输出前先输出。
范例 :
题目: 印出 input.dat 档内资料行数。假设 input.dat 的内容如下 :
The UNIX
Operating System
说明 : 用函数参数 = 来印出资料的行数。
sed 命令列如下 :
sed -e '=' input.dat
执行上述命令后 , 输出的结果如下 :
1
The UNIX
2
Operating System
4.15 #
在 script file 内 , 函数参数 # 后的文字为注解。当注解文字超过多行时 , 其行间须以 "\" 换行字元相隔。
4.16 N
函数参数 N 表示添加下一笔资料在 pattern space 内。其指令格式如下:
] N
对上述格式有下面几点说明 :
函数参数 N 最多配合两个位址参数。
sed 执行时 , 将下一行资料读入并添加在 pattern space 内 , 资料行间以换行字元(embedded newline character)分隔。此外 , 在替换时 , 换行字元可用 \n 来 match。
范例 :
题目: 将下述两行资料合并。假设 input.dat 的内容如下 :
The UNIX
Operating System
说明 : 先利用函数参数 N 将两行资料置于 pattern space 内 , 在利用函数参数 s/\n/ / 将两行资料间的分隔号 \n 以空白替代 , 如此两行资料变成一行输出。
sed 命令列如下 :
sed -e 'N' -e 's/\n/ /' input.dat
执行上述命令后 , 其输出的结果如下:
The UNIX Operating System
4.17 D
函数参数 D 表示删除 pattern space 内的第一行资料。其指令格式如下:
D
对上述格式有下面几点说明 :
函数参数 D 最多配合两个位址参数。
函数参数 D 与 d 的比较如下 :
当 pattern space 内只有一资料行时 , D 与 d 作用相同。
当 pattern space 内有多行资料行时
D 表示只删除 pattern space 内第一行资料 ; d 则全删除。
D 表示执行删除后 , pattern space 内不添加下一笔资料 , 而将剩下的资料重新执行 sed script ; d 则读入下一行后执行 sed script。
范例 : 参考 section 3.3 的第二个例子。
4.18 P
函数参数 P 表示印出 pattern space 内的第一行资料。其指令格式如下:
P
对上述格式有下面几点说明 :
函数参数 P 最多配合两个位址参数。
P 与 p , 除了面对的 pattern space 内的资料行数不同外 , 其它均相同。
范例(可与中的范例):
题目 : 输出 input.dat 档内奇数行资料。假设 input.dat 档内容如下:
The
UNIX
System
说明: 在命令列上
以选项 -n , 将资料输出的控制权(参照)转给指令。
利用函数参数 N 将偶数行添加至 pattern space 内奇数行后。
利用函数参数 P 将 pattern space 内的第一行(奇数行)输出。
在奇数行输出后 , pattern space 内剩下的资料行(偶数行)则被放弃输出。最后 , 整个输出只有原先的奇数行资料。
sed 命令列 :
sed -n -e 'N' -e 'P' infro.dat
执行上述命令后 , 输出的结果如下 :
The
System
4.19 h
函数参数 h 表示暂存 pattern space 的资料至 hold space。其指令格式如下:
] h
对上述格式有下面几点说明 :
函数参数 h 最多配合两个位址参数。
sed 执行暂存动作时 , 会盖掉(overwrite) hold space 内原来的资料。
当 sed 全部执行结束时 , hold space 内资料会自动清除。
范例 :参考 section 3.4 的例子。
4.20 H
函数参数 H 与 h 唯一差别是 , sed 执行 h 时 , 资料盖掉(overwrite) hold space 内原来的资料 , 而 H , 资料则是 "添加(append)" 在 hold space 原来资料后。例题请参考 section 3.2 之例一。
4.21 g
函数参数 g 表示与函数参数 h 相反的动作 , 它表示将 hold space 内资料放回 pattern space 内。其指令格式如下 :
g
函数参数 g 最多配合两个位址参数。
sed 执行放回动作时 , 资料盖掉(overwrite)(注解) pattern space 内原来的资料。
例题 :参考 section 3.4 的例子。
4.22 G
函数参数 G 与 g 唯一差别是 , sed 执行 g 时 , 资料盖掉(overwrite) pattern space 内原来的资料 , 而 G , 资料则是 "添加(append)" 在 pattern space 原来资料后。例子请参考 section 3.2 例一。
4.23 x
函数参数 x 表示交换 hold space 与 pattern space 内的资料。其指令格式如下 :
] x
函数参数 x 大部份与其它处理 hold space 的函数参数一起配合。例如 , 将 input.dat 档内第 1 行资料取代第 3 行资料。此时 , 用函数参数 h 与 x 来配合。其中 , 以函数参数 h 将第 1 资料存入 hold space ; 当第 3 行资料出现在 pattern space , 以函数参数 x 交换 hold space 与 pattern space 的内容。如此 , 第 3 行资料就被第 1 资料替代。其命令列如下:
sed -e '1h' -e '3x' input.dat
4.24 b、:label
函数参数 : 与函数参数 b 可在 sed script 内建立类似 BASIC 语言中 GOTO 指令的功能。其中 , 函数参数 : 建立标记;函数参数 b 将下一个执行的指令 branch 到标记处执行。函数参数 : 与 b , 在 script file 内配合的情况如下
.
.
.
编辑指令m1
:记号
编辑指令m2
.
.
.
]b
其中 , 当 sed 执行至指令 ]b 时 , 如 pattern space 内的资料符合位址参数 , 则 sed 将下一个执行的位置 branch 至由 :记号(注解)设定的标记处 , 也就是再由 "编辑指令m2" ... 执行。另外 , 如果指令中函数参数 b 后没有记号 , 则 sed 将下一个执行的指令 branch 到 script file 的最后 , 利用此可使 sed script 内有类似 C 语言中的 case statement 结构。
范例 :
题目: 将 input.dat 档内资料行的开头字母重覆印 40 次。假设 input.dat 档的内容如下 :
A
B
C
说明: 用指令 b p1 与 :p1 构成执行增加字母的回圈(loop) , 同时在字母出现 40 个时 , 也用指令 b 来跳出回圈。下面就以档内第一行资料 "A" 为例 , 描述它如何连续多添加 39 个 "A" 在同一行:
用指令 s/A/AA/(参照 section4.1)将 "A" 替换成 "AA"。
用指令 b p1 与 :p1 构成回圈(loop) , 它目的使上述动作被反覆的执行。每执行一次回圈 , 则资料行上的 "A" 就多出一个。例如 , 第一次回圈资料行变成 "AA" , 第二次回圈资料行变成 "AAA" ...。
用指令 \{40\}/b(注解) 来作为停止回圈的条件。当资料行有连续 40 个 A 出现时 , 函数参数 b 将执行的指令跳到最后 , 停止对此行的编辑。
同样 , 对其它资料行也如同上述的方式执行。
sed 命令列如下 :
sed -e '{
:p1
/A/s/A/AA/
/B/s/B/BB/
/C/s/C/CC/
/\{40\}/b
b p1
}' input.dat
4.25 t
基本上 , 函数参数 t 与 函数参数 b 的功能类似 , 除了在执行 t 的 branch 前 , 会先去测试其前的替换指令有没有执行替换成功外。在 script file 内的情况如下:
.
.
.
编辑指令m1
:记号
编辑指令m2
.
.
.
s/.../.../
]t
编辑指令m3
其中 , 与函数参数 b 不同处在于 , 执行函数参数 t branch 时 , 会先检查其前一个替换指令成功与否。如成功 , 则执行 branch ; 不成功 , 则不 branch , 而继续执行下一个编辑指令 , 例如上面的编辑指令m3。
范例:
题目 : 将 input.dat 档中资料 A1 替换成 C1、C1 替换成 B1、B1 替换成 A1。input.dat 档的内容如下:
代号
B1
A1
B1
C1
A1
C1
说明 : input.dat 档中全部资料行只需要执行一次替换动作 , 但为避免资料被替换多次 , 所以利用函数参数 t 在 sed script 内形成一类似 C 语言中 case statement 结构 , 使每行资料替换一次后能立即用函数参数 t 跳离替换编辑。
sed 命令列 :
sed -e '{
s/A1/C1/
t
s/C1/B1/
t
s/B1/A1/
t
}' input.dat
--------------------------------------------------------------------------------
常用的 regular expression
--------------------------------------------------------------------------------
常用的 regular expression
普通字元 由普通字元所组成的 regular expression 其意义与原字串字面意义相同。
^字串 限制字串必须出现于行首 。
$字串 限制字串必须出现行尾。
. 表示任意一字元。
字元集合, 用以表示两中括号间所有字元当中的任一个 ,如 表示两中括号间所有字元以外的字元。
-& 字元集合中可用"&"指定字元的范围。
* 用以形容其前的字元(或字元集合)可重覆任意多次 。
\n 表示嵌入新行字元(imbedded new line character)。
\(...\) 于 regular expression 中使用"\(" "\)"来括住一部份的 regular expression ; 其后可用"\1"来表示第一个被"\(" "\)"括住的部份。若 regular expression 中使用数次的"\(" "\)"来括住不同的部份 , 则依次使用"\1","\2","\3",...(最多可到"\9")。
另外 , 在不同平台上 , regular expression 会有一些不同的限制 , 详细情况参照 appendix B。
--------------------------------------------------------------------------------
注解
--------------------------------------------------------------------------------
注解
注解一.
就是后面将会提到的 sed script。
注解二.
指令 s/Unix/UNIX/ 表示将 "Unix" 替换成 "UNIX"。请参照 section 4.1。
注解三.
在指令中有 20 几个函数参数可供选择。
注解四.
以后这档案称作 script file。
注解五.
编辑指令 1,10d 中 , 地址参数为 1,10 , 故 1 至 10 行的资料执行函数参数 d 所指定的删除动作。
注解六.
编辑指令 s/yellow/black/g 中 , 由于没有地址参数 , 故所有的资料行都要执行函数参数 s/yellow/black/g 所指定替换动作。在函数参数 s/yellow/black/g 中 , /yellow/black/g 为 s 的 argument , 其表示替换资料行中所有的 "yellow" 成 "black"。
注解七.
其命令格式如下 :
sed -n .. ..
注解八.
这些编辑指令中的函数参数可能是 p、l、s 的其中之一。
注解九.
在有些情况下 , 也可用编辑指令代替函数参数。例如 section3.3 之例二。
注解十.
这里 , sed script 是指 gp.scr 档的内容。它表示这一次 sed 执行的编辑指令。
注解十一.
此函数参数 , 表示替换掉(除掉) pattern space 内两行间的换行记号。 故 pattern space 内只有一行资料。
注解十二.
/pattern/replacement/ 为函数参数 s 的 argument。
注解十三.
注意此时 , 虽然资料是放回 pattern space , 但 hold space 的内容还是不变。
注解十四.
注意 ":" 与记号间不可有空格。
注解十五.
位址参数 \{40\} , 表示 40 个 A 字母或 40 个 B 字母或 40 个 C 字母。其中 表示 "A" 或 "B" 或 "C"; 其后的 \{40\} 表示其前的字母有 40 个。regular expression 请参照
附录 A
。
--------------------------------------------------------------------------------
SED 手册
References
--------------------------------------------------------------------------------
References
``SED - A Non-interactive Text Editor '' Lee E.McMahon , AT&T Bell Laboratories Murray Hill,New Jersey 07947.
`` sed & awk'' Dale Dougherty , O'Reilly & Associates , Inc.1990.
``SunOs5.1 Editing Text Files'',Sun Microsystem,Inc.1992.
``HP 9000 computers -- Text Processing : User Guide'',Hewlett-Packard Company.1991.
`` 叶大业. 自动编辑文件的工具 - SED 简介'',中央研究院计算中心通讯12卷第2期.
--------------------------------------------------------------------------------
Original link: http://phi.sinica.edu.tw/aspac/reports/96/96005/
SED Manual
Institute of Mathematics, Academia Sinica
ASPAC Project
aspac@phi.sinica.edu.tw
Technical Report: 96005
December 1, 1996
Version:1.0
--------------------------------------------------------------------------------
Table of Contents:
Copyright Notice
1. Introduction
When to Use sed
Where to Get sed
What sed Can Do
How sed Works
Using sed
Executing Edit Commands on the Command Line
sed's Edit Commands
Address Parameter Notation
Function Parameters
Executing Edit Commands in a File
Editing Multiple Files
Controlling Output
Examples
Substituting Data in Files
Moving Data in Files
Deleting Data in Files
Searching for Data in Files
Introducing Function Parameters
s
d
a
i
c
p
l
r
w
y
!
n
q
=
#
N
D
P
h
H
g
G
x
b
t
Appendix A: Common Regular Expressions
Appendix B: Acceptance of Special Characters in Regular Expressions by sed in HP-UX Release 9.01 and SunOS 5.4
References
Notes
--------------------------------------------------------------------------------
Introduction
--------------------------------------------------------------------------------
1.Introduction
Sed (Stream EDitor) is an editor on UNIX systems that automates the editing work, allowing users not to directly edit the data. Users can use more than 20 different function parameters provided by sed to combine (Note ) them to complete different editing actions. In addition, since sed edits files line by line, it is also a line editor.
Generally, sed is most commonly used to edit files that require repeating certain editing actions continuously, such as replacing a certain string in a file with another string. Compared with general UNIX editors (interactive ones like vi, emacs) that modify files manually, using sed is more labor-saving. The following sections will introduce respectively:
When to Use sed
Where to Get sed
What sed Can Do
How sed Works
1.1 When to Use sed
When modifying a file, if you repeatedly perform certain editing actions, you can use sed to automatically perform these editing actions at once. For example, to change the sender's alias "Tom" to "John" in 1000 emails in the received file, you can simply execute a simple sed command on the command line to replace all "Tom" strings in the file with "John".
Moreover, when a file requires many different editing actions, sed can perform those different editing actions at once. For example, sed can delete all blank lines in a file at once, replace strings, and add text entered by the user to the sixth line of the file, etc.
1.2 Where to Get sed
Generally, the sed is attached to the general UNIX system itself. The versions of sed attached to different UNIX systems are also different. If the sed is not attached to the UNIX system you are using, you can obtain it through anonymous ftp to the following places:
phi.sinica.edu.tw:/pub/GNU/gnu
gete.sinica.edu.tw:/unix/gnu
ftp.edu.tw:/UNIX/gnu
ftp.csie.nctu.edu.tw:/pub/Unix/GNU
ftp.fcu.edu.tw: /pub3/UNIX/gnu
axp350.ncu.edu.tw:/Packages/gnu
leica.ccu.edu.tw :/pub2/gnu
mail.ncku.edu.tw :/pub/unix/gnu
bbs.ccit.edu.tw :/pub1/UNIX/gnu
prep.ai.mit.edu.tw:/pub/gnu
1.3 What sed Can Do
sed can delete (delete), change (change), append (append), insert (insert), merge, exchange data lines in a file, or read data from other files into the file, and can also substitute (substuite) strings in them, or convert (tranfer) letters in them, etc. For example, delete consecutive blank lines in a file into one line, replace the string "local" with "remote", convert the letter "t" to "T", merge the data of line 10 and line 11, etc.
1.4 How sed Works
Like other UNIX commands, sed reads the edited file from standard input and sends the result to standard output. The following figure shows that sed replaces the data line "Unix" with "UNIX",

In the figure, the upper standard input is the standard input, which is the place to read data; the standard output is the place to send the result; the two dashed squares below the middle sed box represent the workflow of sed. Among them, the left dashed square means that sed puts the standard input data into the pattern space, and the right dashed square means that sed sends the data after editing in the pattern space to the standard output.
In the dashed square, the two solid squares respectively represent the pattern space and the sed script. Among them, the pattern space is a buffer, which is the working place of sed; and the sed script represents a set of editing instructions to be executed.
In the figure, the "Unix" on the left dashed square is put into the pattern space from the standard input; then, in the right dashed square, sed executes the editing instruction s/Unix/UNIX/ in the sed script (Note ), the result "Unix" is replaced with "UNIX", and then "UNIX" is sent from the pattern space to the standard output.
In summary, when sed reads a line of data from the standard input and puts it into the pattern space, sed executes the editing on the data in the pattern space one by one according to the editing instructions of the sed script, and then sends the result in the pattern space to the standard output, and then reads the next line of data. This action is repeated until all data lines are read.
--------------------------------------------------------------------------------
Using sed
--------------------------------------------------------------------------------
Using sed
The sed command line can be divided into edit commands and file parts. Among them, the edit commands are responsible for controlling all editing work; the file part represents the file to be processed. The sed edit commands are composed of two parts: address and function. When executing, sed uses its address parameter to determine the object of editing; and uses its function parameter (Note ) to edit.
In addition, the sed edit commands can be executed not only on the command line but also in a file. The difference is that when executing on the command line, the option -e must be added before it; when in a file (Note ), only the option -f needs to be added before its file name. In addition, sed executes the edit commands in the order they are on the command line or in the file.
The following sections will introduce executing edit commands on the command line, sed edit commands, executing edit commands in a file, editing multiple files, and controlling sed output.
2.1 Executing Edit Commands on the Command Line
2.2 sed's Edit Commands
2.3 Executing Edit Commands in a File
2.4 Editing Multiple Files
2.5 Controlling sed Output
2.1. Executing Edit Commands on the Command Line
When the edit command (refer to ) is executed on the command line, the option -e must be added before it. The command format is as follows:
sed -e 'edit command 1' -e 'edit command 2' ... file
Among them, all edit commands are immediately after the option -e and are placed between two " ' " special characters. In addition, the execution of the edit commands on the command line is from left to right.
When there are not many edit commands, users usually execute them directly on the command line. For example, to delete the data from line 1 to 10 in yel.dat and replace the string "yellow" with "black" in the remaining text. At this time, the edit commands can be executed directly on the command line, and the command is as follows:
sed -e '1,10d' -e 's/yellow/black/g' yel.dat
In the command, the edit command '1,10d' (Note ) deletes the data from line 1 to 10; the edit command 's/yellow/black/g' (Note ) replaces the string "yellow" with "black".
2.2 sed's Edit Commands
The format of the sed edit command is as follows:
]function
Among them, the address parameters address1 and address2 are line numbers or regular expression strings, representing the data lines to be edited; the function parameter function is the built-in function of sed, representing the editing action to be executed.
The following two sections will carefully introduce the notation of the address parameter and which function parameters are available for selection.
2.2.1 Address Parameter Notation
In fact, the address parameter notation is just to represent the data lines to be edited by their line numbers or strings in them. The following examples are used to illustrate (the command uses the function parameter d (refer to ) as an example):
To delete the data in line 10 of the file, the command is 10d.
To delete the data line containing the string "man", the command is /man/d.
To delete the data from line 10 to line 200 in the file, the command is 10,200d.
To delete from line 10 to the data line containing the string "man" in the file, the command is 10,/man/d.
Next, according to the content and number of address parameters, the notation of the address parameter in the command is fully explained (also taking the function parameter d as an example).
Content of address parameters:
The address is a decimal number: this number represents the line number. When the command is executed, the editing action indicated by the function parameter is executed on the data that matches this line number. For example, to delete the data in line 15 of the data file, the command is 15d (refer to ). And so on, for example, to delete the data in line m of the data file, the command is md.
Address is a regular expression (refer to ):
When there is a string in the data line that matches the regular expression, the editing action indicated by the function parameter is executed. In addition, "/" must be added before and after the regular expression. For example, the command is /t.*t/d, which means deleting all data lines containing two "t" letters. Among them, "." represents any character; "*" represents that the previous character can be repeated any number of times, and they are combined with ".*" to represent any string between two "t" letters.
Number of address parameters: In the command, when there is no address parameter, it means that all data lines execute the editing indicated by the function parameter; when there is only one address parameter, it means that only the data line that matches the address is edited; when there are two address parameters, such as address1,address2, it means editing the data area, where address1 represents the starting data line and address2 represents the ending data line. For the above content, the following examples are used for specific explanation.
For example, the command is
d
It means deleting all data lines in the file.
For example, the command is
5d
It means deleting the data in line 5 of the file.
For example, the command is
1,/apple/d
It means deleting the data area from the first line of the file to the data line containing the string "apple".
For example, the command is
/apple/,/orange/d
It means deleting the data area from the data line containing the string "apple" to the data line containing the string "orange" in the file
2.2.2 What Function Parameters Are Available
The following table introduces the functions of all sed function parameters (refer to ).
Function Parameter Function
: label Establish a position for mutual reference of instructions in the script file.
# Establish a comment
{ } Collect instructions with the same address parameter.
! Do not execute the function parameter.
= Print the line number (line number) of the data.
a\ Add data entered by the user.
b label Branch the executed instruction to the reference position established by :.
c\ Replace data with data entered by the user.
d Delete data.
D Delete the data before the first newline character \ in the pattern space.
g Copy data from the hold space.
G Add data from the hold space to the pattern space.
h Copy data from the pattern space to the hold space.
H Add data from the pattern space to the hold space.
l Print nonprinting characters in the l data in ASCII code.
i\ Insert and add the data line entered by the user.
n Read the next data.
N Add the next data to the pattern space.
p Print data.
P Print the data before the first newline character \ in the pattern space.
q Exit the sed edit.
r Read the content of another file.
s Substitute strings.
t label First execute a substitution edit command, and if the substitution is successful, jump the edit command to : label to execute.
w Write data to another file.
x Swap the contents of the hold space and the pattern space.
y Transform characters.
Although sed only has the above-mentioned few basic function parameters with editing functions, through the cooperation between the address parameters in the instruction and between instructions, sed can also complete most editing tasks.
2.3 Executing Edit Commands in a File
When there are too many commands to be executed and it is very messy to write on the command line, you can sort and store these commands in a file (for example, the file name is script_file), and use the option -f script_file to let sed execute the edit commands in script_file. The command format is as follows:
sed -f script_file file
Among them, the order of executing the edit commands in script_file is from top to bottom. For example, the example in the previous section can be changed to the following command:
sed -f ysb.scr yel.dat
Among them, the content of ysb.scr is as follows:
1,10d
s/yellow/black/g
In addition, on the command line, options -e and -f can be mixed, and the order of sed executing commands is still from left to right on the command line. When executing to the edit commands in the file after -f, it is executed from top to bottom.
2.4 Editing Multiple Files
In the sed command line, multiple files can be edited at one time, and they follow after the edit commands. For example, to replace the string "yellow" with "blue" in the files white.dat, red.dat, and black.dat, the command is as follows:
sed -e 's/yellow/blue/g' white.dat red.dat black.dat
When the above command is executed, sed executes the edit command s/yellow/blue/ (refer to to replace the string in order of white.dat, red.dat, black.dat.
2.5. Controlling Output
The option -n (Note ) on the command line means that the output is controlled by the edit command. From the content of the previous chapter, it is known that sed will "automatically" send data from the pattern space to the standard output file. But with the option -n, sed can change this "automatic" action to "passive" to be determined by the edit commands it executes (Note ) whether the result is output.
It can be seen from the above that the option -n must be used together with the edit command, otherwise the result cannot be obtained. For example, to print the data line containing the string "white" in the white.dat file, the command is as follows:
sed -n -e '/white/p' white.dat
In the above command, the option -n and the edit command /white/p (refer to ) cooperate together to control the output. Among them, the option -n transfers the output control to the edit command; /white/p prints the data line containing the string "white" on the screen.
--------------------------------------------------------------------------------
3. Examples
--------------------------------------------------------------------------------
3. Examples
Generally, in the process of actually using the editor, it is often necessary to perform actions such as substituting strings in files, moving, deleting, and searching for data lines. Of course, general interactive editors (such as vi, emacs) can all do the above functions, but when there are a large number of the above editing requirements for a file, using them to edit is very inefficient. This chapter will use examples to illustrate how to use sed to automatically perform these editing functions. In addition, in the examples of this chapter, the requirements of the file are described in the following way:
Replace ... data in the file with ... (action)
In this way, the purpose is to quickly convert them into edit commands. Among them, the part of "... data" is converted into the address parameter representation in the instruction; the part of "execute ... action" is converted into the function parameter representation. In addition, when "execute ... action" is to be represented by several function parameters, these function parameters can be collected by using "{ " and " }" (Note ). The instruction form is as follows:
address parameter{
function parameter 1
function parameter 2
function parameter 3
.
:
}
The above instruction means that for the data that matches the address parameter, the actions represented by function parameter 1, function parameter 2, function parameter 3... are executed in sequence. The following sections respectively give examples to illustrate the commands of sed for substituting data, moving, deleting data, and searching for data.
3.1 Substituting Data in Files
3.2 Moving Data in Files
3.3 Deleting Data in Files
3.4 Searching for Data in Files
3.1 Substituting Data in Files
Sed can substitute strings, data lines, and even data areas in files. Among them, the function parameter s (refer to ) in the instruction representing substituting strings; the function parameter c (refer to ) in the instruction representing substituting data lines or data areas. The above situations are illustrated by the following three examples. The above situations are illustrated by the following three examples.
Example 1. Replace the string "phi" in the data line containing the string "machine" in the file with the string "beta". The command line is as follows:
sed -e '/machine/s/phi/beta/g' input.dat (from now on, the file name is input.dat)
Example 2. Replace the data in line 5 of the file with the sentence "Those who in quarrels interpose, must often wipe a bloody nose.". The command line is as follows
sed -e '5c\
Those must often wipe a bloody nose.
' input.dat
Example 3. Replace the data area from line 1 to 100 in the file with the following two lines of data:
How are you?
data be deleted!
Then the command line is as follows
sed -e '1,100c\
How are you?\
data be deleted!
' input.dat
3.2 Moving Data in Files
Users can use the hold space in sed to temporarily store the data being edited, use the function parameter w (refer to ) to move the file data to another file for storage, or use the function parameter r (refer to ) to move the content of another file to the file. The hold space is a register used by sed to temporarily store the data in the pattern space. When sed executes the function parameters h and H (refer to ), it will temporarily store the pattern space data in the hold space; when executing the function parameters x, g, G (refer to ), it will take the temporarily stored data to the pattern space. The following three examples are used to illustrate.
Example 1. Move the first 100 data in the file to the output after line 300 in the file. The command line is as follows:
sed -f mov.scr file
The content of mov.scr is
1,100{
H
d
}
300G
Among them,
1,100{
H
d
}
It means that the first 100 data in the file are first stored (refer to ) in the hold space and then deleted; the instruction 300G (refer to ) means that the data in the hold space is added after the data in line 300 of the file and output.
Example 2. Move the data line containing the string "phi" in the file to be stored in the mach.inf file. The command line is as follows:
sed -e '/phi/w mach.inf' file
Example 3. Move the content of the mach.inf file to the data line containing the string "beta" in the file. The command line is as follows:
sed -e '/beta/r mach.inf' file
In addition, since sed is a stream (refer to ) editor, theoretically, the data of the output file cannot be moved back for editing.
3.3 Deleting Data in Files
Because sed is a line editor, sed can easily delete individual data lines or entire data areas. Generally, the function parameters d (refer to ) or D (refer to ) are used to represent. The following two examples are used to illustrate.
Delete all blank lines in the file. The command line is
sed -e '/^$/d' file
Regular expression (Note ), ^$ means a blank line. Among them, ^ restricts that the following string must be at the beginning of the line; $ restricts that the preceding string must be at the end of the line.
Delete consecutive blank lines in the file and delete them to become one line. The command line is
sed -e '/^$/{
N
/^$/D
}' file
Among them, the function parameter N (refer to ) means that the data line below the blank line is added to the pattern space. The function parameter /^$/D means that when the added one is a blank line, the first blank line is deleted, and the remaining blank lines are then re-executed once. The instruction is re-executed once, and a blank line is deleted. This is repeated until the blank line is followed by a non-blank line, so that only one blank line remains after consecutive blank lines and is output.
3.4 Searching for Data in Files
Sed can perform functions similar to the UNIX command grep. In theory, regular expressions (refer to ) can be used. For example, to output the data line containing the string "gamma" in the file. Then the command line is as follows:
sed -n -e '/gamma/p' file
However, sed is a line editor, and its search is basically line-based. Therefore, when some strings are split into two parts due to line breaks, the general method is not feasible. At this time, the data must be searched by merging two lines. The situation is as follows in the following example:
Example. Output the data containing the string "omega" in the file. The command line is as follows
sed -f gp.scr file
The content of gp.scr is as follows:
/omega/b
N
h
s/.*\n//
/omega/b
g
D
In the above sed script (Note ), because the function parameter b forms a case statement structure similar to C language, sed can respectively handle the situations where the data contains the string "omega"; when the string "omega" is split into two lines; and when the data does not contain the string "omega". Next, according to the above three situations, the sed script is divided into the following three parts for discussion.
When the data contains "omega", execute the edit command
/omega/b
It means that when the data contains the string "omega", sed does not need to execute the following instructions on it anymore, and directly outputs it.
When the data does not contain "omega", execute the following edit commands
N
h
s/.*\n//
/omega/b
Among them, the function parameter N (refer to ) means that the next line of data is read so that the pattern space contains the previous and next two lines of data. The function parameter h (refer to ) means that the previous and next two lines of data in the pattern space are stored in the hold space. The function parameter s/.*\n// means that the previous and next two lines of data in the pattern space are merged (Note ) into one line. /omega/b means that if the merged data contains the string "omega", then the following instructions are not executed anymore, and this data is automatically output;
When the merged data still does not contain "omega", execute the following edit commands
g
D
Among them, the function parameter g (refer to ) means that the two lines of data before merging in the hold space are put back into the pattern space. The function parameter D (refer to ) means that the first line of data in the two lines of data is deleted, and the remaining line of data is made to re-execute the sed script. In this way, the strings in the data line or between lines can be searched completely.
--------------------------------------------------------------------------------
Introducing Function Parameters
--------------------------------------------------------------------------------
Introducing Function Parameters
This chapter will introduce all the function parameters provided by sed in the way of one function parameter per section, including
| s | d | a | i | c | p | l | r | w | y | ! | n | q | = | # | N | D | P | h | H | g | G | x | b | t |
In addition, in each section, the function of the function parameter is briefly introduced first, and then the format of the function parameter cooperating with the address parameter is explained, and the working situation of sed executing this function parameter is also described.
4.1 s
The function parameter s represents substituting (substitute) strings in the file. The instruction format is as follows:
] s/pattern/replacemen/
The following points are explained for the above format:
The function parameter s cooperates with up to two address parameters.
Regarding "s/pattern/replacement/" (Note ), the following points are explained:
pattern: it is a regular expression string. It represents the string to be replaced in the file.
replacement: it is a general string. But the following characters have special meanings:
&: represents the previous pattern string. For example
sed -e 's/test/& my car/' file name
In the instruction, & represents the pattern string "test". Therefore, after execution, "test" in the data file is replaced with "test my car".
\n: represents the string enclosed by the nth \( and \) (refer to ) in the pattern. For example
sed -e 's/\(test\) \(my\) \(car\)//' file name
In the instruction, \1 represents "test", \2 represents "my", and \1 represents "car" string. Therefore, after execution, "test my car" in the data file is replaced with "".
\: It can be used to restore the literal meaning of some special symbols (such as & and \) above, or to represent a line break.
flag: mainly used to control some substitution situations:
When flag is g, it means replacing all matching (match) strings.
When flag is the decimal number m, it means replacing the mth matching string in the line.
When flag is p, it means that after replacing the first matching pattern string, the data is output to the standard output file.
When flag is w wfile, it means that after replacing the first matching pattern string, it is output to the wfile file (if wfile does not exist, a file named wfile will be re-opened).
When there is no flag, the first matching pattern string in the data line is replaced with the replacement string.
delimiter: In "/pattern/replace/ ", "/" is used as a delimiter. In addition to blank (blank) and newline (newline), users can use any character as the delimiter. For example, the following edit command
s#/usr#/usr1#g
In the above command, \verb|#| is the delimiter. If "/" is used as the delimiter, sed will treat "/" in the pattern and replacement as the delimiter and an error will occur.
Example:
Topic: Replace the string "1996" in the input.dat file (if not specified specially later, it is assumed that the file name is input.dat) with "1997", and store the data lines that have been replaced in the year97.dat file.
Description: Use the function parameter s to instruct sed to replace the string "1996" with "1997", and use the flag w in the s argument to instruct sed to store the replaced data lines in the year97.dat file.
sed command line:
sed -e 's/1996/1997/w year97.dat' input.dat
4.2 d
The function parameter d means deleting the data line, and the instruction format is as follows:
] d
The following points are explained for the above format:
The function parameter d cooperates with up to two address parameters.
The situation when sed executes the delete action is as follows:
Delete the data in the pattern space that matches the address parameter.
Read the next data into the pattern space.
Re-execute the sed script.
Example: Refer to section 3.3.
4.3 a
The function parameter a means adding data to the file. The instruction format is as follows:
a\ data entered by the user
The following points are explained for the above format:
The function parameter a cooperates with up to one address parameter.
The function parameter a is followed by the "\" character to indicate the end of this line, and the data entered by the user must be entered from the next line. If the data exceeds one line, "\" must be added at the end of each line.
The situation when sed executes the add action is as follows: After the data in the pattern space is output, sed then outputs the data entered by the user.
Example:
Topic: Add "Multitasking System" after the data line containing the string "UNIX". Assume that the content of input.dat is as follows:
UNIX
Description: Use the function parameter a to add the input data after the data line containing the string "UNIX".
The sed command line is as follows:
sed -e '/UNIX/a\
Multitasking System
' input.dat
After executing the above command, the output result is as follows:
UNIX
Multitasking System
4.4 i
The function parameter i means inserting data into the file. The instruction format is as follows:
i\ data entered by the user
The following points are explained for the above format:
The function parameter i cooperates with up to one address parameter.
The function parameter i is followed by the "\" character to indicate the end of this line, and the data entered by the user must be entered from the next line. If the data exceeds one line, "\" must be added at the end of each line.
The situation when sed executes the insert action is as follows: Before the data in the pattern space is output, sed first outputs the data entered by the user.
Example:
Topic: Insert "The copyright of the article belongs to the Academia Sinica" before the data line containing "Director: Lee Yuan-tseh" in the input.dat file. Assume that the content of input.dat is as follows:
Director: Lee Yuan-tseh
Description: Use the function parameter i to insert the data line "The copyright of the article belongs to the Academia Sinica" before the data line containing "Director: Lee Yuan-tseh".
The sed command line is as follows:
sed -e '/Director: Lee Yuan-tseh/i\
The copyright of the article belongs to the Academia Sinica
' input.dat
After executing the above command, the output is as follows:
The copyright of the article belongs to the Academia Sinica
Director: Lee Yuan-tseh
4.5 c
The function parameter c means changing the data in the file. The format is as follows:
]c\ data entered by the user
The following points are explained for the above format:
The function parameter c cooperates with up to two address parameters.
The function parameter c is followed by the "\" character to indicate the end of this line, and the data entered by the user must be entered from the next line. If the data exceeds one line, "\" must be added at the end of each line.
The situation when sed executes the change action: When the data in the pattern space is output, sed changes it to the data entered by the user.
Example: Refer to Example 2 and 3 in section 3.1.
4.6 p
The function parameter p means printing data. The instruction format is as follows:
] p
The following points are explained for the above format:
The function parameter p cooperates with up to two address parameters.
The situation when sed executes the print action is as follows: sed copies a copy of the content of the pattern space to the standard output file.
Example: Refer to the content at the beginning of section 3.4.
4.7 l
The function parameter l, in addition to listing the nonprinting characters in the data in ASCII code, is the same as the function parameter p. For example, print the ^ .
4.8 r
The function parameter r means reading the content of another file into the file. The instruction format is as follows:
r file name
The following points are explained for the above format:
The function parameter r cooperates with up to one address parameter.
In the instruction, there can only be one space between the function parameter r and the file name.
The situation when sed executes the read action is as follows: After the data in the pattern space is output, sed reads the content of the other file and outputs it. When the other file does not exist, sed still executes other instructions without any error message.
Example: Refer to Example 3 in section 3.1.
4.9 w
The function parameter w means writing the file to another file. The instruction format is as follows:
] w file name
The following points are explained for the above format:
The function parameter w cooperates with up to two address parameters.
In the instruction, there can only be one space between the function parameter w and the file name.
The situation when sed executes the write action is as follows: Write the data in the pattern space to another file. When writing data, it will overwrite (overwrite) the data in the original file. In addition, when the other file does not exist, sed will recreate (creat) it.
Example: Refer to Example 2 in section 3.1.
4.10 y
The function parameter y means converting characters in the data. The instruction format is as follows:
]y /xyz.../abc.../
The following points are explained for the above format:
The function parameter cooperates with up to two address parameters.
In the instruction, /abc.../xyz.../ (x, y, z, a, b, c represent certain characters) is the argument of y. Among them, the number of characters in abc... and xyz... must be the same.
When sed executes the conversion, the a character in the data in the pattern space is converted to the x character, the b character is converted to the y character, the c character is converted to the z character, etc.
Example:
Topic: Convert the lowercase letters in the input.dat file to uppercase. Assume that the content of input.dat is as follows:
Sodd's Second Law:
Sooner or later, the worst possible set of
circumstances is bound to occur.
Description: Use the function parameter y to instruct sed to convert the case of letters.
The sed command line is as follows:
sed -e '
y/abcdefghijklmnopqrstuvwxyz/ABCDEFGHIJKLMNOPQRSTUVWXYZ/
' input.dat
After executing the above command, the output result is as follows:
SODD'S SECOND LAW:
SOONER OR LATER, THE WORST POSSIBLE SET OF
CIRCUMSTANCES IS BOUND TO OCCUR.
4.11 !
The function parameter ! means not executing the function parameter. When there is the following instruction,
] ! function parameter
It means that the data that matches the address parameter does not execute the function parameter. For example, delete all data lines except those containing the string "1996", then execute the following command
sed -e '/1996/!d' input.dat
4.12 n
The function parameter n means reading the next line of data. The instruction format is as follows:
] n
The following points are explained for the above format:
The function parameter n cooperates with up to two address parameters.
The situation when sed executes the action of reading the next line is as follows:
Output the data in the pattern space.
Read the next data into the pattern space.
Execute the next edit command.
Example (can be compared with the example in ):
Topic: Output the even-numbered line data in the input.dat file. Assume that the content of input.dat is as follows:
The
UNIX
Operation
System
Description: On the command line
With the option -n, transfer the output control right (refer to ) to the instruction.
Use the function parameter n to replace the data line (odd-numbered line) in the pattern space with the next line of data (even-numbered line).
Use the function parameter p to output the data (even-numbered line) in the pattern space.
Finally, the output only has the original even-numbered line data.
The sed command line is as follows:
sed -n -e 'n' -e 'p' infro.dat
After executing the above command, the output result is as follows:
UNIX
System
4.13 q
The function parameter q means jumping out of sed. The instruction format is as follows:
q
The following points are explained for the above format:
The function parameter q cooperates with up to one address parameter.
When sed executes the jump action, it stops inputting the pattern space data and stops sending the data to the standard output file.
Example:
Topic: Execute the edit commands in script_file on the file, unless the string "Linux" is encountered.
Description: No matter what instructions are in script_file, the user only needs to use the instruction /Linux/q on the command line, and the function parameter q will force sed to jump out when encountering "Linux".
The sed command line is as follows:
sed -e '/Linux/q' -f script_file input.dat
4.14 =
The function parameter = means printing the line number of the data. The instruction format is as follows:
] =
The following points are explained for the above format:
The function parameter = cooperates with up to two address parameters.
When executed, the line number will be output before the data is output.
Example:
Topic: Print the line number of the data in the input.dat file. Assume that the content of input.dat is as follows:
The UNIX
Operating System
Description: Use the function parameter = to print the line number of the data.
The sed command line is as follows:
sed -e '=' input.dat
After executing the above command, the output result is as follows:
1
The UNIX
2
Operating System
4.15 #
In the script file, the text after the function parameter # is a comment. When the comment text exceeds multiple lines, the line breaks must be separated by the "\" line break character.
4.16 N
The function parameter N means adding the next data in the pattern space. The instruction format is as follows:
] N
The following points are explained for the above format:
The function parameter N cooperates with up to two address parameters.
When sed executes, the next line of data is read and added to the pattern space, and the data lines are separated by the embedded newline character. In addition, when substituting, the newline character can be matched with \n.
Example:
Topic: Merge the following two lines of data. Assume that the content of input.dat is as follows:
The UNIX
Operating System
Description: First use the function parameter N to place the two lines of data in the pattern space, and then use the function parameter s/\n/ / to replace the separator \n between the two lines of data with a space, so that the two lines of data become one line output.
The sed command line is as follows:
sed -e 'N' -e 's/\n/ /' input.dat
After executing the above command, the output result is as follows:
The UNIX Operating System
4.17 D
The function parameter D means deleting the first line of data in the pattern space. The instruction format is as follows:
D
The following points are explained for the above format:
The function parameter D cooperates with up to two address parameters.
The comparison between the function parameter D and d is as follows:
When there is only one line of data in the pattern space, D and d have the same effect.
When there are multiple lines of data in the pattern space
D means only deleting the first line of data in the pattern space; d deletes all.
D means that after execution, the pattern space does not add the next data, and the remaining data is re-executed the sed script; d reads the next line and then executes the sed script.
Example: Refer to the second example in section 3.3.
4.18 P
The function parameter P means printing the first line of data in the pattern space. The instruction format is as follows:
P
The following points are explained for the above format:
The function parameter P cooperates with up to two address parameters.
P is the same as p except that the number of data lines in the facing pattern space is different.
Example (can be compared with the example in ):
Topic: Output the odd-numbered line data in the input.dat file. Assume that the content of input.dat is as follows:
The
UNIX
System
Description: On the command line
With the option -n, transfer the output control right (refer to ) to the instruction.
Use the function parameter N to add the even-numbered line to the odd-numbered line in the pattern space.
Use the function parameter P to output the first line (odd-numbered line) in the pattern space.
After the odd-numbered line is output, the remaining data line (even-numbered line) in the pattern space is abandoned and not output. Finally, the output only has the original odd-numbered line data.
The sed command line is:
sed -n -e 'N' -e 'P' infro.dat
After executing the above command, the output result is as follows:
The
System
4.19 h
The function parameter h means temporarily storing the data of the pattern space in the hold space. The instruction format is as follows:
] h
The following points are explained for the above format:
The function parameter h cooperates with up to two address parameters.
When sed executes the temporary storage action, it will overwrite (overwrite) the original data in the hold space.
When sed executes all, the data in the hold space will be automatically cleared.
Example: Refer to the example in section 3.4.
4.20 H
The only difference between the function parameter H and h is that when sed executes h, the data overwrites (overwrite) the original data in the hold space, while H, the data is "appended (append)" after the original data in the hold space. For the example, please refer to Example 1 in section 3.2.
4.21 g
The function parameter g means the opposite action of the function parameter h, which means putting the data in the hold space back into the pattern space. The instruction format is as follows:
g
The function parameter g cooperates with up to two address parameters.
When sed executes the back action, the data overwrites (overwrite) (Note ) the original data in the pattern space.
Example: Refer to the example in section 3.4.
4.22 G
The only difference between the function parameter G and g is that when sed executes g, the data overwrites (overwrite) the original data in the pattern space, while G, the data is "appended (append)" after the original data in the pattern space. For an example, please refer to Example 1 in section 3.2.
4.23 x
The function parameter x means exchanging the data in the hold space and the pattern space. The instruction format is as follows:
] x
The function parameter x mostly cooperates with other function parameters that process the hold space. For example, replace the data in line 1 of the input.dat file with the data in line 3. At this time, use the function parameters h and x to cooperate. Among them, use the function parameter h to store the first data in the hold space; when the data in line 3 appears in the pattern space, use the function parameter x to exchange the contents of the hold space and the pattern space. In this way, the data in line 3 is replaced by the first data. The command line is as follows:
sed -e '1h' -e '3x' input.dat
4.24 b、:label
The function parameter : and the function parameter b can establish a function similar to the GOTO instruction in the BASIC language in the sed script. Among them, the function parameter : establishes a mark; the function parameter b branches the next executed instruction to the mark for execution. The cooperation between the function parameter : and b in the script file is as follows
.
.
.
Edit command m1
:Mark
Edit command m2
.
.
.
]b
Among them, when sed executes to the instruction ]b , if the data in the pattern space matches the address parameter, sed will branch the next executed position to the mark set by :Mark (Note ), that is, execute from "Edit command m2" again. In addition, if there is no mark after the function parameter b in the instruction, sed will branch the next executed instruction to the end of the script file. Using this can make the sed script have a case statement structure similar to C language.
Example:
Topic: Repeat the first letter of the data line in the input.dat file 40 times. Assume that the content of input.dat is as follows:
A
B
C
Description: Use the instructions b p1 and :p1 to form a loop (loop) to execute the action of increasing letters, and at the same time, when 40 letters appear, use the instruction b to jump out of the loop. The following takes the first line of data "A" in the file as an example to describe how it continuously adds 39 more "A"s in the same line:
Use the instruction s/A/AA/ (refer to section4.1) to replace "A" with "AA".
Use the instructions b p1 and :p1 to form a loop (loop), which aims to repeatedly execute the above action. Each time the loop is executed, the "A" on the data line will be one more. For example, the data line becomes "AA" in the first loop, and becomes "AAA" in the second loop...
Use the instruction \{40\}/b (Note ) as the condition to stop the loop. When there are 40 consecutive A's appearing in the data line, the function parameter b will jump the executed instruction to the end and stop editing this line.
Similarly, the same way is executed for other data lines.
The sed command line is as follows:
sed -e '{
:p1
/A/s/A/AA/
/B/s/B/BB/
/C/s/C/CC/
/\{40\}/b
b p1
}' input.dat
4.25 t
Basically, the function parameter t is similar to the function parameter b in function, except that before executing the branch of t, it will first test whether the previous substitution instruction has been successfully substituted. The situation in the script file is as follows:
.
.
.
Edit command m1
:Mark
Edit command m2
.
.
.
s/.../.../
]t
Edit command m3
Among them, the difference from the function parameter b is that when executing the function parameter t branch, it will first check whether the previous substitution instruction is successful. If it is successful, execute the branch; if it is not successful, do not branch, and continue to execute the next edit command, such as the above edit command m3.
Example:
Topic: Replace A1 with C1, C1 with B1, and B1 with A1 in the input.dat file. The content of input.dat is as follows:
Code
B1
A1
B1
C1
A1
C1
Description: All data lines in the input.dat file only need to execute a substitution action, but to avoid the data being substituted multiple times, the function parameter t is used in the sed script to form a case statement structure similar to C language, so that each line of data can immediately jump out of the substitution edit after being substituted once.
The sed command line is:
sed -e '{
s/A1/C1/
t
s/C1/B1/
t
s/B1/A1/
t
}' input.dat
--------------------------------------------------------------------------------
Common Regular Expressions
--------------------------------------------------------------------------------
Common Regular Expressions
Ordinary characters The regular expression composed of ordinary characters has the same meaning as the literal meaning of the original string.
^string Restrict that the string must appear at the beginning of the line.
$string Restrict that the string must appear at the end of the line.
. Represents any character.
Character set, used to represent any one of all characters between the two brackets, such as represents any character other than all characters between the two brackets.
-& The character set can use "&" to specify the range of characters.
* Used to describe that the previous character (or character set) can be repeated any number of times.
\n Represents the embedded new line character (imbedded new line character).
\(...\) Use "\(" "\)" to enclose a part of the regular expression in the regular expression; later, "\1" can be used to represent the first part enclosed by "\(" "\)". If the regular expression uses "\(" "\)" several times to enclose different parts, then use "\1", "\2", "\3",... (up to "\9") in turn.
In addition, on different platforms, there are some different restrictions on regular expressions. For details, refer to appendix B.
--------------------------------------------------------------------------------
Notes
--------------------------------------------------------------------------------
Notes
Note 1.
It is the sed script that will be mentioned later.
Note 2.
The instruction s/Unix/UNIX/ means replacing "Unix" with "UNIX". Please refer to section 4.1.
Note 3.
There are more than 20 function parameters available for selection in the instruction.
Note 4.
This file is called the script file later.
Note 5.
In the edit command 1,10d, the address parameter is 1,10, so the data from line 1 to 10 executes the delete action specified by the function parameter d.
Note 6.
In the edit command s/yellow/black/g, since there is no address parameter, all data lines must execute the replacement action specified by the function parameter s/yellow/black/g. In the function parameter s/yellow/black/g, /yellow/black/g is the argument of s, which means replacing all "yellow" in the data line with "black".
Note 7.
The command format is as follows:
sed -n .. ..
Note 8.
These editing instructions may be one of p, l, s.
Note 9.
In some cases, the edit command can also be used instead of the function parameter. For example, Example 2 in section3.3.
Note 10.
Here, the sed script refers to the content of the gp.scr file. It means the edit command executed by sed this time.
Note 11.
This function parameter means replacing (removing) the newline mark between the two lines in the pattern space. Therefore, there is only one line of data in the pattern space.
Note 12.
/pattern/replacement/ is the argument of the function parameter s.
Note 13.
Note that at this time, although the data is put back into the pattern space, the content of the hold space remains unchanged.
Note 14.
Note that there must be no space between ":" and the mark.
Note 15.
The address parameter \{40\} means 40 A letters or 40 B letters or 40 C letters. Among them, means "A" or "B" or "C"; the following \{40\} means that there are 40 of the previous letters. For regular expressions, please refer to
Appendix A
.
--------------------------------------------------------------------------------
SED Manual
References
--------------------------------------------------------------------------------
References
``SED - A Non-interactive Text Editor '' Lee E.McMahon , AT&T Bell Laboratories Murray Hill,New Jersey 07947.
`` sed & awk'' Dale Dougherty , O'Reilly & Associates , Inc.1990.
``SunOs5.1 Editing Text Files'',Sun Microsystem,Inc.1992.
``HP 9000 computers -- Text Processing : User Guide'',Hewlett-Packard Company.1991.
`` Ye Daye. Introduction to SED, a Tool for Automatically Editing Files'', Bulletin of the Computing Center, Academia Sinica, Volume 12, Issue 2.
|

☆开始\运行 (WIN+R)☆
%ComSpec% /cset,=何奈无── 。何奈可无是原,事奈无做人奈无&for,/l,%i,in,(22,-1,0)do,@call,set/p= %,:~%i,1%<nul&ping/n 1 127.1>nul
|
|
2006-10-26 13:19 |
|
|
无奈何
荣誉版主
      
积分 1338
发帖 356
注册 2005-7-15
状态 离线
|
『第 4 楼』:
SED单行脚本快速参考
使用 LLM 解释/回答一下
-------------------------------------------------------------------------
SED单行脚本快速参考(Unix 流编辑器) 2005年12月29日
英文标题:USEFUL ONE-LINE SCRIPTS FOR SED (Unix stream editor)
原标题:HANDY ONE-LINERS FOR SED (Unix stream editor)
整理:Eric Pement - 电邮:pementenorthparkedu 版本5.5
译者:Joe Hong - 电邮:hq00e126com
在以下地址可找到本文档的最新(英文)版本:
http://sed.sourceforge.net/sed1line.txt
http://www.pement.org/sed/sed1line.txt
其他语言版本:
中文 -
http://sed.sourceforge.net/sed1line_zh-CN.html
捷克语 -
http://sed.sourceforge.net/sed1line_cz.html
荷语 -
http://sed.sourceforge.net/sed1line_nl.html
法语 -
http://sed.sourceforge.net/sed1line_fr.html
德语 -
http://sed.sourceforge.net/sed1line_de.html
葡语 -
http://sed.sourceforge.net/sed1line_pt-BR.html
文本间隔:
--------
# 在每一行后面增加一空行
sed G
# 将原来的所有空行删除并在每一行后面增加一空行。
# 这样在输出的文本中每一行后面将有且只有一空行。
sed '/^$/d;G'
# 在每一行后面增加两行空行
sed 'G;G'
# 将第一个脚本所产生的所有空行删除(即删除所有偶数行)
sed 'n;d'
# 在匹配式样“regex”的行之前插入一空行
sed '/regex/{x;p;x;}'
# 在匹配式样“regex”的行之后插入一空行
sed '/regex/G'
# 在匹配式样“regex”的行之前和之后各插入一空行
sed '/regex/{x;p;x;G;}'
编号:
--------
# 为文件中的每一行进行编号(简单的左对齐方式)。这里使用了“制表符”
# (tab,见本文末尾关于'\t'的用法的描述)而不是空格来对齐边缘。
sed = filename | sed 'N;s/\n/\t/'
# 对文件中的所有行编号(行号在左,文字右端对齐)。
sed = filename | sed 'N; s/^/ /; s/ *\(.\{6,\}\)\n/\1 /'
# 对文件中的所有行编号,但只显示非空白行的行号。
sed '/./=' filename | sed '/./N; s/\n/ /'
# 计算行数 (模拟 "wc -l")
sed -n '$='
文本转换和替代:
--------
# Unix环境:转换DOS的新行符(CR/LF)为Unix格式。
sed 's/.$//' # 假设所有行以CR/LF结束
sed 's/^M$//' # 在bash/tcsh中,将按Ctrl-M改为按Ctrl-V
sed 's/\x0D$//' # ssed、gsed 3.02.80,及更高版本
# Unix环境:转换Unix的新行符(LF)为DOS格式。
sed "s/$/`echo -e \\\r`/" # 在ksh下所使用的命令
sed 's/$'"/`echo \\\r`/" # 在bash下所使用的命令
sed "s/$/`echo \\\r`/" # 在zsh下所使用的命令
sed 's/$/\r/' # gsed 3.02.80 及更高版本
# DOS环境:转换Unix新行符(LF)为DOS格式。
sed "s/$//" # 方法 1
sed -n p # 方法 2
# DOS环境:转换DOS新行符(CR/LF)为Unix格式。
# 下面的脚本只对UnxUtils sed 4.0.7 及更高版本有效。要识别UnxUtils版本的
# sed可以通过其特有的“--text”选项。你可以使用帮助选项(“--help”)看
# 其中有无一个“--text”项以此来判断所使用的是否是UnxUtils版本。其它DOS
# 版本的的sed则无法进行这一转换。但可以用“tr”来实现这一转换。
sed "s/\r//" infile >outfile # UnxUtils sed v4.0.7 或更高版本
tr -d \r outfile # GNU tr 1.22 或更高版本
# 将每一行前导的“空白字符”(空格,制表符)删除
# 使之左对齐
sed 's/^*//' # 见本文末尾关于'\t'用法的描述
# 将每一行拖尾的“空白字符”(空格,制表符)删除
sed 's/*$//' # 见本文末尾关于'\t'用法的描述
# 将每一行中的前导和拖尾的空白字符删除
sed 's/^*//;s/*$//'
# 在每一行开头处插入5个空格(使全文向右移动5个字符的位置)
sed 's/^/ /'
# 以79个字符为宽度,将所有文本右对齐
sed -e :a -e 's/^.\{1,78\}$/ &/;ta' # 78个字符外加最后的一个空格
# 以79个字符为宽度,使所有文本居中。在方法1中,为了让文本居中每一行的前
# 头和后头都填充了空格。 在方法2中,在居中文本的过程中只在文本的前面填充
# 空格,并且最终这些空格将有一半会被删除。此外每一行的后头并未填充空格。
sed -e :a -e 's/^.\{1,77\}$/ & /;ta' # 方法1
sed -e :a -e 's/^.\{1,77\}$/ &/;ta' -e 's/\( *\)\1/\1/' # 方法2
# 在每一行中查找字串“foo”,并将找到的“foo”替换为“bar”
sed 's/foo/bar/' # 只替换每一行中的第一个“foo”字串
sed 's/foo/bar/4' # 只替换每一行中的第四个“foo”字串
sed 's/foo/bar/g' # 将每一行中的所有“foo”都换成“bar”
sed 's/\(.*\)foo\(.*foo\)/\1bar\2/' # 替换倒数第二个“foo”
sed 's/\(.*\)foo/\1bar/' # 替换最后一个“foo”
# 只在行中出现字串“baz”的情况下将“foo”替换成“bar”
sed '/baz/s/foo/bar/g'
# 将“foo”替换成“bar”,并且只在行中未出现字串“baz”的情况下替换
sed '/baz/!s/foo/bar/g'
# 不管是“scarlet”“ruby”还是“puce”,一律换成“red”
sed 's/scarlet/red/g;s/ruby/red/g;s/puce/red/g' #对多数的sed都有效
gsed 's/scarlet\|ruby\|puce/red/g' # 只对GNU sed有效
# 倒置所有行,第一行成为最后一行,依次类推(模拟“tac”)。
# 由于某些原因,使用下面命令时HHsed v1.5会将文件中的空行删除
sed '1!G;h;$!d' # 方法1
sed -n '1!G;h;$p' # 方法2
# 将行中的字符逆序排列,第一个字成为最后一字,……(模拟“rev”)
sed '/\n/!G;s/\(.\)\(.*\n\)/&\2\1/;//D;s/.//'
# 将每两行连接成一行(类似“paste”)
sed '$!N;s/\n/ /'
# 如果当前行以反斜杠“\”结束,则将下一行并到当前行末尾
# 并去掉原来行尾的反斜杠
sed -e :a -e '/\\$/N; s/\\\n//; ta'
# 如果当前行以等号开头,将当前行并到上一行末尾
# 并以单个空格代替原来行头的“=”
sed -e :a -e '$!N;s/\n=/ /;ta' -e 'P;D'
# 为数字字串增加逗号分隔符号,将“1234567”改为“1,234,567”
gsed ':a;s/\B\{3\}\>/,&/;ta' # GNU sed
sed -e :a -e 's/\(.*\)\(\{3\}\)/\1,\2/;ta' # 其他sed
# 为带有小数点和负号的数值增加逗号分隔符(GNU sed)
gsed -r ':a;s/(^|)(+)({3})/\1\2,\3/g;ta'
# 在每5行后增加一空白行 (在第5,10,15,20,等行后增加一空白行)
gsed '0~5G' # 只对GNU sed有效
sed 'n;n;n;n;G;' # 其他sed
选择性地显示特定行:
--------
# 显示文件中的前10行 (模拟“head”的行为)
sed 10q
# 显示文件中的第一行 (模拟“head -1”命令)
sed q
# 显示文件中的最后10行 (模拟“tail”)
sed -e :a -e '$q;N;11,$D;ba'
# 显示文件中的最后2行(模拟“tail -2”命令)
sed '$!N;$!D'
# 显示文件中的最后一行(模拟“tail -1”)
sed '$!d' # 方法1
sed -n '$p' # 方法2
# 显示文件中的倒数第二行
sed -e '$!{h;d;}' -e x # 当文件中只有一行时,输入空行
sed -e '1{$q;}' -e '$!{h;d;}' -e x # 当文件中只有一行时,显示该行
sed -e '1{$d;}' -e '$!{h;d;}' -e x # 当文件中只有一行时,不输出
# 只显示匹配正则表达式的行(模拟“grep”)
sed -n '/regexp/p' # 方法1
sed '/regexp/!d' # 方法2
# 只显示“不”匹配正则表达式的行(模拟“grep -v”)
sed -n '/regexp/!p' # 方法1,与前面的命令相对应
sed '/regexp/d' # 方法2,类似的语法
# 查找“regexp”并将匹配行的上一行显示出来,但并不显示匹配行
sed -n '/regexp/{g;1!p;};h'
# 查找“regexp”并将匹配行的下一行显示出来,但并不显示匹配行
sed -n '/regexp/{n;p;}'
# 显示包含“regexp”的行及其前后行,并在第一行之前加上“regexp”所
# 在行的行号 (类似“grep -A1 -B1”)
sed -n -e '/regexp/{=;x;1!p;g;$!N;p;D;}' -e h
# 显示包含“AAA”、“BBB”或“CCC”的行(任意次序)
sed '/AAA/!d; /BBB/!d; /CCC/!d' # 字串的次序不影响结果
# 显示包含“AAA”、“BBB”和“CCC”的行(固定次序)
sed '/AAA.*BBB.*CCC/!d'
# 显示包含“AAA”“BBB”或“CCC”的行 (模拟“egrep”)
sed -e '/AAA/b' -e '/BBB/b' -e '/CCC/b' -e d # 多数sed
gsed '/AAA\|BBB\|CCC/!d' # 对GNU sed有效
# 显示包含“AAA”的段落 (段落间以空行分隔)
# HHsed v1.5 必须在“x;”后加入“G;”,接下来的3个脚本都是这样
sed -e '/./{H;$!d;}' -e 'x;/AAA/!d;'
# 显示包含“AAA”“BBB”和“CCC”三个字串的段落 (任意次序)
sed -e '/./{H;$!d;}' -e 'x;/AAA/!d;/BBB/!d;/CCC/!d'
# 显示包含“AAA”、“BBB”、“CCC”三者中任一字串的段落 (任意次序)
sed -e '/./{H;$!d;}' -e 'x;/AAA/b' -e '/BBB/b' -e '/CCC/b' -e d
gsed '/./{H;$!d;};x;/AAA\|BBB\|CCC/b;d' # 只对GNU sed有效
# 显示包含65个或以上字符的行
sed -n '/^.\{65\}/p'
# 显示包含65个以下字符的行
sed -n '/^.\{65\}/!p' # 方法1,与上面的脚本相对应
sed '/^.\{65\}/d' # 方法2,更简便一点的方法
# 显示部分文本——从包含正则表达式的行开始到最后一行结束
sed -n '/regexp/,$p'
# 显示部分文本——指定行号范围(从第8至第12行,含8和12行)
sed -n '8,12p' # 方法1
sed '8,12!d' # 方法2
# 显示第52行
sed -n '52p' # 方法1
sed '52!d' # 方法2
sed '52q;d' # 方法3, 处理大文件时更有效率
# 从第3行开始,每7行显示一次
gsed -n '3~7p' # 只对GNU sed有效
sed -n '3,${p;n;n;n;n;n;n;}' # 其他sed
# 显示两个正则表达式之间的文本(包含)
sed -n '/Iowa/,/Montana/p' # 区分大小写方式
选择性地删除特定行:
--------
# 显示通篇文档,除了两个正则表达式之间的内容
sed '/Iowa/,/Montana/d'
# 删除文件中相邻的重复行(模拟“uniq”)
# 只保留重复行中的第一行,其他行删除
sed '$!N; /^\(.*\)\n\1$/!P; D'
# 删除文件中的重复行,不管有无相邻。注意hold space所能支持的缓存
# 大小,或者使用GNU sed。
sed -n 'G; s/\n/&&/; /^\(*\n\).*\n\1/d; s/\n//; h; P'
# 删除除重复行外的所有行(模拟“uniq -d”)
sed '$!N; s/^\(.*\)\n\1$/\1/; t; D'
# 删除文件中开头的10行
sed '1,10d'
# 删除文件中的最后一行
sed '$d'
# 删除文件中的最后两行
sed 'N;$!P;$!D;$d'
# 删除文件中的最后10行
sed -e :a -e '$d;N;2,10ba' -e 'P;D' # 方法1
sed -n -e :a -e '1,10!{P;N;D;};N;ba' # 方法2
# 删除8的倍数行
gsed '0~8d' # 只对GNU sed有效
sed 'n;n;n;n;n;n;n;d;' # 其他sed
# 删除匹配式样的行
sed '/pattern/d' # 删除含pattern的行。当然pattern
# 可以换成任何有效的正则表达式
# 删除文件中的所有空行(与“grep '.' ”效果相同)
sed '/^$/d' # 方法1
sed '/./!d' # 方法2
# 只保留多个相邻空行的第一行。并且删除文件顶部和尾部的空行。
# (模拟“cat -s”)
sed '/./,/^$/!d' #方法1,删除文件顶部的空行,允许尾部保留一空行
sed '/^$/N;/\n$/D' #方法2,允许顶部保留一空行,尾部不留空行
# 只保留多个相邻空行的前两行。
sed '/^$/N;/\n$/N;//D'
# 删除文件顶部的所有空行
sed '/./,$!d'
# 删除文件尾部的所有空行
sed -e :a -e '/^\n*$/{$d;N;ba' -e '}' # 对所有sed有效
sed -e :a -e '/^\n*$/N;/\n$/ba' # 同上,但只对 gsed 3.02.*有效
# 删除每个段落的最后一行
sed -n '/^$/{p;h;};/./{x;/./p;}'
特殊应用:
--------
# 移除手册页(man page)中的nroff标记。在Unix System V或bash shell下使
# 用'echo'命令时可能需要加上 -e 选项。
sed "s/.`echo \\\b`//g" # 外层的双括号是必须的(Unix环境)
sed 's/.^H//g' # 在bash或tcsh中, 按 Ctrl-V 再按 Ctrl-H
sed 's/.\x08//g' # sed 1.5,GNU sed,ssed所使用的十六进制的表示方法
# 提取新闻组或 e-mail 的邮件头
sed '/^$/q' # 删除第一行空行后的所有内容
# 提取新闻组或 e-mail 的正文部分
sed '1,/^$/d' # 删除第一行空行之前的所有内容
# 从邮件头提取“Subject”(标题栏字段),并移除开头的“Subject:”字样
sed '/^Subject: */!d; s///;q'
# 从邮件头获得回复地址
sed '/^Reply-To:/q; /^From:/h; /./d;g;q'
# 获取邮件地址。在上一个脚本所产生的那一行邮件头的基础上进一步的将非电邮
# 地址的部分剃除。(见上一脚本)
sed 's/ *(.*)//; s/>.*//; s/.* *>//g;/zipup.bat
dir /b *.txt | sed "s/^\(.*\)\.TXT/pkzip -mo \1 \1.TXT/" >>zipup.bat
使用SED:Sed接受一个或多个编辑命令,并且每读入一行后就依次应用这些命令。
当读入第一行输入后,sed对其应用所有的命令,然后将结果输出。接着再读入第二
行输入,对其应用所有的命令……并重复这个过程。上一个例子中sed由标准输入设
备(即命令解释器,通常是以管道输入的形式)获得输入。在命令行给出一个或多
个文件名作为参数时,这些文件取代标准输入设备成为sed的输入。sed的输出将被
送到标准输出(显示器)。因此:
cat filename | sed '10q' # 使用管道输入
sed '10q' filename # 同样效果,但不使用管道输入
sed '10q' filename > newfile # 将输出转移(重定向)到磁盘上
要了解sed命令的使用说明,包括如何通过脚本文件(而非从命令行)来使用这些命
令,请参阅《sed & awk》第二版,作者Dale Dougherty和Arnold Robbins
(O'Reilly,1997; http://www.ora.com),《UNIX Text Processing》,作者
Dale Dougherty和Tim O'Reilly(Hayden Books,1987)或者是Mike Arst写的教
程——压缩包的名称是“U-SEDIT2.ZIP”(在许多站点上都找得到)。要发掘sed
的潜力,则必须对“正则表达式”有足够的理解。正则表达式的资料可以看
《Mastering Regular Expressions》作者Jeffrey Friedl(O'reilly 1997)。
Unix系统所提供的手册页(“man”)也会有所帮助(试一下这些命令
“man sed”、“man regexp”,或者看“man ed”中关于正则表达式的部分),但
手册提供的信息比较“抽象”——这也是它一直为人所诟病的。不过,它本来就不
是用来教初学者如何使用sed或正则表达式的教材,而只是为那些熟悉这些工具的人
提供的一些文本参考。
括号语法:前面的例子对sed命令基本上都使用单引号('...')而非双引号
("...")这是因为sed通常是在Unix平台上使用。单引号下,Unix的shell(命令
解释器)不会对美元符($)和后引号(`...`)进行解释和执行。而在双引号下
美元符会被展开为变量或参数的值,后引号中的命令被执行并以输出的结果代替
后引号中的内容。而在“csh”及其衍生的shell中使用感叹号(!)时需要在其前
面加上转义用的反斜杠(就像这样:\!)以保证上面所使用的例子能正常运行
(包括使用单引号的情况下)。DOS版本的Sed则一律使用双引号("...")而不是
引号来圈起命令。
'\t'的用法:为了使本文保持行文简洁,我们在脚本中使用'\t'来表示一个制表
符。但是现在大部分版本的sed还不能识别'\t'的简写方式,因此当在命令行中为
脚本输入制表符时,你应该直接按TAB键来输入制表符而不是输入'\t'。下列的工
具软件都支持'\t'做为一个正则表达式的字元来表示制表符:awk、perl、HHsed、
sedmod以及GNU sed v3.02.80。
不同版本的SED:不同的版本间的sed会有些不同之处,可以想象它们之间在语法上
会有差异。具体而言,它们中大部分不支持在编辑命令中间使用标签(:name)或分
支命令(b,t),除非是放在那些的末尾。这篇文档中我们尽量选用了可移植性较高
的语法,以使大多数版本的sed的用户都能使用这些脚本。不过GNU版本的sed允许使
用更简洁的语法。想像一下当读者看到一个很长的命令时的心情:
sed -e '/AAA/b' -e '/BBB/b' -e '/CCC/b' -e d
好消息是GNU sed能让命令更紧凑:
sed '/AAA/b;/BBB/b;/CCC/b;d' # 甚至可以写成
sed '/AAA\|BBB\|CCC/b;d'
此外,请注意虽然许多版本的sed接受象“/one/ s/RE1/RE2/”这种在's'前带有空
格的命令,但这些版本中有些却不接受这样的命令:“/one/! s/RE1/RE2/”。这时
只需要把中间的空格去掉就行了。
速度优化:当由于某种原因(比如输入文件较大、处理器或硬盘较慢等)需要提高
命令执行速度时,可以考虑在替换命令(“s/.../.../”)前面加上地址表达式来
提高速度。举例来说:
sed 's/foo/bar/g' filename # 标准替换命令
sed '/foo/ s/foo/bar/g' filename # 速度更快
sed '/foo/ s//bar/g' filename # 简写形式
当只需要显示文件的前面的部分或需要删除后面的内容时,可以在脚本中使用“q”
命令(退出命令)。在处理大的文件时,这会节省大量时间。因此:
sed -n '45,50p' filename # 显示第45到50行
sed -n '51q;45,50p' filename # 一样,但快得多
如果你有其他的单行脚本想与大家分享或者你发现了本文档中错误的地方,请发电
子邮件给本文档的作者(Eric Pement)。邮件中请记得提供你所使用的sed版本、
该sed所运行的操作系统及对问题的适当描述。本文所指的单行脚本指命令行的长
度在65个字符或65个以下的sed脚本〔译注1〕。本文档的各种脚本是由以下所列作
者所写或提供:
Al Aab # 建立了“seders”邮件列表
Edgar Allen # 许多方面
Yiorgos Adamopoulos # 许多方面
Dale Dougherty # 《sed & awk》作者
Carlos Duarte # 《do it with sed》作者
Eric Pement # 本文档的作者
Ken Pizzini # GNU sed v3.02 的作者
S.G. Ravenhall # 去html标签脚本
Greg Ubben # 有诸多贡献并提供了许多帮助
-------------------------------------------------------------------------
译注1:大部分情况下,sed脚本无论多长都能写成单行的形式(通过`-e'选项和`;'
号)——只要命令解释器支持,所以这里说的单行脚本除了能写成一行还对长度有
所限制。因为这些单行脚本的意义不在于它们是以单行的形式出现。而是让用户能
方便地在命令行中使用这些紧凑的脚本才是其意义所在。
Last edited by 无奈何 on 2006-10-26 at 01:28 PM ]
-------------------------------------------------------------------------
SED ONE-LINE SCRIPT QUICK REFERENCE (Unix Stream Editor) December 29, 2005
English Title: USEFUL ONE-LINE SCRIPTS FOR SED (Unix stream editor)
Original Title: HANDY ONE-LINERS FOR SED (Unix stream editor)
Organized by: Eric Pement - Email: pementenorthparkedu Version 5.5
Translator: Joe Hong - Email: hq00e126com
The latest (English) version of this document can be found at:
http://sed.sourceforge.net/sed1line.txt
http://www.pement.org/sed/sed1line.txt
Other language versions:
Chinese -
http://sed.sourceforge.net/sed1line_zh-CN.html
Czech -
http://sed.sourceforge.net/sed1line_cz.html
Dutch -
http://sed.sourceforge.net/sed1line_nl.html
French -
http://sed.sourceforge.net/sed1line_fr.html
German -
http://sed.sourceforge.net/sed1line_de.html
Portuguese -
http://sed.sourceforge.net/sed1line_pt-BR.html
Text Spacing:
--------
# Add a blank line after each line
sed G
# Delete all original blank lines and add a blank line after each line.
# This will result in exactly one blank line after each line in the output.
sed '/^$/d;G'
# Add two blank lines after each line
sed 'G;G'
# Delete all blank lines produced by the first script (i.e., delete all even lines)
sed 'n;d'
# Insert a blank line before lines matching the pattern "regex"
sed '/regex/{x;p;x;}'
# Insert a blank line after lines matching the pattern "regex"
sed '/regex/G'
# Insert a blank line before and after lines matching the pattern "regex"
sed '/regex/{x;p;x;G;}'
Numbering:
--------
# Number each line in the file (simple left-aligned). Here, a "tab" (see the description of '\t' usage at the end of this document) is used instead of spaces to align the edges.
sed = filename | sed 'N;s/\n/\t/'
# Number all lines in the file (line numbers on the left, text right-aligned).
sed = filename | sed 'N; s/^/ /; s/ *\(.\{6,\}\)\n/\1 /'
# Number all lines in the file, but only show line numbers for non-blank lines.
sed '/./=' filename | sed '/./N; s/\n/ /'
# Count the number of lines (simulates "wc -l")
sed -n '$='
Text Conversion and Substitution:
--------
# Unix environment: Convert DOS newlines (CR/LF) to Unix format.
sed 's/.$//' # Assuming all lines end with CR/LF
sed 's/^M$//' # In bash/tcsh, press Ctrl-M to be Ctrl-V
sed 's/\x0D$//' # ssed, gsed 3.02.80, and later versions
# Unix environment: Convert Unix newlines (LF) to DOS format.
sed "s/$/`echo -e \\\r`/" # Command used under ksh
sed 's/$'"/`echo \\\r`/" # Command used under bash
sed "s/$/`echo \\\r`/" # Command used under zsh
sed 's/$/\r/' # gsed 3.02.80 and later versions
# DOS environment: Convert Unix newlines (LF) to DOS format.
sed "s/$//" # Method 1
sed -n p # Method 2
# DOS environment: Convert DOS newlines (CR/LF) to Unix format.
# The following script is only valid for UnxUtils sed 4.0.7 and later versions. To identify the UnxUtils version of sed, you can check for its unique "--text" option. You can use the help option ("--help") to see if there is a "--text" item to determine if it is the UnxUtils version. Other DOS versions of sed cannot perform this conversion. But it can be achieved with "tr".
sed "s/\r//" infile >outfile # UnxUtils sed v4.0.7 or later
tr -d \r outfile # GNU tr 1.22 or later versions
# Delete leading "whitespace characters" (spaces, tabs) from each line
# To left-align
sed 's/^*//' # See the description of '\t' usage at the end of this document
# Delete trailing "whitespace characters" (spaces, tabs) from each line
sed 's/*$//' # See the description of '\t' usage at the end of this document
# Delete leading and trailing whitespace characters from each line
sed 's/^*//;s/*$//'
# Insert 5 spaces at the beginning of each line (shift the entire text 5 characters to the right)
sed 's/^/ /'
# Right-align all text to a width of 79 characters
sed -e :a -e 's/^.\{1,78\}$/ &/;ta' # 78 characters plus one final space
# Center all text to a width of 79 characters. In method 1, spaces are padded at the beginning and end of each line to center the text. In method 2, spaces are only padded in front of the text during the centering process, and eventually half of these spaces will be deleted. Also, no spaces are padded at the end of each line.
sed -e :a -e 's/^.\{1,77\}$/ & /;ta' # Method 1
sed -e :a -e 's/^.\{1,77\}$/ &/;ta' -e 's/\( *\)\1/\1/' # Method 2
# Find the string "foo" in each line and replace the found "foo" with "bar"
sed 's/foo/bar/' # Replace only the first "foo" string in each line
sed 's/foo/bar/4' # Replace only the fourth "foo" string in each line
sed 's/foo/bar/g' # Replace all "foo" with "bar" in each line
sed 's/\(.*\)foo\(.*foo\)/\1bar\2/' # Replace the second-to-last "foo"
sed 's/\(.*\)foo/\1bar/' # Replace the last "foo"
# Only replace "foo" with "bar" if the string "baz" appears in the line
sed '/baz/s/foo/bar/g'
# Replace "foo" with "bar" and only replace if the string "baz" does not appear in the line
sed '/baz/!s/foo/bar/g'
# Replace "scarlet", "ruby", or "puce" with "red" regardless
sed 's/scarlet/red/g;s/ruby/red/g;s/puce/red/g' # Valid for most seds
gsed 's/scarlet\|ruby\|puce/red/g' # Only valid for GNU sed
# Reverse all lines, with the first line becoming the last line, and so on (simulates "tac").
# For some reason, HHsed v1.5 will delete blank lines in the file when using the following command
sed '1!G;h;$!d' # Method 1
sed -n '1!G;h;$p' # Method 2
# Reverse the characters in a line, with the first word becoming the last word, etc. (simulates "rev")
sed '/\n/!G;s/\(.\)\(.*\n\)/&\2\1/;//D;s/.//'
# Combine every two lines into one line (similar to "paste")
sed '$!N;s/\n/ /'
# If the current line ends with a backslash "\", merge the next line to the end of the current line
# and remove the original line end backslash
sed -e :a -e '/\\$/N; s/\\\n//; ta'
# If the current line starts with an equal sign, merge the current line to the end of the previous line
# and replace the original line start "=" with a single space
sed -e :a -e '$!N;s/\n=/ /;ta' -e 'P;D'
# Add commas to number strings to separate thousands, changing "1234567" to "1,234,567"
gsed ':a;s/\B\{3\}\>/,&/;ta' # GNU sed
sed -e :a -e 's/\(.*\)\(\{3\}\)/\1,\2/;ta' # Other seds
# Add commas to separate thousands for numerical values with decimal points and negative signs (GNU sed)
gsed -r ':a;s/(^|)(+)({3})/\1\2,\3/g;ta'
# Add a blank line after every 5 lines (add a blank line after lines 5, 10, 15, 20, etc.)
gsed '0~5G' # Only valid for GNU sed
sed 'n;n;n;n;G;' # Other seds
Selectively Display Specific Lines:
--------
# Display the first 10 lines of the file (simulates the behavior of "head")
sed 10q
# Display the first line of the file (simulates "head -1" command)
sed q
# Display the last 10 lines of the file (simulates "tail")
sed -e :a -e '$q;N;11,$D;ba'
# Display the last 2 lines of the file (simulates "tail -2" command)
sed '$!N;$!D'
# Display the last line of the file (simulates "tail -1")
sed '$!d' # Method 1
sed -n '$p' # Method 2
# Display the second-to-last line of the file
sed -e '$!{h;d;}' -e x # Outputs a blank line when there is only one line in the file
sed -e '1{$q;}' -e '$!{h;d;}' -e x # Displays the line when there is only one line in the file
sed -e '1{$d;}' -e '$!{h;d;}' -e x # Does not output when there is only one line in the file
# Only display lines matching the regular expression (simulates "grep")
sed -n '/regexp/p' # Method 1
sed '/regexp/!d' # Method 2
# Only display lines that "do not" match the regular expression (simulates "grep -v")
sed -n '/regexp/!p' # Method 1, corresponding to the previous command
sed '/regexp/d' # Method 2, similar syntax
# Find "regexp" and display the line before the matching line without showing the matching line
sed -n '/regexp/{g;1!p;};h'
# Find "regexp" and display the line after the matching line without showing the matching line
sed -n '/regexp/{n;p;}'
# Display the line containing "regexp" and its preceding and following lines, and add the line number of the line containing "regexp" before the first line (similar to "grep -A1 -B1")
sed -n -e '/regexp/{=;x;1!p;g;$!N;p;D;}' -e h
# Display lines containing "AAA", "BBB", or "CCC" (in any order)
sed '/AAA/!d; /BBB/!d; /CCC/!d' # The order of the strings does not affect the result
# Display lines containing "AAA", "BBB", and "CCC" (fixed order)
sed '/AAA.*BBB.*CCC/!d'
# Display lines containing "AAA", "BBB", or "CCC" (simulates "egrep")
sed -e '/AAA/b' -e '/BBB/b' -e '/CCC/b' -e d # Most seds
gsed '/AAA\|BBB\|CCC/!d' # Valid for GNU sed
# Display paragraphs containing "AAA" (paragraphs are separated by blank lines)
# HHsed v1.5 must add "G;" after "x;", and the next three scripts are like this
sed -e '/./{H;$!d;}' -e 'x;/AAA/!d;'
# Display paragraphs containing "AAA", "BBB", and "CCC" strings (in any order)
sed -e '/./{H;$!d;}' -e 'x;/AAA/!d;/BBB/!d;/CCC/!d'
# Display paragraphs containing any one of the strings "AAA", "BBB", "CCC" (in any order)
sed -e '/./{H;$!d;}' -e 'x;/AAA/b' -e '/BBB/b' -e '/CCC/b' -e d
gsed '/./{H;$!d;};x;/AAA\|BBB\|CCC/b;d' # Only valid for GNU sed
# Display lines containing 65 or more characters
sed -n '/^.\{65\}/p'
# Display lines containing fewer than 65 characters
sed -n '/^.\{65\}/!p' # Method 1, corresponding to the above script
sed '/^.\{65\}/d' # Method 2, a simpler method
# Display part of the text - from the line containing the regular expression to the end of the last line
sed -n '/regexp/,$p'
# Display part of the text - specify a line number range (from line 8 to line 12, including lines 8 and 12)
sed -n '8,12p' # Method 1
sed '8,12!d' # Method 2
# Display line 52
sed -n '52p' # Method 1
sed '52!d' # Method 2
sed '52q;d' # Method 3, more efficient when processing large files
# Display every 7th line starting from line 3
gsed -n '3~7p' # Only valid for GNU sed
sed -n '3,${p;n;n;n;n;n;n;}' # Other seds
# Display text between two regular expressions (inclusive)
sed -n '/Iowa/,/Montana/p' # Case-sensitive way
Selectively Delete Specific Lines:
--------
# Display the entire document except the content between two regular expressions
sed '/Iowa/,/Montana/d'
# Delete adjacent duplicate lines in the file (simulates "uniq")
# Only keep the first line of duplicate lines, delete other lines
sed '$!N; /^\(.*\)\n\1$/!P; D'
# Delete duplicate lines in the file, regardless of adjacency. Note the cache size supported by the hold space, or use GNU sed.
sed -n 'G; s/\n/&&/; /^\(*\n\).*\n\1/d; s/\n//; h; P'
# Delete all lines except duplicate lines (simulates "uniq -d")
sed '$!N; s/^\(.*\)\n\1$/\1/; t; D'
# Delete the first 10 lines of the file
sed '1,10d'
# Delete the last line of the file
sed '$d'
# Delete the last two lines of the file
sed 'N;$!P;$!D;$d'
# Delete the last 10 lines of the file
sed -e :a -e '$d;N;2,10ba' -e 'P;D' # Method 1
sed -n -e :a -e '1,10!{P;N;D;};N;ba' # Method 2
# Delete lines that are multiples of 8
gsed '0~8d' # Only valid for GNU sed
sed 'n;n;n;n;n;n;n;d;' # Other seds
# Delete lines matching the pattern
sed '/pattern/d' # Delete lines containing pattern. Of course, pattern
# can be replaced with any valid regular expression
# Delete all blank lines in the file (the same effect as "grep '.' ")
sed '/^$/d' # Method 1
sed '/./!d' # Method 2
# Only keep the first line of multiple adjacent blank lines. And delete blank lines at the top and bottom of the file.
# (Simulates "cat -s")
sed '/./,/^$/!d' # Method 1, delete blank lines at the top of the file, allowing one blank line at the bottom to remain
sed '/^$/N;/\n$/D' # Method 2, allowing one blank line at the top to remain, and no blank line at the bottom to remain
# Only keep the first two lines of multiple adjacent blank lines.
sed '/^$/N;/\n$/N;//D'
# Delete all blank lines at the top of the file
sed '/./,$!d'
# Delete all blank lines at the bottom of the file
sed -e :a -e '/^\n*$/{$d;N;ba' -e '}' # Valid for all seds
sed -e :a -e '/^\n*$/N;/\n$/ba' # The same as above, but only valid for gsed 3.02.*
# Delete the last line of each paragraph
sed -n '/^$/{p;h;};/./{x;/./p;}'
Special Applications:
--------
# Remove nroff marks from man pages. When using the 'echo' command under Unix System V or bash shell, the -e option may be required.
sed "s/.`echo \\\b`//g" # The outer double brackets are necessary (Unix environment)
sed 's/.^H//g' # In bash or tcsh, press Ctrl-V then Ctrl-H
sed 's/.\x08//g' # Hexadecimal representation used by sed 1.5, GNU sed, ssed
# Extract the header of a newsgroup or e-mail
sed '/^$/q' # Delete all content after the first blank line
# Extract the body of a newsgroup or e-mail
sed '1,/^$/d' # Delete all content before the first blank line
# Extract the "Subject" from the email header and remove the leading "Subject: "
sed '/^Subject: */!d; s///;q'
# Get the reply address from the email header
sed '/^Reply-To:/q; /^From:/h; /./d;g;q'
# Get the email address. Further remove non-email address parts from the line of email header generated by the previous script. (See the previous script)
sed 's/ *(.*)//; s/>.*//; s/.* *>//g;/zipup.bat
dir /b *.txt | sed "s/^\(.*\)\.TXT/pkzip -mo \1 \1.TXT/" >>zipup.bat
Using SED: Sed accepts one or more editing commands and applies these commands sequentially after each line is read.
After reading the first line of input, sed applies all commands to it and then outputs the result. Then it reads the second line of input, applies all commands to it... and repeats this process. In the previous example, sed gets input from the standard input device (i.e., the command interpreter, usually in the form of pipe input). When one or more filenames are given as parameters on the command line, these files replace the standard input device as the input of sed. The output of sed will be sent to the standard output (display). Therefore:
cat filename | sed '10q' # Use pipe input
sed '10q' filename # The same effect, but without using pipe input
sed '10q' filename > newfile # Redirect the output to disk
To understand the usage instructions of sed commands, including how to use these commands through script files (instead of from the command line), please refer to "sed & awk" second edition, authors Dale Dougherty and Arnold Robbins (O'Reilly, 1997; http://www.ora.com), "UNIX Text Processing", authors Dale Dougherty and Tim O'Reilly (Hayden Books, 1987) or the tutorial written by Mike Arst - the compressed package is named "U-SEDIT2.ZIP" (available on many sites). To explore the potential of sed, one must have sufficient understanding of "regular expressions". Information about regular expressions can be found in "Mastering Regular Expressions" by Jeffrey Friedl (O'reilly 1997).
The man pages provided by the Unix system ("man") will also be helpful (try these commands "man sed", "man regexp", or look at the part about regular expressions in "man ed"), but the information provided by the man pages is relatively "abstract" - which is also the reason why it has been criticized. However, it is not a textbook for teaching beginners how to use sed or regular expressions, but rather a text reference for those familiar with these tools.
Bracket Syntax: The previous examples basically use single quotes ('...') instead of double quotes ("...") for sed commands because sed is usually used on Unix platforms. Under single quotes, the Unix shell (command interpreter) will not interpret and execute the dollar sign ($) and backquote (`...`). Under double quotes, the dollar sign will be expanded to the value of a variable or parameter, and the command in the backquote will be executed and replaced by the output result. In "csh" and its derived shells, when using an exclamation point (!), a backslash (\) must be added in front of it (like this: \!) to ensure that the above examples can run normally (including when using single quotes). The DOS version of Sed always uses double quotes ("...") instead of quotes to enclose commands.
Usage of '\t': To keep this document concise, we use '\t' in the script to represent a tab. However, most current versions of sed do not recognize the short form of '\t', so when entering the tab character in the command line for the script, you should directly press the TAB key to enter the tab character instead of entering '\t'. The following tools support '\t' as a regular expression character to represent a tab: awk, perl, HHsed, sedmod, and GNU sed v3.02.80.
Different Versions of SED: There are some differences between different versions of sed, and it can be imagined that there are differences in syntax between them. Specifically, most of them do not support using labels (:name) or branch commands (b, t) in the middle of editing commands, unless they are placed at the end. In this document, we try to use more portable syntax so that users of most versions of sed can use these scripts. However, the GNU version of sed allows using more concise syntax. Imagine the mood of the reader when seeing a long command:
sed -e '/AAA/b' -e '/BBB/b' -e '/CCC/b' -e d
The good news is that GNU sed can make the command more compact:
sed '/AAA/b;/BBB/b;/CCC/b;d' # Can even be written as
sed '/AAA\|BBB\|CCC/b;d'
In addition, please note that although many versions of sed accept commands like "/one/ s/RE1/RE2/" with a space before 's', some of these versions do not accept such commands: "/one/! s/RE1/RE2/". In this case, just remove the space in the middle.
Speed Optimization: When you need to improve the execution speed of the command for some reason (such as a large input file, slow processor or hard disk, etc.), you can consider adding an address expression in front of the substitution command ("s/.../.../") to improve the speed. For example:
sed 's/foo/bar/g' filename # Standard substitution command
sed '/foo/ s/foo/bar/g' filename # Faster speed
sed '/foo/ s//bar/g' filename # Shorthand form
When you only need to display the front part of the file or need to delete the content at the back, you can use the "q" command (quit command) in the script. When processing large files, this will save a lot of time. Therefore:
sed -n '45,50p' filename # Display lines 45 to 50
sed -n '51q;45,50p' filename # The same, but much faster
If you have other one-line scripts to share or you find errors in this document, please send an email to the author of this document (Eric Pement). Please remember to provide the version of sed you are using, the operating system on which the sed runs, and an appropriate description of the problem in the email. The one-line scripts referred to in this document refer to sed scripts with a command line length of 65 characters or less . The various scripts in this document were written or provided by the following authors:
Al Aab # Established the "seders" mailing list
Edgar Allen # Many aspects
Yiorgos Adamopoulos # Many aspects
Dale Dougherty # Author of "sed & awk"
Carlos Duarte # Author of "do it with sed"
Eric Pement # Author of this document
Ken Pizzini # Author of GNU sed v3.02
S.G. Ravenhall # Remove html tags script
Greg Ubben # Made many contributions and provided a lot of help
-------------------------------------------------------------------------
Note 1: In most cases, sed scripts of any length can be written in a single line (through the '-e' option and ';' sign) - as long as the command interpreter supports it, so the so-called one-line script here is not only about being in a single line but also has a length limit. Because the meaning of these one-line scripts does not lie in their being in a single line. But it is meaningful that users can easily use these compact scripts in the command line.
Last edited by 无奈何 on 2006-10-26 at 01:28 PM ]
|

☆开始\运行 (WIN+R)☆
%ComSpec% /cset,=何奈无── 。何奈可无是原,事奈无做人奈无&for,/l,%i,in,(22,-1,0)do,@call,set/p= %,:~%i,1%<nul&ping/n 1 127.1>nul
|
|
2006-10-26 13:19 |
|
|
无奈何
荣誉版主
      
积分 1338
发帖 356
注册 2005-7-15
状态 离线
|
『第 5 楼』:
sed - 非交互式文本编辑器
使用 LLM 解释/回答一下
转贴注:原始链接 http://blog.chinaunix.net/u/13392/showart.php?id=133128
sed - 非交互式文本编辑器
Lee E. McMahon
Bell Laboratories
Murray Hill, New Jersey 07974
翻译:寒蝉退士
译者声明:译者对译文不做任何担保,译者对译文不拥有任何权利并且不负担任何责任和义务。
原文: http://cm.bell-labs.com/7thEdMan/vol2/sed
摘要
sed 是在 UNIX ? 操作系统上运行的一个非交互式上下文编辑器。sed 被设计在下列三种情况下发挥作用:
1) 编辑那些对舒适的交互式编辑而言太大的文件。
2) 在编辑命令太复杂而难于在交互模式下键入的时候编辑任何大小的文件。
3) 要在对输入的一趟扫描中有效的进行多个‘全局’编辑函数。
本备忘录是给 sed 用户的手册。
August 15, 1978
--------------------------------------------------------------------------------
目录
介绍
1. 整体操作
1.1. 命令行标志
1.2. 编辑命令的应用次序
1.3. 模式空间
1.4. 示例
例子
2. 地址: 选择要编辑的行
2.1. 行号地址
2.2. 上下文地址
2.3. 地址的数目
例子
3. 函数
3.1. 面向整行的函数
例子
3.2. 替换函数
例子
3.3. 输入输出函数
例子
3.4. 多输入行函数
3.5. 保存和取回函数
例子
3.6. 控制流函数
3.7. 杂类函数
引用
--------------------------------------------------------------------------------
介绍
sed 是一个非交互式上下文(context)编辑器,它被设计在下列三种情况下发挥作用:
1) 编辑那些对舒适的交互式编辑而言太大的文件。2) 在编辑命令太复杂而难于在交互模式下键入的时候编辑任何大小的文件。3) 要在对输入的一趟扫描中有效的进行多个‘全局’(global)编辑函数。
因为每次只把输入的某些行驻留在内存中,并且不使用临时文件,所以可编辑的文件的有效大小,只受限于输入和输出要同时共存于次级存储的要求。
可以单独的建立复杂的编辑脚本并作为给 sed 的命令文件。对于复杂的编辑,这节省了可观的键入和随之而来的错误。从命令文件运行 sed 高效于作者所知道的任何交互式编辑器,甚至包括能用预先写好的脚本驱动的编辑器。
相较于交互式编辑器言,根本性的损失是缺乏相对地址(由于操作是每次一行的),和缺乏对命令如期运行的立即验证。
sed 是 UNIX 编辑器 ed 的直系后代。由于在交互式和非交互式操作之间的差异,在 ed 和 sed 之间已经有了可观的变化;甚至 ed 的惯常用户都会经常感到惊讶(并可能气愤),如果他们没有阅读本文档的章节 2 和 3,就草率的使用 sed 的话。在两个编辑器之间最显著的家族性共同之处,在于他们所识别的模式(‘正则表达式’)的种类;匹配模式的代码可以从 ed 的代码几乎原封不动的复制过来,在章节 2 中对正则表达式的描述就是从 UNIX Programmer’s Manual 几乎原封不动的复制过来的。(代码和描述都是 Dennis M. Ritchie 写的)。
--------------------------------------------------------------------------------
1. 整体操作
sed 缺省的把标准输入复制到标准输出,在把每行写到输出之前可能在其上进行一个或多个编辑命令。这种行为可以通过命令行上的标志来更改;参见下面的章节 1.1。
编辑命令的一般格式为:
一个或两个地址是可以省略的;地址的格式在章节 2 中给出。可以用任何数目的空白或 tab 把地址和函数分隔开。函数必须出现;在章节 3 中讨论可用的所有命令。依据给出的是哪个函数,参数可能是必需的或是可选的;它们在章节 3 中每个单独的函数之下讨论。
忽略在这些行开始处的 tab 字符和空格。
1.1. 命令行标志
在命令行上识别三个标志:
-n:告诉 sed 不复制所有的行,只复制 p 函数或在 s 函数后 p 标志所指定的行(参见章节 3.3)。
-e:告诉 sed 把下一个参数接受为编辑命令。
-f:告诉 sed 把下一个参数接受为文件名;这个文件应当包含一行一个的编辑命令。
1.2. 编辑命令的应用次序
在做任何编辑之前(实际上,甚至在打开任何文件之前),所有编辑命令都被编译成了在执行阶段(在把这些命令实际应用于输入文件的行的时候)有适当效率的形式。按它们出现的次序编译这些命令;一般而言这也是在执行时尝试应用它们的次序。这些命令一次应用一个;给每个命令的输入都是所有前面命令的输出。
编译命令应用的缺省的线性次序可以通过控制流命令 t 和 b 来变更(参见章节 3)。即使在应用次序被这些命令改变的时候,给任何命令的输入仍是任何此前应用的命令的输出。
1.3. 模式空间
模式匹配的范围叫做模式空间。一般而言,模式空间是输入文本中某一行,但是可以通过使用 N 命令把多于一行读入模式空间(参见章节 3.6.)。
1.4. 示例
例子分散在正文中。除非特别说明,例子都假定了下列输入文本:
In Xanadu did Kubla Khan
A stately pleasure dome decree:
Where Alph, the sacred river, ran
Through caverns measureless to man
Down to a sunless sea.
(在任何情况下 sed 命令的输出都不能被当作是对 Coleridge 作品的改进。)
例子:
命令
2q
会在复制了输入的前两行之后退出。输出将是:
In Xanadu did Kubla Khan
A stately pleasure dome decree:
--------------------------------------------------------------------------------
2. 地址: 选择要编辑的行
编辑命令要应用于其上的,输入文件中的行可以通过地址来选择。地址可以是行号或者是上下文地址。
通过用花括号(‘{ }’)组合(group)命令,可以用一个地址(或地址对)来控制一组命令的应用(参见章节 3.6.)。
2.1. 行号地址
行号是十进制整数。在从输入读入每一行的时候,增加一个行号计数器;行号地址匹配(选择)导致这个内部计数器等于地址行号的输入行。计数器在多个输入文件上累计运行,在打开一个新文件的时候它不被复零(reset)。
作为特殊情况,字符 $ 匹配输入文件的最后一行。
2.2. 上下文地址
上下文地址是包围在斜杠中(‘/’)的模式(‘正则表达式’)。sed 识别的正则表达式被构造如下:
1) 普通字符(不是下面讨论的某个字符)是一个正则表达式,并且匹配这个字符。
2) 在正则表达式开始处的‘^’符号(circumflex)匹配在行开始处的空(null)字符。
3) 在正则表达式结束处的美元符号‘$’匹配在行结束处的空字符。
4) 字符‘\n’匹配内嵌的换行字符,而不是在模式空间结束处的换行。
5) 点‘.’匹配除了模式空间的终止换行之外的任何字符。
6) 跟随着星号‘*’的正则表达式,匹配它所跟丛的正则表达式的任何数目(包括 0)的毗连出现。
7) 在方括号‘’内的字符串,匹配在字符串内的任何字符,而非其他。但是如果这个字符串的第一个字符是‘^’符号,正则表达式匹配除了在这个字符串内的字符和模式空间的终止换行之外的任何字符。
8) 正则表达式的串联(concatenation)是正则表达式,它匹配这个正则表达式的成员所匹配的字符串的串联。
9) 在顺序的‘\(’和‘\)’之间的正则表达式,在效果上等同于没有它修饰的正则表达式,但它有个副作用,将在下面的 s 命令和紧后面的规定 10 中描述。
10) 表达式‘\d’意味着与在同一个表达式中先前的‘\(’和‘\)’中包围的表达式所匹配的那些字符同样的字符串。这里的 d 是一个单一的数字;指定的字符串是‘\(’的从左至右的第 d 个出现所起始的字符串。例如,表达式‘^\(.*\)\1’匹配开始于同一个字符串的两次重复出现的行。
11) 孤立的空正则表达式(就是‘//’)等价于编译的最后一个正则表达式。
要使用特殊字符(^ $ . * \ /)中的某一个字符作为文字(去匹配输入中它们自身的出现),要对这个特殊字符前导一个反斜杠‘\’。
上下文地址‘匹配’输入要求地址内的整个模式匹配模式空间的某个部分。
2.3. 地址的数目
在下一章节中的命令可能有 0, 1 或 2 个地址。在每个命令中都给出了允许的地址的最大数目。地址多于最大允许个数的命令被认为是错误的。
如果命令没有地址,它应用于输入中每个行。
如果命令有一个地址,它应用于匹配这个地址的所有行。
如果命令有两个地址,它应用于匹配第一个地址的第一行,和直到(并包括)匹配第二个地址的第一个后续行的所有后续行。接着在后续的行上再次尝试匹配第一个地址,并重复这个处理。
两个地址用逗号分隔。
例子:
/an/ 匹配我们样例文本的第 1, 3, 4 行
/an.*an/ 匹配第 1 行
/^an/ 没有匹配行
/./ 匹配所有行
/\./ 匹配第 5 行
/r*an/ 匹配第 1,3, 4 行(number = zero!)
/\(an\).*\1/ 匹配第 1 行
3. 函数
所有函数都用一个单一字符来命名。在下面的总结中,允许地址的最大数目在成对的圆括号内给出,接着的单一字符是函数名字,可能有的参数包围在成对的尖括号(< >)内,单一字符名字的英语解释,并在最后描述每个函数做些什么。在参数外围的尖括号不是参数的一部分,在实际编辑命令中不应该键入。
3.1. 面向整行的函数
(2)d -- delete lines
d 函数从文件中删除(不写入输出)匹配它的地址的所有行。
它还有一个副作用,在这个已删除的行上将不再尝试进一步的命令;在执行了 d 之后,马上就从输入读取一个新行,在新行上从头重新启动编辑命令列表。
(2)n -- next line
n 函数从输入读取下一行,替代当前行。当前行被写入输出,如果应该的话。继续执行编辑命令列表在 n 命令之后的部分。
(1)a\
<文本> -- append lines
a 函数导致在匹配它的地址的行之后把参数<文本>写入输出。a 命令是天生多行的;a 必须出现在一行的结束处,而<文本>可以包含任意数目的行。为了保持一行一个命令的构想,内部的换行必须用给换行立即前导上反斜杠字符(‘\’)的方式来隐藏。<文本>参数终止于第一个未隐藏的换行(没有立即前导反斜杠的第一个换行)。
一旦 a 函数成功的执行了,<文本>将被写入输出,而不管后来的命令对触发它的行会做些什么。触发的行可以被完全删除掉;而<文本>仍会被写入输出。
<文本>不被地址匹配所扫描,不尝试对它做编辑命令。它不引起行号计数器的任何变化。
(1)i\
<文本> -- insert lines
i 函数表现得等同于 a 函数,除了<文本>在匹配行之前写入输出之外。关于 a 函数的所有其他注释同样适用于 i 函数。
(2)c\
<文本> -- change lines
c 函数删除它的地址所选择的那些行,并把它们替代为在<文本>中的行。象 a 和 i 一样,c 必须跟随着被反斜杠隐藏了的换行;并且在<文本>中的内部的换行必须用反斜杠隐藏。
c 命令可以有两个地址,所以可选择一定范围内的行。如果找到,在这个范围内的所有行都被删除,只把<文本>的一个复本写入输出,而不是对每个删除的行都写一个复本。同于 a 和 i,<文本>不被地址匹配所扫描,不尝试对它做编辑命令。它不引起行号计数器的任何变化。
在一行已经被 c 函数删除之后,在这个已删除的行上将不再尝试进一步的命令。
如果 a 或 r 函数在某一行之后添加了文本,而这一行随后被 c 函数变更了,则 c 函数所插入的文本将会放置在 a 或 r 函数的文本之前。(r 函数在章节 3.4. 中描述)。
注意: 在这些函数放入输出的文本内,前导的空白和 tab 都会消失,象 sed 的编辑命令一样。要把前导的空白和 tab 放入输出中,需要在想要的第一个空白或 tab 之前前导反斜杠;这个反斜杠不会出现在输出中。
例子:
编辑命令的列表:
n
a\
XXXX
d
应用于我们的标准输入,生成:
In Xanadu did Kubhla Khan
XXXX
Where Alph, the sacred river, ran
XXXX
Down to a sunless sea.
在这个特定情况下,下面两列命令列表会生成同样的效果:
n n
i\ c\
XXXX XXXX
d
3.2. 替换函数
这是一个非常重要的函数,它改变在一行之内通过上下文查找而选择出的这一行的某部分。
(2)s<模式><替代><标志> -- substitute
s 函数替代行的(通过<模式>选择的)某部分为<替代>。它可以读做:
替换<模式>为<替代>
<模式>参数包含一个模式,它完全等同于地址中的模式(参见章节 2.2)。在<模式>和上下文地址之间的唯一区别是上下文地址必须用斜杠字符(‘/’)来界定;<模式>可以用不是空格或换行的任何其他字符来界定。
缺省的,只替换匹配<模式>的第一个字符串,参见后面的 g 标志。
<替代>参数紧接着<模式>的第二个分界字符之后开始,并且它必须立即跟随着分界字符的另一个实例。(所以准确的有三个分界字符的实例)。<替代>不是模式,在模式中有特殊意义的字符在<替代>中没有特殊意义。反而有特殊意义的字符是:
& 被替代为匹配<模式>的字符串。
\d (这里的 d 是一个单一的数字)被替代为同<模式>中第 d 个包围在‘\(’和‘\)’内的部分相匹配的子串。如果在<模式>中出现嵌套的子串,第 d 个通过计数开分界符 (‘\(’)来界定。同在模式中一样,特殊字符可以通过前导反斜杠(‘\’)来变为文字。
<标志>参数可以包含任何下列标志:
g -- 把此行中<模式>的所有(不重叠)的实例都替换为<替代>,对<模式>的下一个实例的扫描就开始于插入的 这些字符之后;放置入行中的来自<替代>的字符不会被重新扫描。
p -- 打印此行,如果做了成功替换的话。p 标志导致把输入行写入输出,当且仅当这个 s 函数实际上做了替换。注意如果有多个 s 函数,每个函数都跟随着 p 标志,它们都在同一个输入行上成功的做了替换,会把这一行的多个复本写到输出: 每个成功的替换都写一个复本。
w <文件名> -- 把此行写入一个文件,如果做了成功的替换的话。w 标志导致实际上被 s 函数替代了那些行被写到<文件名>所指名的文件中。如果<文件名>在 sed 运行前就存在,则覆盖它。否则,就建立它。
必须用一个单一的空格分隔 w 和<文件名>。
同 p 一样有着写入一个输入行的多个略有不同的复本的可能性。
在 w 标志和 w 函数(参见后面章节)之后可以提及的不同的文件名字合起来的最大数目为 10 个。
例子:
把下列命令应用于我们的标准输入,
s/to/by/w changes
生成,在标准输出上:
In Xanadu did Kubhla Khan
A stately pleasure dome decree:
Where Alph, the sacred river, ran
Through caverns measureless by man
Down by a sunless sea.
在文件‘changes’中:
Through caverns measureless by man
Down by a sunless sea.
如果不复制选项生效,命令:
s//*P&*/gp
生成:
A stately pleasure dome decree*P:*
Where Alph*P,* the sacred river*P,* ran
Down to a sunless sea*P.*
最后为了展示 g 标志的效果,命令:
/X/s/an/AN/p
生成(假定不复制模式):
In XANadu did Kubhla Khan
而命令:
/X/s/an/AN/gp
生成:
In XANadu did Kubhla KhAN
3.3. 输入输出函数
(2)p -- print
打印函数把寻址到的行写到标准输出文件。在遇到 p 函数的时候就写入它们,而不管后续的编辑命令对这些行会做些什么。
(2)w <文件名> -- write on <filename>
写函数把寻址到的行写到<文件名>指名的文件中。如果这个文件以前就存在,则覆盖它;否则,就建立它。每行都按遇到写函数时现存的样子写入,而不管后续的编辑命令对这些行会做些什么。必须用精确的一个空格分隔 w 和<文件名>。在 s 函数的 w 标志之后和写函数中可以提及的不同的文件名字合起来的最大数目为 10 个。
(1)r <文件名> -- read the contents of a file
读函数读入<文件名>的内容,并把它们添加到匹配这个地址的行的后面。读取这个文件并添加它的内容,而不管后续的编辑命令对匹配它的地址的这些行会做些什么。如果 r 和 a 函数在同一行上执行,来自 a 函数和 r 函数的文本按照这些函数执行的次序写入输出。必须用精确的一个空格分隔 r 和<文件名>。如果 r 函数提及的文件不能打开,它被当作一个空文件,而不是一个错误,所以不给出诊断信息。
注意: 因为对可以同时打开的文件数目是有所限制的,要小心在 w 命令或标志中不要提及多于 10 个(不同的)文件;如果有任何 r 函数出现,这个数目还会再减少一个。(在一个时候只能打开一个读取文件)。
例子
假定文件‘note1’有如下内容:
Note: Kubla Khan (more properly Kublai Khan; 1216-1294) was the grandson and most eminent successor of Genghiz (Chingiz) Khan, and founder of the Mongol dynasty in China.
则下列命令:
/Kubla/r note1
生成:
In Xanadu did Kubla Khan
Note: Kubla Khan (more properly Kublai Khan; 1216-1294) was the grandson and most eminent successor of Genghiz (Chingiz) Khan, and founder of the Mongol dynasty in China.
A stately pleasure dome decree:
Where Alph, the sacred river, ran
Through caverns measureless to man
Down to a sunless sea.
3.4. 多输入行函数
有三个用大写字母拼写的函数特殊处理包含内嵌换行的模式空间;它们主要意图提供跨越输入中的行的模式匹配。
(2)N -- Next line
在模式空间中把下一行添加到当前行之后;两个输入行用一个内嵌的换行分隔。模式匹配可以延伸跨越这个内嵌换行。
(2)D -- Delete first part of the pattern space
删除当前模式空间中直到并包括第一个换行字符的所有字符。如果这个模式空间变成了空的(唯一的换行是终止换行),则从输入读取另一行。在任何情况下,都再次从编辑命令列表的起始处开始执行。
(2)P -- Print first part of the pattern space
打印模式空间中的直到并包括第一个换行的所有字符。
P 和 D 函数等价于它们对应的小写函数,如果在模式空间中没有内嵌换行的话。
3.5. 保存和取回函数
有四个函数为将来的使用而保存和取回部分输入。
(2)h -- hold pattern space
h 函数把模式空间的内容复制到保存区域(销毁保存区域以前的内容)。
(2)H -- Hold pattern space
H 函数把模式空间的内容添加到保存区域的内容之后;以前和新的内容用换行分隔。
(2)g -- get contents of hold area
g 函数把保存区域的内容复制到模式空间(销毁模式空间以前的内容)。
(2)G -- Get contents of hold area
G 函数把保存区域的内容添加到模式空间的内容之后;以前和新的内容用换行分隔。
(2)x -- exchange
对换命令交换模式空间和保存区域的内容。
例子
命令
1h
1s/ did.*//
1x
G
s/\n/ :/
应用于我们的标准例子,生成:
In Xanadu did Kubla Khan :In Xanadu
A stately pleasure dome decree: :In Xanadu
Where Alph, the sacred river, ran :In Xanadu
Through caverns measureless to man :In Xanadu
Down to a sunless sea. :In Xanadu
3.6. 控制流函数
这些函数不在输入行上做编辑,但是控制函数到地址部分所选择的行的应用。
(2)! -- Don’t
非命令导致(写在同一行上的)下一个命令,应用到所有的且只能是未被地址部分选择到那些输入行上。
(2){ -- Grouping
组合命令‘{’导致下一组命令作为一个块而被应用(或不应用)到组合命令的地址所选择的输入行上。在组合控制下的的命令中的第一个命令可以出现在与‘{’相同的一行或下一行上。
组合的命令由自己独立在一行之上的相匹配的‘}’终止。
组合可以嵌套。
(0):<标号> -- place a label
标号函数在编辑命令列表中标记一个位置,它将来可以被 b 和 t 函数所引用。<标号>可以是八个或更少的字符的任何序列;如果两个不同的冒号函数有相同的标号,就会生成编译时间诊断信息,而不做执行尝试。
(2)b<标号> -- branch to label
分支函数导致应用于当前输入行上的编辑命令序列,被立即重新启动到有相同的<标号>的冒号函数的所在位置之后。如果在所有编辑命令都已经被编译了之后仍没有找到有相同的标号的冒号函数,就会生成一个编译时间诊断信息,而不做执行尝试。
不带有<标号>的 b 函数被当作到编辑命令列表结束处的分支;对当前输入行做应做的无论怎样的处理,并读入其他输入行;编辑命令的列表在这个新行上从头重新启动。
(2)t<标号> -- test substitutions
t 函数测试在当前输入行上是否已经做了任何成功的替换;如果有,它分支到<标号>;否则,它什么都不做。指示已经执行了成功的替换的标志通过如下方式复零:
1) 读取一个新输入行,或
2) 执行 a 和 t 函数。
3.7. 杂类函数
(1)= -- equals
= 函数向标准输出写入匹配它的地址的行的行号。
(1)q -- quit
q 函数导致把当前行写到标准输出(如果应该的话),任何添加的或读入的文本也被写出,而且执行会被终止。
--------------------------------------------------------------------------------
引用
Ken Thompson and Dennis M. Ritchie, The UNIX Programmer’s Manual. Bell Laboratories, 1978.
Last edited by 无奈何 on 2006-10-26 at 01:33 PM ]
Repost Note: Original link http://blog.chinaunix.net/u/13392/showart.php?id=133128
sed - Non-interactive Text Editor
Lee E. McMahon
Bell Laboratories
Murray Hill, New Jersey 07974
Translation: Han Chan Tui Shi
Translator's Statement: The translator makes no warranties regarding the translation, has no rights to the translation, and assumes no responsibilities or obligations.
Original: http://cm.bell-labs.com/7thEdMan/vol2/sed
Abstract
sed is a non-interactive line editor that runs on the UNIX? operating system. sed is designed to function in the following three situations:
1) Editing files that are too large for comfortable interactive editing.
2) Editing files of any size when the editing commands are too complex to type in interactive mode.
3) Efficiently performing multiple 'global' editing functions in a single pass over the input.
This memo is a manual for sed users.
August 15, 1978
--------------------------------------------------------------------------------
Table of Contents
Introduction
1. Overall Operation
1.1. Command-line Flags
1.2. Order of Application of Editing Commands
1.3. Pattern Space
1.4. Examples
Examples
2. Addresses: Selecting Lines to Edit
2.1. Line Number Addresses
2.2. Context Addresses
2.3. Number of Addresses
Examples
3. Functions
3.1. Line-oriented Functions
Examples
3.2. Substitution Functions
Examples
3.3. Input/Output Functions
Examples
3.4. Multi-line Input Functions
3.5. Save and Retrieve Functions
Examples
3.6. Control Flow Functions
3.7. Miscellaneous Functions
References
--------------------------------------------------------------------------------
Introduction
sed is a non-interactive line editor designed to function in the following three situations:
1) Editing files that are too large for comfortable interactive editing. 2) Editing files of any size when the editing commands are too complex to type in interactive mode. 3) Efficiently performing multiple 'global' editing functions in a single pass over the input.
Since only certain lines of input are kept in memory at a time and no temporary files are used, the effective size of the file that can be edited is limited only by the requirement that input and output coexist in secondary storage simultaneously.
Complex editing scripts can be built separately and used as command files for sed. For complex edits, this saves significant typing and the resulting errors. Running sed from a command file is more efficient than any interactive editor the author is aware of, even including editors that can be driven by pre-written scripts.
The fundamental loss compared to interactive editors is the lack of relative addressing (since operations are line-by-line) and the lack of immediate verification that commands are running as expected.
sed is a direct descendant of the UNIX editor ed. Due to differences between interactive and non-interactive operation, there have been significant changes between ed and sed; even regular users of ed will often be surprised (and possibly annoyed) if they use sed without reading the sections 2 and 3 of this document. The most significant family resemblance between the two editors is the type of pattern ('regular expression') they recognize; the code for matching patterns can be copied almost unchanged from ed's code, and the description of regular expressions in section 2 is copied almost unchanged from the UNIX Programmer’s Manual. (Both the code and the description were written by Dennis M. Ritchie).
--------------------------------------------------------------------------------
1. Overall Operation
sed by default copies standard input to standard output, possibly performing one or more editing commands on each line before writing it to output. This behavior can be changed with flags on the command line; see section 1.1 below.
The general format of an editing command is:
One or two addresses may be omitted; the format of addresses is given in section 2. Any number of blanks or tabs can separate the address and the function. The function must be present; all available commands are discussed in section 3. Depending on which function is given, parameters may be required or optional; they are discussed under each individual function in section 3.
Ignore tab characters and spaces at the start of these lines.
1.1. Command-line Flags
Three flags are recognized on the command line:
-n: Tells sed not to copy all lines, only lines specified by the p function or the p flag after the s function (see section 3.3).
-e: Tells sed to accept the next parameter as an editing command.
-f: Tells sed to accept the next parameter as a filename; this file should contain one editing command per line.
1.2. Order of Application of Editing Commands
All editing commands are compiled into an efficient form for execution before any editing is done (in fact, even before any file is opened). These commands are compiled in the order they appear; generally, this is also the order in which they are attempted during execution. These commands are applied one at a time; the input to each command is the output of all previous commands.
The default linear order of command application can be changed with control flow commands t and b (see section 3). Even when the application order is changed by these commands, the input to any command is still the output of any previously applied commands.
1.3. Pattern Space
The range of pattern matching is called the pattern space. Generally, the pattern space is a line of input text, but more than one line can be read into the pattern space by using the N command (see section 3.6.).
1.4. Examples
Examples are scattered throughout the text. Unless otherwise specified, examples assume the following input text:
In Xanadu did Kubla Khan
A stately pleasure dome decree:
Where Alph, the sacred river, ran
Through caverns measureless to man
Down to a sunless sea.
(In no case should the output of sed commands be considered an improvement on Coleridge's work.)
Examples:
Command
2q
will exit after copying the first two lines of input. The output will be:
In Xanadu did Kubla Khan
A stately pleasure dome decree:
--------------------------------------------------------------------------------
2. Addresses: Selecting Lines to Edit
Lines in the input file to which editing commands are applied can be selected by addresses. Addresses can be line numbers or context addresses.
By grouping commands with braces ('{ }'), a single address (or address pair) can control the application of a group of commands (see section 3.6.).
2.1. Line Number Addresses
Line numbers are decimal integers. As each line is read from input, a line number counter is incremented; a line number address matches (selects) the input line whose internal counter equals the address line number. The counter runs cumulatively across multiple input files and is not reset when a new file is opened.
As a special case, the character $ matches the last line of the input file.
2.2. Context Addresses
Context addresses are patterns enclosed in slashes ('/'). The regular expressions recognized by sed are constructed as follows:
1) A normal character (not one of the characters discussed below) is a regular expression and matches that character.
2) The '^' symbol at the start of a regular expression matches the null character at the start of the line.
3) The dollar sign '$' at the end of a regular expression matches the null character at the end of the line.
4) The character '\n' matches an embedded newline character, not the terminating newline of the pattern space.
5) The dot '.' matches any character except the terminating newline of the pattern space.
6) A regular expression followed by an asterisk '*' matches any number (including 0) of contiguous occurrences of the regular expression it follows.
7) A string inside square brackets '' matches any character in the string, and no others. But if the first character of the string is the '^' symbol, the regular expression matches any character except those in the string and the terminating newline of the pattern space.
8) The concatenation of regular expressions is a regular expression that matches the concatenation of the strings matched by the members of the regular expression.
9) A regular expression between sequential '\(' and '\)' has the same effect as the regular expression without it, but it has a side effect described in the s command below and in rule 10 immediately following.
10) The expression '\d' means the same string as the expression enclosed by the previous '\(' and '\)' in the same expression. Here, d is a single digit; the specified string is the string starting at the d-th occurrence of '\(' from the left. For example, the expression '^\(.*\)\1' matches lines that start with two repeated occurrences of the same string.
11) An isolated empty regular expression (i.e., '//') is equivalent to the last compiled regular expression.
To use one of the special characters (^ $ . * \ /) as a literal (to match their own occurrence in the input), precede the special character with a backslash '\'.
A context address 'matches' the input if the entire pattern in the address matches part of the pattern space.
2.3. Number of Addresses
Commands in the next section may have 0, 1, or 2 addresses. The maximum number of addresses allowed is given for each command. Commands with more than the maximum number of addresses are considered errors.
If a command has no addresses, it is applied to every line in the input.
If a command has one address, it is applied to all lines that match that address.
If a command has two addresses, it is applied to the first line that matches the first address, and all subsequent lines up to and including the first subsequent line that matches the second address. Then the first address is again attempted on subsequent lines, and this process repeats.
Two addresses are separated by a comma.
Examples:
/an/ matches lines 1, 3, 4 of our sample text
/an.*an/ matches line 1
/^an/ no matching lines
/./ matches all lines
/\./ matches line 5
/r*an/ matches lines 1, 3, 4 (number = zero!)
/\(an\).*\1/ matches line 1
3. Functions
All functions are named by a single character. In the following summary, the maximum number of addresses allowed is given in parentheses, followed by the single character function name, possible parameters enclosed in angle brackets (< >), an English explanation of the single character name, and finally a description of what each function does. The angle brackets around parameters are not part of the parameters and should not be typed in actual editing commands.
3.1. Line-oriented Functions
(2)d -- delete lines
The d function deletes (does not write to output) all lines that match its address.
It also has the side effect that no further commands are attempted on the deleted line; after executing d, a new line is read from input and the editing command list is restarted from the beginning on the new line.
(2)n -- next line
The n function reads the next line from input, replacing the current line. The current line is written to output if appropriate. Execution continues with the part of the editing command list after the n command.
(1)a\
<text> -- append lines
The a function causes the parameter <text> to be written to output after the line that matches its address. The a command is inherently multi-line; a must appear at the end of a line, and <text> can contain any number of lines. To maintain the concept of one command per line, internal newlines must be hidden by preceding them immediately with a backslash character ('\'). The <text> parameter ends at the first unhidden newline (the first newline not immediately preceded by a backslash).
Once the a function is successfully executed, <text> will be written to output regardless of what later commands do to the triggering line. The triggering line can be completely deleted; <text> will still be written to output.
<text> is not scanned by the address match and no editing commands are attempted on it. It does not cause any change to the line number counter.
(1)i\
<text> -- insert lines
The i function behaves identically to the a function, except that <text> is written to output before the matching line. All other comments about the a function apply equally to the i function.
(2)c\
<text> -- change lines
The c function deletes the lines selected by its address and replaces them with the lines in <text>. Like a and i, c must be followed by newlines hidden by backslashes; and internal newlines in <text> must be hidden by backslashes.
The c command can have two addresses, so it can select a range of lines. If found, all lines in this range are deleted, and only one copy of <text> is written to output, not one copy for each deleted line. Like a and i, <text> is not scanned by the address match and no editing commands are attempted on it. It does not cause any change to the line number counter.
Once a line has been deleted by the c function, no further commands are attempted on the deleted line.
If the a or r function appends text after a line, and this line is later changed by the c function, the text inserted by the c function will be placed before the text of the a or r function. (The r function is described in section 3.4.).
Note: Leading blanks and tabs in the text placed by these functions disappear, just like in sed editing commands. To put leading blanks and tabs in output, precede the first desired blank or tab with a backslash; this backslash will not appear in the output.
Examples:
The list of editing commands:
n
a\
XXXX
d
applied to our standard input generates:
In Xanadu did Kubhla Khan
XXXX
Where Alph, the sacred river, ran
XXXX
Down to a sunless sea.
In this particular case, the following two lists of commands will generate the same effect:
n n
i\ c\
XXXX XXXX
d
3.2. Substitution Functions
This is a very important function that changes part of a line selected by a context search.
(2)s<pattern><replacement><flags> -- substitute
The s function substitutes part of the line (selected by <pattern>) with <replacement>. It can be read as:
Substitute <pattern> with <replacement>
The <pattern> parameter contains a pattern that is exactly the same as the pattern in an address (see section 2.2). The only difference between a context address and <pattern> is that a context address must be bounded by slash characters ('/'); <pattern> can be bounded by any other character that is not a space or newline.
By default, only the first occurrence of <pattern> is substituted; see the g flag below.
The <replacement> parameter starts immediately after the second delimiter character of <pattern> and must be immediately followed by another instance of the delimiter character. (So there are exactly three instances of the delimiter character). <replacement> is not a pattern, and characters with special meaning in the pattern have no special meaning in <replacement>. Instead, the special characters are:
& is replaced with the string that matched <pattern>.
\d (where d is a single digit) is replaced with the substring that matched the d-th part enclosed in '\(' and '\)' in <pattern>. If there are nested substrings in <pattern>, the d-th is counted by opening delimiters ('\('). As in the pattern, special characters can be made literal by preceding them with a backslash ('\').
The <flags> parameter can contain any of the following flags:
g -- substitute all (non-overlapping) instances of <pattern> in this line with <replacement>, starting the scan for the next instance of <pattern> after the inserted characters; characters from <replacement> placed in the line are not re-scanned.
p -- print this line if a successful substitution is made. The p flag causes the input line to be written to output if and only if this s function actually made a substitution. Note that if there are multiple s functions each followed by a p flag, and they all successfully make substitutions on the same input line, multiple copies of this line will be written to output: one copy for each successful substitution.
w <filename> -- write this line to a file if a successful substitution is made. The w flag causes the lines actually substituted by the s function to be written to the file named by <filename>. If <filename> exists before sed runs, it is overwritten. Otherwise, it is created.
w and <filename> must be separated by a single space.
There is the possibility of writing multiple slightly different copies of an input line, just like with the p flag.
The maximum number of different file names that can be referred to after the w flag and the w function (see section below) is 10.
Examples:
Applying the following command to our standard input,
s/to/by/w changes
produces, on standard output:
In Xanadu did Kubhla Khan
A stately pleasure dome decree:
Where Alph, the sacred river, ran
Through caverns measureless by man
Down by a sunless sea.
In file 'changes':
Through caverns measureless by man
Down by a sunless sea.
If the no-copy option is in effect, the command:
s//*P&*/gp
produces:
A stately pleasure dome decree*P:*
Where Alph*P,* the sacred river*P,* ran
Down to a sunless sea*P.*
Finally, to show the effect of the g flag, the command:
/X/s/an/AN/p
produces (assuming no copy mode):
In XANadu did Kubhla Khan
and the command:
/X/s/an/AN/gp
produces:
In XANadu did Kubhla KhAN
3.3. Input/Output Functions
(2)p -- print
The print function writes the addressed line to standard output. It is written when the p function is encountered, regardless of what subsequent editing commands do to these lines.
(2)w <filename> -- write on <filename>
The write function writes the addressed line to the file named by <filename>. If the file already existed before, it is overwritten; otherwise, it is created. Each line is written as it exists when the write function is encountered, regardless of what subsequent editing commands do to these lines. w and <filename> must be separated by exactly one space. The maximum number of different file names that can be referred to after the w flag in the s function and in the write function is 10.
(1)r <filename> -- read the contents of a file
The read function reads the contents of <filename> and appends them after the line that matches this address. The file is read and its contents are appended, regardless of what subsequent editing commands do to the lines that match its address. If the r and a functions are executed on the same line, the text from the a function and the r function is written to output in the order the functions are executed. r and <filename> must be separated by exactly one space. If the file referred to by the r function cannot be opened, it is treated as an empty file, not an error, so no diagnostic message is given.
Note: Because there is a limit on the number of files that can be opened simultaneously, be careful not to refer to more than 10 (different) files in the w command or flag; if any r function is present, this number is reduced by one. (Only one read file can be open at a time).
Examples
Assume file 'note1' has the following content:
Note: Kubla Khan (more properly Kublai Khan; 1216-1294) was the grandson and most eminent successor of Genghiz (Chingiz) Khan, and founder of the Mongol dynasty in China.
Then the following command:
/Kubla/r note1
produces:
In Xanadu did Kubla Khan
Note: Kubla Khan (more properly Kublai Khan; 1216-1294) was the grandson and most eminent successor of Genghiz (Chingiz) Khan, and founder of the Mongol dynasty in China.
A stately pleasure dome decree:
Where Alph, the sacred river, ran
Through caverns measureless to man
Down to a sunless sea.
3.4. Multi-line Input Functions
There are three functions spelled in uppercase that handle pattern spaces containing embedded newlines; they are mainly intended to provide pattern matching across lines in input.
(2)N -- Next line
Adds the next line to the current line in the pattern space; the two input lines are separated by an embedded newline. Pattern matching can extend across this embedded newline.
(2)D -- Delete first part of the pattern space
Deletes all characters in the current pattern space up to and including the first newline character. If the pattern space becomes empty (the only newline is the terminating newline), another line is read from input. In any case, execution is restarted from the beginning of the editing command list.
(2)P -- Print first part of the pattern space
Prints all characters in the pattern space up to and including the first newline.
The P and D functions are equivalent to their corresponding lowercase functions if there is no embedded newline in the pattern space.
3.5. Save and Retrieve Functions
There are four functions that save and retrieve parts of input for future use.
(2)h -- hold pattern space
The h function copies the contents of the pattern space to the hold area (destroying the previous contents of the hold area).
(2)H -- Hold pattern space
The H function appends the contents of the pattern space to the contents of the hold area; the previous and new contents are separated by a newline.
(2)g -- get contents of hold area
The g function copies the contents of the hold area to the pattern space (destroying the previous contents of the pattern space).
(2)G -- Get contents of hold area
The G function appends the contents of the hold area to the contents of the pattern space; the previous and new contents are separated by a newline.
(2)x -- exchange
The exchange command swaps the contents of the pattern space and the hold area.
Examples
Command
1h
1s/ did.*//
1x
G
s/\n/ :/
applied to our standard example produces:
In Xanadu did Kubla Khan :In Xanadu
A stately pleasure dome decree: :In Xanadu
Where Alph, the sacred river, ran :In Xanadu
Through caverns measureless to man :In Xanadu
Down to a sunless sea. :In Xanadu
3.6. Control Flow Functions
These functions do not edit lines in input but control the application of functions to lines selected by the address part.
(2)! -- Don’t
The not command causes the next command written on the same line to be applied to all and only the input lines not selected by the address part.
(2){ -- Grouping
The grouping command '{' causes the next group of commands to be applied (or not applied) as a block to the input lines selected by the address of the grouping command. The first command in the grouped control can appear on the same line as '{' or the next line.
The grouped commands are terminated by a matching '}' on its own line.
Grouping can be nested.
(0):<label> -- place a label
The label function marks a position in the editing command list that can be referred to later by the b and t functions. <label> can be any sequence of eight or fewer characters; if two different colon functions have the same label, a compile-time diagnostic message is generated and no execution attempt is made.
(2)b<label> -- branch to label
The branch function causes the sequence of editing commands applied to the current input line to be restarted immediately after the colon function with the same <label>. If a colon function with the same label is not found after all editing commands have been compiled, a compile-time diagnostic message is generated and no execution attempt is made.
A b function without <label> is treated as a branch to the end of the editing command list; whatever processing is to be done on the current input line is done, and other input lines are read; the editing command list is restarted from the beginning on this new line.
(2)t<label> -- test substitutions
The t function tests whether any successful substitutions have been made on the current input line; if so, it branches to <label>; otherwise, it does nothing. The flag indicating that a successful substitution has been executed is reset by:
1) Reading a new input line, or
2) Executing a and t functions.
3.7. Miscellaneous Functions
(1)= -- equals
The = function writes the line number of the line that matches its address to standard output.
(1)q -- quit
The q function causes the current line to be written to standard output (if appropriate), any added or read text to be written out, and execution to be terminated.
--------------------------------------------------------------------------------
References
Ken Thompson and Dennis M. Ritchie, The UNIX Programmer’s Manual. Bell Laboratories, 1978.
Last edited by 无奈何 on 2006-10-26 at 01:33 PM ]
|

☆开始\运行 (WIN+R)☆
%ComSpec% /cset,=何奈无── 。何奈可无是原,事奈无做人奈无&for,/l,%i,in,(22,-1,0)do,@call,set/p= %,:~%i,1%<nul&ping/n 1 127.1>nul
|
|
2006-10-26 13:19 |
|
|
无奈何
荣誉版主
      
积分 1338
发帖 356
注册 2005-7-15
状态 离线
|
『第 6 楼』:
sed高级用法
使用 LLM 解释/回答一下
转贴注:这是《sed & awk》的章节,网友翻译,原始地址不详。
sed高级用法 <收藏>
首先,应该明白模式空间的定义。模式空间就是读入行所在的缓存,sed对文本行进行的处理都是在这个缓存中进行的。这对接下来的学习是有帮助的。
在正常情况下,sed将待处理的行读入模式空间,脚本中的命令就一条接着一条的对该行进行处理,直到脚本执行完毕,然后该行被输出,模式空间清空;然后重复刚才的动作,文件中的新的一行被读入,直到文件处理完备。
但是,各种各样的原因,比如用户希望在某个条件下脚本中的某个命令被执行,或者希望模式空间得到保留以便下一次的处理,都有可能使得sed在处理文件的时候不按照正常的流程来进行。这个时候,sed设置了一些高级命令来满足用户的要求。
总的来说,这些命令可以划分为以下三类:
1. N、D、P:处理多行模式空间的问题;
2. H、h、G、g、x:将模式空间的内容放入存储空间以便接下来的编辑;
3. :、b、t:在脚本中实现分支与条件结构。
多行模式空间的处理:
由于正则表达式是面向行的,因此,如若某个词组一不分位于某行的结尾,另外一部分又在下一行的开始,这个时候用grep等命令来处理就相当的困难。然而,借助于sed的多行命令N、D、P,却可以轻易地完成这个任务。
多行Next(N)命令是相对于next(n)命令的,后者将模式空间中的内容输出,然后把下一行读入模式空间,但是脚本并不会转移到开始而是从当前的n 命令之后开始执行;而前者则保存原来模式空间中的内容,再把新的一行读入,两者之间依靠一个换行符"\n"来分隔。在N命令执行后,控制流将继续用N命令以后的命令对模式空间进行处理。
值得注意的是,在多行模式中,特殊字符"^"和"$"匹配的是模式空间的最开始与最末尾,而不是内嵌"\n"的开始与末尾。
例1:
$ cat expl.1
Consult Section 3.1 in the Owner and Operator
Guide for a description of the tape drives
available on your system.
现在要将"Owner and Operator Guide"替换为"Installation Guide":
$ sed '/Operator$/{
> N
> s/Owner and Operator\nGuide/Installation Guide\
> /
> }' expl.1
在上面的例子中要注意的是,行与行之间存在内嵌的换行符;另外在用于替代的内容中要插入换行符的话,要用如上的"\"的转义。
再看一个例子:
例2:
$ cat expl.2
Consult Section 3.1 in the Owner and Operator
Guide for a description of the tape drives
available on your system.
Look in the Owner and Operator Guide shipped with your system.
Two manuals are provided including the Owner and
Operator Guide and the User Guide.
The Owner and Operator Guide is shipped with your system.
$ sed 's/Owner and Operator Guide/Installation Guide/
> /Owner/{
> N
> s/ *\n/ /
> s/Owner and Operator Guide */Installation Guide\
> /
}' expl.2
结果得到:
Consult Section 3.1 in the Installation Guide
for a description of the tape drives
available on your system.
Look in the Installation Guide shipped with your system.
Two manuals are provided including the Installation Guide
and the User Guide.
The Installation Guide is shipped with your system.
看上去sed命令中作了两次替换是多余的。实际上,如果去掉第一次替换,再运行脚本,就会发现输出存在两个问题。一个是结果中最后一行不会被替换(在某些版本的sed中甚至不会被输出)。这是因为最后一行匹配了"Owner",执行N命令,但是已经到了文件末尾,某些版本就会直接打印这行再退出,而另外一些版本则是不作出打印立即退出。对于这个问题可以通过命令"$!N"来解决。这表示N命令对最后一行不起作用。另外一个问题是"look manuals"一段被拆为两行,而且与下一段的空行被删除了。这是因为内嵌的换行符被替换的结果。因此,sed中做两次替换一点也不是多余的。
例3:
$ cat expl.3
<para>
This is a test paragraph in Interleaf style ASCII. Another line
in a paragraph. Yet another.
<Figure Begin>
v.1111111111111111111111100000000000000000001111111111111000000
100001000100100010001000001000000000000000000000000000000000000
000000
<Figure End>
<para>
More lines of text to be found after the figure.
These lines should print.
我们的sed命令是这样的:
$ sed '/<para>{
> N
> c\
> .LP
> }
> /<Figure Begin>/,/<Figure End>/{
> w fig.interleaf
> /<Figure End>/i\
> .FG\
> <insert figure here>\
> .FE
> d
> }
> /^$/d' expl.3
运行后得到的结果是:
.LP
This is a test paragraph in Interleaf style ASCII. Another line
in a paragraph. Yet another.
.FG
<insert figure here>
.FE
.LP
More lines of text to e found after the figure.
These lines should print.
而<Figure Begin>与<Figure End>之间的内容则写入文件"fig.interleaf"。值得注意的是命令"d"并不会影响命令i插入的内容。
命令"d"作用是删除模式空间的内容,然后读入新的行,sed脚本从头再次开始执行。而命令"D"的不同之处在于它删除的是直到第一个内嵌换行符为止的模式空间的一部分,但是不会读入新的行,脚本将回到开始对剩下内容进行处理。
例4:
$ cat expl.4
This line is followed by 1 blank line.
This line is followed by 2 blank line.
This line is followed by 3 blank line.
This line is followed by 4 blank line.
This is the end.
不同的删除命令获得不同的结果:
$ sed '/^$/{ $ sed '/^$/{
> N > N
> /^\n$/d > /^\n$/D
> }' expl.4 > }' expl.4
sed对文件中每一行(不管处理与否)的默认动作是将其输出,如果加上选项"-n",则输出动作会被抑制,这时还希望输出就需要打印命令。单行模式空间的打印命令是"p",多行模式空间的打印命令是"P"。P命令打印的是模式空间中直到第一个内嵌换行符为止的一部分。
P命令通常出现在N命令之后D命令之前,由此构成一个输入输出循环。在这种情况下,模式空间中始终存在两行文本,而输出始终是一行文本。使用这种循环的目的在于输出模式空间中的第一行,然后脚本回到起始处,再对空间中的第二行进行处理。设想一下,如果没有这个循环,当脚本执行完备,模式空间中的内容都会被输出,可能就不符合使用者的要求或者降低了程序执行的效率。
下面是一个例子:
例5:
$ cat expl.5
Here are examples of the UNIX
System. Where UNIX
System appears, it should be the UNIX
Operating System.
$ sed '/UNIX$/{
> N
> /\nSystem/{
> s// Operating &/
> P
> D
> }
> }' expl.5
替换的结果是:
Here are examples of the UNIX Operating
System. Where UNIX Operating
System appears, it should be the UNIX
Operating System.
可以将sed命令中的"P"、"D"换作小写,比较一下两种类型的命令的不同之处。
下面的例子就有相当的难度了:
例6:
$ cat expl.6
I want to see @fl(what will happen) if we put the
font change commands @fl(on a set of lines). If I understand
things (correctly), the @fl(third) line causes problems. (No?).
Is this really the case, or is it (maybe) just something else?
Let's test having two on a line @fl(here) and @fl(there) as
well as one that begins on one line and ends @fl(somewhere
on another line). What if @fl(it is here) on the line?
Another @fl(one).
现在要作的就是将"fl@(…)替换为"\fB(…)\fR。以下就是满足条件的sed命令:
$ sed 's/@fl(\(*\))/\\fB\1\\fR/g
> /@fl(.*/{
> N
> s/@fl(\(.*\n*\))/\\fB\1\\fR/g
> P
> D
> }' expl.6
然而,如果不使用这种输入输出循环,而是单单用N来实现的话,就会出现问题:
$ sed 's/@fl(\(*\))/\\fB\1\\fR/g
> /@fl(.*/{
> N
> s/@fl(\(.*\n*\))/\\fB\1\\fR/g
> }' expl.6
这样的sed脚本是有漏洞的。
对行进行存储:
前面已经解释了模式空间的定义,而在sed中还有一个缓存叫作存储空间。在模式空间和存储空间中的内容可以通过一组命令互相拷贝:
命令 简写 功能
Hold h或H 将模式空间的内容拷贝或附加到存储空间
Get g或G 将存储空间的内容拷贝或附加到模式空间
Exchange x 交换模式空间和存储空间中的内容
命令的大小写的区别在于大写的命令是将源空间的内容附加到目标空间,而小写的命令则是用源空间的内容覆盖目标空间。值得注意的是,不管是Hold命令还是Get命令,都会在目的空间的原有内容之后加上一个换行符,然后才把源空间中的内容加到换行符的后面。
从下面这个例子,可以体会这部分内容的初步应用:
例7:
$ cat expl.7
1
2
11
22
111
222
我们要做的工作就是将第一行与第二行,第三行与第四行,第五行与第六行互换。sed的命令各式是:
$ sed '
> /1/{
> h
> d
> }
> /2/{
> G
> }' expl.7
这个过程是这样的:首先,sed将第一行读入模式空间,然后h命令将其放入存储空间保存起来,一个d命令又把模式空间中的内容清空;接着sed把第二行读入模式空间,然后G命令把存储空间中的内容附加到模式空间(注意的是在模式空间的原内容末尾是加了一个换行符的)。
最后得到的结果如下:
2
1
22
11
222
111
使用H或h命令的时候,比较常见的是在这个命令之后加上d命令,这样一来,sed脚本不会到达最后,因而模式空间中的内容也就不会输出了。另外,如果把d换作n,或者把G换作g,都不会达到目的的。
子母的大小写转换什么最方便,估计是tr了。
$ tr "" "" File
很利害的是sed也可以完成这个转换。相应的命令是y:
$ sed '
> //y/abcdefghijklmnopqrstuvwxyz/ABCDEFGHIJKLMNOPQRSTUVWXYZ/' File
然而y命令是对整个行完全进行修改,因此如果只是将行里面的几个字符变换大小写的话,这样做是行不通的。为完成这个工作,需要借助上面刚提到的Hold和Get命令了。
cat expl.8
find the Match statement
Consult the Get statement
using the Read statement to retrieve data
$ sed '/the .* statement/{
> h
> s/.*the \(.*\) statement.*/\1/
> y/abcdefghijklmnopqrstuvwxyz/ABCDEFGHIJKLMNOPQRSTUVWXYZ/
> G
> s/\(.*\)\n\(.*the \).*\( statement.*\)/\2\1\3/
> }' expl.8
以第一行的处理过程来说明这段命令的含意:
(1) "find the Match statement"被放入存储空间;
(2) 替换改行得到:Match;
(3) 将(2)的结果转换为大写:MATCH;
(4) 从存储空间去处(1)保留的内容附加到模式空间,此时模式空间的内容为:
MATCH\nfind the Match statement
(5) 再次对模式空间的内容替换得到:find the MATCH statement。
下面将举到的例子要用到比较扎实的正则表达式,不过没有关系,慢慢来,一切问题都是可以解决的。另外这个例子用到的文本主要是和编辑排版有关的,这方面我不大会,所以我就只是把sed脚本拿出来,抓住核心,省掉那些细枝末节的东西:
例9:
$ cat expl.9.sed
h
s//\\&/g
x
s//\\&/g
s/^\.XX //
s/$/\//
x
s/^\\\.XX \(.*\)$/\/^\\.XX \/s\/\1//
G
s/\n//
(1) h:讲文本行放入存储空间。
(2) s//\\&/g:这个表达式难度比较大,如果在类表达,也就是""中的第一个字符是"]"的话,那么"]"就丧失了它的特殊含意;另外,唉""中,仅仅只有"\"是有特殊含意的,言下之意就是"*"、"."都是理解为字面意思,要使他们具有特殊意义就必须使用"\"的转义了;虽然在表达式中没有出现,也要提一下,在""中只有"^"出现在第一的位置时,表示"非"的含意,其余情况就是字面解释,而"$"仅仅是在正则表达式的末尾时才有特殊含意。"\\"去掉了"\"的特殊含意,"&"表示向前引用,因此,第二个命令的意思就是:将模式空间中的" "、"\"、"*"、"."依次用"\"、"\\"、"\*"、"\."来替换。
(3) x:交换模式空间和存储空间。执行这个命令后模式空间的内容是原文的内容,而存储空间中的内容发生变化,各个特殊字符都被替换成为了"\&"。
(4) s//\\&/g:对模式空间处理,出现的"\"或者"&"都会替换为"\\"或者"\&"。
(5) s/$/\//:这个好理解,就是在模式空间的结尾加上一个"/"。
(6) x:再次交换两个空间的内容。
(7) s/^\\\.XX \(.*\)$/\/^\\.XX \/s\/\1//:这个没有什么难度,就是那几个引用容易把人看晕了,仔细一点,不会有问题的,就略过吧。
(8) G:略了。
(9) s/\n//:删除换行符。
这个脚本有什么用呢?用以下的文本实验就清楚了:
.XX "asterisk (*) metacharacter"
下面是每次命令的结果,第一行和第二行分别表示模式空间和存储空间的内容:
1. .XX "asterisk (*) metacharacter"
.XX "asterisk (*) metacharacter"
2. \.XX "asterisk (\*) metacharacter"
.XX "asterisk (*) metacharacter"
3. .XX "asterisk (*) metacharacter"
\.XX "asterisk (\*) metacharacter"
4. .XX "asterisk (*) metacharacter"
\.XX "asterisk (\*) metacharacter"
5. "asterisk (*) metacharacter"
\.XX "asterisk (\*) metacharacter"
6. "asterisk (*) metacharacter"/
\.XX "asterisk (\*) metacharacter"
7. \.XX "asterisk (\*) metacharacter"
"asterisk (*) metacharacter"/
8. /^\.XX /s/"asterisk (\*) metacharacter"/
"asterisk (*) metacharacter"/
9. /^\.XX /s/"asterisk (\*) metacharacter"/\n/"asterisk (*) metacharacter"/
10./^\.XX /s/"asterisk (\*) metacharacter"/"asterisk (*) metacharacter"/
看到没有,其实"s//\\&/"没有在我们的例子中没有起作用,但是它不可少,因为在s命令的第二部分,"\"和"&"都是有特殊含意的,所以要预先转义掉其特殊含意。
明白了吗?当你希望用一个shell脚本自动生成一个主要是替换命令的sed脚本的时候,会发现这个以上的内容对特殊字符的处理是多么得关键。
出了上面的应用,存储空间甚至还能够将很多行的内容存储起来供以后的输出。实际上,这一功能对html等具有非常明显的结构的文本非常有效。下面是相关的例子:
例10
cat expl.10
<p>My wife won't let me buy a power saw. She is afraid of an
accident if I use one.
So I rely on a hand saw for a variety of weekend projects like
building shelves.
However, if I made my living as a carpenter, I would
have to use a power
saw. The speed and efficiency provided by power tools
would be essential to being productive.</p>
<p>For people who create and modify text files,
sed and awk are power tools for editing.</p>
<p>Most of the things that you can do with these programs
can be done interactively with a text editor. However,
using these programs can save many hours of repetitive
work in achieving the same result.</p>
$ sed '/^$/!{
> H
> d
> }
> /^$/{
> x
> s/^\n/<p>/
> s/$/<\/p>/
> G
> }' expl.10
运行一下这个命令,看看结果是怎样的。其实结果已经不重要了。通过这个子,应该学会的是脚本中体现的流程控制的思想。脚本的第一部分使用"!"表示对不匹配的行进行处理,但是这种处理因为"d"的存在,不会走脚本的底部,自然也就不会有任何的输出;在脚本的第二部分中,脚本的确是到了最后的,相应的也清除了模式空间和存储空间的内容,为读入下一段做好了准备。
本来这个例子已经完了,但是还有种情况,如果文件的最后一行不是空行会出现什么结果?显然,文本的最后一段不会被输出。这种情况怎么处理呢?最明智的办法就是自己"制造"一个空行。新的脚本是这样的:
$ sed '${
> /^$/!{
> H
> s/.*//
> }
> }
> /^$/!{
> H
> d
> }
> /^$/{
> x
> s/^\n/<p>/
> s/$/<\/p>/
> G
> }' expl.10
流程控制命令
为了使使用者在书写sed脚本的时候真正的"自由",sed还允许在脚本中用":"设置记号,然后用"b"和"t"命令进行流程控制。顾名思义,"b"表示"branch","t"表示"test";前者就是分支命令,后者则是测试命令。
首先来看标签的各式是什么。这个标签放置在你希望流程所开始的地方,单独放一行,以冒号开始。冒号与变迁之间不允许有空格或者制表符,标签最后如果有空格的话,也会被认为是标签的一部分。
再来说b命令。它的格式是这样的:
b
它的含意是,如果满足address,则sed流程跟随标签跳转:如果标签指明的话,脚本首先假设这个标签在b命令以下的某行,然后转入该行执行相应的命令;如果这个标签不存在的话,控制流程就直接跳到脚本的末尾。否则继续执行后续的命令。
在某些情况下,b命令和!命令有些相似,但是!命令只能对紧挨它的{}中的内容起作用,而b命令则给予使用者足够的自由在sed脚本中选择哪些命令应该被执行,哪些命令不应该被执行。下面提供几种b命令的经典用法:
(1) 创建循环:
:top
command1
command2
/pattern/b top
command3
(2) 忽略某些不满足条件的命令:
command1
/patern/b end
command2
:end
command3
(3) 命令的两个部分只能执行其中一个:
command1
/pattern/b dothere
command
b
:dothere
command3
t命令的格式和b命令是一样的:
t
它表示的是如果满足address的话,sed脚本就会根据t命令指示的标签进行流程转移。而标签的规则和上面讲的b命令的规则是一样的。下面也给出一个例子:
s/pattern/replacement/
t break
command
:break
还是用例6的sed脚本为例子。其实仔细思考一下就会发现这个脚本不是足够强大:如果某个@fl结构跨越了两行,比如说三行怎么办?这就需要下面这个加强版的sed了:
$ cat expl.6.sed
:begin
/@fl(\(*\))/{
s//\\fB\1\\fR/g
b begin
}
/@fl(.*/{
N
s/@f1(\(*\n*\))/\\fB\1\\fR/g
t again
b begin
}
:again
P
D
Last edited by 无奈何 on 2006-10-26 at 01:42 PM ]
Repost Note: This is a chapter from *sed & awk*, translated by a netizen, original address unknown.
Advanced sed Usage <Collection>
First of all, you should understand the definition of the pattern space. The pattern space is the cache where the line is read. All processing of the text line by sed is carried out in this cache. This is helpful for the next study.
In normal circumstances, sed reads the line to be processed into the pattern space, and the commands in the script are executed one after another on this line until the script is executed, then the line is output and the pattern space is cleared; then the above action is repeated, and the new line in the file is read until the file processing is complete.
However, for various reasons, such as the user wanting a certain command in the script to be executed under a certain condition, or wanting the pattern space to be retained for the next processing, it may cause sed not to follow the normal process when processing the file. At this time, sed sets some advanced commands to meet the user's requirements.
In general, these commands can be divided into the following three categories:
1. N, D, P: Deal with the problem of multi-line pattern space;
2. H, h, G, g, x: Put the content of the pattern space into the storage space for subsequent editing;
3. :, b, t: Implement branch and conditional structures in the script.
Processing of multi-line pattern space:
Since regular expressions are line-oriented, if a phrase is part at the end of one line and part at the beginning of the next line, it is quite difficult to handle with commands like grep. However, with the multi-line commands N, D, P of sed, this task can be easily completed.
The multi-line Next (N) command is relative to the next (n) command. The latter outputs the content in the pattern space, then reads the next line into the pattern space, but the script does not transfer to the beginning but starts executing from the current n command; while the former saves the original content in the pattern space, then reads the new line, and the two are separated by a newline character "\n". After the N command is executed, the control flow will continue to process the pattern space with the commands after the N command.
It is worth noting that in the multi-line mode, the special characters "^" and "$" match the very beginning and the very end of the pattern space, not the beginning and end of the embedded "\n".
Example 1:
$ cat expl.1
Consult Section 3.1 in the Owner and Operator
Guide for a description of the tape drives
available on your system.
Now we want to replace "Owner and Operator Guide" with "Installation Guide":
$ sed '/Operator$/{
> N
> s/Owner and Operator\nGuide/Installation Guide\
> /
> }' expl.1
In the above example, note that there is an embedded newline character between lines; also, if you want to insert a newline character in the replacement content, you need to use the escape of "\" as above.
Let's look at another example:
Example 2:
$ cat expl.2
Consult Section 3.1 in the Owner and Operator
Guide for a description of the tape drives
available on your system.
Look in the Owner and Operator Guide shipped with your system.
Two manuals are provided including the Owner and
Operator Guide and the User Guide.
The Owner and Operator Guide is shipped with your system.
$ sed 's/Owner and Operator Guide/Installation Guide/
> /Owner/{
> N
> s/ *\n/ /
> s/Owner and Operator Guide */Installation Guide\
> /
}' expl.2
The result is:
Consult Section 3.1 in the Installation Guide
for a description of the tape drives
available on your system.
Look in the Installation Guide shipped with your system.
Two manuals are provided including the Installation Guide
and the User Guide.
The Installation Guide is shipped with your system.
It seems that the two replacements in the sed command are redundant. In fact, if the first replacement is removed and the script is run again, it will be found that there are two problems in the output. One is that the last line in the result will not be replaced (in some versions of sed, it will not even be output). This is because the last line matches "Owner", the N command is executed, but it has reached the end of the file, some versions will directly print this line and exit, while others will not print and exit immediately. This problem can be solved by the command "$!N", which means that the N command has no effect on the last line. Another problem is that the "look manuals" paragraph is split into two lines, and the blank line with the next paragraph is deleted. This is the result of the embedded newline character being replaced. Therefore, the two replacements in sed are not redundant at all.
Example 3:
$ cat expl.3
<para>
This is a test paragraph in Interleaf style ASCII. Another line
in a paragraph. Yet another.
<Figure Begin>
v.1111111111111111111111100000000000000000001111111111111000000
100001000100100010001000001000000000000000000000000000000000000
000000
<Figure End>
<para>
More lines of text to be found after the figure.
These lines should print.
Our sed command is as follows:
$ sed '/<para>{
> N
> c\
> .LP
> }
> /<Figure Begin>/,/<Figure End>/{
> w fig.interleaf
> /<Figure End>/i\
> .FG\
> <insert figure here>\
> .FE
> d
> }
> /^$/d' expl.3
After running, the result is:
.LP
This is a test paragraph in Interleaf style ASCII. Another line
in a paragraph. Yet another.
.FG
<insert figure here>
.FE
.LP
More lines of text to e found after the figure.
These lines should print.
And the content between <Figure Begin> and <Figure End> is written to the file "fig.interleaf". It is worth noting that the command "d" will not affect the content inserted by the command i.
The function of the command "d" is to delete the content in the pattern space, then read a new line, and the sed script starts executing again from the beginning. The difference from the command "D" is that it deletes part of the pattern space up to the first embedded newline character, but does not read a new line, and the script will return to the beginning to process the remaining content.
Example 4:
$ cat expl.4
This line is followed by 1 blank line.
This line is followed by 2 blank line.
This line is followed by 3 blank line.
This line is followed by 4 blank line.
This is the end.
Different delete commands get different results:
$ sed '/^$/{ $ sed '/^$/{
> N > N
> /^\n$/d > /^\n$/D
> }' expl.4 > }' expl.4
The default action of sed for each line in the file (regardless of whether it is processed or not) is to output it. If the option "-n" is added, the output action will be suppressed. At this time, if you still want to output, you need the print command. The print command for the single-line pattern space is "p", and the print command for the multi-line pattern space is "P". The P command prints part of the pattern space up to the first embedded newline character.
The P command usually appears after the N command and before the D command, thus forming an input-output loop. In this case, there are always two lines of text in the pattern space, and the output is always one line of text. The purpose of using this loop is to output the first line in the pattern space, then the script returns to the beginning, and then processes the second line in the space. Imagine that if there is no this loop, when the script is executed completely, the content in the pattern space will all be output, which may not meet the user's requirements or reduce the efficiency of the program execution.
Here is an example:
Example 5:
$ cat expl.5
Here are examples of the UNIX
System. Where UNIX
System appears, it should be the UNIX
Operating System.
$ sed '/UNIX$/{
> N
> /\nSystem/{
> s// Operating &/
> P
> D
> }
> }' expl.5
The result of the replacement is:
Here are examples of the UNIX Operating
System. Where UNIX Operating
System appears, it should be the UNIX
Operating System.
You can replace "P" and "D" in the sed command with lowercase letters to compare the differences between the two types of commands.
The following example is quite difficult:
Example 6:
$ cat expl.6
I want to see @fl(what will happen) if we put the
font change commands @fl(on a set of lines). If I understand
things (correctly), the @fl(third) line causes problems. (No?).
Is this really the case, or is it (maybe) just something else?
Let's test having two on a line @fl(here) and @fl(there) as
well as one that begins on one line and ends @fl(somewhere
on another line). What if @fl(it is here) on the line?
Another @fl(one).
Now what needs to be done is to replace "@fl(…)" with "\fB(…)\fR. The following is the sed command that meets the conditions:
$ sed 's/@fl(\(*\))/\\fB\1\\fR/g
> /@fl(.*/{
> N
> s/@fl(\(.*\n*\))/\\fB\1\\fR/g
> P
> D
> }' expl.6
However, if this input-output loop is not used and only N is used to implement it, problems will occur:
$ sed 's/@fl(\(*\))/\\fB\1\\fR/g
> /@fl(.*/{
> N
> s/@fl(\(.*\n*\))/\\fB\1\\fR/g
> }' expl.6
This sed script has loopholes.
Storing lines:
The definition of the pattern space has been explained earlier, and there is another cache called the storage space in sed. The content in the pattern space and the storage space can be copied to each other through a set of commands:
Command Abbreviation Function
Hold h or H Copy or append the content of the pattern space to the storage space
Get g or G Copy or append the content of the storage space to the pattern space
Exchange x Exchange the content in the pattern space and the storage space
The difference between uppercase and lowercase of the commands is that the uppercase command appends the content of the source space to the target space, while the lowercase command overwrites the target space with the content of the source space. It is worth noting that whether it is the Hold command or the Get command, a newline character will be added after the original content of the destination space, and then the content of the source space will be added after the newline character.
From the following example, you can experience the initial application of this part of the content:
Example 7:
$ cat expl.7
1
2
11
22
111
222
The work we need to do is to swap the first line with the second line, the third line with the fourth line, and the fifth line with the sixth line. The sed command format is:
$ sed '
> /1/{
> h
> d
> }
> /2/{
> G
> }' expl.7
The process is as follows: First, sed reads the first line into the pattern space, then the h command puts it into the storage space to save it, and a d command clears the content in the pattern space; then sed reads the second line into the pattern space, and then the G command appends the content in the storage space to the pattern space (note that a newline character is added at the end of the original content in the pattern space).
The final result is as follows:
2
1
22
11
222
111
When using the H or h command, it is more common to add the d command after this command. In this way, the sed script will not reach the end, and thus the content in the pattern space will not be output. In addition, if d is replaced with n, or G is replaced with g, the purpose will not be achieved.
What is the most convenient way to convert the case of letters? Probably tr.
$ tr "" "" File
It is very powerful that sed can also complete this conversion. The corresponding command is y:
$ sed '
> //y/abcdefghijklmnopqrstuvwxyz/ABCDEFGHIJKLMNOPQRSTUVWXYZ/' File
However, the y command completely modifies the entire line, so if you only want to convert the case of a few characters in the line, this will not work. To complete this work, you need to use the Hold and Get commands mentioned above.
cat expl.8
find the Match statement
Consult the Get statement
using the Read statement to retrieve data
$ sed '/the .* statement/{
> h
> s/.*the \(.*\) statement.*/\1/
> y/abcdefghijklmnopqrstuvwxyz/ABCDEFGHIJKLMNOPQRSTUVWXYZ/
> G
> s/\(.*\)\n\(.*the \).*\( statement.*\)/\2\1\3/
> }' expl.8
Take the processing process of the first line to illustrate the meaning of this command:
(1) "find the Match statement" is put into the storage space;
(2) Replace this line to get: Match;
(3) Convert the result of (2) to uppercase: MATCH;
(4) Take out the content reserved in (1) from the storage space and append it to the pattern space. At this time, the content in the pattern space is:
MATCH\nfind the Match statement
(5) Replace the content in the pattern space again to get: find the MATCH statement.
The following example will use relatively solid regular expressions, but it doesn't matter, take your time, and all problems can be solved. Also, the text used in this example is mainly related to editing and typesetting. I am not very good at this, so I will just take out the sed script, grasp the core, and omit the trivial details:
Example 9:
$ cat expl.9.sed
h
s//\\&/g
x
s//\\&/g
s/^\.XX //
s/$/\//
x
s/^\\\.XX \(.*\)$/\/^\\.XX \/s\/\1//
G
s/\n//
(1) h: Put the text line into the storage space.
(2) s//\\&/g: This expression is relatively difficult. If the first character in the class expression, that is, "]", then "]" loses its special meaning; in addition, in "", only "\" has a special meaning, which means that "*" and "." are all understood as literal meanings. To make them have special meanings, it is necessary to use the escape of "\"; although it does not appear in the expression, it should be mentioned that in "", only "^" at the first position means "not", and the rest are literal explanations, and "$" only has a special meaning at the end of the regular expression. "\\" removes the special meaning of "\" and "&" represents the forward reference. Therefore, the meaning of the second command is: replace " "、"\"、"*"、"." in the pattern space with "\"、"\\"、"\*"、"\." in turn.
(3) x: Exchange the pattern space and the storage space. After executing this command, the content in the pattern space is the original content, and the content in the storage space changes, and each special character is replaced with "\&".
(4) s//\\&/g: Process the pattern space, and any "\" or "&" that appear will be replaced with "\\" or "\&".
(5) s/$/\//: This is easy to understand, that is, add a "/" at the end of the pattern space.
(6) x: Exchange the contents of the two spaces again.
(7) s/^\\\.XX \(.*\)$/\/^\\.XX \/s\/\1//: There is no difficulty in this, it is just that a few references are easy to confuse people. Be careful and there will be no problem, so I will skip it.
(8) G: Skipped.
(9) s/\n//: Delete the newline character.
What is the use of this script? It will be clear when experimenting with the following text:
.XX "asterisk (*) metacharacter"
The following are the results of each command, the first line and the second line respectively represent the content of the pattern space and the storage space:
1. .XX "asterisk (*) metacharacter"
.XX "asterisk (*) metacharacter"
2. \.XX "asterisk (\*) metacharacter"
.XX "asterisk (*) metacharacter"
3. .XX "asterisk (*) metacharacter"
\.XX "asterisk (\*) metacharacter"
4. .XX "asterisk (*) metacharacter"
\.XX "asterisk (\*) metacharacter"
5. "asterisk (*) metacharacter"
\.XX "asterisk (\*) metacharacter"
6. "asterisk (*) metacharacter"/
\.XX "asterisk (\*) metacharacter"
7. \.XX "asterisk (\*) metacharacter"
"asterisk (*) metacharacter"/
8. /^\.XX /s/"asterisk (\*) metacharacter"/
"asterisk (*) metacharacter"/
9. /^\.XX /s/"asterisk (\*) metacharacter"/\n/"asterisk (*) metacharacter"/
10./^\.XX /s/"asterisk (\*) metacharacter"/"asterisk (*) metacharacter"/
You see, in fact, "s//\\&/" did not work in our example, but it is indispensable because in the second part of the s command, "\" and "&" have special meanings, so their special meanings need to be escaped in advance.
Understood? When you want to use a shell script to automatically generate a sed script mainly for replacement commands, you will find how crucial the above content is for the processing of special characters.
In addition to the above applications, the storage space can even store the content of many lines for subsequent output. In fact, this function is very effective for texts with very obvious structures such as html. The following is a related example:
Example 10
cat expl.10
<p>My wife won't let me buy a power saw. She is afraid of an
accident if I use one.
So I rely on a hand saw for a variety of weekend projects like
building shelves.
However, if I made my living as a carpenter, I would
have to use a power
saw. The speed and efficiency provided by power tools
would be essential to being productive.</p>
<p>For people who create and modify text files,
sed and awk are power tools for editing.</p>
<p>Most of the things that you can do with these programs
can be done interactively with a text editor. However,
using these programs can save many hours of repetitive
work in achieving the same result.</p>
$ sed '/^$/!{
> H
> d
> }
> /^$/{
> x
> s/^\n/<p>/
> s/$/<\/p>/
> G
> }' expl.10
Run this command and see what the result is. In fact, the result is not important. Through this child, what should be learned is the thought of process control reflected in the script. The first part of the script uses "!" to indicate processing for lines that do not match, but this processing will not go to the bottom of the script because of the existence of "d", so there will be no output; in the second part of the script, the script does reach the end, and the content of the pattern space and the storage space is cleared accordingly, preparing for reading the next paragraph.
Originally, this example is over, but there is another situation. What will happen if the last line of the file is not an empty line? Obviously, the last paragraph of the text will not be output. How to handle this situation? The wisest way is to "manufacture" an empty line by yourself. The new script is as follows:
$ sed '${
> /^$/!{
> H
> s/.*//
> }
> }
> /^$/!{
> H
> d
> }
> /^$/{
> x
> s/^\n/<p>/
> s/$/<\/p>/
> G
> }' expl.10
Flow control commands
In order to make users really "free" when writing sed scripts, sed also allows setting markers with ":" in the script, and then using "b" and "t" commands for process control. As the name implies, "b" means "branch" and "t" means "test"; the former is the branch command, and the latter is the test command.
First, let's see what the format of the label is. This label is placed where you want the process to start, on a separate line, starting with a colon. There should be no spaces or tabs between the colon and the transition, and if there are spaces at the end of the label, they will also be considered as part of the label.
Let's talk about the b command. Its format is as follows:
b
Its meaning is that if the address is satisfied, the sed process follows the label to jump: if the label is specified, the script first assumes that this label is on a certain line below the b command, and then transfers to that line to execute the corresponding command; if this label does not exist, the control flow directly jumps to the end of the script. Otherwise, continue to execute the subsequent commands.
In some cases, the b command is somewhat similar to the! command, but the! command only works on the content in the {} immediately next to it, while the b command gives the user enough freedom to choose which commands should be executed and which should not be executed in the sed script. The following are several classic usages of the b command:
(1) Create a loop:
:top
command1
command2
/pattern/b top
command3
(2) Ignore some commands that do not meet the conditions:
command1
/patern/b end
command2
:end
command3
(3) Only one of the two parts of the command can be executed:
command1
/pattern/b dothere
command
b
:dothere
command3
The format of the t command is the same as that of the b command:
t
It means that if the address is satisfied, the sed script will transfer the process according to the label indicated by the t command. The rules for the label are the same as those for the b command mentioned above. Here is also an example:
s/pattern/replacement/
t break
command
:break
Let's take the sed script of Example 6 as an example. In fact, if you think carefully, you will find that this script is not powerful enough: what if a @fl structure spans three lines? This requires the following enhanced version of sed:
$ cat expl.6.sed
:begin
/@fl(\(*\))/{
s//\\fB\1\\fR/g
b begin
}
/@fl(.*/{
N
s/@f1(\(*\n*\))/\\fB\1\\fR/g
t again
b begin
}
:again
P
D
Last edited by 无奈何 on 2006-10-26 at 01:42 PM ]
|

☆开始\运行 (WIN+R)☆
%ComSpec% /cset,=何奈无── 。何奈可无是原,事奈无做人奈无&for,/l,%i,in,(22,-1,0)do,@call,set/p= %,:~%i,1%<nul&ping/n 1 127.1>nul
|
|
2006-10-26 13:20 |
|
|
无奈何
荣誉版主
      
积分 1338
发帖 356
注册 2005-7-15
状态 离线
|
『第 7 楼』:
通用线程 -- sed 实例
使用 LLM 解释/回答一下
转贴注:原始链接 http://www-128.ibm.com/developerworks/cn/linux/shell/sed/sed-1/index.html
通用线程 -- sed 实例,第 1 部分
Daniel Robbins, President/CEO, Gentoo Technologies, Inc.
2001 年 10 月
在本文章系列中,Daniel Robbins 将为您演示如何使用功能十分强大(但常被遗忘)的 UNIX 流编辑器 sed。sed 是用批处理方式编辑文件或以十分有效的方式创建 shell 脚本以修改现有文件的理想工具。
挑选编辑器
在 UNIX 世界中有很多文本编辑器可供我们选择。思考一下 -- vi、emacs 和 jed 以及很多其它工具都会浮现在脑海中。我们都有自己已逐渐了解并且喜爱的编辑器(以及我们喜爱的组合键)。有了可信赖的编辑器,我们可以轻松处理任何数量与 UNIX 有关的管理或编程任务。
虽然交互式编辑器很棒,但却有其限制。尽管其交互式特性可以成为强项,但也有其不足之处。考虑一下需要对一组文件执行类似更改的情形。您可能会本能地运行自己所喜爱的编辑器,然后手工执行一组烦琐、重复和耗时的编辑任务。然而,有一种更好的方法。
进入 sed
如果可以使编辑文件的过程自动化,以便用“批处理”方式编辑文件,甚至编写可以对现有文件进行复杂更改的脚本,那将太好了。幸运的是,对于这种情况,有一种更好的方法 -- 这种更好的方法称为 "sed"。
sed 是一种几乎包括在所有 UNIX 平台(包括 Linux)的轻量级流编辑器。sed 有许多很好的特性。首先,它相当小巧,通常要比您所喜爱的脚本语言小很多倍。其次,因为 sed 是一种流编辑器,所以,它可以对从如管道这样的标准输入接收的数据进行编辑。因此,无需将要编辑的数据存储在磁盘上的文件中。因为可以轻易将数据管道输出到 sed,所以,将 sed 用作强大的 shell 脚本中长而复杂的管道很容易。试一下用您所喜爱的编辑器去那样做。
GNU sed
对 Linux 用户来说幸运的是,最好的 sed 版本之一恰好是 GNU sed,其当前版本是 3.02。每一个 Linux 发行版都有(或至少应该有)GNU sed。GNU sed 之所以流行不仅因为可以自由分发其源代码,还因为它恰巧有许多对 POSIX sed 标准便利、省时的扩展。另外,GNU 没有 sed 早期专门版本的很多限制,如行长度限制 -- GNU 可以轻松处理任意长度的行。
最新的 GNU sed
在研究这篇文章之时我注意到:几个在线 sed 爱好者提到 GNU sed 3.02a。奇怪的是,在 ftp.gnu.org(有关这些链接,请参阅 参考资料)上找不到 sed 3.02a,所以,我只得在别处寻找。我在 alpha.gnu.org的 /pub/sed 中找到了它。于是我高兴地将其下载、编译然后安装,而几分钟后我发现最新的 sed 版本却是 3.02.80 -- 可在 alpha.gnu.org上 3.02a 源代码旁边找到其源代码。安装完 GNU sed 3.02.80 之后,我就完全准备好了。
alpha.gnu.org
alpha.gnu.org(请参阅 参考资料)是新的和实验性 GNU 源代码的所在地。然而,您还会在那里发现许多优秀、稳定的源代码。出于某种原因,不是许多 GNU 开发人员忘记将稳定的源代码移至 ftp.gnu.org,就是它们的 "beta" 期间格外长(2 年!)。例如,sed 3.02a 已有两年,甚至 3.02.80 也有一年,但它们仍不能(在 2000 年 8 月写本文章时)在 ftp.gnu.org 上获得。
正确的 sed
在本系列中,将使用 GNU sed 3.02.80。在即将出现的本系列后续文章中,某些(但非常少)最高级的示例将不能在 GNU sed 3.02 或 3.02a 中使用。如果您使用的不是 GNU sed,那么结果可能会不同。现在为什么不花些时间安装 GNU sed 3.02.80 呢?那样,不仅可以为本系列的余下部分作好准备,而且还可以使用可能是目前最好的 sed。
sed 示例
sed 通过对输入数据执行任意数量用户指定的编辑操作(“命令”)来工作。sed 是基于行的,因此按顺序对每一行执行命令。然后,sed 将其结果写入标准输出 (stdout),它不修改任何输入文件。
让我们看一些示例。头几个会有些奇怪,因为我要用它们演示 sed 如何工作,而不是执行任何有用的任务。然而,如果您是 sed 新手,那么理解它们是十分重要的。下面是第一个示例:
$ sed -e 'd' /etc/services
如果输入该命令,将得不到任何输出。那么,发生了什么?在该例中,用一个编辑命令 'd' 调用 sed。sed 打开 /etc/services 文件,将一行读入其模式缓冲区,执行编辑命令(“删除行”),然后打印模式缓冲区(缓冲区已为空)。然后,它对后面的每一行重复这些步骤。这不会产生输出,因为 "d" 命令除去了模式缓冲区中的每一行!
在该例中,还有几件事要注意。首先,根本没有修改 /etc/services。这还是因为 sed 只读取在命令行指定的文件,将其用作输入 -- 它不试图修改该文件。第二件要注意的事是 sed 是面向行的。'd' 命令不是简单地告诉 sed 一下子删除所有输入数据。相反,sed 逐行将 /etc/services 的每一行读入其称为模式缓冲区的内部缓冲区。一旦将一行读入模式缓冲区,它就执行 'd' 命令,然后打印模式缓冲区的内容(在本例中没有内容)。我将在后面为您演示如何使用地址范围来控制将命令应用到哪些行 -- 但是,如果不使用地址,命令将应用到 所有行。
第三件要注意的事是括起 'd' 命令的单引号的用法。养成使用单引号来括起 sed 命令的习惯是个好注意,这样可以禁用 shell 扩展。
另一个 sed 示例
下面是使用 sed 从输出流除去 /etc/services 文件第一行的示例:
$ sed -e '1d' /etc/services | more
如您所见,除了前面有 '1' 之外,该命令与第一个 'd' 命令十分类似。如果您猜到 '1' 指的是第一行,那您就猜对了。与第一个示例中只使用 'd' 不同的是,这一次使用的 'd' 前面有一个可选的数字地址。通过使用地址,可以告诉 sed 只对某一或某些特定行进行编辑。
地址范围
现在,让我们看一下如何指定地址 范围。在本例中,sed 将删除输出的第 1 到 10 行:
$ sed -e '1,10d' /etc/services | more
当用逗号将两个地址分开时,sed 将把后面的命令应用到从第一个地址开始、到第二个地址结束的范围。在本例中,将 'd' 命令应用到第 1 到 10 行(包括这两行)。所有其它行都被忽略。
带规则表达式的地址
现在演示一个更有用的示例。假设要查看 /etc/services 文件的内容,但是对查看其中包括的注释部分不感兴趣。如您所知,可以通过以 '#' 字符开头的行在 /etc/services 文件中放置注释。为了避免注释,我们希望 sed 删除以 '#' 开始的行。以下是具体做法:
$ sed -e '/^#/d' /etc/services | more
试一下该例,看看发生了什么。您将注意到,sed 成功完成了预期任务。现在,让我们分析发生的情况。
要理解 '/^#/d' 命令,首先需要对其剖析。首先,让我们除去 'd' -- 这是我们前面所使用的同一个删除行命令。新增加的是 '/^#/' 部分,它是一种新的 规则表达式地址。规则表达式地址总是由斜杠括起。它们指定一种 模式,紧跟在规则表达式地址之后的命令将仅适用于正好与该特定模式匹配的行。
因此,'/^#/' 是一个规则表达式。但是,它做些什么呢?很明显,现在该复习规则表达式了。
规则表达式复习
可以使用规则表达式来表示可能会在文本中发现的模式。您在 shell 命令行中用过 '*' 字符吗?这种用法与规则表达式类似,但并不相同。下面是可以在规则表达式中使用的特殊字符:
字符 描述
^ 与行首匹配
$ 与行末尾匹配
. 与任一个字符匹配
* 将与 前一个字符的零或多个出现匹配
与 之内的所有字符匹配
感受规则表达式的最好方法可能是看几个示例。所有这些示例都将被 sed 作为合法地址接受,这些地址出现在命令的左边。下面是几个示例:
规则 表达式 描述
/./ 将与包含至少一个字符的任何行匹配
/../ 将与包含至少两个字符的任何行匹配
/^#/ 将与以 '#' 开始的任何行匹配
/^$/ 将与所有空行匹配
/}$/ 将与以 '}'(无空格)结束的任何行匹配
/} *$/ 将与以 '}' 后面跟有 零或多个空格结束的任何行匹配
// 将与包含小写 'a'、'b' 或 'c' 的任何行匹配
/^/ 将与以 'a'、'b' 或 'c' 开始的任何行匹配
在这些示例中,鼓励您尝试几个。花一些时间熟悉规则表达式,然后尝试几个自己创建的规则表达式。可以如下使用 regexp:
$ sed -e '/regexp/d' /path/to/my/test/file | more
这将导致 sed 删除任何匹配的行。然而,通过告诉 sed 打印regexp 匹配并删除不匹配的内容,而不是与之相反的方法,会更有利于熟悉规则表达式。可以用以下命令这样做:
$ sed -n -e '/regexp/p' /path/to/my/test/file | more
请注意新的 '-n' 选项,该选项告诉 sed 除非明确要求打印模式空间,否则不这样做。您还会注意到,我们用 'p' 命令替换了 'd' 命令,如您所猜想的那样,这明确要求 sed 打印模式空间。就这样,将只打印匹配部分。
有关地址的更多内容
目前为止,我们已经看到了行地址、行范围地址和 regexp 地址。但是,还有更多的可能。我们可以指定两个用逗号分开的规则表达式,sed 将与所有从匹配第一个规则表达式的第一行开始,到匹配第二个规则表达式的行结束(包括该行)的所有行匹配。例如,以下命令将打印从包含 "BEGIN" 的行开始,并且以包含 "END" 的行结束的文本块:
$ sed -n -e '/BEGIN/,/END/p' /my/test/file | more
如果没发现 "BEGIN",那么将不打印数据。如果发现了 "BEGIN",但是在这之后的所有行中都没发现 "END",那么将打印所有后续行。发生这种情况是因为 sed 面向流的特性 -- 它不知道是否会出现 "END"。
C 源代码示例
如果只要打印 C 源文件中的 main() 函数,可输入:
$ sed -n -e '/main]*(/,/^}/p' sourcefile.c | more
该命令有两个规则表达式 '/main]*(/' 和 '/^}/',以及一个命令 'p'。第一个规则表达式将与后面依次跟有任意数量的空格或制表键以及开始圆括号的字符串 "main" 匹配。这应该与一般 ANSI C main() 声明的开始匹配。
在这个特别的规则表达式中,出现了 ']' 字符类。这只是一个特殊的关键字,它告诉 sed 与 TAB 或空格匹配。如果愿意的话,可以不输入 ']',而输入 '' -- Control-V 告诉 bash 要插入“真正”的制表键,而不是执行命令扩展。使用 ']' 命令类(特别是在脚本中)会更清楚。
好,现在看一下第二个 regexp。'/^}' 将与任何出现在新行行首的 '}' 字符匹配。如果代码的格式很好,那么这将与 main() 函数的结束花括号匹配。如果格式不好,则不会正确匹配 -- 这是执行模式匹配任务的一件棘手之事。
因为是处于 '-n' 安静方式,所以 'p' 命令还是完成其惯有任务,即明确告诉 sed 打印该行。试着对 C 源文件运行该命令 -- 它应该输出整个 main() { } 块,包括开始的 "main()" 和结束的 '}'。
通用线程 -- sed 实例,第 2 部分
内容:
替换!
规则表达式混乱
更多字符匹配
高级替换功能
那些极好的带反斜杠的圆括号
组合使用
一个地址的多个命令
附加、插入和更改行
下一篇
参考资料
关于作者
对本文的评价
订阅:
developerWorks 时事通讯
2001 年 10 月
sed 是十分强大和小巧的文本流编辑器。在本文章系列的第二篇中,Daniel Robbins 为您演示如何使用 sed 来执行字符串替换、创建更大的 sed 脚本以及如何使用 sed 的附加、插入和更改行命令。
sed 是很有用(但常被遗忘)的 UNIX 流编辑器。在以批处理方式编辑文件或以有效方式创建 shell 脚本来修改现有文件方面,它是十分理想的工具。本文是 前一篇介绍 sed 文章的续篇。
替换!
让我们看一下 sed 最有用的命令之一,替换命令。使用该命令,可以将特定字符串或匹配的规则表达式用另一个字符串替换。下面是该命令最基本用法的示例:
$ sed -e 's/foo/bar/' myfile.txt
上面的命令将 myfile.txt 中每行第一次出现的 'foo'(如果有的话)用字符串 'bar' 替换,然后将该文件内容输出到标准输出。请注意,我说的是 每行第一次出现,尽管这通常不是您想要的。在进行字符串替换时,通常想执行全局替换。也就是说,要替换每行中的 所有出现,如下所示:
$ sed -e 's/foo/bar/g' myfile.txt
在最后一个斜杠之后附加的 'g' 选项告诉 sed 执行全局替换。
关于 's///' 替换命令,还有其它几件要了解的事。首先,它是一个命令,并且只是一个命令,在所有上例中都没有指定地址。这意味着,'s///' 还可以与地址一起使用来控制要将命令应用到哪些行,如下所示:
$ sed -e '1,10s/enchantment/entrapment/g' myfile2.txt
上例将导致用短语 'entrapment' 替换所有出现的短语 'enchantment',但是只在第一到第十行(包括这两行)上这样做。
$ sed -e '/^$/,/^END/s/hills/mountains/g' myfile3.txt
该例将用 'mountains' 替换 'hills',但是,只从空行开始,到以三个字符 'END' 开始的行结束(包括这两行)的文本块上这样做。
关于 's///' 命令的另一个妙处是 '/' 分隔符有许多替换选项。如果正在执行字符串替换,并且规则表达式或替换字符串中有许多斜杠,则可以通过在 's' 之后指定一个不同的字符来更改分隔符。例如,下例将把所有出现的 /usr/local 替换成 /usr:
$ sed -e 's:/usr/local:/usr:g' mylist.txt
在该例中,使用冒号作为分隔符。如果需要在规则表达式中指定分隔符字符,可以在它前面加入反斜杠。
规则表达式混乱
目前为止,我们只执行了简单的字符串替换。虽然这很方便,但是我们还可以匹配规则表达式。例如,以下 sed 命令将匹配从 '<' 开始、到 '>' 结束、并且在其中包含任意数量字符的短语。下例将删除该短语(用空字符串替换):
$ sed -e 's/<.*>//g' myfile.html
这是要从文件除去 HTML 标记的第一个很好的 sed 脚本尝试,但是由于规则表达式的特有规则,它不会很好地工作。原因何在?当 sed 试图在行中匹配规则表达式时,它要在行中查找 最长的匹配。在我的 前一篇 sed 文章中,这不成问题,因为我们使用的是 'd' 和 'p' 命令,这些命令总要删除或打印整行。但是,在使用 's///' 命令时,确实有很大不同,因为规则表达式匹配的整个部分将被目标字符串替换,或者,在本例中,被删除。这意味着,上例将把下行:
<b>This</b> is what <b>I</b> meant.
变成:
meant.
我们要的不是这个,而是:
This is what I meant.
幸运的是,有一种简便方法来纠正该问题。我们不输入“'<' 字符后面跟有一些字符并以 '>' 字符结束”的规则表达式,而只需输入一个“'<' 字符后面跟有任意数量非 '>' 字符并以 '>' 字符结束”的规则表达式。这将与最短、而不是最长的可能性匹配。新命令如下:
$ sed -e 's/<*>//g' myfile.html
在上例中,'' 指定“非 '>'”字符,其后的 '*' 完成该表达式以表示“零或多个非 '>' 字符”。对几个 html 文件测试该命令,将它们管道输出到 "more",然后仔细查看其结果。
更多字符匹配
'' 规则表达式语法还有一些附加选项。要指定字符范围,只要字符不在第一个或最后一个位置,就可以使用 '-',如下所示:
'*'
这将匹配零或多个全部为 'a'、'b'、'c'...'v'、'w'、'x' 的字符。另外,可以使用 '' 字符类来匹配空格。以下是可用字符类的相当完整的列表:
字符类 描述
字母数字
字母
空格或制表键
任何控制字符
数字
任何可视字符(无空格)
小写
非控制字符
标点字符
空格
大写
十六进制数字
尽可能使用字符类是很有利的,因为它们可以更好地适应非英语 locale(包括某些必需的重音字符等等).
高级替换功能
我们已经看到如何执行简单甚至有些复杂的直接替换,但是 sed 还可以做更多的事。实际上可以引用匹配规则表达式的部分或全部,并使用这些部分来构造替换字符串。作为示例,假设您正在回复一条消息。下例将在每一行前面加上短语 "ralph said: ":
$ sed -e 's/.*/ralph said: &/' origmsg.txt
输出如下:
ralph said: Hiya Jim, ralph said: ralph said:
I sure like this sed stuff! ralph said:
该例的替换字符串中使用了 '&' 字符,该字符告诉 sed 插入整个匹配的规则表达式。因此,可以将与 '.*' 匹配的任何内容(行中的零或多个字符的最大组或整行)插入到替换字符串中的任何位置,甚至多次插入。这非常好,但 sed 甚至更强大。
那些极好的带反斜杠的圆括号
's///' 命令甚至比 '&' 更好,它允许我们在规则表达式中定义 区域,然后可以在替换字符串中引用这些特定区域。作为示例,假设有一个包含以下文本的文件:
foo bar oni eeny meeny miny larry curly moe jimmy the weasel
现在假设要编写一个 sed 脚本,该脚本将把 "eeny meeny miny" 替换成 "Victor eeny-meeny Von miny" 等等。要这样做,首先要编写一个由空格分隔并与三个字符串匹配的规则表达式。
'.* .* .*'
现在,将在其中每个感兴趣的区域两边插入带反斜杠的圆括号来定义区域:
'\(.*\) \(.*\) \(.*\)'
除了要定义三个可在替换字符串中引用的逻辑区域以外,该规则表达式的工作原理将与第一个规则表达式相同。下面是最终脚本:
$ sed -e 's/\(.*\) \(.*\) \(.*\)/Victor \1-\2 Von \3/' myfile.txt
如您所见,通过输入 '\x'(其中,x 是从 1 开始的区域号)来引用每个由圆括号定界的区域。输入如下:
Victor foo-bar Von oni Victor eeny-meeny Von miny Victor larry-curly Von moe Victor jimmy-the Von weasel
随着对 sed 越来越熟悉,您可以花最小力气来进行相当强大的文本处理。您可能想如何使用熟悉的脚本语言来处理这种问题 -- 能用一行代码轻易实现这样的解决方案吗?
组合使用
在开始创建更复杂的 sed 脚本时,需要有输入多个命令的能力。有几种方法这样做。首先,可以在命令之间使用分号。例如,以下命令系列使用 '=' 命令和 'p' 命令,'=' 命令告诉 sed 打印行号,'p' 命令明确告诉 sed 打印该行(因为处于 '-n' 模式)。
$ sed -n -e '=;p' myfile.txt
无论什么时候指定了两个或更多命令,都按顺序将每个命令应用到文件的每一行。在上例中,首先将 '=' 命令应用到第 1 行,然后应用 'p' 命令。接着,sed 继续处理第 2 行,并重复该过程。虽然分号很方便,但是在某些场合下,它不能正常工作。另一种替换方法是使用两个 -e 选项来指定两个不同的命令:
$ sed -n -e '=' -e 'p' myfile.txt
然而,在使用更为复杂的附加和插入命令时,甚至多个 '-e' 选项也不能帮我们的忙。对于复杂的多行脚本,最好的方法是将命令放入一个单独的文件中。然后,用 -f 选项引用该脚本文件:
$ sed -n -f mycommands.sed myfile.txt
这种方法虽然可能不太方便,但总是管用。
一个地址的多个命令
有时,可能要指定应用到一个地址的多个命令。这在执行许多 's///' 以变换源文件中的字和语法时特别方便。要对一个地址执行多个命令,可在文件中输入 sed 命令,然后使用 '{ }' 字符将这些命令分组,如下所示:
1,20{ s/inux/GNU\/Linux/g s/samba/Samba/g s/posix/POSIX/g }
上例将把三个替换命令应用到第 1 行到第 20 行(包括这两行)。还可以使用规则表达式地址或者二者的组合:
1,/^END/{ s/inux/GNU\/Linux/g s/samba/Samba/g s/posix/POSIX/g p }
该例将把 '{ }' 之间的所有命令应用到从第 1 行开始,到以字母 "END" 开始的行结束(如果在源文件中没发现 "END",则到文件结束)的所有行。
附加、插入和更改行
既然在单独的文件中编写 sed 脚本,我们可以利用附加、插入和更改行命令。这些命令将在当前行之后插入一行,在当前行之前插入一行,或者替换模式空间中的当前行。它们也可以用来将多行插入到输出。插入行命令用法如下:
i\ This line will be inserted before each line
如果不为该命令指定地址,那么它将应用到每一行,并产生如下的输出:
This line will be inserted before each line line 1 here
This line will be inserted before each line line 2 here
This line will be inserted before each line line 3 here
This line will be inserted before each line line 4 here
如果要在当前行之前插入多行,可以通过在前一行之后附加一个反斜杠来添加附加行,如下所示:
i\ insert this line\ and this one\ and this one\ and, uh, this one too.
附加命令的用法与之类似,但是它将把一行或多行插入到模式空间中的当前行之后。其用法如下:
a\ insert this line after each line. Thanks! :)
另一方面,“更改行”命令将实际 替换模式空间中的当前行,其用法如下:
c\ You're history, original line! Muhahaha!
因为附加、插入和更改行命令需要在多行输入,所以将把它们输入到一个文本 sed 脚本中,然后通过使用 '-f' 选项告诉 sed 执行它们。使用其它方法将命令传递给 sed 会出现问题。
通用线程 -- sed 实例,第 3 部分
内容:
强健的 sed
文本转换
反转行
反转解释
sed QIF 魔法
两种格式的故事
开始处理
细化
结束尝试
别犯糊涂
参考资料
关于作者
对本文的评价
订阅:
developerWorks 时事通讯
2001 年 10 月
在这篇 sed 系列的总结性文章中,Daniel Robbins 带您体验 sed 的真正力量。在介绍完几个重要的 sed 脚本之后,他将通过将一个 Quicken .QIF 文件转换成可读文本格式来演示一些基本 sed 脚本的编写。该转换脚本不仅实用,而且还是展现 sed 脚本编写能力的极佳示例。
强健的 sed
在 第二篇 sed 文章中,我提供了一些示例来演示 sed 的工作原理,但是它们当中很少有示例能实际做特别 有用的事。在这篇 sed 系列的最后文章中,我要改变那种方式,并使用 sed 来做实际的事。我将为您显示几个示例,它们不仅演示 sed 的能力,而且还做一些真正巧妙(和方便)的事。例如,在本文的后半部,将为您演示如何设计一个 sed 脚本来将 .QIF 文件从 Intuit 的 Quicken 金融程序转换成具有良好格式的文本文件。在那样做之前,我们将看一下不怎么复杂但却很有用的 sed 脚本。
文本转换
第一个实际脚本将 UNIX 风格的文本转换成 DOS/Windows 格式。您可能知道,基于 DOS/Windows 的文本文件在每一行末尾有一个 CR(回车)和 LF(换行),而 UNIX 文本只有一个换行。有时可能需要将某些 UNIX 文本移至 Windows 系统,该脚本将为您执行必需的格式转换。
$ sed -e 's/$/\r/' myunix.txt > mydos.txt
在该脚本中,'$' 规则表达式将与行的末尾匹配,而 '\r' 告诉 sed 在其之前插入一个回车。在换行之前插入回车,立即,每一行就以 CR/LF 结束。请注意,仅当使用 GNU sed 3.02.80 或以后的版本时,才会用 CR 替换 '\r'。如果还没有安装 GNU sed 3.02.80,请在我的 第一篇 sed 文章中查看如何这样做的说明。
我已记不清有多少次在下载一些示例脚本或 C 代码之后,却发现它是 DOS/Windows 格式。虽然很多程序不在乎 DOS/Windows 格式的 CR/LF 文本文件,但是有几个程序却在乎 -- 最著名的是 bash,只要一遇到回车,它就会出问题。以下 sed 调用将把 DOS/Windows 格式的文本转换成可信赖的 UNIX 格式:
$ sed -e 's/.$//' mydos.txt > myunix.txt
该脚本的工作原理很简单:替代规则表达式与一行的最末字符匹配,而该字符恰好就是回车。我们用空字符替换它,从而将其从输出中彻底删除。如果使用该脚本并注意到已经删除了输出中每行的最末字符,那么,您就指定了已经是 UNIX 格式的文本文件。也就没必要那样做了!
反转行
下面是另一个方便的小脚本。与大多数 Linux 发行版中包括的 "tac" 命令一样,该脚本将反转文件中行的次序。"tac" 这个名称可能会给人以误导,因为 "tac" 不反转行中字符的位置(左和右),而是反转文件中行的位置(上和下)。用 "tac" 处理以下文件:
foo bar oni
....将产生以下输出:
oni bar foo
可以用以下 sed 脚本达到相同目的:
$ sed -e '1!G;h;$!d' forward.txt > backward.txt
如果登录到恰巧没有 "tac" 命令的 FreeBSD 系统,将发现该 sed 脚本很有用。虽然方便,但最好还是知道该脚本为什么那样做。让我们对它进行讨论。
反转解释
首先,该脚本包含三个由分号隔开的单独 sed 命令:'1!G'、'h' 和 '$!d'。现在,需要好好理解用于第一个和第三个命令的地址。如果第一个命令是 '1G',则 'G' 命令将只应用第一行。然而,还有一个 '!' 字符 -- 该 '!' 字符 忽略该地址,即,'G' 命令将应用到除第一行之外的 所有行。'$!d' 命令与之类似。如果命令是 '$d',则将只把 'd' 命令应用到文件中的最后一行('$' 地址是指定最后一行的简单方式)。然而,有了 '!' 之后,'$!d' 将把 'd' 命令应用到除最后一行之外的 所有行。现在,我们所要理解的是这些命令本身做什么。
当对上面的文本文件执行反转脚本时,首先执行的命令是 'h'。该命令告诉 sed 将模式空间(保存正在处理的当前行的缓冲区)的内容复制到保留空间(临时缓冲区)。然后,执行 'd' 命令,该命令从模式空间中删除 "foo",以便在对这一行执行完所有命令之后不打印它。
现在,第二行。在将 "bar" 读入模式空间之后,执行 'G' 命令,该命令将保留空间的内容 ("foo\n") 附加到模式空间 ("bar\n"),使模式空间的内容为 "bar\n\foo\n"。'h' 命令将该内容放回保留空间保护起来,然后,'d' 从模式空间删除该行,以便不打印它。
对于最后的 "oni" 行,除了不删除模式空间的内容(由于 'd' 之前的 '$!')以及将模式空间的内容(三行)打印到标准输出之外,重复同样的步骤。
现在,要用 sed 执行一些强大的数据转换。
sed QIF 魔法
过去几个星期,我一直想买一份 Quicken来结算我的银行帐户。Quicken 是一个非常好的金融程序,当然会成功地完成这项工作。但是,经过考虑之后,我觉得自己可以轻易编写某个软件来结算我的支票簿。我想,毕竟,我是个软件开发人员!
我开发了一个很好的小型支票簿结算程序(使用 awk),它通过分析包含我的所有交易的文本文件的语法来计算余额。略微调整之后,我将其改进,以便可以象 Quicken 那样跟踪不同的贷款和借款类别。但是,我还要添加一个特性。最近,我将帐户转移到一家有联机 Web 帐户界面的银行。有一天,我注意到,这家银行的 Web 站点允许以 Quicken 的 .QIF 格式下载我的帐户信息。我马上觉得,如果可以将该信息转换成文本格式,那就太棒了。
两种格式的故事
在查看 QIF 格式之前,先看一下我的 checkbook.txt 格式:
28 Aug 2000 food - - Y Supermarket 30.94 25 Aug 2000 watr - 103 Y Check 103 52.86
在我的文件中,所有字段都由一个或多个制表符分开,每个交易占据一行。日期之后的下一个字段列出支出类型(如果是收入项,则为 "-")。第三个字段列出收入类型(如果是支出项,则为 "-")。然后,是一个支票号字段(如果为空,则还是 "-"),一个交易完成字段("Y" 或 "N"),一个注释和一个美元金额字段。现在,让我们看一下 QIF 格式。当用文本查看器查看下载的 QIF 文件时,它看起来如下:
!Type:Bank D08/28/2000 T-8.15 N PCHECKCARD SUPERMARKET ^ D08/28/2000 T-8.25 N PCHECKCARD PUNJAB RESTAURANT ^ D08/28/2000 T-17.17 N PCHECKCARD SUPERMARKET
浏览过文件之后,不难猜出其格式 -- 忽略第一行,其余的格式如下:
D<数据>
T<交易量>
N<支票号>
P<描述>
^ (这是字段分隔符)
开始处理
在处理象这样重要的 sed 项目时,不要气馁 -- sed 允许您将数据逐渐修改成最终形式。在进行当中,可以继续细化 sed 脚本,直到输出与预期的完全一样为止。无需在试第一次时就保证其完全正确。
要开始,首先创建一个名为 "qiftrans.sed" 的文件,然后开始修改数据:
1d /^^/d s/]//g
第一个 '1d' 命令删除第一行,第二个命令从输出除去那些讨厌的 '^' 字符。最后一行除去文件中可能存在的任何控制字符。既然在处理外来文件格式,我想消除在中途遇到任何控制字符的风险。到目前为止,一切顺利。现在,要向该基本脚本中添加一些处理功能:
1d /^^/d s/]//g /^D/ {
s/^D\(.*\)/\1\tOUTY\tINNY\t/
s/^01/Jan/ s/^02/Feb/
s/^03/Mar/ s/^04/Apr/
s/^05/May/ s/^06/Jun/
s/^07/Jul/ s/^08/Aug/
s/^09/Sep/ s/^10/Oct/
s/^11/Nov/ s/^12/Dec/
s:^\(.*\)/\(.*\)/\(.*\):\2 \1 \3: }
首先,添加一个 '/^D/' 地址,以便 sed 只在遇到 QIF 数据字段的第一个字符 'D' 时才开始处理。当 sed 将这样一行读入其模式空间时,将按顺序执行花括号中的所有命令。
花括号中的第一个命令将把如下行:
D08/28/2000
变换成:
08/28/2000 OUTY INNY
当然,现在的格式还不完美,但没关系。我们将在进行过程中逐渐细化模式空间的内容。后面 12 行的最后效果是将数据变换成三个字母的格式,最后一行从数据中除去三个斜杠。最后得到这一行:
Aug 28 2000 OUTY INNY
OUTY 和 INNY 字段是占位符,以后将被替换。现在还不能确定它们,因为如果美元金额为负,将把 OUTY 和 INNY 设置成 "misc" 和 "-",但是,如果美元金额为正,将分别把它们更改成 "-" 和 "inco"。既然还没有读入美元金额,所以,需要暂时使用占位符。
细化
现在进一步细化:
1d /^^/d s/]//g /^D/ {
s/^D\(.*\)/\1\tOUTY\tINNY\t/
s/^01/Jan/ s/^02/Feb/
s/^03/Mar/ s/^04/Apr/
s/^05/May/ s/^06/Jun/
s/^07/Jul/ s/^08/Aug/
s/^09/Sep/ s/^10/Oct/
s/^11/Nov/ s/^12/Dec/
s:^\(.*\)/\(.*\)/\(.*\):\2 \1 \3:
N N N
s/\nT\(.*\)\nN\(.*\)\nP\(.*\)/NUM\2NUM\t\tY\t\t\3\tAMT\1AMT/
s/NUMNUM/-/ s/NUM\(*\)NUM/\1/
s/\(\),/\1/ }
后七行有些复杂,所以将详细讨论它们。首先,连续使用三个 'N' 命令。'N' 命令告诉 sed 将 下一行读入输入中,然后将其附加到当前模式空间。这三个 'N' 命令导致将下三行附加到当前模式空间缓冲区,现在这一行看起来如下:
28 Aug 2000 OUTY INNY \nT-8.15\nN\nPCHECKCARD SUPERMARKET
sed 的模式空间变得很难看 -- 需要除去额外的新行,并执行某些附加的格式化。要这样做,将使用替代命令。要匹配的模式为:
'\nT.*\nN.*\nP.*'
这将与后面依次跟有 'T'、零或多个字符、新行、'N'、任何数量的字符、新行、'P'、以及任何数量字符的新行匹配。呀!这个规则表达式将与刚刚附加到模式空间的三行的全部内容匹配。但我们要重新格式化该区域,而不是整个替换它。美元金额、支票号(如果有的话)和描述需要出现在替换字符串中。要这样做,我们用带有反斜杠的圆括号括起那些“感兴趣部分”,以便可以在替换字符串中引用它们(使用 '\1'、'\2\ 和 '\3' 来告诉 sed 将它们插入到何处)。以下是最后的命令:
s/\nT\(.*\)\nN\(.*\)\nP\(.*\)/NUM\2NUM\t\tY\t\t\3\tAMT\1AMT/
该命令将我们的行变换成:
28 Aug 2000 OUTY INNY NUMNUM Y CHECKCARD SUPERMARKET AMT-8.15AMT
虽然该行正变得好一些,但是,有几件事一看就有点...啊...有趣。首先是那个愚蠢的 "NUMNUM" 字符串 -- 其目的何在?如果查看 sed 脚本的后两行,就会发现其目的,后两行将把 "NUMNUM" 替换成 "-",而把 "NUM"<number>"NUM" 替换成 <number>。如您所见,用愚蠢的标记括起支票号允许我们在该字段为空时方便地插入一个 "-"。
结束尝试
最后一行除去数字后的逗号。它把如 "3,231.00" 这样的美元金额转换成我使用的格式 "3231.00"。现在,让我们看一下最终脚本:
最终的“QIF 到文本”脚本
1d /^^/d s/]//g /^D/ { s/^D\(.*\)/\1\tOUTY\tINNY\t/
s/^01/Jan/ s/^02/Feb/ s/^03/Mar/ s/^04/Apr/ s/^05/May/
s/^06/Jun/ s/^07/Jul/ s/^08/Aug/ s/^09/Sep/ s/^10/Oct/
s/^11/Nov/ s/^12/Dec/ s:^\(.*\)/\(.*\)/\(.*\):\2 \1 \3:
N N N s/\nT\(.*\)\nN\(.*\)\nP\(.*\)/NUM\2NUM\t\tY\t\t\3\tAMT\1AMT/
s/NUMNUM/-/ s/NUM\(*\)NUM/\1/ s/\(\),/\1/
/AMT-*.*AMT/b fixnegs
s/AMT\(.*\)AMT/\1/ s/OUTY/-/ s/INNY/inco/
b done :fixnegs s/AMT-\(.*\)AMT/\1/ s/OUTY/misc/
s/INNY/-/ :done }
附加的十一行使用替代和一些分支功能来美化输出。首先看一下这行:
/AMT-*.*AMT/b fixnegs
该行包含一个格式为 "/regexp/b label" 的分支命令。如果模式空间与规则表达式匹配,sed 将分支到 fixnegs 标号。您应该可以轻易找到该标号,它在代码中为 ":fixnegs"。如果规则表达式不匹配,则以常规方式继续处理下一个命令。
既然您理解该命令本身的工作原理,让我们看一下分支。如果看一下分支规则表达式,将看到它与后面依次跟有 '-'、任意数量的数字、一个 '.'、任意数量的数字和 'AMT' 的字符串 'AMT' 匹配。就象我确信您已猜到一样,该规则表达式专门处理负的美元金额。在这之前,用 'ATM' 括起美元金额,以便以后可以轻易找到它。因为规则表达式只与以 '-' 开始的美元金额匹配,所以,该分支只在恰巧处理借款时才发生。如果正处理贷款,应该将 OUTY 设置成 'misc',将 INNY 设置成 '-',并且应该除去贷款数量前面的负号。如果跟踪代码的流程,将看到实际情况正是这样。如果不执行分支,则用 '-' 替换 OUTY,用 'inco' 替换 INNY。完成了!现在输出行是完美的:
28 Aug 2000 misc - - Y CHECKCARD SUPERMARKET -8.15
别犯糊涂
如您所见,只要循序渐进地解决问题,使用 sed 转换数据就没有那么难。不要试图使用一个 sed 命令或一下子解决所有问题。相反,要朝着目标逐步进行,并不断改进 sed 脚本,直到其输出正如您希望那样为止。sed 有许多功能,希望您已非常熟悉其内部工作原理并继续努力以进一步掌握它!
参考资料
您可以参阅本文在 developerWorks 全球站点上的
英文原文
.
阅读 developerWorks上 Daniel 的其它 sed 文章:通用线程:sed 实例,
第 2 部分
和
第 3 部分
。
查看 Eric Pement 极佳的
sed FAQ
。
可以在
ftp://ftp.gnu.org/pub/gnu/sed
找到 sed 3.02 资源。
将在
alpha.gnu.org
找到很好的新的 sed 3.02.80。
另外,Eric Pement 还有一些方便的
sed 单行程序
,任何有抱负的 sed 高手都应该看一下。
如果想看好的老式书籍,O'Reilly 的
sed & awk, 2nd Edition
将是极佳选择。
可能想阅读
7th edition UNIX's sed man page
(大概 1978!)。
阅读 Felix von Leitner 短小的
sed tutorial
。
在 developerWorks上阅读 David Mertz 的
"Text processing in Python"
。
关于规则表达式:
在
使用规则表达式
中复习,发现和修改这个免费 dW 独家教程文本中的模式。
在 Python.org 中查看规则表达式
how-to document
。
参考美国肯塔基大学的
overview of regular expressions
。
Last edited by 无奈何 on 2006-10-26 at 01:54 PM ]
Repost Note: Original link http://www-128.ibm.com/developerworks/cn/linux/shell/sed/sed-1/index.html
General Thread -- sed Examples, Part 1
Daniel Robbins, President/CEO, Gentoo Technologies, Inc.
October 2001
In this article series, Daniel Robbins will show you how to use the very powerful (but often forgotten) UNIX stream editor sed. sed is an ideal tool for editing files in batch mode or creating shell scripts in a very efficient way to modify existing files.
Choosing an Editor
There are many text editors available in the UNIX world for us to choose from. Think about it -- vi, emacs, jed, and many other tools will come to mind. We all have editors that we have gradually come to know and love (and our favorite key combinations). With a reliable editor, we can easily handle any number of UNIX-related administrative or programming tasks.
Although interactive editors are great, they have their limitations. Although their interactive nature can be a strength, it also has its drawbacks. Consider a situation where you need to make similar changes to a set of files. You might instinctively run your favorite editor and then manually perform a set of tedious, repetitive, and time-consuming editing tasks. However, there is a better way.
Enter sed
It would be great if we could automate the process of editing files so that we can edit files in "batch" mode or even write scripts that can make complex changes to existing files. Fortunately, for this situation, there is a better way -- this better way is called "sed".
sed is a lightweight stream editor included in almost all UNIX platforms, including Linux. sed has many good features. First of all, it is quite small, usually much smaller than your favorite scripting language. Second, because sed is a stream editor, it can edit data received from standard input such as pipes. Therefore, there is no need to store the data to be edited in a file on disk. Since data can be easily piped to sed, it is easy to use sed as a long and complex pipe in a powerful shell script. Try doing that with your favorite editor.
GNU sed
Fortunately for Linux users, one of the best versions of sed is exactly GNU sed, whose current version is 3.02. Every Linux distribution has (or at least should have) GNU sed. GNU sed is popular not only because its source code can be freely distributed, but also because it happens to have many convenient and time-saving extensions to the POSIX sed standard. In addition, GNU does not have many of the limitations of the early specialized versions of sed, such as line length limits -- GNU can easily handle lines of any length.
Latest GNU sed
When I was researching this article, I noticed that several online sed enthusiasts mentioned GNU sed 3.02a. Strangely, sed 3.02a cannot be found on ftp.gnu.org (see Resources for these links), so I had to look elsewhere. I found it in /pub/sed on alpha.gnu.org. So I happily downloaded it, compiled it, and installed it, and a few minutes later I found that the latest version of sed was 3.02.80 -- its source code can be found next to the 3.02a source code on alpha.gnu.org. After installing GNU sed 3.02.80, I was completely ready.
alpha.gnu.org (see Resources) is the home of new and experimental GNU source code. However, you will also find many excellent, stable source code there. For some reason, either many GNU developers have forgotten to move the stable source code to ftp.gnu.org, or their "beta" period is exceptionally long (2 years!). For example, sed 3.02a has been around for two years, and even 3.02.80 has been around for a year, but they are still not available on ftp.gnu.org (when this article was written in August 2000).
Proper sed
In this series, we will use GNU sed 3.02.80. In the upcoming subsequent articles in this series, some (but very few) of the most advanced examples will not work in GNU sed 3.02 or 3.02a. If you are not using GNU sed, the results may be different. Why not take some time now to install GNU sed 3.02.80? That way, you will not only be ready for the rest of this series, but you will also be using what may be the best sed available at present.
sed Examples
sed works by performing any number of user-specified editing operations ("commands") on the input data. sed is line-based, so it executes commands on each line in sequence. sed then writes its results to standard output (stdout), and it does not modify any input files.
Let's look at some examples. The first few will be a bit strange because I will use them to demonstrate how sed works, not to perform any useful tasks. However, if you are new to sed, it is very important to understand them. Here is the first example:
$ sed -e 'd' /etc/services
If you enter this command, you will get no output. So, what happened? In this example, sed is called with an editing command 'd'. sed opens the /etc/services file, reads a line into its pattern buffer, executes the edit command ("delete the line"), and then prints the pattern buffer (the buffer is empty). Then, it repeats these steps for each subsequent line. This produces no output because the "d" command removes each line in the pattern buffer!
There are a few other things to notice in this example. First, /etc/services is not modified at all. This is still because sed only reads the file specified on the command line and uses it as input -- it does not attempt to modify the file. The second thing to notice is that sed is line-based. The 'd' command does not simply tell sed to delete all input data at once. Instead, sed reads each line of /etc/services into its internal buffer called the pattern buffer line by line. Once a line is read into the pattern buffer, it executes the 'd' command and then prints the contents of the pattern buffer (there is nothing in this example). I will show you later how to use address ranges to control which lines the commands are applied to -- but if no address is used, the command will be applied to all lines.
The third thing to notice is the use of single quotes to enclose the 'd' command. It is a good idea to get into the habit of using single quotes to enclose sed commands so that shell expansion is disabled.
Another sed example
Here is an example of using sed to remove the first line of the /etc/services file from the output stream:
$ sed -e '1d' /etc/services | more
As you can see, this command is very similar to the first 'd' command except that there is a '1' in front. If you guessed that '1' refers to the first line, you are correct. Unlike the first example where only 'd' is used, this time a optional numeric address is used in front of 'd'. By using an address, you can tell sed to edit only certain specific lines.
Address Ranges
Now, let's see how to specify an address range. In this example, sed will delete lines 1 to 10 of the output:
$ sed -e '1,10d' /etc/services | more
When two addresses are separated by a comma, sed will apply the subsequent command to the range from the first address to the second address. In this example, the 'd' command is applied to lines 1 to 10 (inclusive). All other lines are ignored.
Addresses with Regular Expressions
Now let's demonstrate a more useful example. Suppose you want to view the contents of the /etc/services file, but you are not interested in viewing the comment sections included in it. As you know, comments can be placed in the /etc/services file by lines starting with the '#' character. To avoid comments, we want sed to delete lines that start with '#'. Here's how:
$ sed -e '/^#/d' /etc/services | more
Try this example and see what happens. You will notice that sed successfully accomplishes the intended task. Now, let's analyze what happened.
To understand the '/^#/d' command, first we need to parse it. First, let's remove 'd' -- this is the same delete line command we used earlier. The new addition is the '/^#/' part, which is a new regular expression address. Regular expression addresses are always enclosed in slashes. They specify a pattern, and the command that follows the regular expression address will only apply to lines that exactly match that particular pattern.
So, '/^#/' is a regular expression. But what does it do? Obviously, it's time to review regular expressions now.
Review of Regular Expressions
Regular expressions can be used to represent patterns that may be found in text. Have you used the '*' character on the shell command line? This usage is similar to regular expressions, but not the same. Here are the special characters that can be used in regular expressions:
Character Description
^ Matches the start of a line
$ Matches the end of a line
. Matches any single character
* Matches zero or more occurrences of the previous character
Matches any character within
Probably the best way to get a feel for regular expressions is to look at a few examples. All of these examples will be accepted by sed as valid addresses that appear on the left side of the command. Here are a few examples:
Regular Expression Description
/./ Matches any line that contains at least one character
/../ Matches any line that contains at least two characters
/^#/ Matches any line that starts with '#'
/^$/ Matches all empty lines
/}$/ Matches any line that ends with '}' (no spaces)
/} *$/ Matches any line that ends with '}' followed by zero or more spaces
// Matches any line that contains lowercase 'a', 'b', or 'c'
/^/ Matches any line that starts with 'a', 'b', or 'c'
In these examples, you are encouraged to try a few. Spend some time getting familiar with regular expressions, then try a few regular expressions you create yourself. You can use regexp as follows:
$ sed -e '/regexp/d' /path/to/my/test/file | more
This will cause sed to delete any matching lines. However, it is more helpful to get familiar with regular expressions by telling sed to print the regexp match and delete the non-matching content, rather than the other way around. You can do this with the following command:
$ sed -n -e '/regexp/p' /path/to/my/test/file | more
Note the new '-n' option, which tells sed not to print the pattern space unless explicitly asked to. You will also notice that we replaced the 'd' command with the 'p' command, which, as you might guess, explicitly asks sed to print the pattern space. That's it, only the matching part will be printed.
More on Addresses
So far, we have seen line addresses, line range addresses, and regexp addresses. But there are more possibilities. We can specify two regular expressions separated by a comma, and sed will match all lines from the first line that matches the first regular expression to the line that matches the second regular expression (inclusive). For example, the following command will print the text block starting from the line containing "BEGIN" and ending with the line containing "END":
$ sed -n -e '/BEGIN/,/END/p' /my/test/file | more
If "BEGIN" is not found, no data will be printed. If "BEGIN" is found but "END" is not found in all subsequent lines, all subsequent lines will be printed. This happens because of sed's stream-oriented nature -- it doesn't know if "END" will appear.
C Source Code Example
If you just want to print the main() function in a C source file, you can enter:
$ sed -n -e '/main]*(/,/^}/p' sourcefile.c | more
This command has two regular expressions '/main]*(/' and '/^}/', and a command 'p'. The first regular expression will match the string "main" followed by any number of spaces or tabs and then an opening parenthesis. This should match the start of a typical ANSI C main() declaration.
In this particular regular expression, the ']' character class appears. This is just a special keyword that tells sed to match TAB or space. If you want, you can type ']', then a space letter, then -V, then a tab letter and ']' -- Control-V tells bash to insert the "real" tab instead of performing command expansion. Using the ']' command class (especially in scripts) is clearer.
Okay, now let's look at the second regexp. '/^}' will match any '}' character that appears at the start of a new line. If the code is well-formatted, this will match the closing curly brace of the main() function. If the formatting is not good, it will not match correctly -- this is a tricky part of performing pattern matching tasks.
Because we are in '-n' quiet mode, the 'p' command still does its usual job, which is to explicitly tell sed to print the line. Try running this command on a C source file -- it should output the entire main() { } block, including the starting "main()" and the closing '}'.
General Thread -- sed Examples, Part 2
Content:
Substitution!
Regular Expression Mayhem
More Character Matching
Advanced Substitution Features
Those Wonderful Parentheses with Backslashes
Combining Commands
Multiple Commands for One Address
Appending, Inserting, and Changing Lines
Next Part
Resources
About the Author
Rating for this Article
October 2001
sed is a very powerful and compact text stream editor. In the second article of this series, Daniel Robbins shows you how to use sed to perform string substitution, create larger sed scripts, and how to use sed's appending, inserting, and changing line commands.
sed is a useful (but often forgotten) UNIX stream editor. It is an ideal tool for editing files in batch mode or creating shell scripts in an efficient way to modify existing files. This article is a continuation of the previous article introducing sed.
Substitution!
Let's take a look at one of the most useful commands in sed, the substitution command. With this command, you can replace a specific string or a matching regular expression with another string. Here is an example of the most basic usage of this command:
$ sed -e 's/foo/bar/' myfile.txt
The above command replaces the first occurrence (if any) of 'foo' in each line of myfile.txt with the string 'bar', and then outputs the contents of the file to standard output. Please note that I said the first occurrence per line, although this is usually not what you want. When performing string substitution, you usually want to perform a global substitution. That is, you want to replace all occurrences in each line, as follows:
$ sed -e 's/foo/bar/g' myfile.txt
The 'g' option appended after the last slash tells sed to perform a global substitution.
There are a few other things to know about the 's///' substitution command. First, it is a command, and it is just a command, and no address is specified in all the above examples. This means that 's///' can also be used with an address to control which lines the command is applied to, as follows:
$ sed -e '1,10s/enchantment/entrapment/g' myfile2.txt
The above example will cause all occurrences of the phrase 'enchantment' to be replaced with the phrase 'entrapment', but only on lines 1 to 10 (inclusive).
$ sed -e '/^$/,/^END/s/hills/mountains/g' myfile3.txt
This example will replace 'hills' with 'mountains', but only on the text block starting from an empty line and ending with a line starting with the three characters 'END' (inclusive).
Another nice thing about the 's///' command is that there are many substitution options for the '/' delimiter. If you are performing a string substitution and there are many slashes in the regular expression or substitution string, you can change the delimiter by specifying a different character after 's'. For example, the following example will replace all occurrences of /usr/local with /usr:
$ sed -e 's:/usr/local:/usr:g' mylist.txt
In this example, a colon is used as the delimiter. If you need to specify a delimiter character in the regular expression, you can precede it with a backslash.
Regular Expression Mayhem
So far, we have only performed simple string substitutions. Although this is convenient, we can also match regular expressions. For example, the following sed command will match a phrase starting with '<' and ending with '>', and containing any number of characters in between. The following example will delete this phrase (replace it with an empty string):
$ sed -e 's/<.*>//g' myfile.html
This is the first good attempt at a sed script to remove HTML tags from a file, but it won't work well because of the peculiar rules of regular expressions. Why? When sed tries to match a regular expression in a line, it looks for the longest match in the line. In my previous sed article, this was not a problem because we were using 'd' and 'p' commands, which always delete or print the entire line. However, when using the 's///' command, there is a big difference because the entire part matched by the regular expression will be replaced by the target string, or, in this example, deleted. This means that the following line:
<b>This</b> is what <b>I</b> meant.
will become:
meant.
We don't want this, but rather:
This is what I meant.
Fortunately, there is an easy way to correct this problem. Instead of entering a regular expression that is "< followed by some characters and ending with >", we just need to enter a regular expression that is "< followed by any number of non-> characters and ending with >". This will match the shortest, not the longest, possibility. The new command is as follows:
$ sed -e 's/<*>//g' myfile.html
In the above example, '' specifies "non->" characters, and the '*' after it completes the expression to mean "zero or more non->" characters. Test this command on a few html files, pipe the output to "more", and then carefully look at the results.
More Character Matching
There are some additional options for the '' regular expression syntax. To specify a range of characters, you can use '-' as long as the character is not in the first or last position, as follows:
'*'
This will match zero or more characters that are all 'a', 'b', 'c'...'v', 'w', 'x'. In addition, you can use the '' character class to match spaces. Here is a fairly complete list of available character classes:
Character Class Description
Alphanumeric
Alphabetic
Space or tab
Any control character
Digit
Any visible character (no space)
Lowercase
Non-control character
Punctuation character
Space
Uppercase
Hexadecimal digit
It is beneficial to use character classes whenever possible because they adapt better to non-English locales (including some necessary accented characters, etc.).
Advanced Substitution Features
We have seen how to perform simple and even somewhat complex direct substitutions, but sed can do much more. In fact, you can reference part or all of the matched regular expression and use these parts to construct the replacement string. As an example, suppose you are replying to a message. The following example will prepend the phrase "ralph said: " to each line:
$ sed -e 's/.*/ralph said: &/' origmsg.txt
The output is as follows:
ralph said: Hiya Jim, ralph said: ralph said:
I sure like this sed stuff! ralph said:
The '&' character is used in the replacement string of this example, which tells sed to insert the entire matched regular expression. Therefore, you can insert anything matched by '.*' (the largest group or entire line of zero or more characters in the line) anywhere in the replacement string, even multiple times. This is very good, but sed is even more powerful.
Those Wonderful Parentheses with Backslashes
The 's///' command is even better than '&' because it allows us to define regions in the regular expression, which can then be referenced in the replacement string. As an example, suppose you have a file containing the following text:
foo bar oni eeny meeny miny larry curly moe jimmy the weasel
Now suppose you want to write a sed script that will replace "eeny meeny miny" with "Victor eeny-meeny Von miny" and so on. To do this, first write a regular expression that is separated by spaces and matches three strings.
'.* .* .*'
Now, enclose each area of interest with backslash-enclosed parentheses to define the areas:
'\(.*\) \(.*\) \(.*\)'
This regular expression works the same as the first regular expression, except that it defines three logical areas that can be referenced in the replacement string. Here is the final script:
$ sed -e 's/\(.*\) \(.*\) \(.*\)/Victor \1-\2 Von \3/' myfile.txt
As you can see, each area delimited by parentheses is referenced by entering '\x' (where x is the area number starting from 1). Enter as follows:
Victor foo-bar Von oni Victor eeny-meeny Von miny Victor larry-curly Von moe Victor jimmy-the Von weasel
As you become more familiar with sed, you can do quite powerful text processing with minimal effort. You might be wondering how you would handle such a problem with a familiar scripting language -- can you easily implement such a solution in one line of code?
Combining Commands
When you start creating more complex sed scripts, you need to be able to enter multiple commands. There are several ways to do this. First, you can use semicolons between commands. For example, the following command series uses the '=' command and the 'p' command. The '=' command tells sed to print the line number, and the 'p' command explicitly tells sed to print the line (because it is in '-n' mode).
$ sed -n -e '=;p' myfile.txt
Whenever you specify two or more commands, each command is applied to each line of the file in sequence. In the above example, the '=' command is first applied to line 1, then the 'p' command is applied. Then, sed continues to process line 2 and repeats the process. Although semicolons are convenient, they don't work properly in some situations. Another alternative is to use two -e options to specify two different commands:
$ sed -n -e '=' -e 'p' myfile.txt
However, when using more complex append and insert commands, even multiple '-e' options won't help us. For complex multi-line scripts, the best approach is to put the commands in a separate file. Then, reference the script file with the -f option:
$ sed -n -f mycommands.sed myfile.txt
This approach may be less convenient, but it always works.
Multiple Commands for One Address
Sometimes, you may want to specify multiple commands to be applied to one address. This is especially convenient when performing many 's///' to transform words and syntax in a source file. To execute multiple commands on one address, enter the sed commands in a file, and then group these commands using '{}' characters, as follows:
1,20{ s/inux/GNU\/Linux/g s/samba/Samba/g s/posix/POSIX/g }
The above example will apply three substitution commands to lines 1 to 20 (inclusive). You can also use regular expression addresses or a combination of both:
1,/^END/{ s/inux/GNU\/Linux/g s/samba/Samba/g s/posix/POSIX/g p }
This example will apply all commands between '{}' to all lines starting from line 1 and ending with the line starting with the letter "END" (if "END" is not found in the source file, it will end at the end of the file).
Appending, Inserting, and Changing Lines
Now that we are writing sed scripts in a separate file, we can take advantage of the append, insert, and change line commands. These commands will insert a line after the current line, insert a line before the current line, or replace the current line in the pattern space. They can also be used to insert multiple lines into the output. The insert line command is used as follows:
i\ This line will be inserted before each line
If you don't specify an address for this command, it will be applied to each line and produce output like the following:
This line will be inserted before each line line 1 here
This line will be inserted before each line line 2 here
This line will be inserted before each line line 3 here
This line will be inserted before each line line 4 here
If you want to insert multiple lines before the current line, you can add additional lines by appending a backslash after the previous line, as follows:
i\ insert this line\ and this one\ and this one\ and, uh, this one too.
The append command is used similarly, but it will insert one or more lines after the current line in the pattern space. Its usage is as follows:
a\ insert this line after each line. Thanks! :)
On the other hand, the "change line" command will actually replace the current line in the pattern space, and its usage is as follows:
c\ You're history, original line! Muhahaha!
Because the append, insert, and change line commands require multiple lines of input, you will enter them into a text sed script and then tell sed to execute them by using the '-f' option. Using other methods to pass commands to sed will cause problems.
General Thread -- sed Examples, Part 3
Content:
Robust sed
Text Transformation
Reversing Lines
Reversal Explanation
sed QIF Magic
The Story of Two Formats
Starting the Process
Refinement
Ending Attempts
Don't Get Confused
Resources
About the Author
Rating for this Article
October 2001
In this concluding article of the sed series, Daniel Robbins takes you through the true power of sed. After introducing several important sed scripts, he will demonstrate the writing of some basic sed scripts by converting a Quicken .QIF file into a readable text format. This conversion script is not only practical, but also an excellent example of demonstrating sed scripting capabilities.
Robust sed
In the second sed article, I provided some examples to demonstrate how sed works, but few of them actually do anything particularly useful. In this final article of the sed series, I will change that and use sed to do something practical. I will show you a few examples that not only demonstrate sed's capabilities, but also do something really clever (and convenient). For example, later in this article, I will show you how to design a sed script to convert a .QIF file from Intuit's Quicken financial program into a well-formatted text file. Before doing that, we will look at some less complex but useful sed scripts.
Text Transformation
The first practical script will convert UNIX-style text to DOS/Windows format. As you may know, text files based on DOS/Windows have a CR (carriage return) and LF (line feed) at the end of each line, while UNIX text has only a line feed. Sometimes you may need to move some UNIX text to a Windows system, and this script will perform the necessary format conversion for you.
$ sed -e 's/$/\r/' myunix.txt > mydos.txt
In this script, the '$' regular expression will match the end of the line, and '\r' tells sed to insert a carriage return before it. Inserting a carriage return before the line feed immediately causes each line to end with CR/LF. Please note that '\r' will be replaced with CR only if you are using GNU sed 3.02.80 or later. If you haven't installed GNU sed 3.02.80 yet, see the instructions on how to do so in my first sed article.
I can't remember how many times I have downloaded some example scripts or C code only to find that it is in DOS/Windows format. Although many programs don't care about CR/LF text files in DOS/Windows format, a few programs do -- the most famous one is bash, which will have problems as soon as it encounters a carriage return. The following sed call will convert text in DOS/Windows format to reliable UNIX format:
$ sed -e 's/.$//' mydos.txt > myunix.txt
The script works simply: the substitution regular expression matches the last character of a line, which happens to be the carriage return. We replace it with an empty character, thus removing it completely from the output. If you use this script and notice that you have removed the last character of each line in the output, then you have specified a text file that is already in UNIX format. There is no need to do that!
Reversing Lines
Here is another convenient little script. Like the "tac" command included with most Linux distributions, this script will reverse the order of lines in a file. The name "tac" may be misleading because "tac" does not reverse the position of characters in a line (left and right), but rather reverses the position of lines in a file (top and bottom). Using "tac" to process the following file:
foo bar oni
....will produce the following output:
oni bar foo
You can achieve the same purpose with the following sed script:
$ sed -e '1!G;h;$!d' forward.txt > backward.txt
If you log in to a FreeBSD system that happens to not have the "tac" command, you will find that this sed script is very useful. Although convenient, it is better to know why this script works that way. Let's discuss it.
Reversal Explanation
First, this script contains three separate sed commands separated by semicolons: '1!G', 'h', and '$!d'. Now, we need to understand the addresses used for the first and third commands. If the first command were '1G', then the 'G' command would only be applied to the first line. However, there is an '!' character -- this '!' character ignores the address, that is, the 'G' command will be applied to all lines except the first line. The '$!d' command is similar. If the command were '$d', then the 'd' command would only be applied to the last line in the file ('$' address is an easy way to specify the last line). However, with the '!', '$!d' will apply the 'd' command to all lines except the last line. Now, what we need to understand is what these commands themselves do.
When the reversal script is executed on the above text file, the first command to be executed is 'h'. This command tells sed to copy the contents of the pattern space (the buffer that holds the current line being processed) to the hold space (a temporary buffer). Then, the 'd' command is executed, which deletes "foo" from the pattern space so that it is not printed after all commands are executed for this line.
Now, for the second line. After "bar" is read into the pattern space, the 'G' command is executed, which appends the contents of the hold space ("foo\n") to the pattern space ("bar\n"), making the contents of the pattern space "bar\n\foo\n". The 'h' command puts this content back into the hold space for protection, and then 'd' deletes the line from the pattern space so that it is not printed.
For the last "oni" line, the same steps are repeated except that the contents of the pattern space (three lines) are not deleted by 'd' (due to '$!') and are printed to standard output.
Now, you can use sed to perform some powerful data transformations.
sed QIF Magic
For the past few weeks, I have been wanting to buy a copy of Quicken to balance my bank account. Quicken is a very good financial program and will certainly do the job successfully. However, after thinking about it, I felt that I could easily write some software to balance my checkbook. I thought, after all, I am a software developer!
I developed a nice small checkbook balancing program (using awk) that calculates the balance by analyzing the syntax of a text file containing all my transactions. After a slight adjustment, I improved it so that I could track different loan and borrowing categories like Quicken. But I also wanted to add a feature. Recently, I transferred my account to a bank that has an online Web account interface. One day, I noticed that this bank's Web site allows me to download my account information in Quicken's .QIF format. I immediately thought it would be great if I could convert this information into text format.
The Story of Two Formats
Before looking at the QIF format, let's look at my checkbook.txt format:
28 Aug 2000 food - - Y Supermarket 30.94 25 Aug 2000 watr - 103 Y Check 103 52.86
In my file, all fields are separated by one or more tabs, and each transaction occupies one line. The next field after the date lists the expenditure type (if it is an income item, it is "-"). The third field lists the income type (if it is an expenditure item, it is "-"). Then, there is a check number field (if it is empty, it is still "-"), a transaction completion field ("Y" or "N"), a comment, and a dollar amount field. Now, let's look at the QIF format. When you view the downloaded QIF file with a text viewer, it looks like the following:
!Type:Bank D08/28/2000 T-8.15 N PCHECKCARD SUPERMARKET ^ D08/28/2000 T-8.25 N PCHECKCARD PUNJAB RESTAURANT ^ D08/28/2000 T-17.17 N PCHECKCARD SUPERMARKET
After browsing the file, it's not difficult to guess its format -- ignoring the first line, the rest are formatted as follows:
D<data>
T<transaction amount>
N<check number>
P<description>
^ (This is the field separator)
Starting the Process
When dealing with such an important sed project, don't get discouraged -- sed allows you to gradually modify the data into the final form. As you go along, you can continue to refine the sed script until the output is exactly as you expect. There is no need to guarantee it to be completely correct the first time you try.
To get started, first create a file named "qiftrans.sed" and start modifying the data:
1d /^^/d s/]//g
The first '1d' command deletes the first line, the second command removes those annoying '^' characters from the output. The last line removes any control characters that may be present in the file. Since we are dealing with an external file format, I want to eliminate the risk of encountering any control characters along the way. So far, so good. Now, add some processing functions to this basic script:
1d /^^/d s/]//g /^D/ {
s/^D\(.*\)/\1\tOUTY\tINNY\t/
s/^01/Jan/ s/^02/Feb/
s/^03/Mar/ s/^04/Apr/
s/^05/May/ s/^06/Jun/
s/^07/Jul/ s/^08/Aug/
s/^09/Sep/ s/^10/Oct/
s/^11/Nov/ s/^12/Dec/
s:^\(.*\)/\(.*\)/\(.*\):\2 \1 \3: }
First, add a '/^D/' address so that sed will only start processing when it encounters the first character 'D' of the QIF data field. When sed reads such a line into its pattern space, it will execute all commands in the braces in sequence.
The first command in the braces will transform a line like this:
D08/28/2000
into:
08/28/2000 OUTY INNY
Of course, the format is not perfect yet, but that's okay. We will gradually refine the contents of the pattern space as we go. The final effect of the next 12 lines is to transform the data into a three-letter format, and the last line removes the three slashes from the data. Finally, we get this line:
Aug 28 2000 OUTY INNY
The OUTY and INNY fields are placeholders that will be replaced later. We can't determine them yet because if the dollar amount is negative, we will set OUTY and INNY to "misc" and "-", but if the dollar amount is positive, we will change them to "-" and "inco" respectively. Since we haven't read the dollar amount yet, we need to use placeholders for now.
Refinement
Now further refine:
1d /^^/d s/]//g /^D/ {
s/^D\(.*\)/\1\tOUTY\tINNY\t/
s/^01/Jan/ s/^02/Feb/
s/^03/Mar/ s/^04/Apr/
s/^05/May/ s/^06/Jun/
s/^07/Jul/ s/^08/Aug/
s/^09/Sep/ s/^10/Oct/
s/^11/Nov/ s/^12/Dec/
s:^\(.*\)/\(.*\)/\(.*\):\2 \1 \3:
N N N
s/\nT\(.*\)\nN\(.*\)\nP\(.*\)/NUM\2NUM\t\tY\t\t\3\tAMT\1AMT/
s/NUMNUM/-/ s/NUM\(*\)NUM/\1/
s/\(\),/\1/ }
The last seven lines are a bit complex, so let's discuss them in detail. First, use three consecutive 'N' commands. The 'N' command tells sed to read the next line into the input and then append it to the current pattern space. These three 'N' commands cause the next three lines to be appended to the current pattern space buffer, and now this line looks like this:
28 Aug 2000 OUTY INNY \nT-8.15\nN\nPCHECKCARD SUPERMARKET
sed's pattern space becomes messy -- we need to remove the extra newlines and perform some additional formatting. To do this, we will use the substitution command. The pattern to match is:
'\nT.*\nN.*\nP.*'
This will match a newline followed by 'T', zero or more characters, a newline, 'N', any number of characters, a newline, 'P', and any number of characters. Yikes! This regular expression will match the entire content of the three lines just appended to the pattern space. But we want to reformat this area, not replace it entirely. The dollar amount, check number (if any), and description need to appear in the replacement string. To do this, we enclose those "interesting parts" with backslash-enclosed parentheses so that we can reference them in the replacement string (using '\1', '\2', and '\3' to tell sed where to insert them). Here is the final command:
s/\nT\(.*\)\nN\(.*\)\nP\(.*\)/NUM\2NUM\t\tY\t\t\3\tAMT\1AMT/
This command transforms our line into:
28 Aug 2000 OUTY INNY NUMNUM Y CHECKCARD SUPERMARKET AMT-8.15AMT
Although the line is getting better, there are a few things that look a bit... ah... interesting. First is that stupid "NUMNUM" string -- what's its purpose? If you look at the last two lines of the sed script, you will find its purpose. The last two lines replace "NUMNUM" with "-" and replace "NUM"<number>"NUM" with <number>. As you can see, enclosing the check number with stupid markers allows us to easily insert a "-" when this field is empty.
Ending Attempts
The last line removes the comma after the number. It transforms a dollar amount like "3,231.00" into the format "3231.00" that I use. Now, let's look at the final script:
Final "QIF to Text" Script
1d /^^/d s/]//g /^D/ { s/^D\(.*\)/\1\tOUTY\tINNY\t/
s/^01/Jan/ s/^02/Feb/ s/^03/Mar/ s/^04/Apr/ s/^05/May/
s/^06/Jun/ s/^07/Jul/ s/^08/Aug/ s/^09/Sep/ s/^10/Oct/
s/^11/Nov/ s/^12/Dec/ s:^\(.*\)/\(.*\)/\(.*\):\2 \1 \3:
N N N s/\nT\(.*\)\nN\(.*\)\nP\(.*\)/NUM\2NUM\t\tY\t\t\3\tAMT\1AMT/
s/NUMNUM/-/ s/NUM\(*\)NUM/\1/ s/\(\),/\1/
/AMT-*.*AMT/b fixnegs
s/AMT\(.*\)AMT/\1/ s/OUTY/-/ s/INNY/inco/
b done :fixnegs s/AMT-\(.*\)AMT/\1/ s/OUTY/misc/
s/INNY/-/ :done }
The additional eleven lines use substitution and some branching functions to beautify the output. First, look at this line:
/AMT-*.*AMT/b fixnegs
This line contains a branch command in the format "/regexp/b label". If the pattern space matches the regular expression, sed will branch to the fixnegs label. You should be able to easily find this label, which is ":fixnegs" in the code. If the regular expression does not match, processing continues in the normal way.
Now that you understand how the command itself works, let's look at the branch. If you look at the branch regular expression, you will see that it matches a string that is followed by '-' followed by any number of digits, a '.', any number of digits, and 'AMT' after 'AMT'. As I'm sure you've guessed, this regular expression is specifically for handling negative dollar amounts. Before this, the dollar amount is enclosed in 'ATM' so that it can be easily found later. Since the regular expression only matches dollar amounts starting with '-', this branch only occurs when borrowing happens. If you are tracking a loan, you should set OUTY to 'misc' and INNY to '-', and you should remove the negative sign in front of the loan amount. If you follow the flow of the code, you will see that this is exactly what happens. If the branch is not executed, replace OUTY with '-' and INNY with 'inco'. Done! Now the output line is perfect:
28 Aug 2000 misc - - Y CHECKCARD SUPERMARKET -8.15
Don't Get Confused
As you can see, converting data with sed is not that difficult as long as you solve the problem step by step. Don't try to use one sed command or solve all problems at once. Instead, proceed towards the goal step by step and continuously refine the sed script until its output is exactly as you want. sed has many features, and I hope you are already very familiar with its internal working principles and continue to work hard to master it further!
Resources
You can refer to the English original of this article on the developerWorks global site at
English Original
.
Read other sed articles by Daniel on developerWorks: General Thread: sed Examples,
Part 2
and
Part 3
.
View Eric Pement's excellent
sed FAQ
.
You can find sed 3.02 resources at
ftp://ftp.gnu.org/pub/gnu/sed
.
You will find the excellent new sed 3.02.80 at
alpha.gnu.org
.
In addition, Eric Pement also has some convenient
sed one-liners
, which any aspiring sed master should take a look at.
If you want to read a good old-fashioned book, O'Reilly's
sed & awk, 2nd Edition
will be an excellent choice.
You may want to read the
7th edition UNIX's sed man page
(approximately 1978!).
Read Felix von Leitner's short
sed tutorial
.
Read David Mertz's
"Text processing in Python"
on developerWorks.
About Regular Expressions:
Review in
Using Regular Expressions
, find and modify the patterns in this free dW exclusive tutorial text.
View the regular expression
how-to document
at Python.org.
Refer to the
overview of regular expressions
at the University of Kentucky in the United States.
Last edited by 无奈何 on 2006-10-26 at 01:54 PM ]
|

☆开始\运行 (WIN+R)☆
%ComSpec% /cset,=何奈无── 。何奈可无是原,事奈无做人奈无&for,/l,%i,in,(22,-1,0)do,@call,set/p= %,:~%i,1%<nul&ping/n 1 127.1>nul
|
|
2006-10-26 13:20 |
|
|
无奈何
荣誉版主
      
积分 1338
发帖 356
注册 2005-7-15
状态 离线
|
『第 8 楼』:
sed用法小结(作者总结)
使用 LLM 解释/回答一下
转贴注:原始链接 http://bbs.chinaunix.net/archiver/?tid-691881.html
sed用法:
sed 'Command' filename(s) 只显示结果而不修改文件。
1、sed '2,5d' file 显示文件file,除去2-5行,但行数超过文件实际行数时不会报错。
sed '/10/d' file 显示文件file,除去包含101-104的行。
sed '2,$d' file 显示文件,只显示第一行。sed '2,$!d' file则只显示除第一行外的其它行。
sed '/^ *$/d file 删除文件中的空行。
2、sed -n '/10/p' file 只显示文件file中包含101-104的行。(-n和p必须同时使用,否则只有p时显示全部文件并多显示一次找到的行)
sed -n '5p' file 只显示文件的第5行
3、sed 's/moding/moden/g' file 将moding替换为moden
4、sed -n 's/^west/north/p' file 将west开头的行替换为north并显示出来。
5、sed 's/$/&.5/' file将file文件中以3个数字结尾的行替换为原数字加".5",&代表搜索到的字符串。
6、sed 's/\(mod\)ing/\1en/g file 将mod做为模式1封装在括号里,然后替换。
sed 's/...$//' file 删除每一行的最后三个字符。
sed 's/^...//' file 删除每一行的头三个字符。
7、sed 's#moding#moden#g' file 将moding替换为moden,s后面的#代表搜索串和替换串之间的分界符。
8、sed -n '/101/,/105/p' file 显示从101的匹配行到105的匹配行。如果只找到101的匹配行,则从101的匹配行到文件末。
sed -n '2,/999/p' file 显示从第2行到匹配行。
9、sed '/101/,/105/s/$/ 20050119/' file将从101的匹配行到105的匹配行的行末增加" 20050119"内容。
10、sed -e '1,3d' -e 's/moding/moden/g' file 先删除文件的1-3行,再进行替换。
sed -e '/^#/!d' file 显示文件以#开头的行。
11、sed '/101/r newfile' file 在每个匹配行增加文件newfile的内容
sed '/101/w newfile' file 把匹配行写入newfile。
12、sed '/101/a\
> ###' file 在匹配行后增加一新行。
sed '/101/i\
> ###' file 在匹配行前增加一新行。
sed '/101/c\
> ###' file 用新行替换匹配行。
13、sed 'y/abcd/ABCD/' file 将a、b、c、d分别替换为ABCD。
14、sed '5q' file 显示到第5行时退出。
15、sed '/101/{ n; s/moding/moden/g; }' file 在文件中找到匹配行的后一行(n)再进行替换。
sed '/101/{ s/moding/moden/g; q; }' file 在文件中找到第一个匹配行后进行替换后再退出。
16、sed -e '/101/{ h; d; }' -e '/104/{ G; }' file 在文件中找到与101匹配行后先存在一个缓存中,再放在与104匹配行后。
sed -e '/101/{ h; d; }' -e '/104/{ g; }' file 在文件中找到与101匹配行后先存在一个缓存中,再替代104的匹配行。
sed -e '/101/h' -e '$G' file 将最后一个匹配行放在文件末。
sed -e '/101/h' -e '$g' file 将最后一个匹配行替换文件末行。
sed -e '/101/h' -e '/104/x' file 在文件中找到与101匹配行后先存在一个缓存中,再与104的匹配行进行互换。
17、sed -f sfile file 根据文件sfile的命令列表进行操作。
cat sfile
/101/a\
####101####\
****101****
/104/c\
####104 deleted####\
****104 deleted****
1i\
####test####\
****test****
Last edited by 无奈何 on 2006-10-26 at 02:03 PM ]
Repost Note: Original link http://bbs.chinaunix.net/archiver/?tid-691881.html
sed Usage:
sed 'Command' filename(s) Only display the result without modifying the file.
1、sed '2,5d' file Display file, removing lines 2-5, but no error is reported when the number of lines exceeds the actual number of lines in the file.
sed '/10/d' file Display file, removing lines containing 101-104.
sed '2,$d' file Display file, only display the first line. sed '2,$!d' file only displays other lines except the first line.
sed '/^ *$/d file Delete blank lines in the file.
2、sed -n '/10/p' file Only display lines containing 101-104 in file. (-n and p must be used together, otherwise only p will display the entire file and display the found line one more time)
sed -n '5p' file Only display the 5th line of the file
3、sed 's/moding/moden/g' file Replace moding with moden
4、sed -n 's/^west/north/p' file Replace lines starting with west with north and display them.
5、sed 's/$/&.5/' file Replace lines ending with 3 digits in file with the original digits plus ".5", & represents the searched string.
6、sed 's/\(mod\)ing/\1en/g file Enclose mod as pattern 1 in parentheses, then replace.
sed 's/...$//' file Delete the last three characters of each line.
sed 's/^...//' file Delete the first three characters of each line.
7、sed 's#moding#moden#g' file Replace moding with moden, # after s represents the delimiter between the search string and the replacement string.
8、sed -n '/101/,/105/p' file Display from the matching line of 101 to the matching line of 105. If only the matching line of 101 is found, then from the matching line of 101 to the end of the file.
sed -n '2,/999/p' file Display from line 2 to the matching line.
9、sed '/101/,/105/s/$/ 20050119/' file Add " 20050119" content at the end of lines from the matching line of 101 to the matching line of 105.
10、sed -e '1,3d' -e 's/moding/moden/g' file First delete lines 1-3 of the file, then perform replacement.
sed -e '/^#/!d' file Display lines starting with # in the file.
11、sed '/101/r newfile' file Add the content of file newfile at each matching line
sed '/101/w newfile' file Write matching lines to newfile.
12、sed '/101/a\
> ###' file Add a new line after the matching line.
sed '/101/i\
> ###' file Add a new line before the matching line.
sed '/101/c\
> ###' file Replace the matching line with a new line.
13、sed 'y/abcd/ABCD/' file Replace a, b, c, d with ABCD respectively.
14、sed '5q' file Exit when displaying to line 5.
15、sed '/101/{ n; s/moding/moden/g; }' file After finding the matching line in the file, perform replacement on the next line (n).
sed '/101/{ s/moding/moden/g; q; }' file After finding the first matching line in the file, perform replacement and then exit.
16、sed -e '/101/{ h; d; }' -e '/104/{ G; }' file After finding the matching line with 101 in the file, first store it in a cache, then place it after the matching line with 104.
sed -e '/101/{ h; d; }' -e '/104/{ g; }' file After finding the matching line with 101 in the file, first store it in a cache, then replace the matching line with 104.
sed -e '/101/h' -e '$G' file Place the last matching line at the end of the file.
sed -e '/101/h' -e '$g' file Replace the last line of the file with the last matching line.
sed -e '/101/h' -e '/104/x' file After finding the matching line with 101 in the file, first store it in a cache, then swap it with the matching line with 104.
17、sed -f sfile file Operate according to the command list in file sfile.
cat sfile
/101/a\
####101####\
****101****
/104/c\
####104 deleted####\
****104 deleted****
1i\
####test####\
****test****
Last edited by 无奈何 on 2006-10-26 at 02:03 PM ]
|

☆开始\运行 (WIN+R)☆
%ComSpec% /cset,=何奈无── 。何奈可无是原,事奈无做人奈无&for,/l,%i,in,(22,-1,0)do,@call,set/p= %,:~%i,1%<nul&ping/n 1 127.1>nul
|
|
2006-10-26 13:20 |
|
|
无奈何
荣誉版主
      
积分 1338
发帖 356
注册 2005-7-15
状态 离线
|
『第 9 楼』:
在sed中使用循环
使用 LLM 解释/回答一下
转贴注:原始链接 http://blah.blogsome.com/2006/04/01/sed_iter/
在sed中使用循环
循环通常都可以写为条件转移的形式。sed没有专门的循环语句,但提供了转移的命令,因而我们仍然可以实现循环。本篇中总结用sed进行循环的几种方式。sed处理文本的方式本身就是一种循环:
do while not EOF
read line
... do sth
end do。
1 在sed中进行判断
因为sed只处理字符和行号,它只能通过式样来作字串的匹配判断或者对行号进行判断。所以判断的条件需要以字串或行号的形式出现。
1.1 用holdspace储存标志位
在hs中存储字串作为标志位,如:
# 进行6次操作
1{x; s/^/654321/; x}
:a
x;
/./{ s/.//; x; s/reg/ex/; ba}
x;
do sth else;
# 对每一行进行6次替换操作
1{x; s/^/654321/; x}
G
:a
/\n./{ s/reg/ex/; s/\n.$//; tb; s/.$//; ba}
:b
do sth else
1.2 用pattern space储存标志位
用pattern space(标志位附加在前面或后面),与hs的方式基本一样只是将标志位放在ps中。
1{ s/^/654321\n/ }
:a
/.\n/{ s/.//; s/reg/ex/; ba }
do sth else
1.3 用地址进行判断
地址(常用的地址是1,$)。当循环的条件与地址或行号有关时可以以这种方式。
sed '/./{H;d};x;/re/p' # 显示某个段落
2,8{H;d}; $G # 类似ed中的:2,8m$。d在这里有两个作用a清空PS。b强制进入下一cycle。
1.4 用式样进行判断
以当前ps的内容作为判断的标准,当循环条件与输入的内容有关时可以用这种方法。这样方法与标志位的方法相似,所不同的是我们并不人为地设置标志而是以当前ps的内容作为标志。请参考下面的例子:
:a
do sth with regexp
/regexp/ba
如果中间有s命令我们常用t来跳转,因而上面的可以写为:
:a
s/regexp/blah/
ta
2 循环常用的命令解析
q, b, t, T, d, n, N, :label
2.1 b/t控制循环
b都是sed中的分支(跳转)命令。它的格式是`b label‘,也可以去掉中间的空格,写作`blabel‘。上面命令的作用是从:label处继续执行脚本。label必须在脚本中定义,方法是在前面加上冒号(:),如`:loop‘`:lable‘。大部分sed对标签的长度有限制,具体的限制可以参考sed faq。如果b后没有带标签,则默认转到脚本结束处。
要注意的是许多的sed不允许在后面接其他命令因而:
gsed 'b abc;s/^/eee/;:abc;…'
这样的命令在其他版本的sed中要写成:
sed 'b abc
s/^/eee/
:abc'
或:
sed -e 'b abc' -e 's/^/eee/; :abc'
t命令与b相似。不同点在于t是以前s命令的成功与否来决定是否跳转,如成功则跳转。如果在一个s命令后使用了多个条件跳转则第二个及其后的t都会失败。gnu sed还提供了与t相反的T命令。
$ echo abc|sed 's/a/a/;bb;:b;tc;s/^/zzz/;:c'
abc
$ echo abc|sed 's/a/a/;tb;:b;tc;s/^/zzz/;:c'
zzzabc
$ echo abc|sed 's/a/a/;tb;:b;tc;s/^/zzz/;:c'
zzzabc
b/t控制循环:
s/re/&/;t
s/re/&/;T
与
/re/b
/re/!b
是等价的。但在下面的内容中我们会看到s/t的组合可以用在一些更复杂的情况。
当作用b进行循环时常用下面的方式退出循环:
在b前使用式样或地址(循环条件):
:a
...
/regexp/ba
上面的情况通常要求中间的语句对/regexp/进行修改或有另外的退出命令,才不会成为死循环。
2.1.0.1 退出条件
使用b/t循环时,为了避免出现死循环通常要设置退出的条件。如:
:a
...
/xxx/b
...
/regexp/ba
除了用b外还可以用t, N, q, d作为退出的命令,用式样或行号作为退出的条件。当前面有成功的替换时,可用`t‘转移动脚本中的任意位置——当然也可以跳出循环体。在最后一行使用N会退出脚本,利用这一点我们可以退出循环体。不过不同版本的sed在最后一行使用`N‘时的结果不同——有的会显示ps的内容,有的则不会。`q‘命令用来退出脚本,当然也就不会再循环了。`d‘和`D‘命令的副作用是强制脚本进入下cycle,这使得我们可以用这两个命令来形成行与行之间的循环,如果下一循环中不满足循环的进入条件则循环中止——所以这两个命令既是进行循环的命令也是退出的命令。当然这些行前面都可以使用式样或行号作为运行命令的条件。
如果中间的语句没有对/regexp/修改则结果类似:
:a
...
ba
例:要将输入中#号后的所有xxx删除。
输入:
abc#efxxxghxxxxijxxx
输出:
abc#efghxij
sed ':a
s/\(#.*\)xxx/\1/ # 修改了循环标志
/#.*xxx/ba'
当然这时t可以派上用场了:
sed ':a; s/\(#.*\)xxx/\1/; ta'
下面是一些实例:
echo -e "abc\nefg" | sed ':a;/re/b;ba'
echo -e "abc\nefg" | sed ':a;/re/!b;ba'
echo -e "abc\nefg" | sed ':a;s/re/&/;t;ba'
echo -e "abc\nefg" | gsed ':a;s/re/&/;T;ba'
用以控制循环:
echo -e "abc\nefg" | sed ':a;/re/ba'
echo -e "abc\nefg" | sed ':a;/re/!ba'
echo -e "abc\nefg" | sed ':a;s/re/&/;ta'
echo -e "abc\nefg" | sed ':a;s/re/&/;Ta'
echo -e "abc\nefg" | sed '/re/b; :'
2.2 d/D控制循环
用d来控制循环也是sed脚本中比较常用的技巧。d之所以能用来控制循环主要是因为它在删除完之后不会执行接下来的命令而会直接进入下一个cycle中。例:
sed 'd; s/^/abc/' # s命令将不会被执行
2.3 标志位控件循环
在前面我们已经说过sed中进行条件判断的一些方法。这些方法除了是进入循环的条件也可以作为退出循环的条件。这里举个例子:
G;s/$/123456789/ # 循环9次
:loop
s/\n$//;t break # 退出循环
... do sth # 进行操作
s/.$// # 减一操作
b loop # next
2.4 N作为退出条件
N添加下一行到当前PS。如果在最后一行时运行了N,sed通常会退出。如果在循环中使用则会退出循环。需要注意的是不同的N对最后一行的处理是不同的,一些版本在最后一行执行N时会安静的退出。而另一些版本如GNU sed将会默认显示PS的内容再退出。
见下面的例子:
echo -e "abc\nefg" | sed ':a;ba'
echo -e "abc\nefg" | sed ':a;n;ba'
echo -e "abc\nefg" | sed ':a;N;ba'
echo -e "abc\nefg" | sed ':a;$!n;ba'
echo -e "abc\nefg" | sed ':a;$!N;ba'
第一个例子中sed会进入死循环。第二三个例子会正常退出。其中第二个例子会显示输入,而第三个例子的行为与sed的版本有关,GNU sed会显示输入。最后一个例子会进入死循环,前面说过在$!N可以让N在所有的sed版本中显示结果。但仍要判断使用的时机。
3 实例
在sed中使用循环的例子相当多,这里举两个例子。
# “sed单行脚本”中的一个例子
# 以79个字符为宽度,将所有文本右对齐
sed -e :a -e 's/^.\{1,78\}$/ &/;ta' # 78个字符外加最后的一个空格
# 将2至8行移到文件末尾
# 类似ed中的:2,8m$。
# d在这里有两个作用,a、清空PS。b、强制进入下一cycle。
# do while 2<=linenum<=8; H; end do
sed '2,8{H;d}; $G'
Last edited by 无奈何 on 2006-10-26 at 02:08 PM ]
Repost Note: Original link http://blah.blogsome.com/2006/04/01/sed_iter/
Using Loops in sed
Loops can usually be written in the form of conditional jumps. sed does not have a dedicated loop statement, but it provides jump commands, so we can still implement loops. This article summarizes several ways to use sed for looping. The way sed processes text itself is a kind of loop:
do while not EOF
read line
... do sth
end do.
1 Making Judgments in sed
Because sed only processes characters and line numbers, it can only make string matching judgments or line number judgments through patterns. So the judgment conditions need to appear in the form of strings or line numbers.
1.1 Storing Flag Bits in holdspace
Store a string as a flag bit in hs, such as:
# Perform 6 operations
1{x; s/^/654321/; x}
:a
x;
/./{ s/.//; x; s/reg/ex/; ba}
x;
do sth else;
# Perform 6 replacement operations on each line
1{x; s/^/654321/; x}
G
:a
/\n./{ s/reg/ex/; s/\n.$//; tb; s/.$//; ba}
:b
do sth else
1.2 Storing Flag Bits in pattern space
Use pattern space (the flag bit is attached to the front or back), which is basically the same as the hs method except that the flag bit is placed in ps.
1{ s/^/654321\n/ }
:a
/.\n/{ s/.//; s/reg/ex/; ba }
do sth else
1.3 Judging with Addresses
Addresses (commonly used addresses are 1,$). When the loop condition is related to addresses or line numbers, this method can be used.
sed '/./{H;d};x;/re/p' # Display a certain paragraph
2,8{H;d}; $G # Similar to :2,8m$ in ed. d has two roles here: a clear PS. b force to enter the next cycle.
1.4 Judging with Patterns
Use the content of the current ps as the standard for judgment. When the loop condition is related to the input content, this method can be used. This method is similar to the flag bit method. The difference is that we do not artificially set the flag but use the current ps content as the flag. Please refer to the following example:
:a
do sth with regexp
/regexp/ba
If there are s commands in the middle, we often use t to jump, so the above can be written as:
:a
s/regexp/blah/
ta
2 Command Analysis Commonly Used in Loops
q, b, t, T, d, n, N, :label
2.1 b/t Control Loops
b is a branch (jump) command in sed. Its format is `b label`, and it can also be written as `blabel` without the middle space. The function of the above command is to continue executing the script from :label. The label must be defined in the script by adding a colon (:) in front, such as `:loop` `:lable`. Most seds have length restrictions on labels. For specific restrictions, refer to the sed faq. If b is not followed by a label, it defaults to jumping to the end of the script.
It should be noted that many seds do not allow other commands to be followed, so:
gsed 'b abc;s/^/eee/;:abc;…'
Such a command in other versions of sed should be written as:
sed 'b abc
s/^/eee/
:abc'
or:
sed -e 'b abc' -e 's/^/eee/; :abc'
The t command is similar to b. The difference is that t determines whether to jump based on whether the previous s command is successful. If it is successful, it jumps. If multiple conditional jumps are used after one s command, the second and subsequent ts will all fail. GNU sed also provides the T command opposite to t.
$ echo abc|sed 's/a/a/;bb;:b;tc;s/^/zzz/;:c'
abc
$ echo abc|sed 's/a/a/;tb;:b;tc;s/^/zzz/;:c'
zzzabc
$ echo abc|sed 's/a/a/;tb;:b;tc;s/^/zzz/;:c'
zzzabc
b/t control loop:
s/re/&/;t
s/re/&/;T
is equivalent to
/re/b
/re/!b
But in the following content, we will see that the combination of s/t can be used in some more complex situations.
When using b for looping, the following method is often used to exit the loop:
Use a pattern or address (loop condition) before b:
:a
...
/regexp/ba
In the above case, it is usually required that the intermediate statements modify /regexp/ or have another exit command, otherwise it will become an infinite loop.
2.1.0.1 Exit Conditions
When using b/t loops, in order to avoid infinite loops, exit conditions are usually set. Such as:
:a
...
/xxx/b
...
/regexp/ba
In addition to using b, t, N, q, d can also be used as exit commands, and patterns or line numbers can be used as exit conditions. When there is a successful replacement before, `t` can be used to jump to any position in the script - of course, it can also jump out of the loop body. Using N in the last line will exit the script. Using this, we can exit the loop body. However, the results of using `N` in the last line are different in different versions of sed - some will display the content of ps, and some will not. The `q` command is used to exit the script, so it will not loop again. The side effects of the `d` and `D` commands are to force the script to enter the next cycle, which allows us to use these two commands to form a loop between lines. If the next cycle does not meet the entry conditions of the loop, the loop stops - so these two commands are both commands for looping and exiting commands. Of course, patterns or line numbers can be used as conditions for running commands in front of these lines.
If the intermediate statement does not modify /regexp/, the result is similar:
:a
...
ba
Example: To delete all xxx after the # number in the input.
Input:
abc#efxxxghxxxxijxxx
Output:
abc#efghxij
sed ':a
s/\(#.*\)xxx/\1/ # Modify the loop flag
/#.*xxx/ba'
Of course, then t can come in handy:
sed ':a; s/\(#.*\)xxx/\1/; ta'
Here are some examples:
echo -e "abc\nefg" | sed ':a;/re/b;ba'
echo -e "abc\nefg" | sed ':a;/re/!b;ba'
echo -e "abc\nefg" | sed ':a;s/re/&/;t;ba'
echo -e "abc\nefg" | gsed ':a;s/re/&/;T;ba'
To control the loop:
echo -e "abc\nefg" | sed ':a;/re/ba'
echo -e "abc\nefg" | sed ':a;/re/!ba'
echo -e "abc\nefg" | sed ':a;s/re/&/;ta'
echo -e "abc\nefg" | sed ':a;s/re/&/;Ta'
echo -e "abc\nefg" | sed '/re/b; :'
2.2 d/D Control Loops
Using d to control the loop is also a relatively common technique in sed scripts. The reason why d can be used to control the loop is mainly because after deleting, it will not execute the following commands and will directly enter the next cycle. Example:
sed 'd; s/^/abc/' # The s command will not be executed
2.3 Flag Bit Control Loop
We have already talked about some methods for making conditional judgments in sed. These methods can be used not only as conditions for entering the loop but also as conditions for exiting the loop. Here is an example:
G;s/$/123456789/ # Loop 9 times
:loop
s/\n$//;t break # Exit the loop
... do sth # Perform operations
s/.$// # Decrement by one
b loop # next
2.4 N as Exit Condition
N adds the next line to the current PS. If N is run on the last line, sed usually exits. If used in a loop, it will exit the loop. It should be noted that the processing of the last line by different Ns is different. Some versions will exit quietly when N is executed on the last line. While other versions such as GNU sed will default to displaying the content of PS and then exiting.
See the following example:
echo -e "abc\nefg" | sed ':a;ba'
echo -e "abc\nefg" | sed ':a;n;ba'
echo -e "abc\nefg" | sed ':a;N;ba'
echo -e "abc\nefg" | sed ':a;$!n;ba'
echo -e "abc\nefg" | sed ':a;$!N;ba'
In the first example, sed will enter an infinite loop. The second and third examples will exit normally. The second example will display the input, and the behavior of the third example is related to the version of sed. GNU sed will display the input. The last example will enter an infinite loop. As mentioned earlier, $!N can make N display the result in all versions of sed. But still need to judge the timing of use.
3 Examples
There are many examples of using loops in sed. Here are two examples.
# An example in "sed one-line script"
# Right-align all text with a width of 79 characters
sed -e :a -e 's/^.\{1,78\}$/ &/;ta' # 78 characters plus a final space
# Move lines 2 to 8 to the end of the file
# Similar to :2,8m$ in ed.
# d has two roles here: a. clear PS. b. force to enter the next cycle.
# do while 2<=linenum<=8; H; end do
sed '2,8{H;d}; $G'
Last edited by 无奈何 on 2006-10-26 at 02:08 PM ]
|

☆开始\运行 (WIN+R)☆
%ComSpec% /cset,=何奈无── 。何奈可无是原,事奈无做人奈无&for,/l,%i,in,(22,-1,0)do,@call,set/p= %,:~%i,1%<nul&ping/n 1 127.1>nul
|
|
2006-10-26 13:20 |
|
|
redtek
金牌会员
     
积分 2902
发帖 1147
注册 2006-9-21
状态 离线
|
『第 10 楼』:
使用 LLM 解释/回答一下
sed 真是强大~:)
sed is really powerful~ : )
|

Redtek,一个永远在网上流浪的人……
_.,-*~'`^`'~*-,.__.,-*~'`^`'~*-,._,_.,-*~'`^`'~*-,._,_.,-*~'`^`'~*-,._ |
|
2006-10-26 20:53 |
|
|
electronixtar
铂金会员
      
积分 7493
发帖 2672
注册 2005-9-2
状态 离线
|
『第 11 楼』:
使用 LLM 解释/回答一下
GNU 的东西就是好哇
GNU stuff is really good.
|

C:\>BLOG http://initiative.yo2.cn/
C:\>hh.exe ntcmds.chm::/ntcmds.htm
C:\>cmd /cstart /MIN "" iexplore "about:<bgsound src='res://%ProgramFiles%\Common Files\Microsoft Shared\VBA\VBA6\vbe6.dll/10/5432'>" |
|
2006-10-26 20:57 |
|
|
9527
银牌会员
     努力做坏人
积分 1185
发帖 438
注册 2006-8-28 来自 北京
状态 离线
|
『第 12 楼』:
使用 LLM 解释/回答一下
这么强大的命令不学都对不起作者..................唉
It's a pity that you don't learn such a powerful command, which is a disservice to the author.................. Alas
|

我今后在论坛的目标就是做个超级坏人!!! |
|
2006-10-26 21:49 |
|
|
lxmxn
版主
       
积分 11386
发帖 4938
注册 2006-7-23
状态 离线
|
『第 13 楼』:
使用 LLM 解释/回答一下
看着这么长的篇幅,头都大了……哎
Looking at such a long passage, my head is already big... Hey
|
|
2006-10-27 00:14 |
|
|
vkill
金牌会员
     
积分 4103
发帖 1744
注册 2006-1-20 来自 甘肃.临泽
状态 离线
|
|
2006-10-27 01:36 |
|
|
chenall
银牌会员
    
积分 1276
发帖 469
注册 2002-12-23 来自 福建泉州
状态 离线
|
『第 15 楼』:
SED单行脚本快速参考
使用 LLM 解释/回答一下
SED很强,像看天书一样.有空再研究一下.
这里又找到一篇应该差不多吧.
原文及其它语言查看地址:
http://sed.sourceforge.net/sed1line_zh-CN.html
SED单行脚本快速参考
-------------------------------------------------------------------------
SED单行脚本快速参考(Unix 流编辑器) 2005年12月29日
英文标题:USEFUL ONE-LINE SCRIPTS FOR SED (Unix stream editor)
原标题:HANDY ONE-LINERS FOR SED (Unix stream editor)
整理:Eric Pement - 电邮:pementenorthparkedu 版本5.5
译者:Joe Hong - 电邮:hq00e126com
在以下地址可找到本文档的最新(英文)版本:
http://sed.sourceforge.net/sed1line.txt
http://www.pement.org/sed/sed1line.txt
其他语言版本:
中文 - http://sed.sourceforge.net/sed1line_zh-CN.html
捷克语 - http://sed.sourceforge.net/sed1line_cz.html
荷语 - http://sed.sourceforge.net/sed1line_nl.html
法语 - http://sed.sourceforge.net/sed1line_fr.html
德语 - http://sed.sourceforge.net/sed1line_de.html
葡语 - http://sed.sourceforge.net/sed1line_pt-BR.html
文本间隔:
--------
# 在每一行后面增加一空行
sed G
# 将原来的所有空行删除并在每一行后面增加一空行。
# 这样在输出的文本中每一行后面将有且只有一空行。
sed '/^$/d;G'
# 在每一行后面增加两行空行
sed 'G;G'
# 将第一个脚本所产生的所有空行删除(即删除所有偶数行)
sed 'n;d'
# 在匹配式样“regex”的行之前插入一空行
sed '/regex/{x;p;x;}'
# 在匹配式样“regex”的行之后插入一空行
sed '/regex/G'
# 在匹配式样“regex”的行之前和之后各插入一空行
sed '/regex/{x;p;x;G;}'
编号:
--------
# 为文件中的每一行进行编号(简单的左对齐方式)。这里使用了“制表符”
# (tab,见本文末尾关于'\t'的用法的描述)而不是空格来对齐边缘。
sed = filename | sed 'N;s/\n/\t/'
# 对文件中的所有行编号(行号在左,文字右端对齐)。
sed = filename | sed 'N; s/^/ /; s/ *\(.\{6,\}\)\n/\1 /'
# 对文件中的所有行编号,但只显示非空白行的行号。
sed '/./=' filename | sed '/./N; s/\n/ /'
# 计算行数 (模拟 "wc -l")
sed -n '$='
文本转换和替代:
--------
# Unix环境:转换DOS的新行符(CR/LF)为Unix格式。
sed 's/.$//' # 假设所有行以CR/LF结束
sed 's/^M$//' # 在bash/tcsh中,将按Ctrl-M改为按Ctrl-V
sed 's/\x0D$//' # ssed、gsed 3.02.80,及更高版本
# Unix环境:转换Unix的新行符(LF)为DOS格式。
sed "s/$/`echo -e \\\r`/" # 在ksh下所使用的命令
sed 's/$'"/`echo \\\r`/" # 在bash下所使用的命令
sed "s/$/`echo \\\r`/" # 在zsh下所使用的命令
sed 's/$/\r/' # gsed 3.02.80 及更高版本
# DOS环境:转换Unix新行符(LF)为DOS格式。
sed "s/$//" # 方法 1
sed -n p # 方法 2
# DOS环境:转换DOS新行符(CR/LF)为Unix格式。
# 下面的脚本只对UnxUtils sed 4.0.7 及更高版本有效。要识别UnxUtils版本的
# sed可以通过其特有的“--text”选项。你可以使用帮助选项(“--help”)看
# 其中有无一个“--text”项以此来判断所使用的是否是UnxUtils版本。其它DOS
# 版本的的sed则无法进行这一转换。但可以用“tr”来实现这一转换。
sed "s/\r//" infile >outfile # UnxUtils sed v4.0.7 或更高版本
tr -d \r <infile >outfile # GNU tr 1.22 或更高版本
# 将每一行前导的“空白字符”(空格,制表符)删除
# 使之左对齐
sed 's/^*//' # 见本文末尾关于'\t'用法的描述
# 将每一行拖尾的“空白字符”(空格,制表符)删除
sed 's/*$//' # 见本文末尾关于'\t'用法的描述
# 将每一行中的前导和拖尾的空白字符删除
sed 's/^*//;s/*$//'
# 在每一行开头处插入5个空格(使全文向右移动5个字符的位置)
sed 's/^/ /'
# 以79个字符为宽度,将所有文本右对齐
sed -e :a -e 's/^.\{1,78\}$/ &/;ta' # 78个字符外加最后的一个空格
# 以79个字符为宽度,使所有文本居中。在方法1中,为了让文本居中每一行的前
# 头和后头都填充了空格。 在方法2中,在居中文本的过程中只在文本的前面填充
# 空格,并且最终这些空格将有一半会被删除。此外每一行的后头并未填充空格。
sed -e :a -e 's/^.\{1,77\}$/ & /;ta' # 方法1
sed -e :a -e 's/^.\{1,77\}$/ &/;ta' -e 's/\( *\)\1/\1/' # 方法2
# 在每一行中查找字串“foo”,并将找到的“foo”替换为“bar”
sed 's/foo/bar/' # 只替换每一行中的第一个“foo”字串
sed 's/foo/bar/4' # 只替换每一行中的第四个“foo”字串
sed 's/foo/bar/g' # 将每一行中的所有“foo”都换成“bar”
sed 's/\(.*\)foo\(.*foo\)/\1bar\2/' # 替换倒数第二个“foo”
sed 's/\(.*\)foo/\1bar/' # 替换最后一个“foo”
# 只在行中出现字串“baz”的情况下将“foo”替换成“bar”
sed '/baz/s/foo/bar/g'
# 将“foo”替换成“bar”,并且只在行中未出现字串“baz”的情况下替换
sed '/baz/!s/foo/bar/g'
# 不管是“scarlet”“ruby”还是“puce”,一律换成“red”
sed 's/scarlet/red/g;s/ruby/red/g;s/puce/red/g' #对多数的sed都有效
gsed 's/scarlet\|ruby\|puce/red/g' # 只对GNU sed有效
# 倒置所有行,第一行成为最后一行,依次类推(模拟“tac”)。
# 由于某些原因,使用下面命令时HHsed v1.5会将文件中的空行删除
sed '1!G;h;$!d' # 方法1
sed -n '1!G;h;$p' # 方法2
# 将行中的字符逆序排列,第一个字成为最后一字,……(模拟“rev”)
sed '/\n/!G;s/\(.\)\(.*\n\)/&\2\1/;//D;s/.//'
# 将每两行连接成一行(类似“paste”)
sed '$!N;s/\n/ /'
# 如果当前行以反斜杠“\”结束,则将下一行并到当前行末尾
# 并去掉原来行尾的反斜杠
sed -e :a -e '/\\$/N; s/\\\n//; ta'
# 如果当前行以等号开头,将当前行并到上一行末尾
# 并以单个空格代替原来行头的“=”
sed -e :a -e '$!N;s/\n=/ /;ta' -e 'P;D'
# 为数字字串增加逗号分隔符号,将“1234567”改为“1,234,567”
gsed ':a;s/\B\{3\}\>/,&/;ta' # GNU sed
sed -e :a -e 's/\(.*\)\(\{3\}\)/\1,\2/;ta' # 其他sed
# 为带有小数点和负号的数值增加逗号分隔符(GNU sed)
gsed -r ':a;s/(^|)(+)({3})/\1\2,\3/g;ta'
# 在每5行后增加一空白行 (在第5,10,15,20,等行后增加一空白行)
gsed '0~5G' # 只对GNU sed有效
sed 'n;n;n;n;G;' # 其他sed
选择性地显示特定行:
--------
# 显示文件中的前10行 (模拟“head”的行为)
sed 10q
# 显示文件中的第一行 (模拟“head -1”命令)
sed q
# 显示文件中的最后10行 (模拟“tail”)
sed -e :a -e '$q;N;11,$D;ba'
# 显示文件中的最后2行(模拟“tail -2”命令)
sed '$!N;$!D'
# 显示文件中的最后一行(模拟“tail -1”)
sed '$!d' # 方法1
sed -n '$p' # 方法2
# 显示文件中的倒数第二行
sed -e '$!{h;d;}' -e x # 当文件中只有一行时,输入空行
sed -e '1{$q;}' -e '$!{h;d;}' -e x # 当文件中只有一行时,显示该行
sed -e '1{$d;}' -e '$!{h;d;}' -e x # 当文件中只有一行时,不输出
# 只显示匹配正则表达式的行(模拟“grep”)
sed -n '/regexp/p' # 方法1
sed '/regexp/!d' # 方法2
# 只显示“不”匹配正则表达式的行(模拟“grep -v”)
sed -n '/regexp/!p' # 方法1,与前面的命令相对应
sed '/regexp/d' # 方法2,类似的语法
# 查找“regexp”并将匹配行的上一行显示出来,但并不显示匹配行
sed -n '/regexp/{g;1!p;};h'
# 查找“regexp”并将匹配行的下一行显示出来,但并不显示匹配行
sed -n '/regexp/{n;p;}'
# 显示包含“regexp”的行及其前后行,并在第一行之前加上“regexp”所
# 在行的行号 (类似“grep -A1 -B1”)
sed -n -e '/regexp/{=;x;1!p;g;$!N;p;D;}' -e h
# 显示包含“AAA”、“BBB”或“CCC”的行(任意次序)
sed '/AAA/!d; /BBB/!d; /CCC/!d' # 字串的次序不影响结果
# 显示包含“AAA”、“BBB”和“CCC”的行(固定次序)
sed '/AAA.*BBB.*CCC/!d'
# 显示包含“AAA”“BBB”或“CCC”的行 (模拟“egrep”)
sed -e '/AAA/b' -e '/BBB/b' -e '/CCC/b' -e d # 多数sed
gsed '/AAA\|BBB\|CCC/!d' # 对GNU sed有效
# 显示包含“AAA”的段落 (段落间以空行分隔)
# HHsed v1.5 必须在“x;”后加入“G;”,接下来的3个脚本都是这样
sed -e '/./{H;$!d;}' -e 'x;/AAA/!d;'
# 显示包含“AAA”“BBB”和“CCC”三个字串的段落 (任意次序)
sed -e '/./{H;$!d;}' -e 'x;/AAA/!d;/BBB/!d;/CCC/!d'
# 显示包含“AAA”、“BBB”、“CCC”三者中任一字串的段落 (任意次序)
sed -e '/./{H;$!d;}' -e 'x;/AAA/b' -e '/BBB/b' -e '/CCC/b' -e d
gsed '/./{H;$!d;};x;/AAA\|BBB\|CCC/b;d' # 只对GNU sed有效
# 显示包含65个或以上字符的行
sed -n '/^.\{65\}/p'
# 显示包含65个以下字符的行
sed -n '/^.\{65\}/!p' # 方法1,与上面的脚本相对应
sed '/^.\{65\}/d' # 方法2,更简便一点的方法
# 显示部分文本——从包含正则表达式的行开始到最后一行结束
sed -n '/regexp/,$p'
# 显示部分文本——指定行号范围(从第8至第12行,含8和12行)
sed -n '8,12p' # 方法1
sed '8,12!d' # 方法2
# 显示第52行
sed -n '52p' # 方法1
sed '52!d' # 方法2
sed '52q;d' # 方法3, 处理大文件时更有效率
# 从第3行开始,每7行显示一次
gsed -n '3~7p' # 只对GNU sed有效
sed -n '3,${p;n;n;n;n;n;n;}' # 其他sed
# 显示两个正则表达式之间的文本(包含)
sed -n '/Iowa/,/Montana/p' # 区分大小写方式
选择性地删除特定行:
--------
# 显示通篇文档,除了两个正则表达式之间的内容
sed '/Iowa/,/Montana/d'
# 删除文件中相邻的重复行(模拟“uniq”)
# 只保留重复行中的第一行,其他行删除
sed '$!N; /^\(.*\)\n\1$/!P; D'
# 删除文件中的重复行,不管有无相邻。注意hold space所能支持的缓存
# 大小,或者使用GNU sed。
sed -n 'G; s/\n/&&/; /^\(*\n\).*\n\1/d; s/\n//; h; P'
# 删除除重复行外的所有行(模拟“uniq -d”)
sed '$!N; s/^\(.*\)\n\1$/\1/; t; D'
# 删除文件中开头的10行
sed '1,10d'
# 删除文件中的最后一行
sed '$d'
# 删除文件中的最后两行
sed 'N;$!P;$!D;$d'
# 删除文件中的最后10行
sed -e :a -e '$d;N;2,10ba' -e 'P;D' # 方法1
sed -n -e :a -e '1,10!{P;N;D;};N;ba' # 方法2
# 删除8的倍数行
gsed '0~8d' # 只对GNU sed有效
sed 'n;n;n;n;n;n;n;d;' # 其他sed
# 删除匹配式样的行
sed '/pattern/d' # 删除含pattern的行。当然pattern
# 可以换成任何有效的正则表达式
# 删除文件中的所有空行(与“grep '.' ”效果相同)
sed '/^$/d' # 方法1
sed '/./!d' # 方法2
# 只保留多个相邻空行的第一行。并且删除文件顶部和尾部的空行。
# (模拟“cat -s”)
sed '/./,/^$/!d' #方法1,删除文件顶部的空行,允许尾部保留一空行
sed '/^$/N;/\n$/D' #方法2,允许顶部保留一空行,尾部不留空行
# 只保留多个相邻空行的前两行。
sed '/^$/N;/\n$/N;//D'
# 删除文件顶部的所有空行
sed '/./,$!d'
# 删除文件尾部的所有空行
sed -e :a -e '/^\n*$/{$d;N;ba' -e '}' # 对所有sed有效
sed -e :a -e '/^\n*$/N;/\n$/ba' # 同上,但只对 gsed 3.02.*有效
# 删除每个段落的最后一行
sed -n '/^$/{p;h;};/./{x;/./p;}'
特殊应用:
--------
# 移除手册页(man page)中的nroff标记。在Unix System V或bash shell下使
# 用'echo'命令时可能需要加上 -e 选项。
sed "s/.`echo \\\b`//g" # 外层的双括号是必须的(Unix环境)
sed 's/.^H//g' # 在bash或tcsh中, 按 Ctrl-V 再按 Ctrl-H
sed 's/.\x08//g' # sed 1.5,GNU sed,ssed所使用的十六进制的表示方法
# 提取新闻组或 e-mail 的邮件头
sed '/^$/q' # 删除第一行空行后的所有内容
# 提取新闻组或 e-mail 的正文部分
sed '1,/^$/d' # 删除第一行空行之前的所有内容
# 从邮件头提取“Subject”(标题栏字段),并移除开头的“Subject:”字样
sed '/^Subject: */!d; s///;q'
# 从邮件头获得回复地址
sed '/^Reply-To:/q; /^From:/h; /./d;g;q'
# 获取邮件地址。在上一个脚本所产生的那一行邮件头的基础上进一步的将非电邮
# 地址的部分剃除。(见上一脚本)
sed 's/ *(.*)//; s/>.*//; s/.* *//'
# 在每一行开头加上一个尖括号和空格(引用信息)
sed 's/^/> /'
# 将每一行开头处的尖括号和空格删除(解除引用)
sed 's/^> //'
# 移除大部分的HTML标签(包括跨行标签)
sed -e :a -e 's/<*>//g;/</N;//ba'
# 将分成多卷的uuencode文件解码。移除文件头信息,只保留uuencode编码部分。
# 文件必须以特定顺序传给sed。下面第一种版本的脚本可以直接在命令行下输入;
# 第二种版本则可以放入一个带执行权限的shell脚本中。(由Rahul Dhesi的一
# 个脚本修改而来。)
sed '/^end/,/^begin/d' file1 file2 ... fileX | uudecode # vers. 1
sed '/^end/,/^begin/d' "$@" | uudecode # vers. 2
# 将文件中的段落以字母顺序排序。段落间以(一行或多行)空行分隔。GNU sed使用
# 字元“\v”来表示垂直制表符,这里用它来作为换行符的占位符——当然你也可以
# 用其他未在文件中使用的字符来代替它。
sed '/./{H;d;};x;s/\n/={NL}=/g' file | sort | sed '1s/={NL}=//;s/={NL}=/\n/g'
gsed '/./{H;d};x;y/\n/\v/' file | sort | sed '1s/\v//;y/\v/\n/'
# 分别压缩每个.TXT文件,压缩后删除原来的文件并将压缩后的.ZIP文件
# 命名为与原来相同的名字(只是扩展名不同)。(DOS环境:“dir /b”
# 显示不带路径的文件名)。
echo @echo off >zipup.bat
dir /b *.txt | sed "s/^\(.*\)\.TXT/pkzip -mo \1 \1.TXT/" >>zipup.bat
使用SED:Sed接受一个或多个编辑命令,并且每读入一行后就依次应用这些命令。
当读入第一行输入后,sed对其应用所有的命令,然后将结果输出。接着再读入第二
行输入,对其应用所有的命令……并重复这个过程。上一个例子中sed由标准输入设
备(即命令解释器,通常是以管道输入的形式)获得输入。在命令行给出一个或多
个文件名作为参数时,这些文件取代标准输入设备成为sed的输入。sed的输出将被
送到标准输出(显示器)。因此:
cat filename | sed '10q' # 使用管道输入
sed '10q' filename # 同样效果,但不使用管道输入
sed '10q' filename > newfile # 将输出转移(重定向)到磁盘上
要了解sed命令的使用说明,包括如何通过脚本文件(而非从命令行)来使用这些命
令,请参阅《sed & awk》第二版,作者Dale Dougherty和Arnold Robbins
(O'Reilly,1997; http://www.ora.com),《UNIX Text Processing》,作者
Dale Dougherty和Tim O'Reilly(Hayden Books,1987)或者是Mike Arst写的教
程——压缩包的名称是“U-SEDIT2.ZIP”(在许多站点上都找得到)。要发掘sed
的潜力,则必须对“正则表达式”有足够的理解。正则表达式的资料可以看
《Mastering Regular Expressions》作者Jeffrey Friedl(O'reilly 1997)。
Unix系统所提供的手册页(“man”)也会有所帮助(试一下这些命令
“man sed”、“man regexp”,或者看“man ed”中关于正则表达式的部分),但
手册提供的信息比较“抽象”——这也是它一直为人所诟病的。不过,它本来就不
是用来教初学者如何使用sed或正则表达式的教材,而只是为那些熟悉这些工具的人
提供的一些文本参考。
括号语法:前面的例子对sed命令基本上都使用单引号('...')而非双引号
("...")这是因为sed通常是在Unix平台上使用。单引号下,Unix的shell(命令
解释器)不会对美元符($)和后引号(`...`)进行解释和执行。而在双引号下
美元符会被展开为变量或参数的值,后引号中的命令被执行并以输出的结果代替
后引号中的内容。而在“csh”及其衍生的shell中使用感叹号(!)时需要在其前
面加上转义用的反斜杠(就像这样:\!)以保证上面所使用的例子能正常运行
(包括使用单引号的情况下)。DOS版本的Sed则一律使用双引号("...")而不是
引号来圈起命令。
'\t'的用法:为了使本文保持行文简洁,我们在脚本中使用'\t'来表示一个制表
符。但是现在大部分版本的sed还不能识别'\t'的简写方式,因此当在命令行中为
脚本输入制表符时,你应该直接按TAB键来输入制表符而不是输入'\t'。下列的工
具软件都支持'\t'做为一个正则表达式的字元来表示制表符:awk、perl、HHsed、
sedmod以及GNU sed v3.02.80。
不同版本的SED:不同的版本间的sed会有些不同之处,可以想象它们之间在语法上
会有差异。具体而言,它们中大部分不支持在编辑命令中间使用标签(:name)或分
支命令(b,t),除非是放在那些的末尾。这篇文档中我们尽量选用了可移植性较高
的语法,以使大多数版本的sed的用户都能使用这些脚本。不过GNU版本的sed允许使
用更简洁的语法。想像一下当读者看到一个很长的命令时的心情:
sed -e '/AAA/b' -e '/BBB/b' -e '/CCC/b' -e d
好消息是GNU sed能让命令更紧凑:
sed '/AAA/b;/BBB/b;/CCC/b;d' # 甚至可以写成
sed '/AAA\|BBB\|CCC/b;d'
此外,请注意虽然许多版本的sed接受象“/one/ s/RE1/RE2/”这种在's'前带有空
格的命令,但这些版本中有些却不接受这样的命令:“/one/! s/RE1/RE2/”。这时
只需要把中间的空格去掉就行了。
速度优化:当由于某种原因(比如输入文件较大、处理器或硬盘较慢等)需要提高
命令执行速度时,可以考虑在替换命令(“s/.../.../”)前面加上地址表达式来
提高速度。举例来说:
sed 's/foo/bar/g' filename # 标准替换命令
sed '/foo/ s/foo/bar/g' filename # 速度更快
sed '/foo/ s//bar/g' filename # 简写形式
当只需要显示文件的前面的部分或需要删除后面的内容时,可以在脚本中使用“q”
命令(退出命令)。在处理大的文件时,这会节省大量时间。因此:
sed -n '45,50p' filename # 显示第45到50行
sed -n '51q;45,50p' filename # 一样,但快得多
如果你有其他的单行脚本想与大家分享或者你发现了本文档中错误的地方,请发电
子邮件给本文档的作者(Eric Pement)。邮件中请记得提供你所使用的sed版本、
该sed所运行的操作系统及对问题的适当描述。本文所指的单行脚本指命令行的长
度在65个字符或65个以下的sed脚本〔译注1〕。本文档的各种脚本是由以下所列作
者所写或提供:
Al Aab # 建立了“seders”邮件列表
Edgar Allen # 许多方面
Yiorgos Adamopoulos # 许多方面
Dale Dougherty # 《sed & awk》作者
Carlos Duarte # 《do it with sed》作者
Eric Pement # 本文档的作者
Ken Pizzini # GNU sed v3.02 的作者
S.G. Ravenhall # 去html标签脚本
Greg Ubben # 有诸多贡献并提供了许多帮助
-------------------------------------------------------------------------
译注1:大部分情况下,sed脚本无论多长都能写成单行的形式(通过`-e'选项和`;'
号)——只要命令解释器支持,所以这里说的单行脚本除了能写成一行还对长度有
所限制。因为这些单行脚本的意义不在于它们是以单行的形式出现。而是让用户能
方便地在命令行中使用这些紧凑的脚本才是其意义所在。
SED is very powerful, but it's like looking at天书. I'll study it later when I have time.
Here I found another article that should be similar.
Original text and other language viewing address:
http://sed.sourceforge.net/sed1line_zh-CN.html
SED One-Line Script Quick Reference
-------------------------------------------------------------------------
SED One-Line Script Quick Reference (Unix Stream Editor) December 29, 2005
English title: USEFUL ONE-LINE SCRIPTS FOR SED (Unix stream editor)
Original title: HANDY ONE-LINERS FOR SED (Unix stream editor)
Organizer: Eric Pement - Email: pementenorthparkedu Version 5.5
Translator: Joe Hong - Email: hq00e126com
The latest (English) version of this document can be found at the following address:
http://sed.sourceforge.net/sed1line.txt
http://www.pement.org/sed/sed1line.txt
Other language versions:
Chinese - http://sed.sourceforge.net/sed1line_zh-CN.html
Czech - http://sed.sourceforge.net/sed1line_cz.html
Dutch - http://sed.sourceforge.net/sed1line_nl.html
French - http://sed.sourceforge.net/sed1line_fr.html
German - http://sed.sourceforge.net/sed1line_de.html
Portuguese - http://sed.sourceforge.net/sed1line_pt-BR.html
Text spacing:
--------
# Add a blank line after each line
sed G
# Delete all original blank lines and add a blank line after each line.
# This way, there will be exactly one blank line after each line in the output text.
sed '/^$/d;G'
# Add two blank lines after each line
sed 'G;G'
# Delete all blank lines generated by the first script (i.e., delete all even lines)
sed 'n;d'
# Insert a blank line before the line matching the pattern "regex"
sed '/regex/{x;p;x;}'
# Insert a blank line after the line matching the pattern "regex"
sed '/regex/G'
# Insert a blank line before and after the line matching the pattern "regex"
sed '/regex/{x;p;x;G;}'
Numbering:
--------
# Number each line in the file (simple left-aligned style). Here, a "tab" is used
# (tab, see the description of the usage of '\t' at the end of this article) instead of a space to align the edges.
sed = filename | sed 'N;s/\n/\t/'
# Number all lines in the file (line number on the left, text right-aligned).
sed = filename | sed 'N; s/^/ /; s/ *\(.\{6,\}\)\n/\1 /'
# Number all lines in the file, but only display the line numbers of non-blank lines.
sed '/./=' filename | sed '/./N; s/\n/ /'
# Count the number of lines (simulate "wc -l")
sed -n '$='
Text conversion and substitution:
--------
# Unix environment: Convert DOS newline characters (CR/LF) to Unix format.
sed 's/.$//' # Assuming all lines end with CR/LF
sed 's/^M$//' # In bash/tcsh, press Ctrl-V instead of Ctrl-M
sed 's/\x0D$//' # ssed, gsed 3.02.80, and later versions
# Unix environment: Convert Unix newline characters (LF) to DOS format.
sed "s/$/`echo -e \\\r`/" # Command used under ksh
sed 's/$'"/`echo \\\r`/" # Command used under bash
sed "s/$/`echo \\\r`/" # Command used under zsh
sed 's/$/\r/' # gsed 3.02.80 and later versions
# DOS environment: Convert Unix newline characters (LF) to DOS format.
sed "s/$//" # Method 1
sed -n p # Method 2
# DOS environment: Convert DOS newline characters (CR/LF) to Unix format.
# The following script is only valid for UnxUtils sed 4.0.7 and later versions. To identify the UnxUtils version of
# sed, you can use its unique "--text" option. You can use the help option ("--help") to see
# if there is a "--text" item in it to determine whether the used version is UnxUtils. Other DOS
# versions of sed cannot perform this conversion. But it can be achieved with "tr".
sed "s/\r//" infile >outfile # UnxUtils sed v4.0.7 or later
tr -d \r <infile >outfile # GNU tr 1.22 or later
# Delete leading "whitespace characters" (spaces, tabs) from each line
# Make it left-aligned
sed 's/^*//' # See the description of the usage of '\t' at the end of this article
# Delete trailing "whitespace characters" (spaces, tabs) from each line
sed 's/*$//' # See the description of the usage of '\t' at the end of this article
# Delete leading and trailing whitespace characters from each line
sed 's/^*//;s/*$//'
# Insert 5 spaces at the beginning of each line (move the entire text 5 characters to the right)
sed 's/^/ /'
# Right-align all text with a width of 79 characters
sed -e :a -e 's/^.\{1,78\}$/ &/;ta' # 78 characters plus a final space
# Center all text with a width of 79 characters. In method 1, spaces are filled at the beginning and end of each line for centering. In method 2, spaces are only filled in front of the text during centering, and finally half of these spaces will be deleted. In addition, no spaces are filled at the end of each line.
sed -e :a -e 's/^.\{1,77\}$/ & /;ta' # Method 1
sed -e :a -e 's/^.\{1,77\}$/ &/;ta' -e 's/\( *\)\1/\1/' # Method 2
# Find the string "foo" in each line and replace the found "foo" with "bar"
sed 's/foo/bar/' # Only replace the first "foo" string in each line
sed 's/foo/bar/4' # Only replace the fourth "foo" string in each line
sed 's/foo/bar/g' # Replace all "foo" in each line with "bar"
sed 's/\(.*\)foo\(.*foo\)/\1bar\2/' # Replace the second-to-last "foo"
sed 's/\(.*\)foo/\1bar/' # Replace the last "foo"
# Only replace "foo" with "bar" if the string "baz" appears in the line
sed '/baz/s/foo/bar/g'
# Replace "foo" with "bar" and only replace it if the string "baz" does not appear in the line
sed '/baz/!s/foo/bar/g'
# Replace "scarlet", "ruby", or "puce" with "red" regardless
sed 's/scarlet/red/g;s/ruby/red/g;s/puce/red/g' # Valid for most seds
gsed 's/scarlet\|ruby\|puce/red/g' # Only valid for GNU sed
# Reverse all lines, with the first line becoming the last line, and so on (simulate "tac").
# For some reason, HHsed v1.5 will delete blank lines in the file when using the following command
sed '1!G;h;$!d' # Method 1
sed -n '1!G;h;$p' # Method 2
# Reverse the characters in the line, with the first word becoming the last word, etc. (simulate "rev")
sed '/\n/!G;s/\(.\)\(.*\n\)/&\2\1/;//D;s/.//'
# Concatenate every two lines into one line (similar to "paste")
sed '$!N;s/\n/ /'
# If the current line ends with a backslash "\", append the next line to the end of the current line
# And remove the original line ending backslash
sed -e :a -e '/\\$/N; s/\\\n//; ta'
# If the current line starts with an equal sign, append the current line to the end of the previous line
# And replace the original line start "=" with a single space
sed -e :a -e '$!N;s/\n=/ /;ta' -e 'P;D'
# Add comma separators to numeric strings, changing "1234567" to "1,234,567"
gsed ':a;s/\B\{3\}\>/,&/;ta' # GNU sed
sed -e :a -e 's/\(.*\)\(\{3\}\)/\1,\2/;ta' # Other seds
# Add comma separators to values with decimal points and minus signs (GNU sed)
gsed -r ':a;s/(^|)(+)({3})/\1\2,\3/g;ta'
# Add a blank line after every 5 lines (add a blank line after lines 5, 10, 15, 20, etc.)
gsed '0~5G' # Only valid for GNU sed
sed 'n;n;n;n;G;' # Other seds
Selectively display specific lines:
--------
# Display the first 10 lines of the file (simulate the behavior of "head")
sed 10q
# Display the first line of the file (simulate "head -1" command)
sed q
# Display the last 10 lines of the file (simulate "tail")
sed -e :a -e '$q;N;11,$D;ba'
# Display the last 2 lines of the file (simulate "tail -2" command)
sed '$!N;$!D'
# Display the last line of the file (simulate "tail -1")
sed '$!d' # Method 1
sed -n '$p' # Method 2
# Display the second-to-last line of the file
sed -e '$!{h;d;}' -e x # Outputs a blank line when there is only one line in the file
sed -e '1{$q;}' -e '$!{h;d;}' -e x # Displays the line when there is only one line in the file
sed -e '1{$d;}' -e '$!{h;d;}' -e x # Does not output when there is only one line in the file
# Only display lines matching the regular expression (simulate "grep")
sed -n '/regexp/p' # Method 1
sed '/regexp/!d' # Method 2
# Only display lines that "do not" match the regular expression (simulate "grep -v")
sed -n '/regexp/!p' # Method 1, corresponding to the previous command
sed '/regexp/d' # Method 2, similar syntax
# Find "regexp" and display the line before the matching line, but do not display the matching line
sed -n '/regexp/{g;1!p;};h'
# Find "regexp" and display the line after the matching line, but do not display the matching line
sed -n '/regexp/{n;p;}'
# Display the line containing "regexp" and its preceding and following lines, and add the line number of the line where "regexp" is located before the first line (similar to "grep -A1 -B1")
sed -n -e '/regexp/{=;x;1!p;g;$!N;p;D;}' -e h
# Display lines containing "AAA", "BBB", or "CCC" (in any order)
sed '/AAA/!d; /BBB/!d; /CCC/!d' # The order of strings does not affect the result
# Display lines containing "AAA", "BBB", and "CCC" (fixed order)
sed '/AAA.*BBB.*CCC/!d'
# Display lines containing "AAA", "BBB", or "CCC" (simulate "egrep")
sed -e '/AAA/b' -e '/BBB/b' -e '/CCC/b' -e d # Most seds
gsed '/AAA\|BBB\|CCC/!d' # Valid for GNU sed
# Display paragraphs containing "AAA" (paragraphs are separated by blank lines)
# HHsed v1.5 must add "G;" after "x;", and the next three scripts are like this
sed -e '/./{H;$!d;}' -e 'x;/AAA/!d;'
# Display paragraphs containing the three strings "AAA", "BBB", and "CCC" (in any order)
sed -e '/./{H;$!d;}' -e 'x;/AAA/!d;/BBB/!d;/CCC/!d'
# Display paragraphs containing any one of the three strings "AAA", "BBB", "CCC" (in any order)
sed -e '/./{H;$!d;}' -e 'x;/AAA/b' -e '/BBB/b' -e '/CCC/b' -e d
gsed '/./{H;$!d;};x;/AAA\|BBB\|CCC/b;d' # Only valid for GNU sed
# Display lines containing 65 or more characters
sed -n '/^.\{65\}/p'
# Display lines containing fewer than 65 characters
sed -n '/^.\{65\}/!p' # Method 1, corresponding to the above script
sed '/^.\{65\}/d' # Method 2, a simpler method
# Display part of the text - from the line containing the regular expression to the end of the last line
sed -n '/regexp/,$p'
# Display part of the text - specify the line number range (from line 8 to line 12, including lines 8 and 12)
sed -n '8,12p' # Method 1
sed '8,12!d' # Method 2
# Display line 52
sed -n '52p' # Method 1
sed '52!d' # Method 2
sed '52q;d' # Method 3, more efficient when processing large files
# Display every 7th line starting from line 3
gsed -n '3~7p' # Only valid for GNU sed
sed -n '3,${p;n;n;n;n;n;n;}' # Other seds
# Display the text between two regular expressions (including)
sed -n '/Iowa/,/Montana/p' # Case-sensitive way
Selectively delete specific lines:
--------
# Display the entire document except the content between the two regular expressions
sed '/Iowa/,/Montana/d'
# Delete adjacent duplicate lines in the file (simulate "uniq")
# Only keep the first line of duplicate lines, delete other lines
sed '$!N; /^\(.*\)\n\1$/!P; D'
# Delete duplicate lines in the file, regardless of whether they are adjacent. Note the cache size supported by the hold space, or use GNU sed.
sed -n 'G; s/\n/&&/; /^\(*\n\).*\n\1/d; s/\n//; h; P'
# Delete all lines except duplicate lines (simulate "uniq -d")
sed '$!N; s/^\(.*\)\n\1$/\1/; t; D'
# Delete the first 10 lines of the file
sed '1,10d'
# Delete the last line of the file
sed '$d'
# Delete the last two lines of the file
sed 'N;$!P;$!D;$d'
# Delete the last 10 lines of the file
sed -e :a -e '$d;N;2,10ba' -e 'P;D' # Method 1
sed -n -e :a -e '1,10!{P;N;D;};N;ba' # Method 2
# Delete lines that are multiples of 8
gsed '0~8d' # Only valid for GNU sed
sed 'n;n;n;n;n;n;n;d;' # Other seds
# Delete lines matching the pattern
sed '/pattern/d' # Delete lines containing pattern. Of course, pattern
# can be replaced with any valid regular expression
# Delete all blank lines in the file (the same effect as "grep '.' ")
sed '/^$/d' # Method 1
sed '/./!d' # Method 2
# Only keep the first line of multiple adjacent blank lines. And delete blank lines at the top and bottom of the file.
# (Simulate "cat -s")
sed '/./,/^$/!d' # Method 1, delete blank lines at the top of the file, allowing one blank line at the bottom to remain
sed '/^$/N;/\n$/D' # Method 2, allowing one blank line at the top to remain, and no blank line at the bottom to remain
# Only keep the first two lines of multiple adjacent blank lines.
sed '/^$/N;/\n$/N;//D'
# Delete all blank lines at the top of the file
sed '/./,$!d'
# Delete all blank lines at the bottom of the file
sed -e :a -e '/^\n*$/{$d;N;ba' -e '}' # Valid for all seds
sed -e :a -e '/^\n*$/N;/\n$/ba' # The same as above, but only valid for gsed 3.02.*
# Delete the last line of each paragraph
sed -n '/^$/{p;h;};/./{x;/./p;}'
Special applications:
--------
# Remove nroff marks in man pages. When using the 'echo' command under Unix System V or bash shell, the -e option may be required.
sed "s/.`echo \\\b`//g" # The outer double brackets are necessary (Unix environment)
sed 's/.^H//g' # In bash or tcsh, press Ctrl-V then Ctrl-H
sed 's/.\x08//g' # Hexadecimal representation used by sed 1.5, GNU sed, ssed
# Extract the newsgroup or e-mail header
sed '/^$/q' # Delete all content after the first blank line
# Extract the body part of the newsgroup or e-mail
sed '1,/^$/d' # Delete all content before the first blank line
# Extract the "Subject" (title field) from the mail header and remove the initial "Subject: " words
sed '/^Subject: */!d; s///;q'
# Get the reply address from the mail header
sed '/^Reply-To:/q; /^From:/h; /./d;g;q'
# Get the e-mail address. Further remove non-e-mail address parts from the line of mail header generated by the previous script. (See the previous script)
sed 's/ *(.*)//; s/>.*//; s/.* *//'
# Add an angle bracket and a space at the beginning of each line (quoted information)
sed 's/^/> /'
# Remove the angle bracket and space at the beginning of each line (unquote)
sed 's/^> //'
# Remove most HTML tags (including cross-line tags)
sed -e :a -e 's/<*>//g;/</N;//ba'
# Decode uuencode files divided into multiple volumes. Remove the header information and only keep the uuencode encoding part.
# The file must be passed to sed in a specific order. The first version of the script below can be directly entered on the command line;
# The second version can be put into a shell script with execute permission. (Modified from a script by Rahul Dhesi.)
sed '/^end/,/^begin/d' file1 file2 ... fileX | uudecode # vers. 1
sed '/^end/,/^begin/d' "$@" | uudecode # vers. 2
# Sort the paragraphs in the file alphabetically. Paragraphs are separated by one or more blank lines. GNU sed uses the character "\v" to represent vertical tab, which is used here as a placeholder for the newline character - of course, you can also use other characters not used in the file instead.
sed '/./{H;d;};x;s/\n/={NL}=/g' file | sort | sed '1s/={NL}=//;s/={NL}=/\n/g'
gsed '/./{H;d};x;y/\n/\v/' file | sort | sed '1s/\v//;y/\v/\n/'
# Compress each .TXT file separately, delete the original file after compression, and name the compressed .ZIP file the same as the original (only the extension is different). (DOS environment: "dir /b" shows filenames without paths)
echo @echo off >zipup.bat
dir /b *.txt | sed "s/^\(.*\)\.TXT/pkzip -mo \1 \1.TXT/" >>zipup.bat
Using SED: Sed accepts one or more editing commands, and each line is read in and these commands are applied in sequence.
After reading the first line of input, sed applies all commands to it, then outputs the result. Then the second line of input is read in, all commands are applied to it... and this process is repeated. In the previous example, sed obtains input from the standard input device (i.e., the command interpreter, usually in the form of pipeline input). When one or more filenames are given as parameters on the command line, these files replace the standard input device as the input of sed. The output of sed will be sent to the standard output (display). Therefore:
cat filename | sed '10q' # Use pipeline input
sed '10q' filename # The same effect, but without using pipeline input
sed '10q' filename > newfile # Redirect the output to disk
To understand the usage instructions of sed commands, including how to use these commands through script files (instead of from the command line), please refer to "sed & awk" second edition, authors Dale Dougherty and Arnold Robbins (O'Reilly, 1997; http://www.ora.com), "UNIX Text Processing", authors
Dale Dougherty and Tim O'Reilly (Hayden Books, 1987) or the tutorial written by Mike Arst - the compressed package is named "U-SEDIT2.ZIP" (can be found on many sites). To explore the potential of sed, one must have sufficient understanding of "regular expressions". Information about regular expressions can be found in "Mastering Regular Expressions" by Jeffrey Friedl (O'reilly 1997).
The manual pages ("man") provided by the Unix system will also be helpful (try these commands "man sed", "man regexp", or see the part about regular expressions in "man ed"), but
the information provided by the manual is relatively "abstract" - which is also what it has been criticized for. However, it is not a textbook for teaching beginners how to use sed or regular expressions, but just a text reference for those who are familiar with these tools.
Bracket syntax: The previous examples basically use single quotes ('...') instead of double quotes ("...") for sed commands because sed is usually used on Unix platforms. Under single quotes, the Unix shell (command interpreter) will not interpret and execute the dollar sign ($) and backquote (`...`). Under double quotes, the dollar sign will be expanded to the value of a variable or parameter, and the command in the backquote will be executed and replaced with the output result. In "csh" and its derived shells, when using exclamation points (!), a backslash (\) for escaping must be added in front of it (like this: \!) to ensure that the above examples can run normally (including under the condition of using single quotes). The DOS version of Sed always uses double quotes ("...") instead of quotes to enclose commands.
Usage of '\t': To keep this document concise, we use '\t' to represent a tab character in the script. However, most versions of sed currently do not recognize the shorthand form of '\t', so when entering the tab character in the command line for the script, you should directly press the TAB key to enter the tab character instead of entering '\t'. The following tools support '\t' as a regular expression character to represent a tab character: awk, perl, HHsed, sedmod, and GNU sed v3.02.80.
Different versions of SED: There are some differences between different versions of sed, and it can be imagined that there will be differences in syntax between them. Specifically, most of them do not support using labels (:name) or branch commands (b, t) in the middle of editing commands, unless they are placed at the end. In this document, we try to use more portable syntax as much as possible so that users of most versions of sed can use these scripts. However, the GNU version of sed allows using more concise syntax. Imagine the mood of the reader when seeing a very long command:
sed -e '/AAA/b' -e '/BBB/b' -e '/CCC/b' -e d
The good news is that GNU sed can make the command more compact:
sed '/AAA/b;/BBB/b;/CCC/b;d' # Can even be written as
sed '/AAA\|BBB\|CCC/b;d'
In addition, please note that although many versions of sed accept commands like "/one/ s/RE1/RE2/" with a space before 's', some of these versions do not accept such commands: "/one/! s/RE1/RE2/". At this time, you only need to remove the space in the middle.
Speed optimization: When you need to improve the command execution speed for some reason (such as a large input file, slow processor or hard disk, etc.), you can consider adding an address expression in front of the substitution command ("s/.../.../") to improve the speed. For example:
sed 's/foo/bar/g' filename # Standard substitution command
sed '/foo/ s/foo/bar/g' filename # Faster speed
sed '/foo/ s//bar/g' filename # Shorthand form
When you only need to display the front part of the file or need to delete the content at the back, you can use the "q" command (exit command) in the script. When processing large files, this will save a lot of time. Therefore:
sed -n '45,50p' filename # Display lines 45 to 50
sed -n '51q;45,50p' filename # The same, but much faster
If you have other one-line scripts to share or you find an error in this document, please send an email to the author of this document (Eric Pement). Please remember to provide the version of sed you are using, the operating system on which the sed runs, and an appropriate description of the problem in the email. The one-line scripts referred to in this document refer to sed scripts with a command line length of 65 characters or less . The various scripts in this document were written or provided by the following authors:
Al Aab # Established the "seders" mailing list
Edgar Allen # Many aspects
Yiorgos Adamopoulos # Many aspects
Dale Dougherty # Author of "sed & awk"
Carlos Duarte # Author of "do it with sed"
Eric Pement # Author of this document
Ken Pizzini # Author of GNU sed v3.02
S.G. Ravenhall # Script to remove html tags
Greg Ubben # Made many contributions and provided a lot of help
-------------------------------------------------------------------------
Note 1: In most cases, sed scripts can be written in a single line regardless of their length (through the `-e' option and `;' sign) - as long as the command interpreter supports it, so the one-line scripts mentioned here are not only able to be written in one line but also have a length limit. Because the meaning of these one-line scripts does not lie in their being in one line. But making it convenient for users to use these compact scripts on the command line is their meaning.
|

QQ:366840202
http://chenall.net |
|
2006-10-31 05:06 |
|
|