China DOS Union

-- Unite DOS · Advance DOS · Grow DOS --

Union site: www.cn-dos.net Forum site: www.cn-dos.net/forum
DOS stands for freedom, openness and progress. Let us work hard, learn from the openness and GNU spirit of FreeDOS and Linux, and together build and grow a free GNU GPL world!

中国DOS联盟论坛
The time now is 2026-06-29 14:53
中国DOS联盟论坛 » DOS批处理 & 脚本技术(批处理室) » [Recommendation] Collection of awk and gawk articles DigestI View 18,173 Replies 22
Original Poster Posted 2006-10-27 02:19 ·  中国 浙江 宁波 鹏博士宽带
荣誉版主
★★★
Credits 1,338
Posts 356
Joined 2005-07-15 12:09
20-year member
UID 40733
Gender Male
Status Offline
This is a collection of some articles, resources, and links about awk and gawk. Supplementary additions to this post will be made.
  ☆开始\运行 (WIN+R)☆
%ComSpec% /cset,=何奈无── 。何奈可无是原,事奈无做人奈无&for,/l,%i,in,(22,-1,0)do,@call,set/p= %,:~%i,1%<nul&ping/n 1 127.1>nul

Floor 2 Posted 2006-10-27 02:19 ·  中国 浙江 宁波 鹏博士宽带
荣誉版主
★★★
Credits 1,338
Posts 356
Joined 2005-07-15 12:09
20-year member
UID 40733
Gender Male
Status Offline
  ☆开始\运行 (WIN+R)☆
%ComSpec% /cset,=何奈无── 。何奈可无是原,事奈无做人奈无&for,/l,%i,in,(22,-1,0)do,@call,set/p= %,:~%i,1%<nul&ping/n 1 127.1>nul

Floor 3 Posted 2006-10-27 02:19 ·  中国 浙江 宁波 鹏博士宽带
荣誉版主
★★★
Credits 1,338
Posts 356
Joined 2005-07-15 12:09
20-year member
UID 40733
Gender Male
Status Offline
AWK
Zhongke Yonglian Advanced Technology Training Center(www.itisedu.com)

   AWK is an excellent text-processing tool. It is not only one of the most powerful data-processing engines available in Linux, but one of the most powerful in any environment. The greatest strength of this programming and data-manipulation language (its name comes from the first letters of its creators Alfred Aho, Peter Weinberger, and Brian Kernighan) depends on how much knowledge the user has. AWK provides extremely powerful features: pattern loading, flow control, mathematical operators, process control statements, and even built-in variables and functions. It has nearly all the fine characteristics that a complete language should have. In fact, AWK really does have its own language: the AWK programming language, formally defined by its three creators as a “pattern scanning and processing language.” It allows you to create short programs that read input files, sort data, process data, perform calculations on input, generate reports, and do countless other things.

You may be quite familiar with UNIX, but you may be very unfamiliar with awk, and that is not strange at all. Indeed, compared with its excellent capabilities, awk is still far from having the reputation it deserves. What is awk? Unlike most other UNIX commands, from the name alone we cannot tell what awk does: it is neither an English word with an independent meaning, nor an abbreviation made from several related words. In fact, awk is an abbreviation from three names: Aho, (Peter) Weinberg and (BrAIn) Kernighan. It was these three people who created awk—an excellent pattern scanning and processing tool.

   Simply put, AWK is a programming language tool for processing text. AWK is similar in many ways to the shell programming language, although AWK has a syntax entirely its own. Its design ideas came from SNOBOL4, sed, Marc Rochkind’s utility language, the language tools yacc and lex, and of course it also absorbed some excellent ideas from C. When AWK was first created, its purpose was text processing, and the basis of the language was this: whenever there is a pattern match in the input data, a series of instructions is executed. This utility scans each line in a file, looking for patterns that match what is given on the command line. If a match is found, it proceeds to the next programming step. If no match is found, it continues processing the next line.

   Although the operations may be complex, the command syntax is always:

   awk '{pattern + action}' {filenAMES}

   Here pattern indicates what AWK is looking for in the data, and action is the series of commands executed when a match is found. Braces ({}) do not always have to appear in the program, but they are used to group a series of instructions according to a particular pattern.

   GAwk is the GNU version of AWK.

一、What can AWK do?

Very similar to sed and grep, awk is also a pattern scanning and processing tool. But its capabilities are far stronger than sed and grep. awk provides extremely powerful functions: it can do almost everything grep and sed can do, and at the same time it can also perform pattern loading, flow control, mathematical operators, process control statements, and even built-in variables and functions. It has nearly all the fine characteristics that a complete language should have. In fact, awk really does have its own language: the awk programming language, which its three creators formally defined as: a pattern scanning and processing language.

二、Why use awk?

Even so, you may still ask, why should I use awk?

The first reason for using awk is that text-based pattern scanning and processing is work we often do. What awk does is somewhat like a database, but unlike a database, what it handles are text files. These files have no special storage format, and ordinary people can edit, read, understand, and process them. Database files, however, often have special storage formats, so they must be handled with database processing programs. Since we often run into this kind of database-like processing work, we should find a simple and practical way to handle it. UNIX has many tools in this area, such as sed, grep, sort, and find, and awk is an outstanding one among them.

The second reason for using awk is that awk is a simple tool, of course this is in relation to its powerful capabilities. Indeed, UNIX has many excellent tools; for example, C, UNIX’s native development tool, and its continuation C++, are both excellent. But compared with them, awk is much more convenient and concise for accomplishing the same work. First of all, awk provides solutions suited to many different needs: from awk command lines for solving simple problems to complex and elegant awk programming language. The advantage is that you do not have to use a complicated method to solve what was originally a simple problem. For example, you can solve a simple problem with a single command line, but C cannot—even the simplest program in C must go through the whole process of writing and compiling. Secondly, awk itself is interpreted, so awk programs do not have to go through a compilation process; at the same time, this allows it to fit very well with shell sCRipt programs. Finally, awk itself is simpler than C. Although awk has absorbed many excellent elements of C, and being familiar with C will be of great help in learning awk, awk itself does not require you to know C—a powerful development tool, but one that takes a great deal of time to master.

The third reason for using awk is that awk is easy to get. Unlike C and C++, awk is only one file (/BIn/awk), and almost every version of UNIX provides its own version of awk, so you do not need to worry at all about how to get awk. But C is not like this. Although C is UNIX’s native development tool, that development tool is distributed separately. In other words, you must pay separately for the C development tools for your version of UNIX (unless of course you use a pirated version), obtain and install them, and only then can you use them.

Based on the above reasons, plus awk’s powerful capabilities, we have reason to say that if you want to deal with work related to text pattern scanning, awk should be your first choice. Here is a general principle you can follow: if you have difficulty using ordinary shell tools or shell script, try awk; if awk still cannot solve the problem, then use C; if C still fails, then move on to C++.

三、Ways of invoking awk

As mentioned earlier, awk provides different solutions suited to different needs. They are:

1、The awk command line. You can use awk just like an ordinary UNIX command. On the command line you can also use the awk programming language, although awk supports multi-line input, entering a long command line and making sure it is correct is a headache, so this method is generally used only to solve simple problems. Of course, you can also use awk command lines or even awk program scripts inside shell script programs.

2、Use the -f option to invoke an awk program. awk allows you to write a section of awk code into a text file, and then invoke and execute that program on the awk command line with the -f option. We will talk about the specific method later in the awk syntax section.

3、Use the command interpreter to invoke an awk program: using the command interpreter features supported by UNIX, we can write a section of awk code into a text file, then add the following to its first line:
#!/bin/awk -f
and give this text file execute permission. After doing this, you can invoke and execute that awk program on the command line in a way similar to the following.

$awk脚本文本名 待处理文件

awk syntax:

Like other UNIX commands, awk has its own syntax:

awk

Parameter explanation:

-F re: allows awk to change its field separator.

parameter: this parameter helps assign values to different variables.

'prog': the awk program statement section. This statement section must be enclosed in single quotes: ' and ', to prevent it from being interpreted by the shell. The standard form of this program statement section is:

'pattern {action}'

Here pattern can be any egrep regular expression, and it can also be formed with the syntax /re/ plus some pattern-matching techniques. Similar to sed, you can also use "," to separate two patterns to select a certain range. For details on matching, you can refer to the appendix. If you still do not understand, get a UNIX book and study grep and sed (I myself mastered matching techniques while learning ed). The action parameter is always enclosed in braces, and consists of a series of awk statements separated by ";". awk interprets them and executes their operations on records that match the pattern given by pattern. Like the shell, you can also use “#” as a comment marker; it makes everything from “#” to the end of the line a comment, and it will be ignored during interpretation and execution. You may omit either pattern or action, but not both at the same time. When pattern is omitted there is no pattern matching, meaning the operation is performed on all lines (records). When action is omitted, the default action is performed—displaying on standard output.

-f progfile: allows awk to invoke and execute the program file specified by progfile. progfile is a text file, and it must conform to awk syntax.

in_file: the input file for awk. awk allows processing multiple input files. It is worth noting that awk does not modify input files. If no input file is specified, awk accepts standard input and displays the result on standard output. awk supports input/output redirection.

awk records, fields, and built-in variables:

As mentioned earlier, the work awk handles has something in common with database processing. One of those common points is that awk supports processing records and fields. Processing fields is something grep and sed cannot do, and this is one of the reasons awk is superior to the two. In awk, by default, one line in a text file is always regarded as one record, and some part of a line is regarded as a field within that record. In order to manipulate these different fields, awk borrows the shell’s method and uses $1,$2,$3... to represent the different fields in a line (record) in sequence. Specially, awk uses $0 to represent the entire line (record). Different fields are separated by characters called delimiters. The system default delimiter is a space. awk allows you to change this delimiter on the command line in the form -F re. In fact, awk uses a built-in variable FS to remember this delimiter. awk has several such built-in variables, for example the record separator variable RS, the current record count NR, and so on. A later appendix in this article lists all the built-in variables. These built-in variables can be referenced or modified in awk programs. For example, you can use the NR variable to specify a working range in pattern matching, or by modifying the record separator RS, use a special character rather than newline as the record separator.

Example: display the first, third, and seventh fields separated by the character % in lines 7 through 15 of the text file myfile:

awk -F % 'NR==7,NR==15 {printf $1 $3 $7}'

四、AWK built-in functions

One reason awk became an excellent programming language is that it absorbed many advantages from some excellent programming languages (for example C). One of these advantages is the use of built-in functions. awk defines and supports a series of built-in functions. Because of these functions, the capabilities provided by awk are more complete and powerful. For example, awk uses a series of built-in string-processing functions (these functions look similar to the string-processing functions in C, and are also used in almost the same way as functions in C); it is precisely because of these built-in functions that awk’s ability to process strings is much stronger. The appendix later in this article lists the built-in functions generally provided by awk. These built-in functions may differ somewhat from your version of awk, so before using them, it is best to check the online help on your system.

As an example of a built-in function, we will introduce awk’s printf function here. This function makes awk output consistent with C language output. In fact, many forms in awk are borrowed from C. If you are familiar with C, you may remember its printf function, whose powerful formatted output brought us much convenience. Fortunately, we meet it again in awk. printf in awk is almost exactly the same as in C. If you are familiar with C, you can completely use printf in awk following the C style. Therefore here we only give one example; if you are not familiar with it, just pick up any beginner’s C book and flip through it.

Example: display the line number and the 3rd field in the file myfile:

$awk '{printf"%03d%sn",NR,$1}' myfile

五、Using awk on the command line

In order, we ought to explain awk programming now, but before that we will use some examples to review the previous material. These examples are all used on the command line, and from them we can see how convenient it is to use awk on the command line. One reason for doing this is to pave the way for the following content, and another is to introduce some ways to solve simple problems. There is absolutely no need to use complicated methods to solve simple problems—since awk provides a simpler way.

Example: display all lines in text file mydoc that match (contain) the string "sun".

$awk '/sun/{print}' mydoc

Since displaying the whole record (the whole line) is awk’s default action, the action part can be omitted.

$awk '/sun/' mydoc

Example: the following is a more complex matching example:

$awk '/un/,/OOn/ {print}' myfile

It will display the lines between the first line matching Sun or sun and the first line matching Moon or moon, and output them to standard output.

Example: the following example shows the use of built-in variables and the built-in function length():

$awk 'length($0)>80 {print NR}' myfile

This command line will display the line numbers of all lines in text file myfile that exceed 80 characters. Here, $0 represents the whole record (line), and the built-in variable NR does not use the '$' marker.

Example: as a more practical example, suppose we want to perform a security check on UNIX users by examining the pASswd file under /etc, checking whether the passwd field (the second field) is "*". If it is not "*", that means the user has no password set, and those usernames (the first field) should be displayed. We can implement it with the following statement:

#awk -F: '$2=="" {printf("%s no password!n",$1' /etc/passwd

In this example, the field separator of the passwd file is “:”, so -F:must be used to change the default field separator. This example also involves the use of the built-in function printf.

六、awk variables

Like other programming languages, awk allows variables to be set in the programming language. In fact, providing variables is a basic requirement of a programming language; personally, I have never seen a programming language that does not provide variables.

awk provides two kinds of variables. One kind is awk built-in variables, which we already discussed above. What needs to be stressed is that unlike the other variables mentioned later, built-in variables in awk programs do not need the "$" marker when referenced (recall the earlier use of NR). The other kind of variable provided by awk is user-defined variables. awk allows users to define and call their own variables in awk program statements. Of course, such variables cannot be the same as built-in variables or other awk reserved words. In awk, when referencing user-defined variables, you must add the "$" marker before them. Unlike C, variables in awk do not need to be initialized; awk determines their specific data type based on the form and context in which they first appear in awk. When the variable type is uncertain, awk defaults it to string type. Here is a trick: if you want your awk program to know the explicit type of a variable you use, you should assign it an initial value in the program. We will use this trick in later examples.

七、Operations and tests

As one of the characteristics a programming language should have, awk supports many kinds of operations, basically the same as those provided by C: such as +、-、*、/、% and so on. At the same time, awk also supports things like ++、--、+=、-=、=+、=- similar to those in C, which brings great convenience to users familiar with C when writing awk programs. As an extension of operational ability, awk also provides a series of built-in arithmetic functions (such as log、sqr、cOS、sin and so on) and some functions used to manipulate strings (such as length、subSTr and so on). The use of these functions greatly improves awk’s computational capabilities.

As part of conditional transfer statements, relational testing is something every programming language has, and awk is no exception. awk allows many kinds of tests, such as the commonly used ==(equal)、!=(not equal)、>(greater than)、<(less than)、>=(greater than or equal)、>=(less than or equal)and so on. At the same time, for pattern matching, it also provides ~(matches)and!~(does not match)tests.

As an extension of testing, awk also supports using logical operators: !(not)、&&(and)、||(or)and parentheses()for multiple tests, greatly enhancing awk’s capabilities. The appendix in this article lists the operations, tests, and operator precedence allowed in awk.

八、Flow control in awk

Flow control statements are an indispensable part of any programming language. Any good language has some statements for carrying out flow control. awk provides complete flow control statements similar to those in C, which brings us great convenience in programming.

1、BEGIN and END:

In awk there are two special expressions, BEGIN and END. Both can be used in pattern (see the earlier awk syntax). The function of BEGIN and END is to give the program an initial state and to perform cleanup work after the program ends. Any operations listed after BEGIN (inside {}) will be executed before awk starts scanning input, while operations listed after END will be executed after all input has been scanned. Therefore, BEGIN is usually used to display variables and preset (initialize) variables, while END is used to output the final result.

Example: accumulate the sales amount in sales file xs (assuming the sales amount is in the third field of each record):

$awk
>'BEGIN { FS=":";print "Sales amount statistics";total=0}
>{print $3;total=total+$3;}
>END {printf "Total sales amount: %.2f",total}' sx
(Note: > is the secondary prompt provided by the shell. If you want to break lines in shell program awk statements and in the awk language, you need to add a backslash at the end of the line.)

Here, BEGIN presets the internal variable FS (field separator) and the user-defined variable total, and at the same time displays the output heading before scanning. END then prints the grand total after scanning is complete.

2、Flow control statements

awk provides complete flow control statements, and their usage is similar to that in C. We explain them one by one below:

2.1、if...else statement:

Format:
if(表达式)
statement 1
else
statement 2

In this format, "statement 1" may consist of multiple statements. If you want to make it convenient for awk to judge and also easier for yourself to read, you had better enclose multiple statements in {}. awk branch structures allow nesting, and their form is:

if(表达式1)
{if(表达式2)
statement 1
else
statement 2
}
statement 3
else {if(表达式3)
statement 4
else
statement 5
}
statement 6

Of course, in actual operation you may not use such a complex branch structure; it is given here only to show its form.

2.2、while statement

The format is:

while(表达式)
statement

2.3、do-while statement

The format is:

do
{
statement
}while(条件判断语句)

2.4、for statement

Format:

for(初始表达式;终止条件;步长表达式)
{statement}

In awk’s while, do-while and for statements, it is allowed to use break and continue statements to control the flow, and also statements such as exit to quit. break interrupts the loop currently being executed and jumps outside the loop to execute the next statement. continue jumps from the current position to the beginning of the loop for execution. There are two cases for exit: when the exit statement is not in END, any exit command in an operation behaves as if end-of-file has been reached, all pattern or action execution stops, and the actions in the END pattern are executed. But an exit appearing in END will cause the program to terminate.

Example: in order to

九、User-defined functions in awk

Defining and calling users’ own functions is a feature found in almost every high-level language, and awk is no exception. But original awk does not provide function capability; only in nawk or newer versions of awk can functions be added.

A function is used in two parts: function definition and function call. Function definition includes the code to be executed (the function itself) and the temporary call passed from the main program code to that function.

The method for defining an awk function is as follows:

function 函数名(参数表){
function body
}

In gawk, function may be abbreviated as func, but other versions of awk do not allow this. The function name must be a legal identifier. The parameter list may provide no parameters (but when calling the function, the pair of parentheses after the function name is still indispensable), or it may provide one or more parameters. Similar to C, awk parameters are also passed by value.

Calling a function in awk is relatively simple, and its method is similar to that of C, but awk is more flexible than C because it does not check parameter validity. In other words, when you call a function, you may list more or fewer parameters than the function expects (as specified in the function definition). Extra parameters will be ignored by awk, while missing parameters will be set by awk to the default value 0 or the empty string. Which value is used specifically depends on how the parameter is used.

awk functions have two return methods: implicit return and explicit return. When awk executes to the end of a function, it automatically returns to the calling program; in this case the function returns implicitly. If you need to exit a function before its end, you can explicitly use a return statement to exit in advance. The method is to use a statement in the function of the form: return 返回值.

Example: the following example demonstrates the use of functions. In this example, a function named print_header is defined. This function takes two parameters, FileName and PageNum. The FileName parameter passes the name of the file currently being used to the function, and PageNum is the current page number. The function’s job is to print (display) the current file name and the current page number. After completing this function, it returns the page number of the next page.

nawk
>'BEGIN{pageno=1;file=FILENAME
>pageno=print_header(file,pageno);#call function print_header
>printf("The current page number is: %dn",pageno);
>}

>#define function print_header
>function print_header(FileName,PageNum){
>printf("%s %dn",FileName,PageNum); >PageNum++;return PageNUm;
>}
>}' myfile

Executing this program will display the following:

myfile 1
The current page number is: 2

十、Advanced input and output in awk

1. Read the next record:

awk’s next statement causes awk to read the next record and complete pattern matching, then immediately execute the corresponding action. It is usually used to execute code in the action of a matching pattern. next causes any additional matching patterns for this record to be ignored.

2. Simply read one record

awk’s gETLine statement is used to simply read one record. If a user has one data record that is like two physical records, then getline is especially useful. It performs normal field splitting (setting field variables $0 FNR NF NR). It returns 1 on success, and 0 on failure (reaching end of file). If you want to simply read a file, you can write code like the following:

Example: example of using getline

{while(getline==1)
{
#process the inputted fields
}
}

getline can also save input data in one field, rather than handling normal fields through the form getline variable. When this method is used, NF is set to 0, and FNR and NR are incremented.

Users can also use getline<"filename" to input data from a given file, rather than from the contents listed on the command line. At this time, getline will perform normal field splitting (setting field variables $0 and NF). If the file does not exist, it returns -1, on success it returns 1, and 0 means failure. The user can read data from the given file into a variable, and may also replace filename with stDIn (standard input device) or a variable containing that file name. It is worth noting that when this method is used, FNR and NR are not modified.

Another way of using the getline statement is to accept input from a UNIX command, for example the following:

Example: example of accepting input from a UNIX command

{while("who -u"|getline)
{
#process each line from the who command
}
}

Of course, the following form can also be used:

"command" | getline variable

3. Close a file:

awk allows closing an input or output file in the program by using awk’s close statement.

close("filename"t

filename can be a file opened by getline (it can also be stdin, a variable containing the file name, or the exact command used by getline), or an output file (it can be stdout, a variable containing the file name, or the exact command used with a pipe).

4. Output to a file:

awk allows outputting results to a file in the following ways:

printf("hello word!n"t>"datafile"
or
printf("hello word!n"t>>"datafile"

5. Output to a command

awk allows outputting results to a command in the following way:

printf("hello word!n"t|"sort-t','"

十一、Mixed programming with awk and shell script

Because awk can be used as a shell command, awk can be integrated very well with shell batch programs, which makes mixed programming with awk and shell programs possible. The key to mixed programming is the dialogue between awk and shell script. In other words, it is the exchange of information between awk and shell script: awk obtains the information it needs from shell script (usually the values of variables), executes shell command lines in awk, shell script sends the results of command execution to awk for processing, and shell script reads the execution results of awk, and so on.

1.awk reads Shell script program variables

In awk we can read variables from a sell scrpit program by using the form “'$变量名'”.

Example: in the following example, we will read the variable Name from the sell scrpit program. This variable stores the author of the text file myfile, and awk will print that person’s name.

$cat writename
:
# @(#)
#
.
.
.
Name="张三" nawk 'BEGIN {name="'Name'"; printf("t%stauthor%sn",FILENAME,name"t;}
{...}END{...}' myfile
.
.
.

2.Send the execution result of a shell command to awk for processing

As one way of transmitting information, we can pass the result of a shell command to awk through a pipe(|)for processing:

Example: example of awk processing the execution result of a shell command

$who -u | awk '{printf("%s is running %sn",$2,$1)}'

This command will print the program names being run on logged-in terminals.

3.shell script program reads the execution result of awk

To allow a shell script program to read the result produced by awk, we can adopt some special methods. For example, we can store the result produced by awk into a shell script variable in the form 变量名=`awk语句`. Of course, we can also use a pipe to pass the result produced by awk to a shell script program for processing.

Example: as one mechanism for passing messages, UNIX provides a command wall for sending messages to all its users (meaning write to all). This command allows sending messages to all working users (terminals). Therefore, we can simulate this program through a shell batch program wall.shell(in fact, in relatively old versions wall was just such a shell batch program:

$cat wall.shell
:
# @(#) wall.shell: send messages to each logged-in terminal
#
cat >/tmp/$$
#user enters message text who -u | awk '{print $2}' | while read tty
do
cat /tmp/$$>$tty
done

In this program, awk accepts the execution result of the who -u command, which prints information on all logged-in terminals. The second field is the device name of the logged-in terminal, so the awk command extracts that device name, then the while read tty statement reads these file names in a loop into the variable tty (a shell script variable), as the final destination address for message delivery.

4.Execute shell command lines in awk----embedded function system()

system() is an embedded function not suited to character or numeric type. The function’s job is to process the string passed to it as a parameter. system processes this parameter by treating it as a command, that is, executing it just like a command line. This allows users to flexibly execute commands or scripts in their own awk programs whenever needed.

Example: the following program uses the system embedded function to print a report file prepared by the user. This file is stored in a file named myreport.txt. For brevity, we list only its END part:

.
.
.
END {close("myreport.txt"t;system("lp myreport.txt"t;}

In this example, we first use the close statement to close the file myreport.txt, and then use the system embedded function to send myreport.txt to the printer for printing.

Writing to this point, I have to say goodbye to my friends. Honestly speaking, this is still only introductory knowledge of awk. Computers are a science that is always moving forward, and awk is no exception. All this article can do is pave a small beginning for you on the long road ahead; as for the rest of the road, you still have to walk it yourself. To be frank, if this article can truly bring you even a little convenience on the road ahead, then I will be satisfied!

If you have any questions about this article, please E-mail To:Chizlong@yeah.net or leave a message at the homepage http://chizling.yeah.net.


Appendix:

1.awk regular-expression metacharacters

Escape sequence
^ Match at the beginning of a string
$ Match at the end of a string
. Match any single character
Match any one character inside
Match characters in the ranges A-C and a-c (in alphabetical order)
Match any character except those inside
Desk|Chair Match either Desk or Chair
Concatenation. Match any one character from A, B, C, and it must be followed by any one character from D, E, F.
* Match a character from A, B, or C appearing zero or more times
+ Match a character from A, B, or C appearing one or more times
? Match an empty string or match any one of A, B, or C as a single character
(Blue|Black)berry Combined regular expression, matching Blueberry or Blackberry

2.awk arithmetic operators

Operator Purpose
------------------
x^y x to the power y
x**y Same as above
x%y Compute the remainder of x/y (modulo)
x+y x plus y
x-y x minus y
x*y x times y
x/y x divided by y
-y Negative y (the sign switch of y); also called unary minus
++y Increment y by 1, then use y (prefix increment)
y++ Use y, then increment by 1 (postfix increment)
--y Decrement y by 1, then use y (prefix decrement)
y-- Use y, then decrement by 1 (postfix decrement)
x=y Assign the value of y to x
x+=y Assign the value of x+y to x
x-=y Assign the value of x-y to x
x*=y Assign the value of x*y to x
x/=y Assign the value of x/y to x x%=y Assign the value of x%y to x
x^=y Assign the value of x^y to x
x**=y Assign the value of x**y to x

3.Tests allowed in awk:

Operator Meaning

x==y x equals y
x!=y x does not equal y
x>y x is greater than y
x>=y x is greater than or equal to y
xx<=y x is less than or equal to y?
x~re x matches regular expression re?
x!~re x does not match regular expression re?

4.awk operators (in ascending order of precedence)

= 、+=、 -=、 *= 、/= 、 %=
||
&&
> >= < <= == != ~ !~
xy (string concatenation, 'x''y' becomes "xy")
+ -
* / %
++ --

5.awk built-in variables (predefined variables)

Explanation: in the table, V indicates the first tool supporting the variable (same below): A=awk,N=nawk,P=POSIX awk,G=gawk

V Variable Meaning Default value
--------------------------------------------------------
N ARGC number of command-line arguments
G ARGIND ARGV index of the file currently being processed
N ARGV command-line argument array
G CONVFMT numeric conversion format %.6g
P ENVIRON UNIX environment variables
N ERRNO UNIX system error message
G FIELDWIDTHS whitespace-separated string of input field widths
A FILENAME name of the current input file
P FNR current record number
A FS input field separator space
G IGNORECASE controls case sensitivity 0(case-sensitive)
A NF number of fields in the current record
A NR number of records already read
A OFMT output format for numbers %.6g
A OFS output field separator space
A ORS output record separator newline
A RS input record separator newline
N RSTART start of the string matched by the match function
N RLENGTH length of the string matched by the match function
N SUBSEP subscript separator "34"

6.awk built-in functions

V Function Purpose or return value
------------------------------------------------
N gsub(reg,string,target) Replace string in target each time regular expression reg matches
N index(search,string) Return the position of search in string
A length(string) Count the number of characters in string
N match(string,reg) Return the position in string where regular expression reg matches
N printf(fORMat,variable) Formatted output, output variable according to the format provided by format.
N split(string,store,delim) Split string into array elements of store according to delimiter delim
N sprintf(format,variable) Return a string containing formatted data based on format; variables are the data to be placed into the string
G strftime(format,timestamp) Return a date or time string based on format; timestmp is the time returned by systime()
N sub(reg,string,target) On the first match of regular expression reg, replace the string in target
A substr(string,position,len) Return a substring of length len starting at position
P totower(string) Return the corresponding lowercase characters in string
P toupper(string) Return the corresponding uppercase characters in string
A atan(x,y) x’s cotangent (radians)
N cos(x) x’s cosine (radians)
A eXP(x) e to the x power
A int(x) integer part of x
A log(x) natural logarithm of x
N rand() random number between 0 and 1
N sin(x) x’s sine (radians)
A sqrt(x) x’s square root
A srand(x) Initialize the random number generator. If x is omitted, system() is used
G system() Return the time elapsed since January 1, 1970 (in seconds)

[ Last edited by 无奈何 on 2006-10-27 at 02:31 AM ]
  ☆开始\运行 (WIN+R)☆
%ComSpec% /cset,=何奈无── 。何奈可无是原,事奈无做人奈无&for,/l,%i,in,(22,-1,0)do,@call,set/p= %,:~%i,1%<nul&ping/n 1 127.1>nul

Floor 4 Posted 2006-10-27 02:20 ·  中国 浙江 宁波 鹏博士宽带
荣誉版主
★★★
Credits 1,338
Posts 356
Joined 2005-07-15 12:09
20-year member
UID 40733
Gender Male
Status Offline
Repost note: original link unknown
GAWK Manual

Author: Wilbur Lang

Chapter 1 Preface
Chapter 2 Introduction
Chapter 3 Reading input files
Chapter 4 Printing
Chapter 5 Patterns
Chapter 6 Expressions as descriptions of Actions
Chapter 7 Control statements inside Actions
Chapter 8 Built-in Functions
Chapter 9 User-defined functions
Chapter 10 Examples
Chapter 11 Conclusion

Chapter 1 Preface

awk is a programming language with very powerful capabilities for data processing. For processing such as modifying, comparing, and extracting data in text files, awk can easily accomplish it with very short programs. If you use languages such as C or Pascal to write programs to do the above, it would be inconvenient and very time-consuming, and the programs written would also be quite large.

awk can break down input data according to user-defined formats, and can also print data according to user-defined formats.

The name awk comes from the first letters of the surnames of its original designers: Alfred V. Aho, Peter J. Weinberger, Brian W. Kernighan.
awk was first completed in 1977. A new version of awk was released in 1985, and its functions were much enhanced compared with the old version.

gawk is GNU's awk. gawk was first completed in 1986, and has since been continuously improved and updated. gawk includes all the functions of awk.

The following gawk examples will use the 2 input files below for illustration.

File 'BBS-list':
aardvark 555-5553 1200/300 B
alpo-net 555-3412 2400/1200/300 A
barfly 555-7685 1200/300 A
bites 555-1675 2400/1200/300 A
camelot 555-0542 300 C
core 555-2912 1200/300 C
fooey 555-1234 2400/1200/300 B
foot 555-6699 1200/300 B
macfoo 555-6480 1200/300 A
sdace 555-3430 2400/1200/300 A
sabafoo 555-2127 1200/300 C


File 'shipped':
Jan 13 25 15 115
Feb 15 32 24 226
Mar 15 24 34 228
Apr 31 52 63 420
May 16 34 29 208
Jun 31 42 75 492
Jul 24 34 67 436
Aug 15 34 47 316
Sep 13 55 37 277
Oct 29 54 68 525
Nov 20 87 82 577
Dec 17 35 61 401

Jan 21 36 64 620
Feb 26 58 80 652
Mar 24 75 70 495
Apr 21 70 74 514



Chapter 2 Introduction

gawk's main function is to search each line of a file for specified patterns.
When a line matches the specified patterns, gawk executes the specified actions on that line. gawk processes every line of the input file this way until the end of the input file.

A gawk program is made up of many patterns and actions. The action is written inside braces { }. A pattern is followed by an action. The whole gawk program looks like this:

pattern {action}
pattern {action}

In the rules inside a gawk program, the pattern or action can be omitted,
but the two cannot both be omitted at the same time. If the pattern is omitted,
the action will be executed for every line in the input file. If the action is omitted, the default action prints all input lines that match the pattern.



2.1 How to execute a gawk program

Basically, there are 2 ways to execute a gawk program.

□If the gawk program is short, gawk can be written directly on the command line, as shown below:

gawk 'program' input-file1 input-file2 ...

Here program includes some patterns and actions.

□If the gawk program is longer, a more convenient method is to store the gawk program in a file,
that is, write the patterns and actions in a file named program-file. The format for executing
gawk is shown below:

gawk -f program-file input-file1 input-file2 ...

If there is more than one gawk program file, the format for executing gawk is shown below:

gawk -f program-file1 -f program-file2 ... input-file1
input-file2 ...



2.2 A simple example

Now let us look at a simple example. Because the gawk program is short, the gawk program is written directly on the command line.

gawk '/foo/ {print $0}' BBS-list

The actual gawk program is /foo/ {print $0}. /foo/ is the pattern, meaning it searches
every line in the input file to see whether it contains the substring 'foo'. If it contains 'foo', then the action is executed.
The action is print $0, which prints the contents of the current line. BBS-list is the input file.

After executing the above command, the following result will be printed:
fooey 555-1234 2400/1200/300 B
foot 555-6699 1200/300 B
macfoo 555-6480 1200/300 A
sabafoo 555-2127 1200/300 C



2.3 A more complex example

gawk '$1 == "Feb" {sum=$2+$3} END {print sum}' shipped

This example compares the first field of the input file 'shipped' with "Feb".
If they are equal, then the values of the 2nd and 3rd fields will be added to the variable sum.
This action is repeated for every line in the input file until every line has been processed.
Finally, the value of sum is printed. END {print sum} means that after all input has been
read, the action print sum is executed once, that is, the value of sum is printed.

The result is:
84


Chapter 3 Reading input files

gawk input can be read from standard input or from specified files. The unit
of input is called a "record" (records). When processing, gawk handles one record at a time. (p9 of 46)
The default value of each record is one line, and a record is divided into multiple
fields.



3.1 How input is divided into records

The gawk language divides input into records. Records are separated from each other by the
record separator. The default value of the record separator is the newline
character, so the default record separator makes each line of text one record.

The record separator changes when the built-in variable RS changes. RS is a string,
and its default value is "\n". Only the first character of RS is effective; it is treated as the record
separator, while the other characters in RS are ignored.

The built-in variable FNR stores the current input file 已颈欢寥〉募锹贾鍪N
built-in variable NR stores the total of all input files so far 已颈欢寥〉募锹贾鍪

3.2 Fields

gawk automatically divides each record into multiple fields. Similar to words in a
line, gawk's default behavior treats fields as being separated by whitespace. In
gawk, whitespace means one or more spaces or tabs.

In a gawk program, '$1' represents the first field, '$2' the second field,
and so on. For example, suppose one input line is as follows:

This seems like a pretty nice example.

The first field or $1 is 'This', the second field or $2 is 'seems', and so on.
One point deserves special attention: the seventh field or $7 is 'example.' rather than 'example'.

No matter how many fields there are, $NF can be used to represent the last field of a record. Using
the example above, $NF is the same as $7, namely 'example.'.

NF is a built-in variable whose value represents the number of fields in the current record. $0 may look like the zeroth field, but it is a special case; it represents the entire record.

Here is a somewhat more complex example:

gawk '$1~/foo/ {print $0}' BBS-list

The result is as follows:
fooey 555-1234 2400/1200/300 B
foot 555-6699 1200/300 B
macfoo 555-6480 1200/300 A
sabafoo 555-2127 1200/300 C

This example checks the first field of each record in the input file 'BBS-list'. If
it contains the substring 'foo', then that record is printed.



3.3 How records are divided into fields

gawk divides a record into fields according to the field separator. The field sepa- rator is represented by the built-in variable FS.

For example, if the field separator is 'oo', then the following line:

moo goo gai pan

will be divided into three fields: 'm', ' g', ' gai pan'.

In a gawk program, '=' can be used to change the value of FS. For example:

gawk 'BEGIN {FS=","}; {print $2}'

The input line is as follows:

John Q. Smith, 29 Oak St., Walamazoo, MI 42139

Executing gawk will print the string ' 29 Oak St.'. The action after BEGIN is executed once
before the first record is read.
Chapter 4 Printing

In gawk programs, the thing actions do most often is printing. For simple
printing, use the print statement. For complex formatted printing, use the printf statement.



4.1 The print statement

The print statement is used for simple, standard output format. The statement format is as follows:

print item1, item2, ...

When outputting, each item is separated by a space, and a newline is added at the end.

If nothing follows the 'print' statement, it has the same ef-
fect as 'print $0'; it prints the current record. To print a blank line, you can use 'print
""'. To print a fixed piece of text, you can enclose the text in double quotes, for example
'print "Hello there"'.
Here is an example that prints the first two fields of each input record:

gawk '{print $1,$2}' shipped

The result is as follows:
Jan 13
Feb 15
Mar 15
Apr 31
May 16
Jun 31
Jul 24
Aug 15
Sep 13
Oct 29
Nov 20
Dec 17

Feb 26
Mar 24
Apr 21



4.2 Output Separators

Earlier we already mentioned that if a print statement contains multiple items, and the items
are separated by commas, then when printed each item will be separated by a space. You can use any
string as the output field separator; you can change the output field separator through the set-
ting of the built-in variable OFS. The initial value of OFS is " ", that is, one
space.

The output of the entire print statement is called the output record. After the print statement
outputs the output record, it then outputs a string called the output
record separator. The built-in variable ORS is used to indicate this string. The initial value
of ORS is "\n", that is, a newline.

The following example prints the first and second fields of each record. These two
(p16 of 46)
fields are separated by a semicolon ';', and a blank line is added after each output line.

gawk 'BEGIN {OFS=";"; ORS="\n\n"} {print $1, $2}' BBS-list

The result is as follows:
aardvark;555-5553

alpo-net;555-3412

barfly;555-7685

bites;555-1675

camelot;555-0542

core;555-2912

fooey;555-1234

foot;555-6699

macfoo;555-6480

sdace;555-3430

sabafoo;555-2127




4.3 The printf statement

The printf statement makes it easier to control the output format precisely. The printf statement can
specify the width of each printed item, and can also specify various numeric formats.

The format of the printf statement is:

printf format, item1, item2, ...

The difference between print and printf lies in format; the arguments of printf have one more
string format than print. The form of format is the same as the format of ANSI C's printf.

printf does not automatically output a newline. The built-in variables OFS and ORS have no effect on printf state-
ments.

A format specification begins with the character '%', followed by a format control letter.

The format control letters are as follows:

'c' Print a number as an ASCII character.
For example, 'printf "%C",65' prints the character 'A'.

'd' Print a decimal integer.

'i' Print a decimal integer.

'e' Print a number in scientific notation.
For example

print "$4.3e",1950
(p19 of

The result will print '1.950e+03'.

'f' Print a number in floating-point form.

'g' Print a number either in scientific notation or in floating-point form. If the absolute value of the number
is greater than or equal to 0.0001, then it is printed in floating-point form; otherwise it is printed
in scientific notation.

'o' Print an unsigned octal integer.

's' Print a string.

'x' Print an unsigned hexadecimal integer. 10 through 15 are represented by 'a' through 'f'.

'X' Print an unsigned hexadecimal integer. 10 through 15 are represented by 'A' through 'F".

'%' It is not really a format control letter; '%%" prints "%".

A modifier can be added between % and the format control letter. A modifier is used to fur-
ther control the output format. Possible modifiers are as follows:

'-' Used before width, indicating left alignment. If '-' does not appear, then it will be
right-aligned within the specified width. For example:

printf "%-4S", "foo"

will print 'foo '.

'width' This number indicates the width to be used when printing the corresponding field. For example:

printf "%4s","foo"

will print ' foo'.

The value of width is a minimum width, not a maximum width. If an item
requires more width than width, then it is not affected by width. For example

printf "%4s","foobar"
will print 'foobar'.

'.prec' This number specifies the precision when printing. It specifies the number of digits to the right of the decimal point. If
a string is to be printed, it specifies how many charac-
ters of this string will be printed at most.



Chapter 5 patterns

In a gawk program, only when a pattern matches the current input record does its
corresponding action get executed.



5.1 Types of patterns

Here is a summary of the various forms of patterns in gawk:

/regular expression/
(p22 of
A regular expression used as a pattern. Whenever an input record (
record) contains the regular expression, it is considered a match.

expression
A single expression. When a value is not 0, or a string is not empty,
it can be considered a match.

pat1,pat2
A pair of patterns separated by a comma, specifying a range of records.

BEGIN
END
These are special patterns; gawk will execute the actions corresponding
to BEGIN or END when starting execution or when finishing.

null
This is an empty pattern. It is considered to match every input record.

(p23 of
5.2 Regular Expressions as Patterns

A regular expression, abbreviated regexp, is a way of describing a string. A regular expression
enclosed in slashes ('/') serves as a gawk pattern.

If an input record contains the regexp, it is considered a match. For example, if the pattern is /foo/,
then any input record containing 'foo' is considered a match.

The following example prints the 2nd field of input records containing 'foo'.

gawk '/foo/ {print $2}' BBS-list

The result is as follows:
555-1234
555-6699
555-6480
555-2127

regexp can also be used in comparison expressions.

(p24 of
exp ~ /regexp/
If exp matches regexp, the result is true.

exp !~ /regexp/
If exp does not match regexp, the result is true.



5.3 Comparison Expressions as Patterns

Comparison patterns are used to test relationships between two numbers or strings such as greater than, equal to,
or less than. Some comparison patterns are listed below:

x<y If x is less than y, the result is true.
x<=y If x is less than or equal to y, the result is true.
x>y If x is greater than y, the result is true.
x>=y If x is greater than or equal to y, the result is true.
x==y If x is equal to y, the result is true.
x!=y If x is not equal to y, the result is true.
x~y If x matches regular expression y, the result is true.
(p25 of
x!~y If x does not match regular expression y, the result is true.

For x and y mentioned above, if both are numbers then it is treated as a numeric comparison;
otherwise they are converted to strings and compared as strings. Two strings are compared by
first comparing the first character, then the second character, and so on, until a difference
appears. If two strings are equal up to the end of the shorter one, then the longer
string is considered greater than the shorter one. For example, "10" is less than "9", and "abc" is less than "abcd".



5.4 Patterns Using Boolean Operators

A boolean pattern combines other patterns using the boolean operators "or" ('||'), "and"
('&&'), and "not" ('!').
For example:

gawk '/2400/ && /foo/' BBS-list
gawk '/2400/ || /foo/' BBS-list
gawk '! /foo/' BBS-list


Chapter 6 Expressions as Actions

Expressions are the basic building blocks of actions in gawk programs.



6.1 Arithmetic operations

The arithmetic operations in gawk are as follows:

x+y addition
x-y subtraction
-x negative
+x positive. Actually it has no effect.
x*y multiplication
x/y division
x%y remainder. For example 5%3=2.
x^y
x**y x to the power y. For example 2^3=8.



6.2 Comparison Expressions and Boolean Expressions

A comparison expression is used to compare relationships
between strings or numbers; the operator symbols are the same as in the C language. They are listed below:

x<y
x<=y
x>y
x>=y
x==y
x!=y
x~y
x!~y

If the comparison result is true, its value is 1.
Otherwise its value is 0.
There are three kinds of boolean expressions:

boolean1 && boolean2
boolean1 || boolean2
! boolean



6.3 Conditional Expressions

A conditional expression is a special kind of expression that contains 3 operands.
Conditional expressions are the same as in the C language:

selector ? if-true-exp : if-false-exp

It has 3 subexpressions. The first subexpression selector is evaluated first. If it is true,
then if-true-exp is evaluated and its value becomes the value of the whole expression. Otherwise if-false-
exp is evaluated and its value becomes the value of the whole expression.

For example, the following expression produces the absolute value of x:
x>0 ? x : -x



Chapter 7 Control statements inside Actions

In gawk programs, control statements such as if and while control the flow
of program execution. The control statements in gawk are similar to those in C.

Many control statements include other statements; the included statements are called the body. If
the body includes more than one statement, these statements must be enclosed in braces { },
and the statements must be separated by newlines or semicolons.



7.1 The if statement

if (condition) then-body
(p30 of
If condition is true, then then-body is executed; otherwise else-body is executed.

An example is as follows:

if (x % 2 == 0)
print "x is even"
else
print "x is odd"



7.2 The while statement

while (condition)
body

The first thing a while statement does is test condition. If condition is true, then
the body statement is executed. After the body statement has finished executing, condition is tested again. If
condition is true, then the body is executed again. This process is repeated until
condition is no longer true. If condition is false on the first test, then
the body is never executed.

The following example prints the first three fields of each input record.

gawk '{ i=1
while (i <= 3) {
print $i
i++
}
}'



7.3 The do-while statement

do
body
while (condition)

This do loop executes body once, and then repeats body as long as condition is true.
(p32 of
Even if condition is false at the start, body is still executed once.

The following example prints each input record ten times.

gawk '{ i= 1
do {
print $0
i++
} while (i <= 10)
}'



7.4 The for statement

for (initialization; condition; increment)
body

This statement executes initialization at the start, and then as long as condition is true, it
repeatedly executes body and performs increment.

The following example prints the first three fields of each input record.

gawk '{ for (i=1; i<=3; i++)
print $i
}'



7.5 The break statement

A break statement jumps out of the innermost enclosing for, while, or do-while loop.

The following example finds the smallest divisor of any integer, and also determines whether it is prime.

gawk '# find smallest divisor of num
{ num=$1
for (div=2; div*div <=num; div++)
if (num % div == 0)
break
if (num % div == 0)
printf "Smallest divisor of %d is %d\n", num, div
else
printf "%d is prime\n", num }'



7.6 The continue statement
(p34 of 46)
The continue statement is used inside for, while, and do-while loops. It skips
the rest of the loop body, causing the next loop iteration to begin immediately.

The following example prints all the numbers from 0 to 20, but 5 will not be printed.

gawk 'BEGIN {
for (x=0; x<=20; x++) {
if (x==5)
continue
printf ("%d",x)
}
print ""
}'



7.7 The next statement, next file statement, and exit statement

The next statement forces gawk to immediately stop processing the current record and continue with the next
record.

The next file statement is similar to next. However, it forces gawk to immediately stop processing the current
data file.

The exit statement causes the gawk program to stop executing and exit. However, if END appears,
it will execute the END actions.



Chapter 8 Built-in Functions

Built-in functions are functions built into gawk, and built-in
functions can be called anywhere in a gawk program.



8.1 Numeric built-in functions

int(x) gets the integer part of x, truncating toward 0. For example: int(3.9)
is 3, and int(-3.9) is -3.
(p36 of 46)
sqrt(x) gets the positive square root of x. Example: sqrt(4)=2
exp(x) gets x's power. Example: exp(2) means e*e .
log(x) gets the natural logarithm of x.
sin(x) gets the sine value of x, where x is in radians.
cos(x) gets the cosine value of x, where x is in radians.
atan2(y,x) gets the arctangent value of y/x, and the resulting value is in radians.
rand() produces a random number value. This random number is uniformly distributed between 0 and 1. This
value will not be 0, nor will it be 1.
Each time gawk runs, rand starts producing numbers from the same point, or seed.
srand(x) sets the starting point, or seed, for generating random numbers to x. If the second time you set
the same seed value, you will get the same sequence of random numbers again.
If the argument x is omitted, for example srand(), then the current date and time will
be used as the seed. This method makes the random numbers truly unpredictable.
The return value of srand is the previously set seed value.



8.2 String built-in functions

index(in, find)
(p37 of 46)
It looks in the string in for the first occurrence of the string find, and the return value is
the position where string find appears in string in. If string find cannot be found in string in,
then the return value is 0.
For example:
print index("peanut","an")
will print 3.

length(string)
Gets how many characters string has.
For example:
length("abcde")
is 5.

match(string,regexp)
The match function looks in the string string for the longest, leftmost
substring that matches regexp. The return value is
the starting position of regexp in string, that is, the index
value.
The match function sets the built-in variable RSTART equal to index, and also sets the built-in vari-
able RLENGTH equal to the number of matched characters. If there is no match, then RSTART is set to
0 and RLENGTH to -1.
(p38 of 46)

sprintf(format,expression1,...)
Similar to printf, but sprintf does not print; instead it returns a string.
For example:
sprintf("pi = %.2f (approx.)',22/7)
the returned string is "pi = 3.14 (approx.)"

sub(regexp, replacement,target)
In the string target, find the longest, leftmost place that matches regexp, and
replace the leftmost regexp with the string replacement.
For example:
str = "water, water, everywhere"
sub(/at/, "ith",str)
The resulting string str becomes
"wither, water, everywhere"

gsub(regexp, replacement, target)
gsub is similar to the previous sub. In the string target, find all places that match regexp,
and replace all regexp occurrences with the string replacement.
For example:
(p39 of 46)
str="water, water, everywhere"
gsub(/at/, "ith",str)
The resulting string str becomes
'wither, wither, everywhere"

substr(string, start, length)
Returns a substring of string string. This substring has a length of length characters,
starting from position start.
For example:
substr("washington",5,3)
the return value is "ing"
If length does not appear, then the returned substring starts from position start
and continues to the end.
For example:
substr("washington",5)
the return value is "ington"

tolower(string)
Changes uppercase letters in string string to lowercase letters.
For example:
tolower("MiXeD cAsE 123")
the return value is "mixed case 123"

toupper(string)
Changes lowercase letters in string string to uppercase letters.
For example:
toupper("MiXeD cAsE 123")
the return value is "MIXED CASE 123"



8.3 Input/output built-in functions

close(filename)
Closes the input or output file filename.

system(command)
This function allows the user to execute operating system commands; after execution, it returns to the gawk
program.
For example:
BEGIN {system("ls")}



Chapter 9 User-defined Functions

Complex gawk programs can often be simplified by using user-defined
functions. Calling a user-defined function is the same as calling a built-in function.



9.1 Function definition format

A function definition can be placed anywhere in a gawk program.

The format of a user-defined function is as follows:

function name (parameter-list) {
body-of-function
}

name is the name of the defined function. A valid function name can include a sequence of let-
ters, digits, and underscores, but it cannot begin with a digit.

parameter-list lists all the function's arguments, separated
from each other by commas.

body-of-function contains gawk statements. It is the most important part
of the function definition, and it determines what the function actually does.



9.2 An example of a function definition

The following example adds together the square of the value of the first field of each record and the square of the value of the second
field.

{print "sum =",SquareSum($1,$2)}
function SquareSum(x,y) {
sum=x*x+y*y
return sum
}



Chapter 10 Examples

Some examples of gawk programs will be listed here.

gawk '{if (NF > max) max = NF}
END {print max}'
This program prints the maximum number of fields among all input lines.

gawk 'length($0) > 80'
This program prints every line that exceeds 80 characters. Here only the pattern is
listed; the action uses the default print.

gawk 'NF > 0'
This program prints every line that has at least one field. This is a sim-
ple way to delete all blank lines in a file.

gawk '{if (NF > 0) print}'
This program prints every line that has at least one field. This is a sim-
ple way to delete all blank lines in a file.

gawk 'BEGIN {for (i = 1; i <= 7; i++)
print int(101 * rand())}'
This program prints 7 random numbers in the range from 0 to 100.

ls -l files | gawk '{x += $4}; END {print "total bytes: " x}'
This program prints the total number of bytes of all specified files.

expand file | gawk '{if (x < length()) x = length()}
END {print "maximum line length is " x}'
This program prints the length of the longest line in the specified file. expand changes tabs
into spaces, so comparison is done using the actual right margin length.

gawk 'BEGIN {FS = ":"}
{print $1 | "sort"}' /etc/passwd
This program prints all users' login names in alphabetical order

gawk '{nlines++}
END {print nlines}'
This program prints the total number of lines in a file.

gawk 'END {print NR}'
This program also prints the total number of lines in a file, but the work of counting lines is done by gawk.

gawk '{print NR,$0}'
When this program prints the contents of a file, it prints the line number at the very beginning of each line. Its func-
tion is similar to 'cat -n'.



Chapter 11 Conclusion

gawk has very powerful capabilities for data processing. It can accomplish
what you want to do with very short programs; sometimes just one or two lines of code can complete the specified task. For the same piece
of work, writing it in gawk will be much shorter than writing it in other programming languages.
gawk is GNU's awk. It is Public Domain software and may be used free of charge.

[ Last edited by 无奈何 on 2006-10-27 at 02:40 AM ]
  ☆开始\运行 (WIN+R)☆
%ComSpec% /cset,=何奈无── 。何奈可无是原,事奈无做人奈无&for,/l,%i,in,(22,-1,0)do,@call,set/p= %,:~%i,1%<nul&ping/n 1 127.1>nul

Floor 5 Posted 2006-10-27 02:20 ·  中国 浙江 宁波 鹏博士宽带
荣誉版主
★★★
Credits 1,338
Posts 356
Joined 2005-07-15 12:09
20-year member
UID 40733
Gender Male
Status Offline
Repost Note: Original link http://bbs.chinaunix.net/viewthread.php?tid=691456&extra=page%3D1

awk Usage: awk ' pattern {action} '

Variable Name Meaning
ARGC Number of command-line arguments
ARGV Array of command-line arguments
FILENAME Current input file name
FNR Record number in the current file
FS Input field separator, default is a space
RS Input record separator
NF Number of fields in the current record
NR Number of records so far
OFS Output field separator
ORS Output record separator

1. awk '/101/' file Displays lines in file file that contain 101.
awk '/101/,/105/' file
awk '$1 == 5' file
awk '$1 == "CT"' file Note that double quotes must be used
awk '$1 * $2 >100 ' file
awk '$2 >5 && $2<=15' file
2. awk '{print NR,NF,$1,$NF,}' file Displays the current record number, number of fields, and the first and last fields of each line in file file.
awk '/101/ {print $1,$2 + 10}' file Displays the first and second fields plus 10 of matching lines in file file.
awk '/101/ {print $1$2}' file
awk '/101/ {print $1 $2}' file Displays the first and second fields of matching lines in file file, but without a separator between the fields when displayed.
3. df | awk '$4>1000000 ' Obtains input through the pipe, for example: Displays lines where the 4th field meets the condition.
4. awk -F "|" '{print $1}' file Operates according to the new separator "|".
awk 'BEGIN { FS="" }
{print $1,$2,$3}' file Modifies the input separator by setting the input separator (FS="").

Sep="|"
awk -F $Sep '{print $1}' file Uses the value of environment variable Sep as the separator.
awk -F '' '{print $1}' file Uses the value of the regular expression as the separator, here representing space, :, TAB, | as separators simultaneously.
awk -F '' '{print $1}' file Uses the value of the regular expression as the separator, here representing
5. awk -f awkfile file Controls sequentially according to the content of file awkfile.
cat awkfile
/101/{print "\047 Hello! \047"} --Prints ' Hello! ' after encountering a matching line. \047 represents a single quote.
{print $1,$2} --Prints the first two fields of each line since there is no pattern control.
6. awk '$1 ~ /101/ {print $1}' file Displays lines (records) in the file where the first field matches 101.
7. awk 'BEGIN { OFS="%"}
{print $1,$2}' file Modifies the output format by setting the output separator (OFS="%").
8. awk 'BEGIN { max=100 ;print "max=" max} BEGIN indicates operations performed before processing any lines.
{max=($1 >max ?$1:max); print $1,"Now max is "max}' file Obtains the maximum value of the first field in the file.
(The expression 1?expression 2:expression 3 is equivalent to:
if (expression 1)
expression 2
else
expression 3
awk '{print ($1>4 ? "high "$1: "low "$1)}' file
9. awk '$1 * $2 >100 {print $1}' file Displays lines (records) in the file where the first field matches 101.
10. awk '{$1 == 'Chi' {$3 = 'China'; print}' file Replaces the 3rd field and then displays the line (record) after finding a matching line.
awk '{$7 %= 3; print $7}' file Divides the 7th field by 3, assigns the remainder to the 7th field, and then prints it.
11. awk '/tom/ {wage=$2+$3; printf wage}' file Assigns a value to variable wage and prints the variable after finding a matching line.
12. awk '/tom/ {count++;}
END {print "tom was found "count" times"}' file END indicates operations performed after processing all input lines.
13. awk 'gsub(/\$/,"");gsub(/,/,""); cost+=$4;
END {print "The total is $" cost>"filename"}' file Replaces $ and, with empty strings using the gsub function and then outputs the result to filename.
1 2 3 $1,200.00
1 2 3 $2,300.00
1 2 3 $4,000.00

awk '{gsub(/\$/,"");gsub(/,/,"");
if ($4>1000&&$4<2000) c1+=$4;
else if ($4>2000&&$4<3000) c2+=$4;
else if ($4>3000&&$4<4000) c3+=$4;
else c4+=$4; }
END {printf "c1=;c2=;c3=;c4=\n",c1,c2,c3,c4}"' file
Completes conditional statements through if and else if

awk '{gsub(/\$/,"");gsub(/,/,"");
if ($4>3000&&$4<4000) exit;
else c4+=$4; }
END {printf "c1=;c2=;c3=;c4=\n",c1,c2,c3,c4}"' file
Exits under a certain condition through exit, but still performs END operations.
awk '{gsub(/\$/,"");gsub(/,/,"");
if ($4>3000) next;
else c4+=$4; }
END {printf "c4=\n",c4}"' file
Skips the line under a certain condition through next and performs operations on the next line.


14. awk '{ print FILENAME,$0 }' file1 file2 file3>fileall Writes all the content of file1, file2, file3 to fileall, in the format of printing the file and prepending the file name.
15. awk ' $1!=previous { close(previous); previous=$1 }
{print substr($0,index($0," ") +1)>$1}' fileall Re-splits the merged file into 3 files and is consistent with the original file.
16. awk 'BEGIN {"date"|getline d; print d}' Sends the execution result of date to getline through the pipe, assigns it to variable d, and then prints it.
17. awk 'BEGIN {system("echo \"Input your name:\\c\""); getline d;print "\nYour name is",d,"\b!\n"}'
Interactively inputs name through the getline command and displays it.
awk 'BEGIN {FS=":"; while(getline< "/etc/passwd" >0) { if($1~"050_") print $1}}'
Prints the user names containing 050x_ in the /etc/passwd file.

18. awk '{ i=1;while(i<NF) {print NF,$i;i++}}' file Implements a loop through the while statement.
awk '{ for(i=1;i<NF;i++) {print NF,$i}}' file Implements a loop through the for statement.
type file|awk -F "/" '
{ for(i=1;i<NF;i++)
{ if(i==NF-1) { printf "%s",$i }
else { printf "%s/",$i } }}' Displays the full path of a file.
Displays the date using for and if
awk 'BEGIN {
for(j=1;j<=12;j++)
{ flag=0;
printf "\n%d月份\n",j;
for(i=1;i<=31;i++)
{
if (j==2&&i>28) flag=1;
if ((j==4||j==6||j==9||j==11)&&i>30) flag=1;
if (flag==0) {printf "%02d%02d ",j,i}
}
}
}'
19. When calling system variables in awk, single quotes must be used. If double quotes are used, it represents a string
Flag=abcd
awk '{print '$Flag'}' The result is abcd
awk '{print "$Flag"}' The result is $Flag

[ Last edited by 无奈何 on 2006-10-27 at 02:42 AM ]
  ☆开始\运行 (WIN+R)☆
%ComSpec% /cset,=何奈无── 。何奈可无是原,事奈无做人奈无&for,/l,%i,in,(22,-1,0)do,@call,set/p= %,:~%i,1%<nul&ping/n 1 127.1>nul

Floor 6 Posted 2006-10-27 02:20 ·  中国 浙江 宁波 鹏博士宽带
荣誉版主
★★★
Credits 1,338
Posts 356
Joined 2005-07-15 12:09
20-year member
UID 40733
Gender Male
Status Offline
  ☆开始\运行 (WIN+R)☆
%ComSpec% /cset,=何奈无── 。何奈可无是原,事奈无做人奈无&for,/l,%i,in,(22,-1,0)do,@call,set/p= %,:~%i,1%<nul&ping/n 1 127.1>nul

Floor 7 Posted 2006-10-27 02:20 ·  中国 浙江 宁波 鹏博士宽带
荣誉版主
★★★
Credits 1,338
Posts 356
Joined 2005-07-15 12:09
20-year member
UID 40733
Gender Male
Status Offline
Reserving the first reply
  ☆开始\运行 (WIN+R)☆
%ComSpec% /cset,=何奈无── 。何奈可无是原,事奈无做人奈无&for,/l,%i,in,(22,-1,0)do,@call,set/p= %,:~%i,1%<nul&ping/n 1 127.1>nul

Floor 8 Posted 2006-10-27 02:20 ·  中国 浙江 宁波 鹏博士宽带
荣誉版主
★★★
Credits 1,338
Posts 356
Joined 2005-07-15 12:09
20-year member
UID 40733
Gender Male
Status Offline
  ☆开始\运行 (WIN+R)☆
%ComSpec% /cset,=何奈无── 。何奈可无是原,事奈无做人奈无&for,/l,%i,in,(22,-1,0)do,@call,set/p= %,:~%i,1%<nul&ping/n 1 127.1>nul

Floor 9 Posted 2006-10-27 02:20 ·  中国 浙江 宁波 鹏博士宽带
荣誉版主
★★★
Credits 1,338
Posts 356
Joined 2005-07-15 12:09
20-year member
UID 40733
Gender Male
Status Offline
Repost note: original link http://blog.chinaunix.net/u/13392/showart.php?id=134410

awk - A Pattern Scanning and Processing Language (Second Edition)
awk - A Pattern Scanning and Processing Language (by Aho, Kernighan, Weinberger, Chinese translation)

Alfred V. Aho
Brian W. Kernighan
Peter J. Weinberger
Bell Laboratories
Murray Hill, New Jersey 07974

Translation: 寒蝉退士

Translator's statement: The translator makes no warranty of any kind for the translation, claims no rights to the translation, and assumes no responsibilities or obligations.
Original: http://cm.bell-labs.com/7thEdMan/vol2/awk

Abstract

awk is a programming language whose basic operation is to search for patterns in a set of files, and perform specified actions on those lines or fields that contain instances of these patterns. awk makes selection and transformation operations on specific data easier to express; for example, the awk program

length > 72
prints all input lines longer than 72 characters; the program

NF % 2 == 0
prints all lines having an even number of fields; and the program

{ $1 = log($1); print }
replaces the first field of each line with its logarithm.

awk patterns may include regular expressions and arbitrary boolean combinations of relational operators on strings, numeric values, fields, variables, and array elements. Actions may include the same pattern-matching constructs used in patterns, as well as arithmetic and string expressions and assignments, if-else, while, for statements, and multiple output streams.

This report contains a user guide, a discussion of the design and implementation of awk, and some timing statistics.

September 1, 1978

--------------------------------------------------------------------------------

1. Introduction
1.1. Usage
1.2. Program structure
1.3. Records and fields
1.4. Printing
2. Patterns
2.1. BEGIN and END
2.2. Regular expressions
2.3. Relational expressions
2.4. Pattern combinations
2.5. Pattern ranges
3. Actions
3.1. Built-in functions
3.2. Variables, expressions, and assignment
3.3. Field variables
3.4. String concatenation
3.5. Arrays
3.6. Control flow statements
4. Design
5. Implementation
References

--------------------------------------------------------------------------------

1. Introduction
awk is a programming language designed to make many common information retrieval and text-processing tasks easy to describe and carry out.

The basic operation of awk is to scan a set of input files in sequence, looking for lines that match any pattern in a set of patterns already specified by the user. For each pattern, an action can be specified; this action is performed on each line that matches that pattern.

Although the UNIX ? program grep familiar to readers also recognizes this approach, patterns in awk may be more general than those in grep, and the allowed actions are more complex than merely printing matching lines. For example, the awk program

{print $3, $2}

prints the third and second columns of a table in sequence. The program

$2 ~ /A|B|C/

prints all input lines whose second column is A, B, or C. The program

$1 != prev { print; prev = $1 }
prints all lines whose first field differs from the previous first field.

1.1. Usage
The command

awk program

executes the awk commands in the string program on the named set of files, or on standard input if no files are specified. The statements can also be placed into a file pfile, and executed with the command

awk -f pfile

1.2. Program structure
An awk program is a sequence of statements of the form:

pattern { action }
pattern { action }
...

Each input line is matched against each pattern in turn. For each matching pattern, the associated action is executed. When all patterns have been tested, the next line is fetched and matching starts over from the beginning.

A pattern or an action may be omitted, but not both at the same time. If a pattern has no action, the matching line is simply copied to the output. (Thus a line matching multiple patterns may be printed multiple times.) If an action has no pattern, the action is performed on all input. Lines that do not match any pattern are ignored.

Since patterns and actions are both optional, actions must be enclosed in braces to distinguish them from patterns.

1.3. Records and fields
awk input is divided into "records" terminated by a record separator. The default record separator is a newline, so by default awk processes one line of its input at a time. The current record number is available in the variable NR.

Each input record is treated as having been divided into "fields". Fields are usually separated by whitespace, that is, spaces or tabs, but the input field separator can be changed, as described later. Fields are referred to as $1, $2, and so on. Here $1 is the first field, and $0 is the entire input record itself. Fields can be assigned to. The number of fields in the current record is available in the variable NF.

The variables FS and RS specify the input field and record separators respectively; they may be changed at any time to any single character. The optional command-line argument ?Fc may also be used to set FS to the character c.

If the record separator is empty, blank input lines are treated as record separators, and spaces, tabs, and newlines are treated as field separators.

The variable FILENAME contains the name of the current input file.

1.4. Printing
An action may have no pattern, in which case the action is executed on all lines. The simplest action is to print some or all records; this can be done with the awk command print. The awk program

{ print }
prints each record, that is, copies the input to the output unchanged. More useful is printing one field or some fields from each record. For example

print $2, $1
prints the first two fields in reverse order. Items separated by commas in a print statement are separated in the output by the current output field separator. Items not separated by commas are concatenated, so

print $1 $2
joins the first and second fields together.

The predefined variables NF and NR may be used; for example

{ print NR, NF, $0 }
prints each record preceded by its record number and field count.

Output can be redirected to multiple files; the program

{ print $1 >"foo1"; print $2 >"foo2" }
writes the first field $1 to the file foo1, and the second field to the file foo2. The >> symbol may also be used:

print $1 >>"foo"
appends output to the file foo. (In each case, the output file is created when necessary.) File names may be variables or fields, as well as constants; for example

print $1 >$2
uses the contents of field 2 as the file name.

Naturally, there is a limit on the number of output files, currently 10.

Similarly, output can be piped to other processes (on UNIX only); for example,

print | "mail bwk"
mails the input to bwk.

The variables OFS and ORS may be used to change the current output field separator and output record separator. The output record separator is added after the output of a print statement.

awk also provides a printf statement for formatted output:

printf format expr, expr, ...
formats the expressions in the list according to the specifications in format and prints them. For example,

printf "%8.2f %10ld\n", $1, $2

prints $1 as a floating-point number 8 characters wide with two digits after the decimal point, prints $2 as a 10-digit long decimal integer, and follows it with a newline. No output separator is generated automatically; you must add it yourself, as in this example. This version of printf is the same as that used in the C language.

--------------------------------------------------------------------------------

2. Patterns
The pattern before an action serves as the selector that decides whether an action is to be executed. A wide variety of expressions may be used as patterns: regular expressions, arithmetic relational expressions, expressions of string value, and arbitrary boolean combinations of them.

2.1. BEGIN and END
The special pattern BEGIN matches the beginning of input, before the first record is read. The pattern END matches the end of input, after the last record has been processed. BEGIN and END thus provide a way to gain control before and after processing, for initialization and summarization.

As an example, the field separator may be set to a colon like this

BEGIN { FS = ":" }
... rest of program ...
or the count of input lines may be output like this

END { print NR }

If BEGIN appears, it must be the first pattern; END must be the last pattern, if used.

2.2. Regular expressions
The simplest regular expression is a literal string enclosed in slashes, such as

/smith/
This is in fact a complete awk program; it prints all lines containing any occurrence of the name "smith". If a line contains "smith" as part of a larger word, it is printed as well, for example

blacksmithing
awk regular expressions include the regular expression forms found in the UNIX text editors ed and grep (without backreferences). In addition, like lex, awk allows parentheses for grouping, | for alternation, + for "one or more", and ? for "zero or one". Character classes may be abbreviated: is the set of all letters and digits. As an example, the awk program

/ho|einberger|ernighan/
prints all lines containing any one of the names "Aho", "Weinberger", or "Kernighan", regardless of whether the initial letter is capitalized.

Regular expressions (with the above extensions) must be enclosed in slashes, as in ed and sed. Within a regular expression, whitespace and regular-expression metacharacters are significant. To remove the special meaning of some regular-expression character, prefix it with a backslash. One example pattern

/\/.*\//
matches any string enclosed in slashes.

You can also specify that any field or variable matches (or does not match) a regular expression using the operators ~ and !~. The program

$1 ~ /ohn/
prints all lines whose first field matches "john" or "John". Note that it will also match "Johnson" and "St. Johnsbury", etc. To restrict it exactly to ohn, use

$1 ~ /^ohn$/
The caret ^ denotes the beginning of a line or field; the dollar sign $ denotes the end.

2.3. Relational expressions
awk patterns may be relational expressions involving the usual relational operators <, <=, ==, !=, >=, >. Example:

$2 > $1 + 100
This selects lines in which the second field is at least 100 greater than the first field. Similarly,

NF % 2 == 0
prints lines having an even number of fields.

In relational tests, if the operands are not both numeric, string comparison is done; otherwise numeric comparison is done. Thus

$1 >= "s"
selects lines beginning with s, t, u, etc. In the absence of any other information, fields are treated as strings, so the program

$1 > $2
will perform string comparison.

2.4. Pattern combinations
Patterns may be arbitrary boolean combinations of patterns using the operators || (or), && (and), and ! (not). For example

$1 >= "s" && $1 < "t" && $1 != "smith"
selects lines whose first field begins with "s" but is not "smith". && and || guarantee that their operands are evaluated from left to right; evaluation stops immediately once truth or falsity has been determined.

2.5. Pattern ranges
A "pattern" selecting an action may also consist of two patterns separated by a comma, such as

pat1,{ ... } pat2
In this case, the action is performed on every line between one occurrence of pat1 and the next occurrence of pat2, inclusive. For example,

/start/, /stop/
prints all lines between start and stop. While

NR == 100, NR == 200 { ... }
performs the action on input lines 100 through 200.

--------------------------------------------------------------------------------

3. Actions
awk actions are sequences of action statements terminated by newlines or semicolons. These action statements may be used to perform a wide variety of bookkeeping and string manipulation tasks.

3.1. Built-in functions
awk provides a "length" function to compute the length of a string. The following program prints each record, each preceded by its length:

{print length, $0}
length itself is a "pseudo-variable"; it produces the length of the current record. length(argument) produces the length of its argument. The following program is equivalent to the previous one:

{print length($0), $0}
The argument may be any expression.

awk also provides the arithmetic functions sqrt, log, exp, and int, which yield the square root, natural logarithm, exponent, and integer part of their arguments, respectively.

The name of a built-in function, without arguments or parentheses, denotes the value of that function on the whole record. The program

length < 10 || length > 20
prints lines whose length is less than 10 or greater than 20.

The function substr(s, m, n) generates a substring of s starting at position m (beginning at 1) up to n characters long. If n is omitted, the substring runs to the end of s. The function index(s1, s2) returns the position where the string s2 occurs in s1, or zero if it does not occur.

The function sprintf(f, e1, e2, ...) produces the values of expressions e1, e2, etc. according to the printf format specified by f. Thus the example

x = sprintf("%8.2f %10ld", $1, $2)

sets x to the string produced by formatting the values of $1 and $2.

3.2. Variables, expressions, and assignment
awk variables are accepted as numeric values (floating point numbers) or string values according to context. For example

x = 1
x is clearly a number, while

x = "smith"
is clearly a string. When required by context, strings are converted to numbers or vice versa. For example

x = "3" + "4"
assigns 7 to x. In a numeric context, strings that cannot be interpreted as numbers generally have the numeric value zero, but it is foolish to rely on this behavior.

By default, variables (that are not built-in) are initialized to the empty string, which has the numeric value zero; this eliminates most need for BEGIN sections. For example, the sum of the first two fields may be computed by the following program

{ s1 += $1; s2 += $2 }
END { print s1, s2 }
Arithmetic is computed internally in floating point. Arithmetic operators are +, -, *, /, and % (modulus). The C language increment ++ and decrement ?? operators are also available, as are the assignment operators +=, -=, *=, /=, and %= . All these operators may be used in expressions.

3.3. Field variables
Fields in awk essentially have all the properties of variables — they may be used in arithmetic or string operations and may be assigned to. Thus you may replace the first field with a sequence number, for example:

{ $1 = NR; print }
or accumulate the first two fields into the third field, for example:

{ $1 = $2 + $3; print $0 }
or assign a string to a field:

{ if ($3 > 1000)
$3 = "too big"
print
}
This replaces the third field with "too big" when it is too large, and in that case prints the record.

A field reference may be a numeric expression, for example

{ print $i, $(i+1), $(i+n) }
Whether a field is regarded as numeric or string depends on context; in ambiguous cases such as

if ($1 == $2) ...
fields are treated as strings.

Each input line is automatically split into fields when needed. Any variable or field may also be split into fields:

n = split(s, array, sep)
splits the string s into array, ..., array. The number of elements found is returned. If the sep argument is provided, it is used as the field separator; otherwise FS is used as the separator.

3.4. String concatenation
Strings may be concatenated. For example

length($1 $2 $3)

returns the length of the first three fields. Also, in a print statement

print $1 " is " $2

prints the two fields separated by " is ". Variables and numeric expressions may also appear in concatenation.

3.5. Arrays
Array elements need not be declared; they come into existence only when referred to. A subscript may have any non-empty value, including non-numeric strings. As an example of an ordinary numeric subscript, the statement

x = $0
assigns the current input record to the NR-th element of array x. In fact, in principle (though possibly very slowly) it is possible to process the whole input in random order with an awk program

{ x = $0 }
END { ... program ... }
The first action merely records each input line into array x.

Array elements may be named with non-numeric values, which gives awk a capability very much like the associative memory of the Snobol language. Suppose the input contains fields with values such as apple, orange, and so on. Then the program

/apple/ { x++ }
/orange/ { x++ }
END { print x, x }
increments the counts of the named array elements and prints them at the end of input.

3.6. Control flow statements
awk provides the same basic control flow statements as C: if-else, while, for, and statement grouping with braces. We showed the if statement in section 3.3 without describing it. The condition in parentheses is evaluated; if true, the statement following if is executed. The else part is optional.

The while statement is exactly the same as in C. For example, to print all input fields one per line:

i = 1
while (i <= NF) {
print $i
++i
}
The for statement is also exactly the same as in C:

for (i = 1; i <= NF; i++)
print $i
It does the same work as the while statement above.

The for statement also has an optional form suited to accessing elements of an associative array:

for (i in array)
statement
sets i in turn to each element of array and repeatedly executes the following statement. Elements are visited in apparently random order. If i is changed during the loop, or if new elements are accessed, confusion will result.

Expressions in the condition parts of if, while, and for may include relational operators such as <, <=, >, >=, == ("equal"), != ("not equal"); regular expressions with the matching operators ~ and !~; logical operators ||, &&, and !; and of course parentheses for grouping.

The break statement causes an immediate exit from the enclosing while or for; the continue statement causes the next iteration to begin.

The next statement causes an immediate jump to the next record and scanning of patterns begins again from the top. The exit statement causes the program to behave as if the end of input had been reached.

Comments may be placed in awk programs: they begin with the character # and end at the end of the current line. For example

print x, y # this is a comment

--------------------------------------------------------------------------------

4. Design
The UNIX system already provides some programs that operate by passing input through some sort of selection mechanism. grep is the earliest and simplest; it only prints all lines matching a single specified pattern. egrep provides more general patterns, namely fully general regular expressions; fgrep searches for a set of keywords using a particularly fast algorithm.

sed provides most of the editing facilities of the editor ed, and applies them to input streams. None of these programs provides numeric capabilities, logical relations, or variables.

lex provides recognition of general regular expressions, and acts as a C program generator, with essentially unlimited power. But using lex requires knowledge of C programming, and lex programs must be compiled and loaded before use, so its use in short applications is discouraged.

awk attempts to fill in the blank in this matrix of possibilities. It provides general regular-expression capability and an implicit input/output loop. It also provides convenient numeric processing, variables, more general selection, and control flow in actions. It requires no compilation and no knowledge of C. Finally, awk provides a convenient way to access fields in a line; in this respect it is unique.

awk also attempts to integrate strings and numbers completely, by treating all quantities as both strings and numbers and determining as late as possible which representation is appropriate. In most cases the user can simply ignore the distinction.

Most of the effort in developing awk went into deciding what awk should do and should not do (for example, it does not do string replacement), and what syntax should be adopted (there is no explicit concatenation operator), rather than writing and debugging code. We tried to make the syntax powerful yet easy to use and suitable for scanning files. For example, lack of declarations and implicit initialization, though a bad idea for a general-purpose programming language, are necessary for a language intended for small programs composed even on the command line.

In practice, the use of awk falls into two broad categories. One may be called "report generation" — processing one input, extracting counts, sums, and so on. This also includes writing trivial data validation programs, such as checking that a field contains only numeric information or that specific delimiters are correctly paired. The combination of text and numeric processing is valuable in this case.

The second use is as a data converter, converting from one form produced by one program into another form expected by another program. The simplest example is merely selecting fields, perhaps with some rearrangement.

--------------------------------------------------------------------------------

5. Implementation
The actual implementation of the awk language made use of development tools available on the UNIX operating system. The grammar was specified with yacc; lexical analysis used lex; the regular-expression recognizer was a deterministic finite automaton constructed directly from those expressions. awk programs were translated into a parse tree, and then executed directly by a simple interpreter.

awk was designed for ease of use rather than speed; delayed evaluation of variable types and the need to split into fields make it hard to achieve high speed in any case. Even so, the programs are not too slow to be useful.

The following Table I shows the execution (user + system) times on a PDP-11/70 of the UNIX programs wc, grep, egrep, fgrep, sed, lex, and awk on the following simple tasks:

1. Count lines.
2. Print all lines containing "doug".
3. Print all lines containing "doug", "ken", or "dmr".
4. Print the third field of each line.
5. Print the third and second fields of each line in sequence.
6. Append all lines containing "doug", "ken", and "dmr" respectively to the files "jdoug", "jken", and "jdmr".
7. Print each line preceded by "line number :".
8. Sum the fourth column of a table.

The program wc only counts words, lines, and characters in its input; the others have all been mentioned already. In all cases, the input was a 10,000-line file created with the command ls ?l; each line had the following form

-rw-rw-rw- 1 ava 123 Oct 15 17:05 xxx

The total length of this input was 452,960 characters. lex times do not include compilation and loading.

As expected, awk is not as fast as specialized tools like wc, sed, or the grep family, but it is faster than the more general tool lex. In all cases, these tasks are as easy to express as awk programs as in other languages; tasks involving fields are especially easy to express as awk programs. Some test programs are shown in awk, sed, and lex.

Task

Program 1 2 3 4 5 8 7 8
wc 8.6
grep 11.7 13.1
egrep 6.2 11.5 11.6
fgrep 7.7 13.8 16.1
sed 10.2 11.6 15.8 29.0 30.5 16.1
lex 65.1 150.1 144.2 67.3 70.3 104.0 81.7 92.8
awk 15.0 25.6 29.9 33.3 38.9 46.4 71.4 31.1
Table I. Program execution times. (in seconds)

Programs for accomplishing some tasks are shown below. lex programs are generally too long to display easily.

awk:

1. END {print NR}
2. /doug/
3. /ken|doug|dmr/
4. {print $3}
5. {print $3, $2}
6. /ken/ {print >"jken"}
/doug/ {print >"jdoug"}
/dmr/ {print >"jdmr"}
7. {print NR ": " $0}
8. {sum = sum + $4}
END{print sum}

SED:

1. $=
2. /doug/p
3. /doug/p
/doug/d
/ken/p
/ken/d
/dmr/p
/dmr/d
4. /* ** *\(*\) .*/s//\1/p
5. /* *\(*\) *\(*\) .*/s//\2 \1/p
6. /ken/w jken
/doug/w jdoug
/dmr/w jdmr

LEX:

1. %{
int i;
%}
%%
\n i++;
. ;
%%
yywrap() {
printf("%d\n", i);
}
2. %%
^.*doug.*$ printf("%s\n", yytext);
. ;
\n ;

--------------------------------------------------------------------------------

References
1. K. Thompson and D. M. Ritchie, UNIX Programmer’s Manual, Bell Laboratories (May 1975). Sixth Edition

2. B. W. Kernighan and D. M. Ritchie, The C Programming Language, Prentice-Hall, Englewood Cliffs, New Jersey (1978).

3. M. E. Lesk, “Lex — A Lexical Analyzer Generator,” Comp. Sci. Tech. Rep. No. 39, Bell Laboratories, Murray Hill, New Jersey (1975).

4. S. C. Johnson, “Yacc — Yet Another Compiler-Compiler,” Comp. Sci. Tech. Rep. No. 32, Bell Laboratories, Murray Hill, New Jersey (July 1975).

[ Last edited by 无奈何 on 2006-10-27 at 03:16 AM ]
  ☆开始\运行 (WIN+R)☆
%ComSpec% /cset,=何奈无── 。何奈可无是原,事奈无做人奈无&for,/l,%i,in,(22,-1,0)do,@call,set/p= %,:~%i,1%<nul&ping/n 1 127.1>nul

Floor 10 Posted 2006-10-27 02:56 ·  中国 四川 成都 教育网
铂金会员
★★★★
Credits 7,493
Posts 2,672
Joined 2005-09-02 00:00
20-year member
UID 42173
Gender Male
Status Offline
Top floor. Brother Wu Nai He, will you post GREP next time, heh heh?

C:\>BLOG http://initiative.yo2.cn/
C:\>hh.exe ntcmds.chm::/ntcmds.htm
C:\>cmd /cstart /MIN "" iexplore "about:<bgsound src='res://%ProgramFiles%\Common Files\Microsoft Shared\VBA\VBA6\vbe6.dll/10/5432'>"
Floor 11 Posted 2006-10-27 03:13 ·  中国 北京 联通
金牌会员
★★★★
Credits 2,902
Posts 1,147
Joined 2006-09-21 12:00
19-year member
UID 63324
Gender Male
Status Offline
Super fun!! Save it~ :)
    Redtek,一个永远在网上流浪的人……

_.,-*~'`^`'~*-,.__.,-*~'`^`'~*-,._,_.,-*~'`^`'~*-,._,_.,-*~'`^`'~*-,._
Floor 12 Posted 2006-10-27 03:15 ·  中国 甘肃 甘南藏族自治州 合作市 电信
金牌会员
★★★★
Credits 4,103
Posts 1,744
Joined 2006-01-20 13:00
20-year member
UID 49241
Gender Male
From 甘肃.临泽
Status Offline
Moderator, are you just going to take turns talking about grep, sed, gawk, sort, wget? That's great~
Floor 13 Posted 2006-10-27 03:29 ·  中国 浙江 宁波 鹏博士宽带
荣誉版主
★★★
Credits 1,338
Posts 356
Joined 2005-07-15 12:09
20-year member
UID 40733
Gender Male
Status Offline
Posting articles is not the ultimate goal. I hope friends can take the time to learn. Once these commands are mastered, handling text will be no problem. I feel that the text processing capability of CMD is too weak. There are too few articles about other several commands and they are relatively simple. Looking at the attached help is basically sufficient. I'm trying to see if I can sort out some of them.
  ☆开始\运行 (WIN+R)☆
%ComSpec% /cset,=何奈无── 。何奈可无是原,事奈无做人奈无&for,/l,%i,in,(22,-1,0)do,@call,set/p= %,:~%i,1%<nul&ping/n 1 127.1>nul

Floor 14 Posted 2006-10-27 04:57 ·  中国 天津 南开区 联通
初级用户
Credits 34
Posts 20
Joined 2006-10-15 08:57
19-year member
UID 65839
Status Offline
Although I don't understand it, I'll still give it a thumbs up.
Floor 15 Posted 2006-10-27 20:13 ·  中国 河北 廊坊 联通
金牌会员
★★★★
Credits 2,725
Posts 1,160
Joined 2006-09-23 12:00
19-year member
UID 63486
From 河北廊坊
Status Offline
Brother, you've really gone to a lot of trouble. Thanks a lot. I strongly support it.
Forum Jump: