awk(1) awk(1)
NAME [Toc] [Back]
awk - pattern-directed scanning and processing language
SYNOPSIS [Toc] [Back]
awk [-Ffs] [-v var=value] [program | -f progfile ...] [file ...]
DESCRIPTION [Toc] [Back]
awk scans each input file for lines that match any of a set of
patterns specified literally in program or in one or more files
specified as -f progfile. With each pattern there can be an
associated action that is to be performed when a line in a file
matches the pattern. Each line is matched against the pattern portion
of every pattern-action statement, and the associated action is
performed for each matched pattern. The file name - means the
standard input. Any file of the form var=value is treated as an
assignment, not a filename. An assignment is evaluated at the time it
would have been opened if it were a filename, unless the -v option is
used.
An input line is made up of fields separated by white space, or by
regular expression FS. The fields are denoted $1, $2, ...; $0 refers
to the entire line.
Options [Toc] [Back]
awk recognizes the following options and arguments:
-F fs Specify regular expression used to separate
fields. The default is to recognize space and tab
characters, and to discard leading spaces and
tabs. If the -F option is used, leading input
field separators are no longer discarded.
-f progfile Specify an awk program file. Up to 100 program
files can be specified. The pattern-action
statements in these files are executed in the same
order as the files were specified.
-v var=value Cause var=value assignment to occur before the
BEGIN action (if it exists) is executed.
Statements [Toc] [Back]
A pattern-action statement has the form:
pattern { action }
A missing { action } means print the line; a missing pattern always
matches. Pattern-action statements are separated by new-lines or
semicolons.
An action is a sequence of statements. A statement can be one of the
following:
Hewlett-Packard Company - 1 - HP-UX 11i Version 2: August 2003
awk(1) awk(1)
if(expression) statement [ else statement ]
while(expression) statement
for(expression;expression;expression) statement
for(var in array) statement
do statement while(expression)
break
continue
{[statement ...]}
expression # commonly var = expression
print [expression-list] [ > expression]
printf format [, expression-list] [ > expression]
return [expression]
next # skip remaining patterns on this input line.
delete array [expression] # delete an array element.
exit [expression] # exit immediately; status is expression.
Statements are terminated by semicolons, newlines or right braces. An
empty expression-list stands for $0. String constants are quoted
(""), with the usual C escapes recognized within. Expressions take on
string or numeric values as appropriate, and are built using the
operators +, -, *, /, %, ^ (exponentiation), and concatenation
(indicated by a blank). The operators ++, --, +=, -=, *=, /=, %=, ^=,
**=, >, >=, <, <=, ==, !=, "" (double quotes, string conversion
operator), and ?: are also available in expressions. Variables can be
scalars, array elements (denoted x[i]) or fields. Variables are
initialized to the null string. Array subscripts can be any string,
not necessarily numeric (this allows for a form of associative
memory). Multiple subscripts such as [i,j,k] are permitted. The
constituents are concatenated, separated by the value of SUBSEP.
The print statement prints its arguments on the standard output (or on
a file if >file or >>file is present or on a pipe if |cmd is present),
separated by the current output field separator, and terminated by the
output record separator. file and cmd can be literal names or
parenthesized expressions. Identical string values in different
statements denote the same open file. The printf statement formats
its expression list according to the format (see printf(3)).
Built-In Functions [Toc] [Back]
The built-in function close(expr) closes the file or pipe expr opened
by a print or printf statement or a call to getline with the same
string-valued expr. This function returns zero if successful,
otherwise, it returns non-zero.
The customary functions exp, log, sqrt, sin, cos, atan2 are built in.
Other built-in functions are:
blength[([s])] Length of its associated argument (in bytes)
taken as a string, or of $0 if no argument.
Hewlett-Packard Company - 2 - HP-UX 11i Version 2: August 2003
awk(1) awk(1)
length[([s])] Length of its associated argument (in characters)
taken as a string, or of $0 if no argument.
rand() Returns a random number between zero and one.
srand([expr]) Sets the seed value for rand, and returns the
previous seed value. If no argument is given,
the time of day is used as the seed value;
otherwise, expr is used.
int(x) Truncates to an integer value
substr(s, m [, n])
Return the at most n-character substring of s
that begins at position m, numbering from 1. If
n is omitted, the substring is limited by the
length of string s.
index(s, t) Return the position, in characters, numbering
from 1, in string s where string t first occurs,
or zero if it does not occur at all.
match(s, ere) Return the position, in characters, numbering
from 1, in string s where the extended regular
expression ere occurs, or 0 if it does not. The
variables RSTART and RLENGTH are set to the
position and length of the matched string.
split(s, a[, fs]) Splits the string s into array elements a[1],
a[2], ..., a[n], and returns n. The separation
is done with the regular expression fs, or with
the field separator FS if fs is not given.
sub(ere, repl [, in])
Substitutes repl for the first occurrence of the
extended regular expression ere in the string in.
If in is not given, $0 is used.
gsub Same as sub except that all occurrences of the
regular expression are replaced; sub and gsub
return the number of replacements.
sprintf(fmt, expr, ...)
String resulting from formatting expr ...
according to the printf(3S) format fmt
system(cmd) Executes cmd and returns its exit status
toupper(s) Converts the argument string s to uppercase and
returns the result.
Hewlett-Packard Company - 3 - HP-UX 11i Version 2: August 2003
awk(1) awk(1)
tolower(s) Converts the argument string s to lowercase and
returns the result.
The built-in function getline sets $0 to the next input record from
the current input file; getline < file sets $0 to the next record from
file. getline x sets variable x instead. Finally, cmd | getline
pipes the output of cmd into getline; each call of getline returns the
next line of output from cmd. In all cases, getline returns 1 for a
successful input, 0 for end of file, and -1 for an error.
Patterns [Toc] [Back]
Patterns are arbitrary Boolean combinations (with ! || &&) of regular
expressions and relational expressions. awk supports Extended Regular
Expressions as described in regexp(5). Isolated regular expressions
in a pattern apply to the entire line. Regular expressions can also
occur in relational expressions, using the operators ~ and !~. /re/
is a constant regular expression; any string (constant or variable)
can be used as a regular expression, except in the position of an
isolated regular expression in a pattern.
A pattern can consist of two patterns separated by a comma; in this
case, the action is performed for all lines from an occurrence of the
first pattern though an occurrence of the second.
A relational expression is one of the following:
expression matchop regular-expression
expression relop expression
expression in array-name
(expr,expr,...) in array-name
where a relop is any of the six relational operators in C, and a
matchop is either ~ (matches) or !~ (does not match). A conditional
is an arithmetic expression, a relational expression, or a Boolean
combination of the two.
The special patterns BEGIN and END can be used to capture control
before the first input line is read and after the last. BEGIN and END
do not combine with other patterns.
Special Characters [Toc] [Back]
The following special escape sequences are recognized by awk in both
regular expressions and strings:
Escape Meaning
\a alert character
\b backspace character
\f form-feed character
\n new-line character
\r carriage-return character
Hewlett-Packard Company - 4 - HP-UX 11i Version 2: August 2003
awk(1) awk(1)
\t tab character
\v vertical-tab character
\nnn 1- to 3-digit octal value nnn
\xhhh 1- to n-digit hexadecimal number
Variable Names [Toc] [Back]
Variable names with special meanings are:
FS Input field separator regular expression; a
space character by default; also settable by
option -Ffs.
NF The number of fields in the current record.
NR The ordinal number of the current record from
the start of input. Inside a BEGIN action the
value is zero. Inside an END action the value
is the number of the last record processed.
FNR The ordinal number of the current record in the
current file. Inside a BEGIN action the value
is zero. Inside an END action the value is the
number of the last record processed in the last
file processed.
FILENAME A pathname of the current input file.
RS The input record separator; a newline character
by default.
OFS The print statement output field separator; a
space character by default.
ORS The print statement output record separator; a
newline character by default.
OFMT Output format for numbers (default %.6g). If
the value of OFMT is not a floating-point
format specification, the results are
unspecified.
CONVFMT Internal conversion format for numbers (default
%.6g). If the value of CONVFMT is not a
floating-point format specification, the
results are unspecified.
Refer to the UNIX95 variable under EXTERNAL
INFLUENCES for additional information on
CONVFMT.
Hewlett-Packard Company - 5 - HP-UX 11i Version 2: August 2003
awk(1) awk(1)
SUBSEP The subscript separator string for multidimensional
arrays; the default value is "\034"
ARGC The number of elements in the ARGV array.
ARGV An array of command line arguments, excluding
options and the program argument numbered from
zero to ARGC-1.
The arguments in ARGV can be modified or added
to; ARGC can be altered. As each input file
ends, awk will treat the next non-null element
of ARGV, up to the current value of ARGC-1,
inclusive, as the name of the next input file.
Thus, setting an element of ARGV to null means
that it will not be treated as an input file.
The name - indicates the standard input. If an
argument matches the format of an assignment
operand, this argument will be treated as an
assignment rather than a file argument.
ENVIRON Array of environment variables; subscripts are
names. For example, if environment variable
V=thing, ENVIRON["V"] produces thing.
RSTART The starting position of the string matched by
the match function, numbering from 1. This is
always equivalent to the return value of the
match function.
RLENGTH The length of the string matched by the match
function.
Functions can be defined (at the position of a pattern-action
statement) as follows:
function foo(a, b, c) { ...; return x }
Parameters are passed by value if scalar, and by reference if array
name. Functions can be called recursively. Parameters are local to
the function; all other variables are global.
Note that if pattern-action statements are used in an HP-UX command
line as an argument to the awk command, the pattern-action statement
must be enclosed in single quotes to protect it from the shell. For
example, to print lines longer than 72 characters, the pattern-action
statement as used in a script (-f progfile command form) is:
length > 72
Hewlett-Packard Company - 6 - HP-UX 11i Version 2: August 2003
awk(1) awk(1)
The same pattern action statement used as an argument to the awk
command is quoted in this manner:
awk 'length > 72'
EXTERNAL INFLUENCES [Toc] [Back]
Environment Variables
UNIX95 If defined, specifies to use the XPG4 behavior for this
command. The changes for XPG4 include support for the
entire behaviour specified above and include the
following behavioral change:
+ If CONVFMT is not specified and UNIX95 is set, %d is
used as the internal conversion format for numbers by
default.
LANG Provides a default value for the internationalization
variables that are unset or null. If LANG is unset or
null, the default value of "C" (see lang(5)) is used.
If any of the internationalization variables contains
an invalid setting, awk will behave as if all
internationalization variables are set to "C". See
environ(5).
LC_ALL If set to a non-empty string value, overrides the
values of all the other internationalization variables.
LC_CTYPE Determines the interpretation of text as single and/or
multi-byte characters, the classification of characters
as printable, and the characters matched by character
class expressions in regular expressions.
LC_NUMERIC Determines the radix character used when interpreting
numeric input, performing conversion between numeric
and string values and formatting numeric output.
Regardless of locale, the period character (the
decimal-point character of the POSIX locale) is the
decimal-point character recognized in processing awk
programs (including assignments in command-line
arguments).
LC_COLLATE Determines the locale for the behavior of ranges,
equivalence classes and multi-character collating
elements within regular expressions.
LC_MESSAGES Determines the locale that should be used to affect the
format and contents of diagnostic messages written to
standard error and informative messages written to
standard output.
Hewlett-Packard Company - 7 - HP-UX 11i Version 2: August 2003
awk(1) awk(1)
NLSPATH Determines the location of message catalogues for the
processing of LC_MESSAGES.
PATH Determines the search path when looking for commands
executed by system(cmd), or input and output pipes.
In addition, all environment variables will be visible via the awk
variable ENVIRON.
International Code Set Support [Toc] [Back]
Single- and multi-byte character code sets are supported except that
variable names must contain only ASCII characters and regular
expressions must contain only valid characters.
DIAGNOSTICS [Toc] [Back]
awk supports up to 199 fields ($1, $2, ..., $199) per record.
EXAMPLES [Toc] [Back]
Print lines longer than 72 characters:
length > 72
Print first two fields in opposite order:
{ print $2, $1 }
Same, with input fields separated by comma and/or blanks and tabs:
BEGIN { FS = ",[ \t]*|[ \t]+" }
{ print $2, $1 }
Add up first column, print sum and average:
{ s += $1 }"
END { print "sum is", s, " average is", s/NR }
Print all lines between start/stop pairs:
/start/, /stop/
Simulate echo command (see echo(1)):
BEGIN { # Simulate echo(1)
for (i = 1; i < ARGC; i++) printf "%s ", ARGV[i]
printf "\n"
exit }
AUTHOR [Toc] [Back]
awk was developed by AT&T, IBM, OSF, and HP.
Hewlett-Packard Company - 8 - HP-UX 11i Version 2: August 2003
awk(1) awk(1)
SEE ALSO [Toc] [Back]
lex(1), sed(1).
A. V. Aho, B. W. Kernighan, P. J. Weinberger: The AWK Programming
Language, Addison-Wesley, 1988.
STANDARDS CONFORMANCE [Toc] [Back]
awk: SVID2, SVID3, XPG2, XPG3, XPG4, POSIX.2
Hewlett-Packard Company - 9 - HP-UX 11i Version 2: August 2003 [ Back ] |