Edit | Discuss | History

AWK

Appeared in:

1977

Influenced by:

C
Unix shell

Influenced:

Paradigm:

Typing discipline:

File extensions:

.awk

Versions and implementations (Collapse all | Expand all):

Programming language

AWK (name derived from surnames of the authors) is an interpreted scripting programming language designed for processing text data.

AWK was created in 1977 in Bell Labs by Alfred Aho, Peter Weinberger and Brian Kernighan. This version of the language was included in Unix V7 (1979). In 1988 a book called “The AWK Programming Language” was published, describing a new dialect of the language, included in Unix SysV. The old dialect was incompatible with the new one, so to avoid confusion they are referred to as oawk (old awk) and nawk (new awk), respectively. The latter was released under a free software license in 1996 and is maintained by Kernighan. The language is standardized in IEEE Std 1003.1-2004.

Nowadays AWK is a mandatory element of all Unix-like systems; along with Unix shell it is included in all standard Unix environments. AWK implementations exist for all platforms.

AWK program takes a stream of text data (from file or console) as input and processes it record by record. The program is a series of rules in form pattern {action}; here pattern is an expression (typically a regular expression), and action is a series of commands. Besides, a program can contain user-defined functions.

When input stream is processed, each line of input data is matched against each pattern of the program, and the actions of matching patterns are executed. The patterns can be given as:

a single expression: the action is executed for lines which make the expression evaluate true.
a pair of expressions: the action is executed for all lines starting with a line for which the first expression is true, and ending with a line for which the second expression is true.
two special patterns BEGIN and END give the actions to be executed before and after processing the input, respectively.

AWK allows to process each line as a string (stored in variable $0) or as a record with a set of fields (stored in $0, $1, ...). There are other built-in variables in the language: the number of lines read so far NR, the number of fields in the current record NF etc. System variables allow to tune the data processing mode, for example, to set record and field separators (default values are line feed and a space).

AWK is a contextually typed language: all primitive data units are stored as strings, though they can be interpreted as numbers depending on the context of their usage (for example, in arithmetic expressions). The main data structure of the language are associative arrays (indexed by strings).

Language shortcomings include:

lack of captures in regular expressions: language standard disallows to identify substrings of the string which match the regular expression. gawk fixes this.
non-repeatability: it’s impossible to apply the same rule to a string more than once without explicitly programming it. Note that sed, one of the prototypes of AWK, doesn’t have this limitation.
impossibility to evaluate a string as part of the program.
a confusing method of defining local variables for a user-defined function.