Class 15: AWK internal variables

December 12, 2019

Arithmetic

+ addition, sum
- subtraction, difference
* multiplication, product
/ division, quotient
% reminder, modulo
^ exponentiation, power

Example

What does this command do?

seq 100 | awk '$1 % 7 == 0 {print $1}'

Example

What does this command do?

seq 100 | awk '($1 % 3 == 0) && ($1 % 2 ==0) {print $1}'

Assignments

The symbol = is used to assign a new value to a variable

variable = value

Variables are created automatically when you assign them

Variables can be numeric or text

Automatic variables

Besides the variables we create with assignments, we have some pre-definded variables

$1, $2, and so are the fields of each record
$0: complete input record
NF: Number of Fields in the current record
NR: Number of the current Record
FILENAME: name of the current file being processed
FNR: Number of the current Record in the current file

Exercise

Write an awk command that prints

Name of the current file
Global Number of Record
File Number of Record
the record itself

Apply this to all .txt files in your folder

Conditions

comparisons
- ==, !=, <, >, <=, >=
regular expressions
- matching anywhere
- matching a single column
BEGIN, END
combinations using &&, || and !

`BEGIN` and `END`

These special conditions are only true once in every program

BEGIN is true before reading any record
- We can define initial values for some variables
END is true after reading all records
- We can print the variables that we computed while reading the records

Example

ls -l | awk '$6 == "Nov" { sum += $5 }
             END { print sum }'

Variables that control awk behavior

FS: Field Separator. Regex that separates fields
RS: Record Separator. Regex that separates fields
OFS: Output Field Separator. Text to separate printed fields
ORS: Output Record Separator. Text to separate printed records

Default values

Notice that Output separators are text, but Input separators are regular expressions

By default,

ORS="\n", that is, records are separated by new line
OFS=" ", that is, fields are separated by space

Writing special characters

Here we find something new

Most characters are easy to write. Just use the keyboard

Some are harder, because they have other meanings, or they are not in the keyboard

To write them, we use \ followed by a letter

There are several cases, such as tab, new line, beep, backspace

(if you are curious, look at “line endings in UNIX v/s Windows”)

Special characters

The most common special characters are written as follows

name	how to write it
TAB	`\t`
new line	`\n`
backslash	`\\`

More default values

Notice that Input separators are regular expressions

By default,

RS="\n", that is, records are separated by new line
FS=/[ \t]+/, that is, fields are separated by whitespace. One or more space or tab

Useful application

We can change the input’s field separator to process different kinds of files

For example, the file /home/andres/population_total.csv contains values separated by comma

geo,1800,1801,1802,1803,1804,1805,1806,1807,1808,1809,1810,1811,1812,1813,
Afghanistan,3280000,3280000,3280000,3280000,3280000,3280000,3280000,3280000,
Albania,410000,412000,413000,414000,416000,417000,418000,420000,421000,422000,

Here fields are separated by ,

Application

Let’s print the country name (field 1) that had over 100 million people in 1800 (field 2)

awk 'BEGIN {FS=","} $2>100e6 {print $1}' population_total.csv

There is a shortcut

Changing FS is so useful that there is a shortcut for it

awk has the -F option for it. Upper case F

awk -F ":" '$2>100e6 {print $1}' population_total.csv

Arithmetic

Example

Example

Assignments

Automatic variables

Exercise

Conditions

BEGIN and END

Example

Variables that control awk behavior

Default values

Writing special characters

Special characters

More default values

Useful application

Application

There is a shortcut

`BEGIN` and `END`