December 12, 2019

Arithmetic

  • + addition, sum
  • - subtraction, difference
  • * multiplication, product
  • / division, quotient
  • % reminder, modulo
  • ^ exponentiation, power

Example

What does this command do?

seq 100 | awk '$1 % 7 == 0 {print $1}'

Example

What does this command do?

seq 100 | awk '($1 % 3 == 0) && ($1 % 2 ==0) {print $1}'

Assignments

The symbol = is used to assign a new value to a variable

variable = value

Variables are created automatically when you assign them

Variables can be numeric or text

Automatic variables

Besides the variables we create with assignments, we have some pre-definded variables

  • $1, $2, and so are the fields of each record
  • $0: complete input record
  • NF: Number of Fields in the current record
  • NR: Number of the current Record
  • FILENAME: name of the current file being processed
  • FNR: Number of the current Record in the current file

Exercise

Write an awk command that prints

  • Name of the current file
  • Global Number of Record
  • File Number of Record
  • the record itself

Apply this to all .txt files in your folder

Conditions

  • comparisons
    • ==, !=, <, >, <=, >=
  • regular expressions
    • matching anywhere
    • matching a single column
  • BEGIN, END
  • combinations using &&, || and !

BEGIN and END

These special conditions are only true once in every program

  • BEGIN is true before reading any record
    • We can define initial values for some variables
  • END is true after reading all records
    • We can print the variables that we computed while reading the records

Example

ls -l | awk '$6 == "Nov" { sum += $5 }
             END { print sum }'

Variables that control awk behavior

  • FS: Field Separator. Regex that separates fields
  • RS: Record Separator. Regex that separates fields
  • OFS: Output Field Separator. Text to separate printed fields
  • ORS: Output Record Separator. Text to separate printed records

Default values

Notice that Output separators are text, but Input separators are regular expressions

By default,

  • ORS="\n", that is, records are separated by new line
  • OFS=" ", that is, fields are separated by space

Writing special characters

Here we find something new

Most characters are easy to write. Just use the keyboard

Some are harder, because they have other meanings, or they are not in the keyboard

To write them, we use \ followed by a letter

There are several cases, such as tab, new line, beep, backspace

(if you are curious, look at “line endings in UNIX v/s Windows”)

Special characters

The most common special characters are written as follows

name how to write it
TAB \t
new line \n
backslash \\

More default values

Notice that Input separators are regular expressions

By default,

  • RS="\n", that is, records are separated by new line
  • FS=/[ \t]+/, that is, fields are separated by whitespace. One or more space or tab

Useful application

We can change the input’s field separator to process different kinds of files

For example, the file /home/andres/population_total.csv contains values separated by comma

geo,1800,1801,1802,1803,1804,1805,1806,1807,1808,1809,1810,1811,1812,1813,
Afghanistan,3280000,3280000,3280000,3280000,3280000,3280000,3280000,3280000,
Albania,410000,412000,413000,414000,416000,417000,418000,420000,421000,422000,

Here fields are separated by ,

Application

Let’s print the country name (field 1) that had over 100 million people in 1800 (field 2)

awk 'BEGIN {FS=","} $2>100e6 {print $1}' population_total.csv

There is a shortcut

Changing FS is so useful that there is a shortcut for it

awk has the -F option for it. Upper case F

awk -F ":" '$2>100e6 {print $1}' population_total.csv