November 29, 2018

Arithmetic

  • + addition, sum
  • - subtraction, difference
  • * multiplication, product
  • / division, quotient
  • % reminder, modulo
  • ^ exponentiation, power

Example

What does this command do?

seq 100 | awk '$1 % 7 == 0 {print $1}'

Example

What does this command do?

seq 100 | awk '($1 % 3 == 0) && ($1 % 2 ==0) {print $1}'

Assignments

The symbol = is used to assign a new value to a variable

variable = value

Variables are created automatically when you assign them

Variables can be numeric or text

Assignments

x += increment Add increment to the value of x.
x -= decrement Subtract decrement from the value of x.
x *= coefficient Multiply the value of x by coefficient.
x /= divisor Divide the value of x by divisor.
x %= modulus Set x to its remainder by modulus.
x ^= power Raise x to the power power.

Increasing and decreasing

We want to read a value, and change it by 1

y = x; x += 1 y = x++
y = x; x -= 1 y = x--
x += 1; y = x y = ++x
x -= 1; y = x y = --xx

Automatic variables

Besides the variables we create with assignments, we have some pre-definded variables

  • $1, $2, and so are the fields of each record
  • $0: complete input record
  • NF: Number of Fields in the current record
  • NR: Number of the current Record
  • FILENAME: name of the current file being processed
  • FNR: Number of the current Record in the current file

Exercise

Write an awk command that prints

  • Name of the current file
  • Global Number of Record
  • File Number of Record
  • the record itself

Apply this to all .txt files in your folder

Conditions

  • comparisons
    • ==, !=, <, >, <=, >=
  • regular expressions
    • matching anywhere
    • matching a single column
  • BEGIN, END
  • combinations using &&, || and !

BEGIN and END

These special conditions are only true once in every program

  • BEGIN is true before reading any record
    • We can define initial values for some variables
  • END is true after reading all records
    • We can print the variables that we computed while reading the records

Example

ls -l | awk '$6 == "Nov" { sum += $5 }
             END { print sum }'

Variables that control awk behavior

  • FS: Field Separator. Regex that separates fields
  • RS: Record Separator. Regex that separates fields
  • OFS: Output Field Separator. Text to separate printed fields
  • ORS: Output Record Separator. Text to separate printed records

Default values

Notice that Output separators are text, but Input separators are regular expressions

By default,

  • ORS="\n", that is, records are separated by new line
  • OFS=" ", that is, fields are separated by space

Writing special characters

Here we find something new

Most characters are easy to write. Just use the keyboard

Some are harder, because they have other meanings, or they are not in the keyboard

To write them, we use \ followed by a letter

There are several cases, such as tab, new line, beep, backspace

(if you are curious, look at “line endings in UNIX v/s Windows”)

Special characters

The most common special characters are written as follows

name how to write it
TAB \t
new line \n
backslash \\

More default values

Notice that Input separators are regular expressions

By default,

  • RS="\n", that is, records are separated by new line
  • FS=/[ \t]+/, that is, fields are separated by whitespace. One or more space or tab

This is why the file /home/andres/world_2007.txt is different from /home/andres/gapminder-2007.txt

Exercise

Write an awk command that prints the file name and the number of fields for /home/andres/world_2007.txt and /home/andres/gapminder-2007.txt

  • filter duplicates, to understand better
  • What is the UNIX command to compare two files?

Useful application

We can change the input’s field separator to process different kinds of files

For example, the list of users in UNIX is stored in the file /etc/passwd

busrabal:x:1060:1022:Busra Bala,,,:/home/busrabal:/bin/bash
simay-24:x:1061:1006:Simay Goknil Urek,0401170068:/home/simay-24:/bin/bash
mert-sir:x:1062:1006:Mert Sırmalı,0401170090:/home/mert-sir:/bin/bash

Here fields are separated by :

Application

Let’s print the username (field 1) for the users in group (field 4) 1006

awk 'BEGIN {FS=":"} $4==1006 {print $1}' /etc/passwd

Exercise

Write an awk command that prints the file name and the number of fields for /home/andres/world_2007.txt and /home/andres/gapminder-2007.txt

  • this time, use TAB as field separator
  • filter duplicates, to understand better

There is a shortcut

Changing FS is so useful that there is a shortcut for it

awk has the -F option for it. Upper case F

awk -F ":" '$4==1006 {print $1}' /etc/passwd

Several Rules

awk reads the input files one line at a time

For each line, awk tries the patterns of each rule

If several patterns match, then several actions execute in the order in which they appear in the awk program

If no patterns match, then no actions run.

Several Rules

After processing all the rules that match the line, awk reads the next line.

This continues until the program reaches the end of the file.

For example, the following awk program contains two rules:

/12/  { print $0 }
/21/  { print $0 }

Longer programs

Sooner or later you will have too many rules to fit in the command line

And it becomes hard to write all again and again

In this case we can write all in an .awk file

We use a text editor (like nano or vim) to edit it

Example

Let’s write the file gdp.awk with this content

BEGIN { FS="\t" }

NF>0 {gdp = $2*$5; total+=gdp; print $1,gdp }

END {print "Total", total}

Running the example

To tell awk to read commands from a file, we use the option -f (lower case f)

awk -f gdp.awk /home/andres/gapminder-2007.txt

Be careful with -f and -F