December 7, 2018

Quiz Solution

Change your working directory to quiz2

There is a file quiz2.sh in your folder. You will edit it with nano

nano quiz2.sh

For each of the next questions, please write the corresponding commands after the question

If you like, you can open a second connection to the server

Write the awk command to show the first 5 lines of world_2007.txt

Afghanistan 31889923    Asia    43.828  974.5803384
Albania 3600523 Europe  76.423  5937.029526
Algeria 33333216    Africa  72.301  6223.367465
Angola  12420476    Africa  42.731  4797.231267
Argentina   40301927    Americas    75.32   12779.37964

Write the awk command to show the first 5 lines of world_2007.txt

awk 'NR<=5' world_2007.txt
Afghanistan 31889923    Asia    43.828  974.5803384
Albania 3600523 Europe  76.423  5937.029526
Algeria 33333216    Africa  72.301  6223.367465
Angola  12420476    Africa  42.731  4797.231267
Argentina   40301927    Americas    75.32   12779.37964

Show the data between lines 45 and 50 (inclusive) of world_2007.txt

France  61083916    Europe  80.657  30470.0167
Gabon   1454867 Africa  56.735  13206.48452
Gambia  1688359 Africa  59.448  752.7497265
Germany 82400996    Europe  79.406  32170.37442
Ghana   22873338    Africa  60.022  1327.60891
Greece  10706290    Europe  79.483  27538.41188

Show the data between lines 45 and 50 (inclusive) of world_2007.txt

awk 'NR<=50 && NR>=45' world_2007.txt
France  61083916    Europe  80.657  30470.0167
Gabon   1454867 Africa  56.735  13206.48452
Gambia  1688359 Africa  59.448  752.7497265
Germany 82400996    Europe  79.406  32170.37442
Ghana   22873338    Africa  60.022  1327.60891
Greece  10706290    Europe  79.483  27538.41188

Write the awk command to show the data corresponding to Turkey

Turkey  71158647    Europe  71.777  8458.276384

Write the awk command to show the data corresponding to Turkey

awk '$1=="Turkey"' world_2007.txt
Turkey  71158647    Europe  71.777  8458.276384

Write the awk command to find how many lines are there in world_2007.txt

142

Write the awk command to find how many lines are there in world_2007.txt

awk 'END {print NR}' world_2007.txt
142

How many countries are called “Rep”?

Write the command to find how many lines of world_2007.txt contain the exact word “Rep”, but not “Republic”

Congo,_Dem._Rep.    64606759    Africa  46.462  277.5518587
Congo,_Rep. 3800610 Africa  55.322  3632.557798
Korea,_Dem._Rep.    23301725    Asia    67.297  1593.06548
Korea,_Rep. 49044790    Asia    78.623  23348.13973
Yemen,_Rep. 22211743    Asia    62.698  2280.769906

How many countries are called “Rep”?

Write the command to find how many lines of world_2007.txt contain the exact word “Rep”, but not “Republic”

awk '/Rep[^u]/' world_2007.txt
Congo,_Dem._Rep.    64606759    Africa  46.462  277.5518587
Congo,_Rep. 3800610 Africa  55.322  3632.557798
Korea,_Dem._Rep.    23301725    Asia    67.297  1593.06548
Korea,_Rep. 49044790    Asia    78.623  23348.13973
Yemen,_Rep. 22211743    Asia    62.698  2280.769906

Which is the African country with 135031164 habitants?

Nigeria 135031164   Africa  46.859  2013.977305

Which is the African country with 135031164 habitants?

awk '$3=="Africa" && $2==135031164' world_2007.txt
Nigeria 135031164   Africa  46.859  2013.977305

Write an awk command that prints population and country (in that order) when the continent is “Europa”

3600523 Albania
8199783 Austria
10392226 Belgium
4552198 Bosnia_and_Herzegovina
7322858 Bulgaria
4493312 Croatia
10228744 Czech_Republic
5468120 Denmark
5238460 Finland
61083916 France
82400996 Germany
10706290 Greece
9956108 Hungary
301931 Iceland
4109086 Ireland
58147733 Italy
684736 Montenegro
16570613 Netherlands
4627926 Norway
38518241 Poland
10642836 Portugal
22276056 Romania
10150265 Serbia
5447502 Slovak_Republic
2009245 Slovenia
40448191 Spain
9031088 Sweden
7554661 Switzerland
71158647 Turkey
60776238 United_Kingdom

Write an awk command that prints population and country (in that order) when the continent is “Europa”

awk '$3=="Europe" {print $2,$1}' world_2007.txt
3600523 Albania
8199783 Austria
10392226 Belgium
4552198 Bosnia_and_Herzegovina
7322858 Bulgaria
4493312 Croatia
10228744 Czech_Republic
5468120 Denmark
5238460 Finland
61083916 France
82400996 Germany
10706290 Greece
9956108 Hungary
301931 Iceland
4109086 Ireland
58147733 Italy
684736 Montenegro
16570613 Netherlands
4627926 Norway
38518241 Poland
10642836 Portugal
22276056 Romania
10150265 Serbia
5447502 Slovak_Republic
2009245 Slovenia
40448191 Spain
9031088 Sweden
7554661 Switzerland
71158647 Turkey
60776238 United_Kingdom

Which are the countries with the biggest population in Europa?

Sort the output of the last command, from biggest to smallest. Use | and sort

82400996 Germany
71158647 Turkey
61083916 France
60776238 United_Kingdom
58147733 Italy
40448191 Spain
38518241 Poland
22276056 Romania
16570613 Netherlands
10706290 Greece
10642836 Portugal
10392226 Belgium
10228744 Czech_Republic
10150265 Serbia
9956108 Hungary
9031088 Sweden
8199783 Austria
7554661 Switzerland
7322858 Bulgaria
5468120 Denmark
5447502 Slovak_Republic
5238460 Finland
4627926 Norway
4552198 Bosnia_and_Herzegovina
4493312 Croatia
4109086 Ireland
3600523 Albania
2009245 Slovenia
684736 Montenegro
301931 Iceland

Which are the countries with the biggest population in Europa?

Sort the output of the last command, from biggest to smallest. Use | and sort

awk '$3=="Europe" {print $2,$1}' world_2007.txt |sort -nr 
82400996 Germany
71158647 Turkey
61083916 France
60776238 United_Kingdom
58147733 Italy
40448191 Spain
38518241 Poland
22276056 Romania
16570613 Netherlands
10706290 Greece
10642836 Portugal
10392226 Belgium
10228744 Czech_Republic
10150265 Serbia
9956108 Hungary
9031088 Sweden
8199783 Austria
7554661 Switzerland
7322858 Bulgaria
5468120 Denmark
5447502 Slovak_Republic
5238460 Finland
4627926 Norway
4552198 Bosnia_and_Herzegovina
4493312 Croatia
4109086 Ireland
3600523 Albania
2009245 Slovenia
684736 Montenegro
301931 Iceland

What is the ranking of Turkey’s population in Europe?

Take the sorted output of last command and pipe it into an awk command that prints the row number and the second field when the country is “Turkey”

2 Turkey 71158647

What is the ranking of Turkey’s population in Europe?

Take the sorted output of last command and pipe it into an awk command that prints the row number and the second field when the country is “Turkey”

awk '$3=="Europe" {print $2,$1}' world_2007.txt |
    sort -nr | awk '/Turkey/ {print NR,$2,$1}'
2 Turkey 71158647

What is the ranking of Turkey’s GDP per capita in Europe?

Repeat the last command, changing “Population” for “GDP per capita”

28 Turkey 8458.276384

What is the ranking of Turkey’s GDP per capita in Europe?

Repeat the last command, changing “Population” for “GDP per capita”

awk '$3=="Europe" {print $5,$1}' world_2007.txt |
    sort -nr | awk '/Turkey/ {print NR,$2,$1}'
28 Turkey 8458.276384

What is the ranking of Turkey’s Life expectancy in Europe?

Repeat the last command, changing “GDP per capita” for “Life Expectancy”

30 Turkey 71.777

What is the ranking of Turkey’s Life expectancy in Europe?

Repeat the last command, changing “GDP per capita” for “Life Expectancy”

awk '$3=="Europe" {print $4,$1}' world_2007.txt |
    sort -nr | awk '/Turkey/ {print NR,$2,$1}'
30 Turkey 71.777

How many countries are there in Europe?

Write an awk command that counts how many lines have “Europe” in the third field

30

How many countries are there in Europe?

Write an awk command that counts how many lines have “Europe” in the third field

awk '$3=="Europe" {n++} END {print n}' world_2007.txt
30

Why we do quizzes

  • To rehearse the exam procedures
  • To test your learning
  • To do realistic work
  • To see if you can follow simple instructions
  • To find which of you will be good scientist

Quizzes are important if you want to succeed

Functions in awk

Numeric Functions

AWK has the following built-in arithmetic functions:

int(expr) Truncate to integer.
rand() Return a random number N, between 0 and 1, such that 0 ≤ N < 1.
srand([expr]) Use expr as the new seed for the random number generator. If no expr is provided, use the time of day. Return the previous seed for the random number generator.

Numeric Functions

atan2(y, x) Return the arctangent of y/x in radians.
cos(expr) Return the cosine of expr, which is in radians.
sin(expr) Return the sine of expr, which is in radians.
exp(expr) The exponential function.
log(expr) The natural logarithm function.
sqrt(expr) Return the square root of expr.

Examples

Print seven random numbers from 0 to 100, inclusive:

awk 'BEGIN { for (i = 1; i <= 7; i++)
                 print int(101 * rand()) }'

String Functions

tolower(str)
Return a copy of the string str, with all the uppercase characters in str translated to their corresponding lowercase counterparts.
Non-alphabetic characters are left unchanged.
toupper(str)
Return a copy of the string str, with all the lowercase characters in str translated to their corresponding uppercase counterparts.

String Functions

length([s])
Return the length of the string s, or the length of $0 if s is not supplied.
substr(s, i [, n])
Return the at most n-character substring of s starting at i.
If n is omitted, use the rest of s.

More functions later

Exercise

Write an awk program that changes the first word to Title Case

Loops

Updated data

Our Midterm and Quiz used data from 2007

The file /home/andres/population_total.csv has data for all years and all countries

Take a look doin this:

head /home/andres/population_total.csv

Pivot table

We want to change the shape of this table

The output should be in three columns

  • country
  • year
  • population

Fields are separated by comma

We need to use the -F option. Something like

awk -F ',' '{print $1, 1800, $2; 
             print $1, 1801, $3; 
             print $1, 1802, $4;
         }' /home/andres/population_total.csv

with one print command for every field

Can we do it smarter?

for loops

Like many other computer languages, awk can repeat the same commands several times

awk -F ',' '{for(i=2; i<=NF; i++) {
                print $1, 1798+i, $i
            }
        }' /home/andres/population_total.csv

for loops have four parts

The general form of a for loop looks like this:

for(A;B;C){D}

  • A, B, and C are separated by ; (semicolon)
  • D is wrapped in {}

The four parts of for

for(A;B;C){D}

  • A is the initialization
  • B is a while condition
  • C is the update
  • D is one or more commands to be executed

The while condition

A, C and D are normal awk commands or assignments

B is a TRUE/FALSE condition

The D part is repeated while B is true

B should be FALSE sometimes, otherwise we never finish

Big Picture

  • A
  • D
  • C
  • B TRUE
  • D
  • C
  • B TRUE
  • D
  • C
  • B FALSE