December 19, 2019

This course is easy

if you do the correct things

  • First, you need to understand the question in your own language
    • Make a drawing
  • Decompose the problem in smaller parts
    • You can eat an elephant, piece by piece
  • Write your answer in your own words
  • Translate them to the computer language
    • In this case, awk

Arrays

How many countries on each continent?

awk '$2=="americas" {n_americas++}
$2=="africa" {n_africa++}
$2=="asia" {n_asia++}
$2=="europe" {n_europe++}
END {print "americas", n_americas;
     print "africa", n_africa;
     print "asia", n_asia;
     print "europe", n_europe;
}' world2017.txt

Arrays make this easier

awk '$2=="americas" {n["americas"]++}
$2=="africa" {n["africa"]++}
$2=="asia" {n["asia"]++}
$2=="europe" {n["europe"]++}
END {print "americas", n["americas"];
     print "africa", n["africa"];
     print "asia", n["asia"];
     print "europe", n["europe"];
}' world2017.txt

$2 is the continent

awk '{n[$2]++}
END {print "america", n["america"];
     print "africa", n["africa"];
     print "asia", n["asia"];
     print "europe", n["europe"];
}' world2017.txt

Repeat commands using for

awk '{n[$2]++}
END {for(continent in n) {
    print continent, n[continent]
    }
}' world2017.txt

Notice that the output may not be in order

Parts of an array

One array contains several elements

They are pairs of key, and values

We can access the values using []

we write the key inside []

we can read or write the value

array[key] =  value

Exercise: Frequency table

histogram

Using the int() function we can round the income per capita

What is the absolute frequency of income (in thousands of dollars)?

0 5
1 20
2 17
3 6
4 8
5 7
6 4
7 8
8 5
9 3
10 8
11 5
12 3
13 3
14 4
15 5
16 5
17 3
18 1
19 5
20 1
21 2
22 2
23 5
24 1
25 3
26 2
27 1
28 1
29 1
30 2
31 2
32 2
33 1
34 1
37 3
39 1
40 1
41 2
42 3
43 2
44 2
45 2
49 1
51 1
56 1
62 2
74 1
78 1
79 1
91 1
123 1