October 25, 2018

df

The df command reports on the space left on the file system. For example, to find out how much space is left on the fileserver, type

$ df .

du

The du command outputs the number of kilobytes used by each subdirectory. Useful if you have gone over quota and you want to find out which directory has the most files. In your home-directory, type

$ du -s *

The -s flag will display only a summary (total size) and the * means all files and directories.

gzip

This reduces the size of a file, thus freeing valuable disk space. For example, type

$ ls -l science.txt

and note the size of the file using ls -l. Then to compress science.txt, type

$ gzip science.txt

This will compress the file and place it in a file called science.txt.gz

To see the change in size, type ls -l again.

gunzip

To expand the file, use the gunzip command.

$ gunzip science.txt.gz

zcat

zcat will read gzipped files without needing to uncompress them first.

$ zcat science.txt.gz

If the text scrolls too fast for you, pipe the output though less

$ zcat science.txt.gz | less

file

file classifies the named files according to the type of data they contain, for example ascii (text), pictures, compressed data, etc..

To report on all files in your current directory, type

$ file *

diff

diff compares the contents of two files and shows the differences

Suppose you have a file called file1 and you edit some part of it and save it as file2

To see the differences type

$ diff file1 file2
  • Lines beginning with a < are in file1
  • Lines beginning with a > are in file2

find

This searches through the directories for files and directories with a given name, date, size, or any other attribute you care to specify

It is a simple command but with many options - use man find.

To search for all files with the extension .txt, starting at the current directory (.) and working through all sub-directories, then printing the name of the file to the screen, type

$ find . -name '*.txt' -print

find

To find files over 1Mb in size, and display the result as a long listing, type

$ find . -size +1M -ls

history

The shell keeps an ordered list of all the commands that you have entered. Each command is given a number according to the order it was entered.

$ history

(show command history list)

You can use the exclamation character (!) to recall commands easily.

$ !!

(recall last command)

Using the !

$ !-3

(recall third most recent command)

$ !5

(recall 5th command in list)

$ !grep

(recall last command starting with grep)

Interactive History

Remember that you can use the arrows to look back in the history

Another useful way is to press ^R

Then you start typing some text contained in the command you are looking for

To search more, press Ctrl-R again

To cancel, press Ctrl-C

Up to here

Asking questions to text files

Scientific data often is text

In many cases, scientific data will be a table on a text file

Each row represents an observation

Each column represents a feature or field

Usually columns are separated by TAB, comma, semicolon, or white space

TAB is the best option. Unfortunately, this is not standard

We already know some tools

We already know commands to handle text files

  • head
  • tail
  • wc
  • grep
  • sort

How can we show only some columns?

The command cut -f will allow us to select which fields to see

$ head -3 exam-plan.txt | cut -f 2
02.11.2018
07.11.2018
06.11.2018

Head first or cut first?

In this case we can also say

$ cut -f 2 exam-plan.txt | head -3
02.11.2018
07.11.2018
06.11.2018

The result is the same, but one can be more efficient

Sorting and counting

After we choose the column, it is sometimes useful to sort it

cut -f 2 exam-plan.txt | sort

There is a practical consequence:

All equal values are together

Sorted data is easier to process

How many unique values

For example, if a line is identical to the previous one, we know it is repeated

The uniq command shows each value a unique time

cut -f 2 exam-plan.txt | sort | uniq

The input of uniq must be sorted before

How many unique values are there?

Combining with wc -l to count how many lines, we get

cut -f 2 exam-plan.txt | sort | uniq | wc -l

How many of each unique value?

We use uniq with the option -c (count)

cut -f 2 exam-plan.txt | sort | uniq -c

Finding duplicates

This query is connected to finding uniques

The command uniq has the option -d to show duplicates

For example, to see which exams are shared by several departments, (i.e. which complete lines are duplicated) we do

sort exam-plan.txt |uniq -d

Can we see it better?

We can see it better using less and some options

sort exam-plan.txt |uniq -d |less -S -x50,62,75

Option -x indicate the columns of each TAB

Try less without -S or with different numbers after -x to see the difference

Filtering rows and selecting columns

Which rooms will be used on Oct 31?

grep '31.10.2018' exam-plan.txt | cut -f 4

We would like to separate each line on the - symbol

That is, we want to transcribe - as new-line

grep '31.10.2018' exam-plan.txt | cut -f 4 | tr '-' '\n'