These are the subjects that every student of CMB needs to know after the course. Some of these contents are evaluated on the makeup, some were already evaluated.

- To understand and value the use of computers in Science.
- Data handling and processing are essential to Science
- Computers are useful beyond the web, email and text editing.
**Computers are not typewriters** - Know the essential parts of computers and what makes them different
from other tools
- What is the computer memory
- Why there are different kinds of memory

- Understand the role of
**text files**for Scientific computing

- Learn the basic rules of the R platform
- How to write commands and give orders to the computers
- How to assign values to variables and read them back
- How to use pre-existing functions
- How to add new functions using
`install.packages()`

and`library()`

- Know the four basic
**data types**of R*numeric*,*character*,*factor*and*logic*- Recognize them in R output
- Know when to use each one
- Be able to create vector of each type
- How to make a numeric vector
- How to make a character vector
- The same with factor and logic vectors

- Know the four basic
**data structures**of R*vectors*,*matrices*,*lists*,*data frames*- Recognize them in R output
- Know when to use each one
- Be able to create objects of each type
- Make new vectors
- Make new matrices
- Make lists with several vectors and other lists
- Read data frames from an existing text file
- Write data frames to new text files

- Use
**indices**to access and modify components of data structures- Single indices for vectors and lists
- Double indices (separated by comma) for matrices and data frames
- Empty indices mean “all the row” or “all the column”

- Positive numbers can be used as indices
- Negative numbers can be used as indices
- be sure to understand the difference

- Logical vectors as indices.
**This is the most important case**

- Characters as indices
- In this case you need to give names to each element
- For vectors and lists use
`names()`

. You can also assign names when you create the vector or list - For matrices and data frames use
`rownames()`

and`colnames()`

- Handling
**data frames**- Data frames are the most useful structures
- Each column is a vector and has a name
- You can see the names using
`colnames()`

or`summary()`

- it is always a good idea to use
`summary()`

and verify if the values make sense or if there is an error - verify the minimum, maximum and if there are
`NA`

values

- it is always a good idea to use
- You can access any complete column using the
`$`

sign- This does not work in matrices

- You can change a column just assigning a new vector to an existing column
- You can add new columns just assigning a vector to a new column name
- You can delete a column just assigning
`NULL`

to an existing column - You can compare any column to a fixed value and get a logical vector
- Then you can use that vector as an index for the rows
**This is the most common case**

- You should be able to index any column, any row and any combination of rows and columns

- Plotting vectors one by one
- You can plot vectors one by one using
`plot()`

,`points()`

and`lines()`

- You can choose type of plot (lines, symbols, both, none)
- You can choose colors (
`col=`

), symbols (`pch=`

) and sizes (`cex=`

) - You can include title, subtitle and axis names
- You can select the range to plot with
`xlim=`

and`ylim=`

- You can plot vectors one by one using
- Scatter plots
- You can also plot two vectors at the same time, one on each axis
- You probably need to use index to get the correct vectors from a data frame

- You can use the same options as before to make a nicer plot
- There are many other options that you find in
`help(par)`

- There are many other options that you find in
- The drawing changes if one of the vectors is a factor
- When
`x`

is a factor you get a*boxplot***This is an important case**- You need to know the meaning of the symbols

- When the vectors are numeric you can use logarithmic axes using
`log=`

- Make sure you understand when to use a logarithmic scale

- You can use
`plot(data$x, data$y)`

or a formula like`plot(y~x, data)`

- When you use the
*formula*then the axis names are automatic - But you can always change the axis names using
`xlab=`

and`ylab=`

**Always**look at the data first. You have to**understand**the data. There is no*easy*way.

- When you use the

- You can also plot two vectors at the same time, one on each axis
- Histograms
- To understand a single vector you can see the distribution of values using a histogram
- The command
`hist()`

groups all values into classes and counts them - The number of classes can be controlled using
`nclass=`

- It is better to use
`col=`

to see the columns - This is just another tool you have to understand the data

## In summary

The purpose of the course is to help you learn how to handle data. Only you can learn. The course is a guide.

Each problem is different. You need to understand each problem and
use the best tool for each case. Memory is not important. There is no
*easy* way.

The important part is to **understand** the data,
**see the common patterns** and
**generalize**. Learn the rules and apply them.

If you succeed in this course you will be able to apply the concepts in Molecular Biology, in Science, in any other job and in everyday life. Data is the essence of 21st century.