November 15th, 2018

## Get data from the web

The best way is to download the data file and save it into a local folder

Then you can read it as much as you like

## Get data from the web

If you want to load data directly from the web use a code like this We read data with a code chunk like this

{r cache=TRUE}
URL <- “http://anaraven.bitbucket.io/static/2018/cmb1/


Please include the cache=TRUE chunk option

It will be faster and nicer on the server

## Choosing color, size and symbol

The commands in this page produce the plots of the following page

plot(survey$height_cm, main="1") plot(survey$height_cm, main="2", col="red")
plot(survey$height_cm, main="3", cex=2) plot(survey$height_cm, main="4", cex=0.5)
plot(survey$height_cm, main="5", pch=16) plot(survey$height_cm, main="6", pch=".")

## Choosing the type of plot

The commands in this page produce the plots of the following page

plot(survey$height_cm, main="1", type = "l") plot(survey$height_cm, main="2", type = "o")
plot(survey$height_cm, main="3", type = "b") plot(survey$height_cm, main="4", type = "p")
plot(survey$height_cm, main="5", xlim=c(1,20)) plot(survey$height_cm, main="6", xlim=c(30,51))

## Two plots in parallel

plot(survey$height_cm, ylim=c(0,200)) points(survey$weight_kg, pch=2)
plot(survey$height_cm, type="l", ylim=c(0,200)) lines(survey$weight_kg, col="red")

## Decoration

plot(survey$height_cm, col=survey$Gender)
legend("topleft", legend=c("Female", "Male"), fill=c(1,2))

plot(survey$height_cm) abline(h=mean(survey$height_cm), col="red", lwd=5)

## AB line

This command adds a straight line in a specific position

• abline(h=1) adds a horizontal line in 1
• abline(v=2) adds a vertical line in 2
• abline(a=3, b=4) adds an $$y=a +b\cdot x$$ line
• a is the intercept when $$x=0$$
• b is the slope

plot(survey$height_cm) abline(v=20, col="blue") abline(a=160, b=0.5) ## Scatter plots ## Comparing two variables plot(survey$height_cm, survey$weight_kg) ## Other example plot(survey$height_cm, survey$hand_span_cm) ## Formulas in R ## Formulas are summaries of a relationship Instead of plot(survey$height_cm, survey$weight_kg) we can write plot(survey$weight_kg ~ survey$height_cm) or even plot(weight_kg ~ height_cm, data = survey) ## Using formulas makes life easier plot(height_cm ~ hand_span_cm, data = survey) plot(height_cm ~ hand_span_cm, data = survey, subset = Gender=="Female") plot(height_cm ~ hand_span_cm, data = survey, subset = Gender=="Male") It is easier to specify the data.frame and which values to plot ## Graphics depend on the type of data ## Numeric v/s Numeric plot(height_cm ~ weight_kg, data=survey) ## Factor v/s Factor survey$handness <- as.factor(survey$handness) plot(Gender ~ handness, data=survey) ## Factor v/s Numeric plot(Gender ~ weight_kg, data=survey) ## Numeric v/s Factor plot(weight_kg ~ Gender, data=survey) ## This is called “Boxplot” Plotting a numeric value depending on a factor results in a boxplot It is a graphical version of summary(). • The center is the median • The box is between the first and third quartile (50% of cases) • The whiskers extend a prediction of 95% of cases • Points are outliers ## Nicer boxplot plot(weight_kg ~ Gender, data=survey, boxwex=0.3, notch=TRUE, col="grey") ## Exploring all data: plot data frame plot(survey) ## Summary ## Plot function • plot() can be used with one or two vectors, or with a formula • plot(y ~ x) looks like plot(x, y) • Formulas are nice: plot(y~x, data=dframe) is better than plot(dframe$x, dframe$y) • In general the defaults are good • axis labels are the names of the variables being plotted • ranges are automatic • You can use numbers to choose colors, symbols and sizes of points • You can choose the ranges, labels and ## Plot is a generic function The figure type depends on the data type of the vector • numeric: similar to points() or lines() • factor: count frequency and draws barplot() • numeric v/s factor: same as boxplot() • complete data frame: same as pairs() • factor v/s factor: like a histogram in 2D ## Adding details to a plot • The plot() command defines the ranges, labels and title • You can add more elements over a pre-existing plot: • points(), lines() • text() • segment(), arrows(), • rect(), polygon() xspline() • legend() Learn more on the help page of each command ## Colors Colors can be specified in several ways: • A numeric value is an index into a palette • A character with a color name in English • such as “red” or “steelblue” • A character with a hexadecimal code • such as “#A11F1F” • Google “hexadecimal colors” to learn more ## Exam answers ## Question 1.1 Write an R command to assign this list to the variable x, and show its content x <- list(weight=c(5.58, 6.11, 4.61, 4.53, 5.14, 4.17, 3.59, 3.83, 4.89, 4.69, 5.12, 5.50, 5.29, 6.15, 5.26), group=factor(rep(c("ctrl", "trt1", "trt2"), c(5,5,5)))) x x <- list(weight=c(5.58, 6.11, 4.61, 4.53, 5.14, 4.17, 3.59, 3.83, 4.89, 4.69, 5.12, 5.50, 5.29, 6.15, 5.26), group=factor(c("ctr1", "ctr1", "ctr1", "ctr1", "ctr1", "trt1", "trt1", "trt1", "trt1", "trt1", "trt2", "trt2", "trt2", "trt2", "trt2"))) ## Question 1.2 Write the code to print all weights except the third one x$weight[-3]
x[[1]][-3]

weird answer (people?)

people$weight[-3] ## Question 1.3 Write the code to print the mean of weight only for the cases where group is equal to "trt1" mean(x$weight[x$group == "trt1"]) a <- x$weight[x$group=="trt1"] mean(a) ## Question 1.3 weird answers mean(x$weight[6:10])
mean(weight[c(-1,-2,-3,-4,-5,-11,-12,-13,-14,-15)])
x$weight[x$group=="trt1"]

## Question 1.4

Write the code to find the group of the element whose weight is equal to the median of all weight

x$group[x$weight==median(x$weight)] a <- median(x$weight) x$group[a==x$weight]

## Question 2.1

Write the command to show the column names of world.

colnames(world)

colnames<-("population", "contenient", "life_exp")

## Question 2.2

Write the command to find how many lines are there in the wolrd data frame

nrow(world)

dim(world)
summary(world)

## Question 2.3

Write the command to show the first 5 lines of world.

world[1:5, ]
M <- world[1:5, ]
M

weird

list(world[1, ], world[2, ], world[3, ], world[4, ], world[5, ])

## Question 2.4

Write the command to show the data corresponding to Turkey

world["Turkey",]

weird

world[row.names="Turkey"]

## Question 2.5

Write the command to count the number of times that each value appears in the continent column

table(world$continent) weird ncol(world$continent)
colnames(contenient)

better

rownames(world)[(world$population>200000000)] ## Question 2.7 Write the command to create a data frame called Africa with all the rows of world where continent=='Africa'. Show the result of summary(Africa) Africa <- world[world$continent=="Africa",]
summary(Africa)

weird

weird

summary(world[world$continent== "Africa" , ]) ## Question 2.9 Show the row of the country which population is the maximum. Africa[which.max(Africa$population),]

Alternative

Africa[Africa$population==max(Africa$population),]

## Question 2.10

Which is the life expectancy of the African country with the smallest population? Show the life_exp column of the country which population is the minimum.

Africa$life_exp[which.min(Africa$population)]

Alternatives:

Africa[Africa$population==min(Africa$population), "life_exp"]
world$life_exp[which.max(world$population[world$continent=="Africa"])] ## Question 2.10 Alternatives z <- min(Africa$population)
Africa$life_exp[z==Africa$population]
min <- which.min(Africa$population) Africa[min,"life_exp"] a <- world[world$continent=="Africa", ]
a[min(a$population)==a, "life_exp"] ## Question 2.10 weird answers min(life_expectancy$Africa)
min(world\$population)