Homework 4

Epilepsy affects 1% of the World population. Thus, the probability for one person is c(Yes=0.01, No=0.99). We can simulate n persons with the following code:

epilepsy <- function(n) {
    return(
        sample(c("Yes","No"), 
            size=n, 
            replace = TRUE,
            prob = c(0.01, 0.99)
        )
    )
}

We are 97 people in the course, including me. To simulate a group like us we use

epilepsy(97)

##  [1] "No"  "No"  "No"  "No"  "Yes" "No"  "No"  "No"  "No"  "No"  "No" 
## [12] "No"  "No"  "No"  "No"  "No"  "No"  "No"  "No"  "No"  "No"  "No" 
## [23] "No"  "No"  "No"  "No"  "No"  "No"  "No"  "No"  "No"  "No"  "No" 
## [34] "No"  "No"  "No"  "No"  "No"  "No"  "No"  "No"  "No"  "No"  "No" 
## [45] "No"  "No"  "Yes" "No"  "No"  "No"  "No"  "No"  "No"  "No"  "No" 
## [56] "No"  "No"  "No"  "No"  "No"  "No"  "No"  "No"  "No"  "No"  "No" 
## [67] "No"  "No"  "No"  "No"  "No"  "No"  "No"  "No"  "No"  "No"  "No" 
## [78] "No"  "No"  "No"  "No"  "No"  "No"  "No"  "No"  "No"  "No"  "No" 
## [89] "No"  "No"  "No"  "No"  "No"  "No"  "No"  "No"  "No"

We replicate the experiment to get 1000 samples of size 97

courses <- replicate(1000, epilepsy(97))
dim(courses)

## [1]   97 1000

Here courses is a matrix, not a data frame. We have to use [row, col], not [[column]].

We got a lot of data, but very little information. We did not learn anything from the 97000 words. We want numbers, not words. We need a function that

takes one column of courses, and
gives us an integer with the number of “Yes”

This is your mission. Write the code to produce a vector named cases of size 1000, with the number of “Yes” in each column. If everything goes right you will get something similar to this:

table(cases)

## cases
##   0   1   2   3   4 
## 356 376 190  69   9

barplot(table(cases))