“Şerefim üzerine söz veririm ki, bu sınav sırasında etik kurallari çiǧnemedim”

All answers are strictly personal. Any unethical behavior will be penalized.

This is serious.

You need to answer only two questions. Choose wisely.

The exact meaning of this phase is this: The total grade is the sum of each question’s grade, divided by 2. If this number is greater than 100, then the grade is 100. If you answer only two questions then the grade is the average of them. If you answer a third one, you can only be better, never worse.

Random process

Cin Ali has 26 pairs of socks, each one marked with an English letter from A to Z. Each week he loses one sock in the laundry. After 10 weeks he has only 42 socks, some are paired, some are unpaired. We want to know what is the probability that the next sock he takes from the drawer is an unpaired sock (that is, has no pair).

Let’s not forget this. We are looking for “the probability that the next sock he takes from the drawer is an unpaired sock”. To make the question more easy, we decompose it into simpler steps.

The socks will be represented by a vector of character. Each element is a single letter, that can appear one or two times. Initially the socks vector is

socks <- c(LETTERS, LETTERS)

In this case Cin Ali starts with 26 pairs of socks, that is, 52 paired socks.

let’s see

length(socks)

## [1] 52

table(socks)

## socks
## A B C D E F G H I J K L M N O P Q R S T U V W X Y Z 
## 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2

Write a function that randomly chooses a sock. The result is the index of the sock. In other words, it is a random number between 1 and length(socks)
```
random_sock <- function(socks) {
    # write your code here
}
```
since the answer must be a random number, the easiest way is to use sample.int. We choose one sock, thus size=1.

In one line:
```
random_sock <- function(socks) {
    return(sample.int(length(socks), size=1))
}
```
Alternative solution: sample(1:length(socks), size=1)

When you have completed the random_sock function, the following function will simulate “loosing n socks”
```
laundry <- function(n, socks) {
    for(i in 1:n) {
        lost <- random_sock(socks)
        socks <- socks[-lost]
    }
    return(socks)
}
length(laundry(10, socks))
```
```
## [1] 42
```
we do nothing here, except verifying that the answer is 42.
Write a function that receives a sock called this and counts how many socks are like this. In other words, this function must count how many elements of the vector socks are equal to this.
```
how_many_like_this <- function(this, socks) {
    # write your code here
}
```
We see that the function has two inputs: this (one sock) and socks (a set of several socks). The key to this question is to understand it.

Step one: “elements of the vector socks are equal to this”. If we write socks==this we get a logic vector, which is TRUE only when an element of socks is equal to this.

Step two: we need to count how many TRUE are in the logic vector. As we saw in classes, that is very easy: we use sum()

In one line:
```
how_many_like_this <-function(this, socks) {
    return(sum(socks==this))
}
```
If your function is correct, you can test it with the following code,
```
how_many_like_this("Q", socks)
```
```
## [1] 2
```
We can also try other values, where the results must be 2 and 0:
```
how_many_like_this("A", socks)
```
```
## [1] 2
```
```
how_many_like_this("2", socks)
```
```
## [1] 0
```
After 10 weeks, Cin Ali takes a random sock from the drawer. What is the probability that he takes an unpaired sock?
```
is_single_sock <- function(n, socks) {
    remaining <- # write here
    new_sock <- # write here
    ans <- how_many_like_this(new_sock, remaining)==1
    return(ans)
}
```
(Here the question is not clear. What is n? We better ask the professor)

Ok. Here we put all together. Notice that we do not need to know how does random_sock or laundry work. We only need to know what they do.

The remaining socks are the output of the laundry for n weeks.

The new_sock is a random_sock among the remaining.

Now we write this in R language:

Now we can use this function to measure the frequency of taking an unpaired sock after 10 laundries, using this code:
```
n <- 500
m <- sum(replicate(n, is_single_sock(10, socks)))
m/n
```
```
## [1] 0.226
```
(your number may be a little different, but similar to this)
(Bonus: this question is optional. If you answer right, you get more points) With this result we cannot be sure about the probability of finding an unpaired sock. But you can find an interval that contains the correct value with probability 95%. Write the code to get that interval. Your values may be different from these ones.

This was the only formula we saw in the last classes, and was exercise 4.2. The formula for the limits of the interval is: \[\frac{m+2}{n + 4}\pm 2{\sqrt{\frac{m+2}{(n + 4)^2}\left(1-\frac{m+2}{n + 4}\right)}}\] You can write directly from class 23:
```
sigma <- sqrt( (m+2)/(n+4)^2 * (1-(m+2)/(n+4)) )
p_est <- c((m+2)/(n+4)-2*sigma, (m+2)/(n+4)+2*sigma)
p_est
```
```
## [1] 0.1907887 0.2655605
```
If you want a better approximation of the probability, you need a bigger \(n\)

Genetic Algorithms

Since Babylonian and Egyptian times (much before Pythagoras), people know that in a right triangle with sides \(a, b\) and \(c\), you always have that \(a^2+b^2=c^2\). For example \(3^2+4^2=5^2.\) Can we find another set of three integer numbers that solve \(a^2+b^2=c^2\)?

To be more specific, we are looking for a vector of integers \(v\) such that \[v[1]^2+v[2]^2-v[3]^2=0\] Also, we want the numbers to be greater than 4 and less than 20, so \(4<v[i]<20\) and \(v[i]\in \mathbb N\) for all \(i\).

As we saw in classes, we can solve this problem using genetic algorithms. Using the framework we saw in classes, we can build a population of vectors and make them evolve to minimize a fitness function. For that we need to define the size of the vectors, the range of acceptable values, the fitness function and a function to randomly mutate a vector.

This is nearly the same as class 23 and quiz 5. We can use the same ideas here

What is the value of m (size of vectors)?
```
m <- # write here
```
Easy. We have 3 variables: \(a, b, c\), also called \(v[1], v[2], v[3]\)
```
m <- 3
```
What are the possible values?
```
values <- # write here
```
Easy. The question says “we want the numbers to be greater than 4 and less than 20”. Therefore 5 to 19 would be fine.
```
values <- 5:19
```

Propose a good fitness function

fitness <- function(v) {
    # write here
}

This has two parts. The easy one is to put the formula we want to make equal to zero inside abs().

That is: abs(v[1]^2+v[2]^2-v[3]^2)

The second part is to put a high value when we are outside the valid range. We

fitness <- function(v) {
    bad1 <- (v[1]<5) | (v[1]>19)
    bad2 <- (v[2]<5) | (v[2]>19)
    bad3 <- (v[3]<5) | (v[3]>19)
    bad  <- bad1 | bad2 | bad3
    return(abs(v[1]^2+v[2]^2-v[3]^2)+1e3*bad)
}

What is a reasonable mutation function in this case?
```
mutation <- function(v) {
    # write here
}
```
Again, from the class and quiz, we know this has to include two lines. First, choose a random position inside the vector v. The code is like in the quiz:
```
k <- sample.int(length(v), size=1)
```
Then you have to update v[k]. The safest way is to add a small value, positive or negative. Something in the range -3:3 seem reasonable, but other similar values are also valid.
```
mutation <- function(v) {
    k <- sample.int(length(v), size=1)
    v[k] <- v[k] + sample(-3:3, size=1)
    return(v)
}
```

We are not doing the optimization here, because it can take too much time. You can try later at home if you like. Now you can test part 2.3 using this code

fitness(c(6,8,10))

## [1] 0

fitness(c(7,9,11))

## [1] 9

mutation(c(7,9,11))

## [1]  7 12 11

Well, now we have time to test it. Let’s do it using the same code as in Class 23 and quiz 5.

initial_pop <- function(N, m, values) {
  pop <- list()
  for(i in 1:N) {
    pop[[i]] <- sample(values, size=m, replace=TRUE)
  }
  return(pop)
}
score_all <- function(pop, fitness) {
    score <- rep(NA, length(pop))
    for(i in 1:length(pop)) {
        score[i] <- fitness(pop[[i]])
    }
    return(score)
}
who_survives <- function(pop, ranking, min) {
    survival <- rep(NA, length(pop))
    range <- max(ranking)-min(ranking)
    for(i in 1:length(pop)) {
        p <- (ranking[i]-min(ranking))/range
        survival[i] <- sample(c(!min, min),
                              size=1, prob=c(p,1-p))
    }
    return(survival)
}

next_gen <- function(pop, survival, mutation) {
    parents <- pop[survival]
    for(i in 1:N) {
        if(!survival[i]){
            j <- sample.int(length(parents), size=1)
            pop[[i]] <- parents[[j]] # cloning
            pop[[i]] <- mutation(pop[[i]]) # mutation
        }
    }
    return(pop)
}
N <- 1000
num_generations <- 300
best_score <- rep(NA, num_generations)
pop <- initial_pop(N, m, values)
for(i in 1:num_generations) {
    score <- score_all(pop, fitness)
    best_score[i] <- max(score)
    ranking <- rank(score, ties.method = "random")
    survival <- who_survives(pop, ranking, min=TRUE)
    pop <- next_gen(pop, survival, mutation)
}

We need to evaluate the score for the last population, and get the individuals with score equal to zero. There may be more that one copy of each solution, so we use the unique() function to see each one only once

unique(pop[score_all(pop, fitness)==0])

## [[1]]
## [1]  9 12 15
## 
## [[2]]
## [1]  6  8 10
## 
## [[3]]
## [1]  8  6 10
## 
## [[4]]
## [1]  5 12 13
## 
## [[5]]
## [1] 12  9 15

You can verify that all these vectors are solution to the Pythagoras’ formula.

Computational thinking

You will program your own version of some standard functions using only for(), if() and indices. All the following functions receive a vector.

Please write your own version of the following functions:

vector_min(x), equivalent to min(x). Returns the smallest element in x.
```
vector_min <- function(x) {
    # write here
}
```
This question was asked in quiz 4 and again in the exercises that the professor give to practice. And the professor insisted several times that we should do the exercises. It is probably a good idea to do all the exercises.

How do you find the smallest item of a collection? You know how to do it, you do every day. Now you have to say it explicitly.

You need to keep in memory only one item: the tentative answer. You start with the first item of x
```
ans <- x[1]
```
Some people had another idea and they start with a very big number
```
ans <- Inf
```
Then you check all the values in x. If you see a smaller value, you update ans.

This is a very easy exercise that children can solve.
```
vector_min <- function(x) {
    ans <- x[1]
    for(i in 1:length(x)) {
        if(ans>x[i]) {
            ans <- x[i]
        }
    }
    return(ans)
}
```
vector_which_min(x), equivalent to which.min(x). Returns the index of the smallest element in x.
```
vector_which_min <- function(x) {
    # write here
}
```
This is a little more complex than the previous one, because you need to keep in memory two things: the smallest value so far, and its index. The easiest way to solve this is to start with the previous answer and add the code to update idx.

The real difficulty of this problem is to understand that the answer must be an index, not a value.
```
vector_which_min <- function(x) {
    ans <- x[1]
    idx <- 1
    for(i in 1:length(x)) {
        if(ans>x[i]) {
            ans <- x[i]
            idx <- i
        }
    }
    return(idx)
}
```
vector_cumsum(x), equivalent to cumsum(x). Returns a vector of the same length of x with the cumulative sum of x.
```
vector_cumsum <- function(x) {
    # write here
}
```
This question was also in the exercises, so hard working students should have seen it before. Also, it was exactly what we did every class before the midterm.

Like then, we have to prepare an “empty” answer vector ans, take the initial value, and then use a loop to see all values of x. Each “today’s” value on ans will be “yesterday’s” value plus the new value we get “today”, like we do the balance of a bank account.
```
vector_cumsum <- function(x) {
    ans <- rep(NA, length(x))
    ans[1] <- x[1]
    for(i in 2:length(x)) {
        ans[i] <- ans[i-1] + x[i]
    }
    return(ans)
}
```
vector_diff(x), equivalent to diff(). Returns a vector one element shorter than x with the difference between consecutive elements of x.
```
vector_diff <- function(x) {
    # write here
}
```
This is kind of the opposite of the previous question. Here we know the daily bank balance, and we want to know how much we expended or received.

Now the ans vector is one element shorter than x, so we create an empty vector of size length(x)-1. We must be careful to look only inside x and never use index less than 1 or greater than length(x). And the sign must be correct
```
vector_diff <- function(x) {
    ans <- rep(NA, length(x)-1)
    for(i in 1:length(ans)) {
        ans[i] <- x[i+1] - x[i]
    }
    return(ans)
}
```

You can test your functions with the following code

x <- sample(5:20, size=10)
x

##  [1] 13 10 15  5  9  7  8 18 11 17

min(x)

## [1] 5

vector_min(x)

## [1] 5

which.min(x)

## [1] 4

vector_which_min(x)

## [1] 4

cumsum(x)

##  [1]  13  23  38  43  52  59  67  85  96 113

vector_cumsum(x)

##  [1]  13  23  38  43  52  59  67  85  96 113

diff(x)

## [1]  -3   5 -10   4  -2   1  10  -7   6

vector_diff(x)

## [1]  -3   5 -10   4  -2   1  10  -7   6

Final Exam–with comments

Computing in Molecular Biology and Genetics II. Istanbul University.

Comments by Andrés Aravena, Ph.D.

May 30, 2018

Random process

Genetic Algorithms

Computational thinking