*“Şerefim üzerine söz veririm ki, bu sınav sırasında etik kurallari çiǧnemedim”*

**All answers are strictly personal. Any unethical behavior will be penalized.**

This is serious.

You need to answer only two questions. Choose wisely.

The exact meaning of this phase is this: The total grade is the sum of each question’s grade, divided by 2. If this number is greater than 100, then the grade is 100. If you answer only two questions then the grade is the average of them. If you answer a third one, you can only be better, never worse.

Cin Ali has 26 pairs of socks, each one marked with an English letter from A to Z. Each week he loses one sock in the laundry. After 10 weeks he has only 42 socks, some are paired, some are unpaired. We want to know what is the probability that the next sock he takes from the drawer is an unpaired sock (that is, has no pair).

Let’s not forget this. We are looking for “the probability that the next sock he takes from the drawer is an unpaired sock”. To make the question more easy, we

decomposeit into simpler steps.

The socks will be represented by a vector of character. Each element is a single letter, that can appear one or two times. Initially the `socks`

vector is

`socks <- c(LETTERS, LETTERS)`

In this case Cin Ali starts with 26 pairs of socks, that is, 52 paired socks.

let’s see

`length(socks)`

`## [1] 52`

`table(socks)`

`## socks ## A B C D E F G H I J K L M N O P Q R S T U V W X Y Z ## 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2`

Write a function that randomly chooses a sock. The result is the index of the sock. In other words, it is a random

*number*between 1 and`length(socks)`

`random_sock <- function(socks) { # write your code here }`

since the answer must be a

*random number*, the easiest way is to use`sample.int`

. We choose*one*sock, thus`size=1`

.In one line:

`random_sock <- function(socks) { return(sample.int(length(socks), size=1)) }`

Alternative solution:

`sample(1:length(socks), size=1)`

When you have completed the

`random_sock`

function, the following function will simulate “loosing`n`

socks”`laundry <- function(n, socks) { for(i in 1:n) { lost <- random_sock(socks) socks <- socks[-lost] } return(socks) } length(laundry(10, socks))`

`## [1] 42`

we do nothing here, except verifying that the answer is 42.

Write a function that receives a sock called

`this`

and counts how many socks are like`this`

. In other words, this function must count how many elements of the vector`socks`

are equal to`this`

.`how_many_like_this <- function(this, socks) { # write your code here }`

We see that the function has two inputs:

`this`

(one sock) and`socks`

(a set of several socks). The key to this question is to*understand it*.Step one: “elements of the vector

`socks`

are*equal*to`this`

”. If we write`socks==this`

we get a logic vector, which is TRUE only when an element of`socks`

is equal to`this`

.Step two: we need to count how many TRUE are in the logic vector. As we saw in classes, that is very easy: we use

`sum()`

In one line:

`how_many_like_this <-function(this, socks) { return(sum(socks==this)) }`

If your function is correct, you can test it with the following code,

`how_many_like_this("Q", socks)`

`## [1] 2`

We can also try other values, where the results must be 2 and 0:

`how_many_like_this("A", socks)`

`## [1] 2`

`how_many_like_this("2", socks)`

`## [1] 0`

After 10 weeks, Cin Ali takes a

*random*sock from the drawer. What is the probability that he takes an unpaired sock?`is_single_sock <- function(n, socks) { remaining <- # write here new_sock <- # write here ans <- how_many_like_this(new_sock, remaining)==1 return(ans) }`

(Here the question is not clear. What is

`n`

? We better ask the professor)Ok. Here we put all together. Notice that we do not need to know

*how*does`random_sock`

or`laundry`

work. We only need to know what they*do*.The

`remaining`

socks are the output of the`laundry`

for`n`

weeks.The

`new_sock`

is a`random_sock`

among the`remaining`

.Now we write this in

*R*language:Now we can use this function to measure the frequency of taking an unpaired sock after 10 laundries, using this code:

`n <- 500 m <- sum(replicate(n, is_single_sock(10, socks))) m/n`

`## [1] 0.226`

(your number may be a little different, but similar to this)

*(Bonus: this question is optional. If you answer right, you get more points)*With this result we cannot be sure about the probability of finding an unpaired sock. But you can find an*interval*that contains the correct value with probability 95%. Write the code to get that interval. Your values may be different from these ones.This was the only formula we saw in the last classes, and was exercise 4.2. The formula for the limits of the interval is: \[\frac{m+2}{n + 4}\pm 2{\sqrt{\frac{m+2}{(n + 4)^2}\left(1-\frac{m+2}{n + 4}\right)}}\] You can write directly from class 23:

`sigma <- sqrt( (m+2)/(n+4)^2 * (1-(m+2)/(n+4)) ) p_est <- c((m+2)/(n+4)-2*sigma, (m+2)/(n+4)+2*sigma) p_est`

`## [1] 0.1907887 0.2655605`

If you want a better approximation of the probability, you need a bigger \(n\)

Since Babylonian and Egyptian times (much before Pythagoras), people know that in a right triangle with sides \(a, b\) and \(c\), you always have that \(a^2+b^2=c^2\). For example \(3^2+4^2=5^2.\) Can we find *another* set of three integer numbers that solve \(a^2+b^2=c^2\)?

To be more specific, we are looking for a vector of integers \(v\) such that \[v[1]^2+v[2]^2-v[3]^2=0\] Also, we want the numbers to be greater than 4 and less than 20, so \(4<v[i]<20\) and \(v[i]\in \mathbb N\) for all \(i\).

As we saw in classes, we can solve this problem using genetic algorithms. Using the framework we saw in classes, we can build a population of vectors and make them evolve to minimize a *fitness* function. For that we need to define the size of the vectors, the range of acceptable values, the fitness function and a function to randomly mutate a vector.

This is nearly the same as class 23 and quiz 5. We can use the same ideas here

What is the value of

`m`

(size of vectors)?`m <- # write here`

Easy. We have 3 variables: \(a, b, c\), also called \(v[1], v[2], v[3]\)

`m <- 3`

What are the possible

`values`

?`values <- # write here`

Easy. The question says “we want the numbers to be greater than 4 and less than 20”. Therefore 5 to 19 would be fine.

`values <- 5:19`

Propose a good

`fitness`

function`fitness <- function(v) { # write here }`

This has two parts. The easy one is to put the formula we want to make equal to zero inside

`abs()`

.That is:

`abs(v[1]^2+v[2]^2-v[3]^2)`

The second part is to put a high value when we are outside the valid range. We

`fitness <- function(v) { bad1 <- (v[1]<5) | (v[1]>19) bad2 <- (v[2]<5) | (v[2]>19) bad3 <- (v[3]<5) | (v[3]>19) bad <- bad1 | bad2 | bad3 return(abs(v[1]^2+v[2]^2-v[3]^2)+1e3*bad) }`

What is a reasonable

`mutation`

function in this case?`mutation <- function(v) { # write here }`

Again, from the class and quiz, we know this has to include two lines. First, choose a random position inside the vector

`v`

. The code is like in the quiz:`k <- sample.int(length(v), size=1)`

Then you have to update

`v[k]`

. The safest way is to add a small value, positive or negative. Something in the range`-3:3`

seem reasonable, but other similar values are also valid.`mutation <- function(v) { k <- sample.int(length(v), size=1) v[k] <- v[k] + sample(-3:3, size=1) return(v) }`

We are not doing the optimization here, because it can take too much time. You can try later at home if you like. Now you can test part 2.3 using this code

`fitness(c(6,8,10))`

`## [1] 0`

`fitness(c(7,9,11))`

`## [1] 9`

`mutation(c(7,9,11))`

`## [1] 7 12 11`

Well, now we have time to test it. Let’s do it using the same code as in Class 23 and quiz 5.

`initial_pop <- function(N, m, values) { pop <- list() for(i in 1:N) { pop[[i]] <- sample(values, size=m, replace=TRUE) } return(pop) } score_all <- function(pop, fitness) { score <- rep(NA, length(pop)) for(i in 1:length(pop)) { score[i] <- fitness(pop[[i]]) } return(score) } who_survives <- function(pop, ranking, min) { survival <- rep(NA, length(pop)) range <- max(ranking)-min(ranking) for(i in 1:length(pop)) { p <- (ranking[i]-min(ranking))/range survival[i] <- sample(c(!min, min), size=1, prob=c(p,1-p)) } return(survival) } next_gen <- function(pop, survival, mutation) { parents <- pop[survival] for(i in 1:N) { if(!survival[i]){ j <- sample.int(length(parents), size=1) pop[[i]] <- parents[[j]] # cloning pop[[i]] <- mutation(pop[[i]]) # mutation } } return(pop) } N <- 1000 num_generations <- 300 best_score <- rep(NA, num_generations) pop <- initial_pop(N, m, values) for(i in 1:num_generations) { score <- score_all(pop, fitness) best_score[i] <- max(score) ranking <- rank(score, ties.method = "random") survival <- who_survives(pop, ranking, min=TRUE) pop <- next_gen(pop, survival, mutation) }`

We need to evaluate the score for the last population, and get the individuals with score equal to zero. There may be more that one copy of each solution, so we use the

`unique()`

function to see each one only once`unique(pop[score_all(pop, fitness)==0])`

`## [[1]] ## [1] 9 12 15 ## ## [[2]] ## [1] 6 8 10 ## ## [[3]] ## [1] 8 6 10 ## ## [[4]] ## [1] 5 12 13 ## ## [[5]] ## [1] 12 9 15`

You can verify that all these vectors are solution to the Pythagoras’ formula.

You will program your own version of some standard functions using only `for()`

, `if()`

and indices. All the following functions receive a vector.

Please write your own version of the following functions:

`vector_min(x)`

, equivalent to`min(x)`

. Returns the smallest element in`x`

.`vector_min <- function(x) { # write here }`

This question was asked in quiz 4 and again in the exercises that the professor give to practice. And the professor insisted several times that we should do the exercises. It is probably a good idea to do all the exercises.

How do you find the smallest item of a collection? You know how to do it, you do every day. Now you have to say it explicitly.

You need to keep in memory only one item: the tentative answer. You start with the first item of

`x`

`ans <- x[1]`

Some people had another idea and they start with a very big number

`ans <- Inf`

Then you check all the values in

`x`

. If you see a smaller value, you update`ans`

.This is a

*very easy*exercise that children can solve.`vector_min <- function(x) { ans <- x[1] for(i in 1:length(x)) { if(ans>x[i]) { ans <- x[i] } } return(ans) }`

`vector_which_min(x)`

, equivalent to`which.min(x)`

. Returns the index of the smallest element in`x`

.`vector_which_min <- function(x) { # write here }`

This is a little more complex than the previous one, because you need to keep in memory two things: the smallest value so far, and its index. The easiest way to solve this is to start with the previous answer and add the code to update

`idx`

.The real difficulty of this problem is to understand that the answer must be an index, not a value.

`vector_which_min <- function(x) { ans <- x[1] idx <- 1 for(i in 1:length(x)) { if(ans>x[i]) { ans <- x[i] idx <- i } } return(idx) }`

`vector_cumsum(x)`

, equivalent to`cumsum(x)`

. Returns a vector of the same length of`x`

with the cumulative sum of`x`

.`vector_cumsum <- function(x) { # write here }`

This question was also in the exercises, so hard working students should have seen it before. Also, it was exactly what we did

*every class*before the midterm.Like then, we have to prepare an “empty” answer vector

`ans`

, take the initial value, and then use a loop to see all values of`x`

. Each*“today’s”*value on`ans`

will be*“yesterday’s”*value plus the new value we get*“today”*, like we do the balance of a bank account.`vector_cumsum <- function(x) { ans <- rep(NA, length(x)) ans[1] <- x[1] for(i in 2:length(x)) { ans[i] <- ans[i-1] + x[i] } return(ans) }`

`vector_diff(x)`

, equivalent to`diff()`

. Returns a vector one element shorter than`x`

with the difference between consecutive elements of`x`

.`vector_diff <- function(x) { # write here }`

This is kind of the opposite of the previous question. Here we know the daily bank balance, and we want to know how much we expended or received.

Now the

`ans`

vector is one element shorter than`x`

, so we create an empty vector of size`length(x)-1`

. We must be careful to look only inside`x`

and never use index less than 1 or greater than`length(x)`

. And the sign must be correct`vector_diff <- function(x) { ans <- rep(NA, length(x)-1) for(i in 1:length(ans)) { ans[i] <- x[i+1] - x[i] } return(ans) }`

You can test your functions with the following code

```
x <- sample(5:20, size=10)
x
```

`## [1] 13 10 15 5 9 7 8 18 11 17`

`min(x)`

`## [1] 5`

`vector_min(x)`

`## [1] 5`

`which.min(x)`

`## [1] 4`

`vector_which_min(x)`

`## [1] 4`

`cumsum(x)`

`## [1] 13 23 38 43 52 59 67 85 96 113`

`vector_cumsum(x)`

`## [1] 13 23 38 43 52 59 67 85 96 113`

`diff(x)`

`## [1] -3 5 -10 4 -2 1 10 -7 6`

`vector_diff(x)`

`## [1] -3 5 -10 4 -2 1 10 -7 6`