Blog of Andrés Aravena

Homework 4, Bioinformatics 2022

17 October 2022. Deadline: Monday, 24 October, 9:00. by Andrés Aravena, Ph.D.

We do not have anything hard this week. Still, we need to keep our mind sharp, and learn some new stuff.

Searching in electronic spreadsheets

I’ve shared with you a Google Sheet for this homework. There are two issues that we want to solve.

In the first worksheet (called Hamming) you can see the code we saw in the last class to calculate the Hamming distance between genetic codes. But it was too complicated to copy-and-paste each genetic code sequence. Instead, we want to tell the computer to do it for us.

You can see that there is a place to write the id of the genetic code. This can be used to search in the second worksheet, called codes, and fill automatically the sequence. The best way to do this is to use the function MATCH() to find the row name corresponding to the genetic code, and then use the function INDEX() to get the genetic code located in the row previously identified.

Your homework is to learn how to use the MATCH() and INDEX() functions in Excel or Google Sheet. Be aware that the names may change if you use Turkish or any other non-English language as your default language

Reading BLOSUM62 substitution scoring matrix

In the last worksheet we have the BLOSUM62 matrix. We want to automatically find the score for the two amino acids written in row and column, in D28 and D29. This can also be solved using MATCH() and INDEX().


When we spoke about “the number of comparisons” someone suggested that the number was 25! (twenty five factorial). That was not the correct answer, but was in the correct direction.

How much is 25 factorial? If you can count one number in one second, how long would it take to count until 25 factorial?

Give your answer in reasonable units. Approximate answers are accepted, as long as they are not too far away from the final result.

Deadline: Monday, 24 October, 9:00.

Originally published at