Blog of Andrés Aravena
Bioinfo:

# Homework 8

28 November 2022. Deadline: Friday, 9 December, 9:00. by Andrés Aravena, Ph.D.

This week we practice drawing trees and doing multiple alignments.

The answer for this homework should be a well structured document, like a paper or a report. It can be written in Markdown or Google Docs. In any case you should send your answer as an attached file, so the answer remains stored in the email server, and cannot be changed after being delivered. In other words, it is not enough to send links, you must send files.

The difference between an attached file and a link is important from the technical and legal perspectives. Be sure to understand it.

# 1. Build a neighbor joining tree of Turkey

The following table shows the distance (in kilometers) between some cities in Turkey, according to Wolfram Alpha.

Istanbul 336 100 351 711
Izmir 257 520 737
Bursa 323 649
Ankara 390

Using these distances build a Neighbor Joining tree. Show every step of the construction:

• The Q matrix of every
• Distance between old nodes and the new node on every step
• The new D matrix of each step
• The partial tree of each step

This step should be done “manually”. You can use a calculator, but not a packaged function or advanced program. We must do manually at least one tree once in our life.

# 2. Align keratin proteins

We want to align (and make a phylogenetic tree) of the following proteins

XP_036351858.1 XP_041361653.1 XP_031771706.1
XP_020892825.1 XP_037745389.1 XP_050186301.1
XP_044574083.1 XP_041361499.1 XP_030371273.1
XP_048565601.1 XP_027050739.1 WP_003121308.1
XP_040571383.1 XP_046913081.1 XP_045135256.1
XP_019233715.1 XP_024370096.1 XP_020325383.1
XP_045899490.1 XP_047326303.1 XP_027525420.1
XP_035144276.1 XP_035778297.1 XP_046802344.1
XP_027981243.1 XP_018564106.1 XP_017722073.1

Use at least two methods in the EBI website (https://www.ebi.ac.uk/Tools/msa) to build a Multiple Sequence alignment, and the corresponding phylogenetic tree. Please let me know which methods you will use.

You will probably need to download the sequences from NCBI’s Protein database. There are several ways of getting the FASTA files. For instance the Batch Entrez page (https://ncbi.nlm.nih.gov/sites/batchentrez). In this case you will need to prepare a text file with one accession number on each line. Do not use Microsoft Windows for this. Word files are not text files. You need to use a text editor, not a word processor. We recommend to use Visual Studio Code, but there are hundreds of alternatives, including Notepad and WordPad that are already included in Microsoft Windows. In MacOS you can use TextEdit.