In the last two classes we learned to build phylogenetic trees based on multiple sequence alignments. However, the examples shown are closely-related sequences so it is not easy to see what happens in real life with sequences at larger distances. To understand this better, we will align sequences with larger mutation rates.
For this exercise we will use simulated data, where we know the original relationships. Basically, the first sequence (A) is the same in all cases, sequences B and C are variations of A, sequences D and E are variations of B, F and G are variations of C, and the sequence H is a variation of D. The mutation rates are different on each file.
Please build phylogenetic trees for each of the following files
Your answer must include
- The multiple alignment, in FASTA or Clustal format,
- the phylogenetic tree (not the guide tree) in Newick format.
- a graphical representation of each tree
Notice that you may not recover the original tree in all cases. It is a good idea to try with different tools, such as MEGA, MUSCLE, Clustal, T-coffee, or any other similar tool. In the same way, you should find a suitable tool to transform the Newick format into a visual representation.
This is not a hard exercise, because all sequences have the same length and there are no gaps, either internal or external. We may try that later.