- More than two sequences at the same time
- semi-global alignment
- Instead of filling a bi-dimensional matrix, we should fill a n-dimensional array
- Cost \(L^n\) in memory and time
- \(n\) sequences of length \(L\)

- Impossible for any practical case

December 11, 2018

- More than two sequences at the same time
- semi-global alignment
- Instead of filling a bi-dimensional matrix, we should fill a n-dimensional array
- Cost \(L^n\) in memory and time
- \(n\) sequences of length \(L\)

- Impossible for any practical case

Nasrettin Hoca had lost his ring in the living room. He searched for it for a while, but since he could not find it, he went out into the yard and began to look there. His wife, who saw what he was doing, asked: “Hocam, you lost your ring in the room, why are you looking for it in the yard?”

Nasrettin Hoca stroked his beard and said: “The room is too dark and I can’t see very well. I came out to the courtyard to look for my ring because there is much more light out here.”

Algorithm: give a precise answer to a question Heuristics: Give a fast and approximate answer

Heuristic is an algorithm for a simpler question, that is related to the original one

We hope that the approximate answer will be close to the real one

- The local alignment algorithm is
*Smith-Waterman*- “filling the matrix with positive numbers and finding diagonals”

- BLAST and other
*index-based*methods are*heuristics*that solve a simpler problem - BLAST may miss some alignments: false negatives

- Build a “guide tree” to organize the alignment
- The tree is built based on the
*edit distance*between all sequences- i.e. we need to calculate \(\approx n^2\) distances
- this is a \(n\times n\) matrix

- Find the “closest neighbors” and “join” them
- Record which sequences are being joined

- Create a new matrix where the two rows (and columns) are replaced by a single one
- new matrix is \((n-1)\times (n-1)\)

- Repeat \(n-1\) times

Once we got the tree, we use it as a guide for the alignment

- First align the “closest neighbors”, putting gaps if necessary
- Then align other reads to this alignment
- Score of each position is the average of all the scores in that position

- The aligned sequences can be represented by a “frequency matrix”