Class 8: Enter the matrix

Systems Biology

Andrés Aravena, PhD

November 23, 2023

Molecular Evolution

Mutation rate is not proportional to time

Multiple substitutions of the same base cannot be observed


So we underestimate the divergence time

Blast hits for Taz1 (Saccharomyces cerevisiae, QHB12384.1) in RefSeq select proteins

probability of mutation

We know that \[ℙ(A,B)=ℙ(A)⋅ℙ(B|A)\] Therefore \[ℙ(B|A)=\frac{ℙ(A,B)}{ℙ(A)}\]

Here \(A\) is “initial amino acid is Valine”
\(B\) is “new amino acid is Leucine”

(or any other combination of amino acids)

Estimating short-term probabilities

By comparing highly-similar sequences, Margaret Dayhoff determined the frequencies of mutation for each pair of amino-acids in the short term.

This is a matrix, called PAM1 (“Point Accepted Mutations”), representing

\[ℙ(A\text{ at time }t, B\text{ at time }t+1)\]

We can write it as a matrix \[P_1 (A,B) = ℙ(A\text{ at time }t, B\text{ at time }t+1)\]

Dayhoff, Mo, and Rm Schwartz. “A Model of Evolutionary Change in Proteins.”. In Atlas of Protein Sequence and Structure. Washington, DC: National Biomedical Research Foundation, 1978.

Calculating long-term evolution

Let’s make the matrix of conditional probabilities \[ \begin{aligned} M_1(A,B)=&ℙ( B\text{ at time }t+1|A\text{ at time }t)\\ =& \frac{ℙ(A\text{ at time }t, B\text{ at time }t+1)}{ℙ(A\text{ at time }t)} \end{aligned} \]

We can build this matrix if we know \(ℙ(A\text{ at time }t)\)

We can find that probability by counting the frequency of each amino acid.