Class 8: Enter the matrix

Systems Biology

Andrés Aravena, PhD

November 23, 2023

Molecular Evolution

Mutation rate is not proportional to time

Multiple substitutions of the same base cannot be observed

GLMTVMNHMSMVDDPLVWATLPYKLFTSLDNIRWSLGAHNICFQNKFLANFFSLGQVLST
GVLVVPNHRSTLDDPLMWGVLPWSMLLRPRLMRWSLGAAELCFTNAVTSSMSSLAQVLAT 
GVLVVPNHRSTLDDPLMWGTLPWSMLLRPRLMRWSLGAAELCFTNPVTSMMSSLAQVLAT
GLITVSNHQSCMDDPHLWGILKLRHIWNLKLMRWTPAAADICFTKELHSHFFSLGKCVPV

So we underestimate the divergence time

Blast hits for Taz1 (Saccharomyces cerevisiae, QHB12384.1) in RefSeq select proteins

probability of mutation

We know that \[ℙ(A,B)=ℙ(A)⋅ℙ(B|A)\] Therefore \[ℙ(B|A)=\frac{ℙ(A,B)}{ℙ(A)}\]

Here \(A\) is “initial amino acid is Valine”
\(B\) is “new amino acid is Leucine”

(or any other combination of amino acids)

Estimating short-term probabilities

By comparing highly-similar sequences, Margaret Dayhoff determined the frequencies of mutation for each pair of amino-acids in the short term.

This is a matrix, called PAM1 (“Point Accepted Mutations”), representing

\[ℙ(A\text{ at time }t, B\text{ at time }t+1)\]

We can write it as a matrix \[P_1 (A,B) = ℙ(A\text{ at time }t, B\text{ at time }t+1)\]

Dayhoff, Mo, and Rm Schwartz. “A Model of Evolutionary Change in Proteins.”. In Atlas of Protein Sequence and Structure. Washington, DC: National Biomedical Research Foundation, 1978. https://doi.org/10.1.1.145.4315.

Calculating long-term evolution

Let’s make the matrix of conditional probabilities \[ \begin{aligned} M_1(A,B)=&ℙ( B\text{ at time }t+1|A\text{ at time }t)\\ =& \frac{ℙ(A\text{ at time }t, B\text{ at time }t+1)}{ℙ(A\text{ at time }t)} \end{aligned} \]

We can build this matrix if we know \(ℙ(A\text{ at time }t)\)

We can find that probability by counting the frequency of each amino acid.