# Class 8: Enter the matrix

# Systems Biology

## Andrés Aravena, PhD

### November 23, 2023

## Molecular Evolution

Mutation rate is not proportional to time

Multiple substitutions of the same base cannot be observed

```
GLMTVMNHMSMVDDPLVWATLPYKLFTSLDNIRWSLGAHNICFQNKFLANFFSLGQVLST
GVLVVPNHRSTLDDPLMWGVLPWSMLLRPRLMRWSLGAAELCFTNAVTSSMSSLAQVLAT
GVLVVPNHRSTLDDPLMWGTLPWSMLLRPRLMRWSLGAAELCFTNPVTSMMSSLAQVLAT
GLITVSNHQSCMDDPHLWGILKLRHIWNLKLMRWTPAAADICFTKELHSHFFSLGKCVPV
```

So we underestimate the divergence time

Blast hits for Taz1 (*Saccharomyces cerevisiae*, QHB12384.1)
in RefSeq select proteins

## probability of mutation

We know that \[ℙ(A,B)=ℙ(A)⋅ℙ(B|A)\]
Therefore \[ℙ(B|A)=\frac{ℙ(A,B)}{ℙ(A)}\]

Here \(A\) is “initial amino acid is
Valine”

\(B\) is “new amino acid is
Leucine”

(or any other combination of amino acids)

## Estimating short-term probabilities

By comparing highly-similar sequences, Margaret Dayhoff determined
the frequencies of mutation for each pair of amino-acids in the short
term.

This is a matrix, called PAM1 (“Point Accepted Mutations”),
representing

\[ℙ(A\text{ at time }t, B\text{ at time
}t+1)\]

We can write it as a matrix \[P_1 (A,B) =
ℙ(A\text{ at time }t, B\text{ at time }t+1)\]

Dayhoff, Mo, and Rm Schwartz. “A Model of Evolutionary Change in
Proteins.”. In Atlas of Protein Sequence and Structure. Washington, DC:
National Biomedical Research Foundation, 1978.
https://doi.org/10.1.1.145.4315.

## Calculating long-term evolution

Let’s make the matrix of conditional probabilities \[
\begin{aligned}
M_1(A,B)=&ℙ( B\text{ at time }t+1|A\text{ at time }t)\\
=& \frac{ℙ(A\text{ at time }t, B\text{ at time }t+1)}{ℙ(A\text{ at
time }t)}
\end{aligned}
\]

We can build this matrix if we know \(ℙ(A\text{ at time }t)\)

We can find that probability by counting the frequency of each amino
acid.