# Systems Biology

## Molecular Evolution

Mutation rate is not proportional to time

Multiple substitutions of the same base cannot be observed

GLMTVMNHMSMVDDPLVWATLPYKLFTSLDNIRWSLGAHNICFQNKFLANFFSLGQVLST
GVLVVPNHRSTLDDPLMWGVLPWSMLLRPRLMRWSLGAAELCFTNAVTSSMSSLAQVLAT
GVLVVPNHRSTLDDPLMWGTLPWSMLLRPRLMRWSLGAAELCFTNPVTSMMSSLAQVLAT
GLITVSNHQSCMDDPHLWGILKLRHIWNLKLMRWTPAAADICFTKELHSHFFSLGKCVPV

So we underestimate the divergence time

Blast hits for Taz1 (Saccharomyces cerevisiae, QHB12384.1) in RefSeq select proteins

## probability of mutation

We know that $ℙ(A,B)=ℙ(A)⋅ℙ(B|A)$ Therefore $ℙ(B|A)=\frac{ℙ(A,B)}{ℙ(A)}$

Here $$A$$ is “initial amino acid is Valine”
$$B$$ is “new amino acid is Leucine”

(or any other combination of amino acids)

## Estimating short-term probabilities

By comparing highly-similar sequences, Margaret Dayhoff determined the frequencies of mutation for each pair of amino-acids in the short term.

This is a matrix, called PAM1 (“Point Accepted Mutations”), representing

$ℙ(A\text{ at time }t, B\text{ at time }t+1)$

We can write it as a matrix $P_1 (A,B) = ℙ(A\text{ at time }t, B\text{ at time }t+1)$

Dayhoff, Mo, and Rm Schwartz. “A Model of Evolutionary Change in Proteins.”. In Atlas of Protein Sequence and Structure. Washington, DC: National Biomedical Research Foundation, 1978. https://doi.org/10.1.1.145.4315.

## Calculating long-term evolution

Let’s make the matrix of conditional probabilities \begin{aligned} M_1(A,B)=&ℙ( B\text{ at time }t+1|A\text{ at time }t)\\ =& \frac{ℙ(A\text{ at time }t, B\text{ at time }t+1)}{ℙ(A\text{ at time }t)} \end{aligned}

We can build this matrix if we know $$ℙ(A\text{ at time }t)$$

We can find that probability by counting the frequency of each amino acid.