Mutation rate is not proportional to time
Multiple substitutions of the same base cannot be observed
GLMTVMNHMSMVDDPLVWATLPYKLFTSLDNIRWSLGAHNICFQNKFLANFFSLGQVLST
GVLVVPNHRSTLDDPLMWGVLPWSMLLRPRLMRWSLGAAELCFTNAVTSSMSSLAQVLAT
GVLVVPNHRSTLDDPLMWGTLPWSMLLRPRLMRWSLGAAELCFTNPVTSMMSSLAQVLAT
GLITVSNHQSCMDDPHLWGILKLRHIWNLKLMRWTPAAADICFTKELHSHFFSLGKCVPV
So we underestimate the divergence time
Blast hits for Taz1 (Saccharomyces cerevisiae, QHB12384.1) in RefSeq select proteins
We know that \[ℙ(A,B)=ℙ(A)⋅ℙ(B|A)\] Therefore \[ℙ(B|A)=\frac{ℙ(A,B)}{ℙ(A)}\]
Here \(A\) is “initial amino acid is
Valine”
\(B\) is “new amino acid is
Leucine”
(or any other combination of amino acids)
By comparing highly-similar sequences, Margaret Dayhoff determined the frequencies of mutation for each pair of amino-acids in the short term.
This is a matrix, called PAM1 (“Point Accepted Mutations”), representing
\[ℙ(A\text{ at time }t, B\text{ at time }t+1)\]
We can write it as a matrix \[P_1 (A,B) = ℙ(A\text{ at time }t, B\text{ at time }t+1)\]
Dayhoff, Mo, and Rm Schwartz. “A Model of Evolutionary Change in Proteins.”. In Atlas of Protein Sequence and Structure. Washington, DC: National Biomedical Research Foundation, 1978. https://doi.org/10.1.1.145.4315.
Let’s make the matrix of conditional probabilities \[ \begin{aligned} M_1(A,B)=&ℙ( B\text{ at time }t+1|A\text{ at time }t)\\ =& \frac{ℙ(A\text{ at time }t, B\text{ at time }t+1)}{ℙ(A\text{ at time }t)} \end{aligned} \]
We can build this matrix if we know \(ℙ(A\text{ at time }t)\)
We can find that probability by counting the frequency of each amino acid.