Class 20: Designing primers

Bioinformatics

Andrés Aravena

December 22, 2022

Conditions for primers

What is a good primer

It is a short sequence, 16-24bp
It binds spontaneously to the target
It does not bind easily to other things
It works well with the other primer
It works well with the polymerase

In other words

It has to be thermodynamically stable
It has to be taxonomically specific

These two conditions imply that the sequence must be short, but not too short

Stability

should not form hairpins
should not form homodimers
should not form heterodimers
should be stable in the 3’ end
- so the polymerase can extend

Specificity

should match the target organism only once
- even if there are some mismatches
should not match other organisms
- even if there are some mismatches

Four possible outcomes

When we use a primer to detect a target, 4 things can happen.
The primer either

binds to the target (True Positive)
binds to something else (False Positive)
does not bind to the target, even if the target is there (False Negative)
does not bind to anything, and the target was not there (True Negative)

Evaluation

When we test a possible primer, the four outcomes may happen

We measure the number of each case, using a database of all possible sequences

\(TP\): True Positive number
\(TN\): True Negative number
\(FP\): False Positive number
\(FN\): False negative number

Sensitivity and Specificity

\[\begin{aligned} \text{Sensitivity}&=\frac{TP}{TP+FN}=\frac{\text{Detected}}{\text{All targets}}\\ \text{Specificity}&=\frac{}{TN+FP}=\frac{\text{Not targets}}{\text{Not detected}} \end{aligned}\]

“Say the truth, all the truth, nothing but the truth”

Tools

qPCR

Polymerase Chain Reaction

The Polymerase Chain Reaction (PCR) is a method used to synthesize millions of copies of a given DNA sequence.

A typical PCR reaction consists of series of cycles:

template DNA denaturation,
primer annealing, and
extension of the annealed primers by DNA polymerase.

This loop is repeated between 25 and 30 times

What is the question?

Why do you want to use qPCR?

qPCR can answer several questions

Presence of a specific mRNA molecule in the sample
Concentration of that mRNA in the sample
Change in concentration of that mRNA between two samples

(you can replace “mRNA” for other keywords)

Always declare what is the question you want to answer

Questions are more important than answers

How to bake your cake

Use the following ingredients

Template: the thing we want to study
Primers for that template, in high concentration
dNTP, in high concentration
Taq polymerase
Some marker, such as a fluorophore
Salt & pepper (Na++, K+, Mg++)
Cooking machine: thermocycler with sensors

The thermocycler is a computer.
You program it to cook your cake

Thermocyclers are kitchen robots

PCR model

We simplify and we forget about the polymerase and the dNTP

They both will be represented by primers

The system is represented by this diagram:

How does it work

Long term behavior

Question:

Does final DNA depends on

initial DNA concentration?
initial primer concentration?
PCR reaction rate?

qPCR depends on initial concentration

The curve depends on the initial DNA concentration

Finding the initial concentration

We care only about the exponential phase

The signal increases 2 times on every cycle

\[X(C) = X(0)⋅2^C\]

So we can find the initial concentration

\[X(0) = X(C)⋅2^{-C}\]

CT: cycle where Signal crosses threshold

DNA concentration crosses 50% at 13.73 cycles

Standard curve: CT changes with concentration

Start with a large concentration of template, and dilute it several times. Measure the CT of each dilution

Standard curve gives concentration

If everything is right, we get

\[X(0) = X(CT)⋅2^{-CT}\]

But sometimes we do not know \(X(CT)\)

because the Signal is not 100% DNA concentration

It depends on the fluorophore assimilation

Delta CT

We still can measure change of concentration

Let’s say we extract mRNA before and after a shock

\[\begin{aligned} X_B(0) & = X_B(CT_B)⋅2^{-CT_B}\\ X_A(0) &= X_A(CT_A)⋅2^{-CT_A}\end{aligned}\]

therefore the fold change of expression is

\[\frac{X_B(0)}{X_A(0)} = \frac{X_B(CT_B)⋅2^{-CT_B}}{X_A(CT_A)⋅2^{-CT_A}}\]

Fold change of expression

If we assume that the DNA concentrations are the same when the signal crosses the threshold, i.e.

\[X_B(CT_B)=X_A(CT_A)\] then \[\frac{X_B(0)}{X_A(0)} = 2^{-(CT_B-CT_A)} = 2^{-Δ CT}\]

Here \(Δ CT\) means the change in CT for one gene in two conditions

Technical correction

Why does the CT change?

This change of concentration has two components

The real biological change
The variability of the RNA extraction protocol

To avoid the second component, we use an endogenous reference

(typically, a housekeeping gene)

The reference must be taken together with the target

A single pipet, at the same time

Delta Delta CT

We normalize each sample

\[\begin{aligned} \frac{X_B(0)}{R_B(0)} &= \frac{X_B(CT_{XB})⋅2^{-CT_{XB}}}{R_B(CT_{RB})⋅2^{-CT_{RB}}}= K_B⋅ 2^{-(CT_{XB}-CT_{RB})}\\ \frac{X_A(0)}{R_A(0)} &= \frac{X_A(CT_{XA})⋅2^{-CT_{XA}}}{R_A(CT_{RA})⋅2^{-CT_{RA}}}= K_A⋅ 2^{-(CT_{XA}-CT_{RA})} \end{aligned}\]

\(K_A\) and \(K_B\) are constants that depend on the target and reference genes, and how each Signal changes with concentration

Ratio of relative expressions

\[\frac{X_B(0)}{R_B(0)}÷\frac{X_A(0)}{R_A(0)} = \frac{K_B}{K_A} ⋅ 2^{-(\Delta CT_B-Δ CT_A)}\]

We can assume that \(K_B=K_A,\) because we are comparing the same pair of genes every time

In that case the change in relative expression is

\[\frac{X_B(0)}{R_B(0)}÷\frac{X_A(0)}{R_A(0)} = 2^{-(\Delta CT_B-Δ CT_A)}\]

Log fold change

It is usually a good idea to take logarithms

Using \(\log_2\) we get log fold change, which can be written as

\[\begin{aligned} Δ CT_A - Δ CT_B &= \overbrace{(CT_{BX}-CT_{BR})}^{\text{before}} - \overbrace{(CT_{AX}-CT_{AR})}^{\text{after}}\\ &=\underbrace{(CT_{BX}-CT_{AX})}_{\text{target}} - \underbrace{(CT_{BR}-CT_{AR})}_{\text{reference}} \end{aligned}\]

This works very well with a linear model

We can change the order of deltas

The reordering of deltas is equivalent to \[\frac{X_B(0)}{R_B(0)}÷\frac{X_A(0)}{R_A(0)} = \frac{X_B(0)}{R_B(0)}⋅ \frac{R_A(0)}{X_A(0)} = \frac{X_B(0)}{X_A(0)}÷\frac{R_B(0)}{R_A(0)}\]

In other words, the ratio of normalized values is also the ratio between ratios of change

(I think the formula is more clear than the text)

Primers’ Efficiency

Primers may not be 100% efficient

In that case the standard curve slope changes

Corrected formula

If the primer efficiency is \(E_X,\) the correct formula is

\[\frac{X_B(0)}{X_A(0)} = \frac{X_B(CT_{BX})⋅(1+E_X)^{-CT_{BX}}}{X_A(CT_{AX})⋅(1+E_X)^{-CT_{AX}}} =(1+E_X)^{-(CT_{BX}-CT_{AX})}\]

The efficiencies may be different for the housekeeping gene

\[\frac{R_B(0)}{R_A(0)} = \frac{R_B(CT_{BR})⋅(1+E_R)^{-CT_{BR}}}{R_A(CT_{AR})⋅(1+E_R)^{-CT_{AR}}} =(1+E_R)^{-(CT_{BR}-CT_{AR})}\]

If any efficiency is not 100%

\[\frac{X_B(0)}{X_A(0)}÷\frac{R_B(0)}{R_A(0)}= \frac{(1+E_X)^{-(CT_{BX}-CT_{AX})}}{(1+E_R)^{-(CT_{BR}-CT_{AR})}}\]

The log ratio is \[-F_X(CT_{BX}-CT_{AX})+ F_R(CT_{BR}-CT_{AR})\] where \(F_X=-\log_2 (1+E_X)\) and \(F_R=-\log_2 (1+E_R)\) are the slopes of the corresponding standard curves

Calculating the efficiencies

We know the dilutions, the CT values, and their relationship \[X(0) = X(C)⋅(1+E)^{-C}\] That is \[\text{initial}_i = \text{threshold}⋅(1+E)^{-CT_i}\] Taking logarithms, we have \[\log(\text{initial}_i) = \log(\text{threshold}) + \log(1+E)⋅ -CT_i\] Notice that threshold concentration should be the same for all \(i\)

We care about the slope coefficient

We can find \(E\) using a linear model like \[\log_2(\text{initial}_i) = β_0 + β_1 ⋅ CT_i + e_i\]

Fitting the linear models gives us \[β_1 = -log(1+E)\] and from there we can find the primers efficiency \(E\)

Results

For the first set of primers, we have

model <- lm(log(DNA_ini) ~ CT, CT_X)
intervals <- confint(model)

             2.5 % 97.5 %
(Intercept)  0.288  0.332
CT          -0.694 -0.692

We get intervals for \(\beta_0\) and \(\beta_1.\) The efficiency is

exp(-intervals["CT",])-1

 2.5 % 97.5 %
 1.000  0.997

In this case the real value is 100%

Second set of primers

For the second set of primers, we have

{r class.output="no_shadow"} 2.5 % 97.5 % (Intercept) 0.108 0.566 CT -0.606 -0.582

The efficiency is in the interval

{r class.output="no_shadow"} 2.5 % 97.5 % 0.833 0.790

The real value is 80%

It is recommended to always do the standard curve by triplicate

Choosing the Threshold

Threshold in the machine is not perfect

All our measuring devices have a margin of error.
There may be a small error measuring the signal.
That will affect the resulting CT value

Lower thresholds are more sensitive

When the initial concentration is small, the signal may not reach 50% in 30 or 40 cycles

Then we will use a lower threshold, and pay the price

Margin of Error

Since the curve is nearly flat in the lower signals, a small error in the signal has a large impact in the concentration

Thr	CT	Concentration	Error.ratio
0.05	10.00	1025.1
0.06	10.21	1186.4	0.1573
0.50	13.73	13624.5
0.51	13.78	14054.3	0.0315

Thus, an error between signal 0.05 and 0.06 results in an apparent increase of 16% in concentration. But an error between signal 0.5 and 0.51 results in an error of 3%

References

Livak KJ, Schmittgen TD. “Analysis of relative gene expression data using real-time quantitative PCR and the 2(-Delta Delta C(T)) Method”. Methods. 2001 Dec; 25(4):402-8. doi: 10.1006/meth.2001.1262

Pfaffl, Michael W. “Relative Quantification.” In Real-Time PCR, Published by International University Line (Editor: T. Dorak), 64–82, 2007.