Class 11: qPCR Normalization

Systems Biology

Andrés Aravena, PhD

November 23, 2021

This class is out of order

It seems a good time to speak about qPCR

Polymerase Chain Reaction

The Polymerase Chain Reaction (PCR) is a method used to synthesize millions of copies of a given DNA sequence.

A typical PCR reaction consists of series of cycles:

template DNA denaturation,
primer annealing, and
extension of the annealed primers by DNA polymerase.

This loop is repeated between 25 and 30 times

What is the question?

Why do you want to use qPCR?

qPCR can answer several questions

Presence of a specific mRNA molecule in the sample
Concentration of that mRNA in the sample
Change in concentration of that mRNA between two samples

(you can replace “mRNA” for other keywords)

Always declare what is the question you want to answer

Questions are more important than answers

How to bake your cake

Use the following ingredients

Template: the thing we want to study
Primers for that template, in high concentration
dNTP, in high concentration
Taq polymerase
Some marker, such as a fluorophore
Salt & pepper (Na++, K+, Mg++)
Cooking machine: thermocycler with sensors

The thermocycler is a computer.
You program it to cook your cake

Thermocyclers are kitchen robots

PCR model

We simplify and we forget about the polymerase and the dNTP

They both will be represented by primers

The system is represented by this diagram:

How does it work

qPCR depends on initial concentration

The curve depends on the initial DNA concentration

Finding the initial concentration

We care only about the exponential phase

The signal increases 2 times on every cycle

\[X(C) = X(0)⋅2^C\]

So we can find the initial concentration

\[X(0) = X(C)⋅2^{-C}\]

CT: cycle where Signal crosses threshold

DNA concentration crosses 50% at 13.73 cycles

Standard curve: CT changes with concentration

Start with a large concentration of template, and dilute it several times. Measure the CT of each dilution

Standard curve gives concentration

If everything is right, we get

\[X(0) = X(CT)⋅2^{-CT}\]

But sometimes we do not know \(X(CT)\)

because the Signal is not 100% DNA concentration

It depends on the fluorophore assimilation

Delta CT

We still can measure change of concentration

Let’s say we extract mRNA before and after a shock

\[\begin{aligned} X_B(0) & = X_B(CT_B)⋅2^{-CT_B}\\ X_A(0) &= X_A(CT_A)⋅2^{-CT_A}\end{aligned}\]

therefore the fold change of expression is

\[\frac{X_B(0)}{X_A(0)} = \frac{X_B(CT_B)⋅2^{-CT_B}}{X_A(CT_A)⋅2^{-CT_A}}\]

Fold change of expression

If we assume that the DNA concentrations are the same when the signal crosses the threshold, i.e.

\[X_B(CT_B)=X_A(CT_A)\] then \[\frac{X_B(0)}{X_A(0)} = 2^{-(CT_B-CT_A)} = 2^{-Δ CT}\]

Here \(Δ CT\) means the change in CT for one gene in two conditions

Technical correction

Why does the CT change?

This change of concentration has two components

The real biological change
The variability of the RNA extraction protocol

To avoid the second component, we use an endogenous reference

(typically, a housekeeping gene)

The reference must be taken together with the target

A single pipet, at the same time

Delta Delta CT

We normalize each sample

\[\begin{aligned} \frac{X_B(0)}{R_B(0)} &= \frac{X_B(CT_{XB})⋅2^{-CT_{XB}}}{R_B(CT_{RB})⋅2^{-CT_{RB}}}= K_B⋅ 2^{-(CT_{XB}-CT_{RB})}\\ \frac{X_A(0)}{R_A(0)} &= \frac{X_A(CT_{XA})⋅2^{-CT_{XA}}}{R_A(CT_{RA})⋅2^{-CT_{RA}}}= K_A⋅ 2^{-(CT_{XA}-CT_{RA})} \end{aligned}\]

\(K_A\) and \(K_B\) are constants that depend on the target and reference genes, and how each Signal changes with concentration

Ratio of relative expressions

\[\frac{X_B(0)}{R_B(0)}÷\frac{X_A(0)}{R_A(0)} = \frac{K_B}{K_A} ⋅ 2^{-(\Delta CT_B-Δ CT_A)}\]

We can assume that \(K_B=K_A,\) because we are comparing the same pair of genes every time

In that case the change in relative expression is

\[\frac{X_B(0)}{R_B(0)}÷\frac{X_A(0)}{R_A(0)} = 2^{-(\Delta CT_B-Δ CT_A)}\]

Log fold change

It is usually a good idea to take logarithms

Using \(\log_2\) we get log fold change, which can be written as

\[\begin{aligned} Δ CT_A - Δ CT_B &= \overbrace{(CT_{BX}-CT_{BR})}^{\text{before}} - \overbrace{(CT_{AX}-CT_{AR})}^{\text{after}}\\ &=\underbrace{(CT_{BX}-CT_{AX})}_{\text{target}} - \underbrace{(CT_{BR}-CT_{AR})}_{\text{reference}} \end{aligned}\]

This works very well with a linear model

We can change the order of deltas

The reordering of deltas is equivalent to \[\frac{X_B(0)}{R_B(0)}÷\frac{X_A(0)}{R_A(0)} = \frac{X_B(0)}{R_B(0)}⋅ \frac{R_A(0)}{X_A(0)} = \frac{X_B(0)}{X_A(0)}÷\frac{R_B(0)}{R_A(0)}\]

In other words, the ratio of normalized values is also the ratio between ratios of change

(I think the formula is more clear than the text)

Primers’ Efficiency

Primers may not be 100% efficient

In that case the standard curve slope changes

Corrected formula

If the primer efficiency is \(E_X,\) the correct formula is

\[\frac{X_B(0)}{X_A(0)} = \frac{X_B(CT_{BX})⋅(1+E_X)^{-CT_{BX}}}{X_A(CT_{AX})⋅(1+E_X)^{-CT_{AX}}} =(1+E_X)^{-(CT_{BX}-CT_{AX})}\]

The efficiencies may be different for the housekeeping gene

\[\frac{R_B(0)}{R_A(0)} = \frac{R_B(CT_{BR})⋅(1+E_R)^{-CT_{BR}}}{R_A(CT_{AR})⋅(1+E_R)^{-CT_{AR}}} =(1+E_R)^{-(CT_{BR}-CT_{AR})}\]

If any efficiency is not 100%

\[\frac{X_B(0)}{X_A(0)}÷\frac{R_B(0)}{R_A(0)}= \frac{(1+E_X)^{-(CT_{BX}-CT_{AX})}}{(1+E_R)^{-(CT_{BR}-CT_{AR})}}\]

The log ratio is \[-F_X(CT_{BX}-CT_{AX})+ F_R(CT_{BR}-CT_{AR})\] where \(F_X=-\log_2 (1+E_X)\) and \(F_R=-\log_2 (1+E_R)\) are the slopes of the corresponding standard curves

Calculating the efficiencies

We know the dilutions, the CT values, and their relationship \[X(0) = X(C)⋅(1+E)^{-C}\] That is \[\text{initial}_i = \text{threshold}⋅(1+E)^{-CT_i}\] Taking logarithms, we have \[\log(\text{initial}_i) = \log(\text{threshold}) + \log(1+E)⋅ -CT_i\] Notice that threshold concentration should be the same for all \(i\)

We care about the slope coefficient

We can find \(E\) using a linear model like \[\log_2(\text{initial}_i) = β_0 + β_1 ⋅ CT_i + e_i\]

Fitting the linear models gives us \[β_1 = -log(1+E)\] and from there we can find the primers efficiency \(E\)

Results

For the first set of primers, we have

model <- lm(log(DNA_ini) ~ CT, CT_X)
intervals <- confint(model)

             2.5 % 97.5 %
(Intercept)  0.288  0.332
CT          -0.694 -0.692

We get intervals for \(\beta_0\) and \(\beta_1.\) The efficiency is

exp(-intervals["CT",])-1

 2.5 % 97.5 % 
 1.000  0.997

In this case the real value is 100%

Second set of primers

For the second set of primers, we have

             2.5 % 97.5 %
(Intercept)  0.108  0.566
CT          -0.606 -0.582

The efficiency is in the interval

 2.5 % 97.5 % 
 0.833  0.790

The real value is 80%

It is recommended to always do the standard curve by triplicate

Choosing the Threshold

Threshold in the machine is not perfect

All our measuring devices have a margin of error.
There may be a small error measuring the signal.
That will affect the resulting CT value

Lower thresholds are more sensitive

When the initial concentration is small, the signal may not reach 50% in 30 or 40 cycles

Then we will use a lower threshold, and pay the price

Margin of Error

Since the curve is nearly flat in the lower signals, a small error in the signal has a large impact in the concentration

Thr	CT	Concentration	Error.ratio
0.05	10.00	1025.1
0.06	10.21	1186.4	0.1573
0.50	13.73	13624.5
0.51	13.78	14054.3	0.0315

Thus, an error between signal 0.05 and 0.06 results in an apparent increase of 16% in concentration. But an error between signal 0.5 and 0.51 results in an error of 3%

References

Livak KJ, Schmittgen TD. “Analysis of relative gene expression data using real-time quantitative PCR and the 2(-Delta Delta C(T)) Method”. Methods. 2001 Dec; 25(4):402-8. doi: 10.1006/meth.2001.1262

Pfaffl, Michael W. “Relative Quantification.” In Real-Time PCR, Published by International University Line (Editor: T. Dorak), 64–82, 2007.