Class 20: Definitions

Bioinformatics

Andrés Aravena

20 December 2021

Chromosome

Chromosome

A single molecule of double-strand DNA

For assembly projects, plasmids are chromosomes

Fragment

Fragment

Physical piece of DNA, part of a chromosome

Shotgun

Shotgun

Process of breaking the chromosomes into many fragments

The breaking points are randomly distributed

Sequencing

Sequencing

Process of transforming DNA fragments into one or more chromatograms, and then into reads

Base calling

Base calling

Process of assigning DNA letters and qualities as defined by chromatograms

Read

Read

Digital representation of a strand of a DNA fragment

Most of times it is shorter than the fragment

Fragment length

Fragment length

Physical length of the fragment, in base pairs

Usually we know it approximately

Read length

Read length

Number of letters in the read with high quality

The low quality begin and end of the read are trimmed (discarded)

Base quality

Base quality

A number representing the confidence that the base-calling process has when determining the bases in the read

It is an integer positive number, usually between 0 and 90

Assembly

Assembly

Process of combining many reads to get one or more contigs that (hopefully) correspond to the real chromosome

Contig

Contig

Set of reads that overlap mutually, and that can be aligned to form a coherent layout

Consensus sequence

Consensus sequence

Sequence defined by the majority of the reads in a contig

The majority rule considers the base quality

Depth

Depth

Number of reads that report a base pair

It is a property of the base pair

Average Depth

Average Depth

Average of base pair depth through all base pairs

It is a property of the sequencing project

Can be calculated before the project

Coverage

Coverage

It can mean two things

  • Depth of coverage

  • Breadth of coverage

Depth of Coverage

a.k.a Coverage Depth

Depth of Coverage

It is the same as average depth

Breadth of Coverage

Breadth of Coverage

Percentage of the base pairs in the genome that has at least depth 1

(Sometimes we increase that threshold to a larger number)