Class 16: Mapping Reads to Reference

Bioinformatics

Andrés Aravena

December 8, 2022

What is the question?

We have reads that are close related to a reference genome

For example, RNA messengers

Or DNA from a new individual from the same species

Or DNA from a species on the same genus

Or reads that we want to map to an assembled genome

Why we want to do this?

The answer depends on the case

  • With RNA messengers we measure gene expression
  • With DNA from a new individual, we identify polymorphisms
    • They may explain some genetic traits
  • For a new species of the same genus, this can be used instead of de novo assembly
  • For a genome assembled with De Bruijn graphs, we can find the contigs of each read

What tool can we use align reads to a genome?

There are several tool for the same goal

Two tools that are popular today are:

  • bwa: Burrows-Wheeler Alignment Tool
  • bowtie and bowtie2
  • hisat and hisat2

They have a similar philosophy

Can you find others?

Using bwa

First, it makes an index of the genome

bwa index ref.fa

Then it aligns the reads to the genome

bwa mem ref.fa reads.fq > aln-se.sam

We use the extension .sam. These are large text files

Sometimes .sam files are encoded in smaller .bam binary format files