Blog of Andrés Aravena

Homework 4 (Practical)

06 December 2019

Genome Assembly

Using all reads and SPAdes

  • Reads in server, folder /home/bioinfo/reads/4-F20-96_S2_L001_R?_001.fastq.gz
  • quality control using fastqc
    • put the results in public_html folder, so you can see them on-line
  • clean and trim using some tool like trimmomatic
    • Suggest some alternative
  • quality control again using fastqc. Is it better?
  • assemble using SPAdes
  • visualize using Bandage
  • Calculate N50

Using phrap

  • Convert all reads from fastq format to fasta and qual. It can be done using Python
  • assemble using phrap
  • visualize using consed
  • calculate N50

Using reference sequence

  • Terje suggested that this primer could be similar to NC_025175. Test this hypothesis by aligning all reads to this sequence.
    • Use bowtie, bowtie2, bwa mem and bwa aln
    • all these produce SAM files
    • are all results the same? How can you compare them?
    • Which alignment is “the best”
  • prepare fastq files containing only the reads that align to the plasmid
  • assemble these reads using phrap

Analysis of alignment

  • what is the coverage of each nucleotide of the reference plasmid?
    • are there any missing regions? What genes are there?
    • are there any regions with atypical coverage? What genes are there?
  • Which reads are
    • only in the plasmid
    • only in the chromosome
    • in both plasmid and chromosome

Multiple sequence alignment

We want to recreate some of the phylogenetic trees from the paper

Aas, Jørn A, Bruce J Paster, Lauren N Stokes, Ingar Olsen, and Floyd E Dewhirst. “Defining the Normal Bacterial Flora of the Oral Cavity.” Journal of Clinical Microbiology 43, no. 11 (2005): 5721–32.

This paper has 8 figures. You can replicate any of them. The accessions ids of the sequences used in each figure can be found in the following filesThanks to Reyhan Aydın for doing the manual labor.


You can download all of them at once in a zip file:

Some programs can work directly with accession numbers. Others will need the sequences in FASTA format. You will need to download them. It is better to download only the sequences you need for the figure you are making.

You can use multiple aligners on the web or on the server, such as

Then you can build the phylogenetic tree using one of these tools:

Some of these tools are available in the server. If you need something else, let me know.

Originally published at