Genome Assembly

Using all reads and SPAdes

Reads in rstudio.iu.edu.tr server, folder /home/bioinfo/reads/4-F20-96_S2_L001_R?_001.fastq.gz
quality control using fastqc
- put the results in public_html folder, so you can see them on-line
clean and trim using some tool like trimmomatic
- Suggest some alternative
quality control again using fastqc. Is it better?
assemble using SPAdes
visualize using Bandage
Calculate N50

Using phrap

Convert all reads from fastq format to fasta and qual. It can be done using Python
assemble using phrap
visualize using consed
calculate N50

Using reference sequence

Terje suggested that this primer could be similar to NC_025175. Test this hypothesis by aligning all reads to this sequence.
- Use bowtie, bowtie2, bwa mem and bwa aln
- all these produce SAM files
- are all results the same? How can you compare them?
- Which alignment is “the best”
prepare fastq files containing only the reads that align to the plasmid
assemble these reads using phrap

Analysis of alignment

what is the coverage of each nucleotide of the reference plasmid?
- are there any missing regions? What genes are there?
- are there any regions with atypical coverage? What genes are there?
Which reads are
- only in the plasmid
- only in the chromosome
- in both plasmid and chromosome

Multiple sequence alignment

We want to recreate some of the phylogenetic trees from the paper

Aas, Jørn A, Bruce J Paster, Lauren N Stokes, Ingar Olsen, and Floyd E Dewhirst. “Defining the Normal Bacterial Flora of the Oral Cavity.” Journal of Clinical Microbiology 43, no. 11 (2005): 5721–32. https://doi.org/10.1128/JCM.43.11.5721-5732.2005.

This paper has 8 figures. You can replicate any of them. The accessions ids of the sequences used in each figure can be found in the following filesThanks to Reyhan Aydın for doing the manual labor.

:

You can download all of them at once in a zip file: Aas2005-fig-acc.zip

Some programs can work directly with accession numbers. Others will need the sequences in FASTA format. You will need to download them. It is better to download only the sequences you need for the figure you are making.

You can use multiple aligners on the web or on the server, such as

Then you can build the phylogenetic tree using one of these tools:

Some of these tools are available in the server. If you need something else, let me know.