December 21, 2018

Working with the plasmid

  • SNPs are more interesting in the plasmid
  • for the chromosome, we only care about which reads match and which do not

Indexing and aligning as before

These are two command lines

bwa index pKEC-a3c.fna 

bwa mem -R '@RG\tID:foo\tSM:bar\tLB:library1' \
    pKEC-a3c.fna \
    /home/bioinfo/1-F5-96_S1_L001_R1_001.fastq.gz \
    /home/bioinfo/1-F5-96_S1_L001_R2_001.fastq.gz \
    > plasmid.sam

Then we follow the protocol

samtools fixmate -O bam plasmid.sam plasmid-fixed.bam
samtools sort -O bam -o plasmid-sorted.bam \
    -T /tmp/plasmid plasmid-fixed.bam 

The protocol is missing some steps

samtools faidx pKEC-a3c.fna
samtools dict pKEC-a3c.fna > pKEC-a3c.fna.dict
samtools index plasmid-sorted.bam 

What is inside the indices?

If you are curious (and you should), take a look at the indices

cat  pKEC-a3c.fna.fai 
cat  pKEC-a3c.fna.dict

We continue with the protocol

java -Xmx2g -jar /home/anaraven/GenomeAnalysisTK.jar \
    -T RealignerTargetCreator \
    -R pKEC-a3c.fna \
    -I plasmid-sorted.bam -o plasmid.intervals

Here the name of the output file is critical

We use the intervals for the next step

java -Xmx2g -jar /home/anaraven/GenomeAnalysisTK.jar \
    -T IndelRealigner \
    -R pKEC-a3c.fna \
    -I plasmid-sorted.bam \
    -targetIntervals plasmid.intervals \
    -o plasmid-realigned.bam

Finally, we make the vcf file

bcftools mpileup -Ou -f pKEC-a3c.fna \
    plasmid-realigned.bam | \
    bcftools call -vmO z -o study.vcf

Summary

This was, more or less, what we did on classes

All the relevant information is on the files

  • study.vcf
  • plasmid.sam
  • ecoliK12.sam (or the one corresponding to the chromosome you studied)

Final comments

  • It may be interesting to transform plasmid-fixed.bam into sam format, to work with a cleaner version
  • To understand which genes are involved in polymorphisms, you can use a tool like Artemis from Sanger Institute, and open the GeneBank file
  • Ask yourself questions, play the game, and enjoy it