# Bioinformatics

## Genomics and DNA analysis

17 September 2019

The main subject is “metagenomics”. We will learn how to handle the output of DNA sequencing machines, how to assemble the chromosome, how to find genes and how to determine the probable function of the proteins they encode. If time allows, we will also study phylogenetic trees and microarray analysis.

Classes are held each week on Tuesdays 13:00–17:00 (at the Physics Dept. Computer Lab), and on Fridays at 14:00-17:00 (at Astronomy Dept. Computer Lab). Most of the practice is done on Linux servers, so you may be interested in other courses with that subject.

# Classes

Here you find the slides that have been used in classes. Notice that usually they are not published immediately, so you better take good notes. We recommend taking notes with pen and paper using the Cornell Method.

• Searching patterns in text (Sept 20)

• Computational cost
• Sets and Logic
• relations
• distance, dissimilarity
• Edit Distance. Searching with mismatches (Sept 24)

• Probabilities. Join Probability, Bayes theorem.
• Needleman–Wunsch method for global alignment Searching with gaps (Oct 4)

• Global versus semi-global alignment. Some gaps are not bad
• Bio+Tech+noise
• Application of Bayes theorem, solution of homework using a tree diagram
• log odds–ratio
• how to assign a score to each substitution
• Class 1: Why do we care about Bioinformatics?. (Sep 17, 2019). A personal perspective on Metagenomics and Bioinformatics [Slides].

• Class 7: Scores and probabilities. (Oct 11, 2019). Statistics [Slides].

• Class 8: Understanding BLAST. (Oct 14, 2019). Using NCBI website [Slides].

• Class 12: DNA sequencing and assembly. (Nov 19, 2019). How can we know the DNA sequence? [Document].

• Class 13: Assembly Workshop. (Nov 22, 2019). Based on the extension activities of project Nucleo Milenio P01-005 “Information and Randomness” at Universidad de Chile. Original date October 14, 2005 [Document].

• Class 15: SAM, BAM & BWA. (Nov 26, 2019). Also, Summary of homework Answers [Slides].

• Class 17: Primer Design. (Dec 3, 2019). How to calculate Melting Temperature [Slides].

• Class 19: NCBI Entrez. (Dec 10, 2019). Using NCBI website [Slides].

• Class 21: Motif Finding and Identification. (Dec 17, 2019). Finding Motifs and Taxonomy Identification without alignment. [Slides].

• Class 22: Alignment free methods. (Dec 20, 2019). Finding Motifs and Taxonomy Identification without alignment. [Slides].

• Example of Position Specific Scoring Matrix on Google Sheets.

# Homework

• Homework 4 (Practical)
Preparation for final exam.
• Homework 3
Preparation for the midterm exam.
• Homework 2 (Deadline: Friday 4 of October at 14:00).
We will explore some methods to find which parts of a text are similar to a pattern. For instance, the text can be a genome, and the pattern can be a gene or a motif, but the same ideas apply to any text and any fixed pattern.
• Homework 1 (Deadline: Tuesday 24 of September at 13:00).
Write a function to find the location of a word in a large text.

# Bibliography

These are some of the papers we want to read and understand during this semester. The most important ones are marked in bold face. Start by reading those

If you find that the web link is wrong, or you find the missing URLs, please let me know.

## Protein Clusters

