Blog of Andrés Aravena

Homework 5, Bioinformatics 2022

24 October 2022. Deadline: Friday, 28 October, 9:00. by Andrés Aravena, Ph.D.

This is the last week before midterm exams. Thus, we have exercises for this week and exercises to prepare you for the midterm exam.

This week

  1. (optional but recommended) Read the short story “The Library of Babel” by Jorge Luis Borges. You may find it in your own language, if you find it more comfortably.

For the midterm

Please finish these exercises before the midterm exam.

  1. What is the difference between the Hamming distance and Levenstein distance?

  2. What is the difference between global, semi-global and local alignments?

  3. What distance will you use to compare two sequences of different length? Why?

  4. What kind of alignment will you use to find shared domains between proteins?

  5. In proteins, why some substitution scores for amino acids are positive and others are negative? What is the biological interpretation of positive substitution scores?

  6. What is the name of gap costs with two parts: existence and extension?

  7. How can you get the accession numbers of all proteins sequenced from any strain of Escherichia coli? And from any strain of Bacillus subtilis?

  8. We want to know how many papers have been publish by our department in the last 5 years, and compare it with the rest of Turkey.

    How would you answer that question using NCBI Entrez on PubMed? Please describe a plan to answer this question.

  9. (bonus) Execute the plan of the previous question, and describe the impact of our department respect to the rest of Turkey

  10. Write an entrez query to download the accession numbers for all proteins from Methylobacterium phyllostachyos (NCBI txid582672). To make life easier, we will only consider proteins in RefSeq.

  11. How many protein sequences did you find? Is that a reasonable number? Why?

  12. Take the first 20 protein accession numbers found in the previous query and align them against all proteins Acidithiobacillus albertensis (NCBI txid119978). To make life easier, we will only consider proteins in RefSeq. Get the top 10 results from each protein and report their accession numbers.

  13. (bonus) Take the results from Acidithiobacillus albertensis and align them against all proteins from Methylobacterium phyllostachyos. Do you get the same proteins as before?

Deadline: Friday, 28 October, 9:00.

