Homework 5

03 November 2023. Deadline: Thursday, 9 November, 12:00. by Andrés Aravena, Ph.D.

Literatures suggests that hemoglobin is a highly conserved protein, present in many organisms. We want to test if this is true, by looking directly at the data.

We will use BLASTP to search for homologs of the human hemoglobin subunit beta (accession NP_000509.1) in the database of Reference proteins (refseq_protein). To make things interesting, we will limit the search to specific taxa, using the option “Organism” in the NCBI BLAST webpage.

  1. Search for at least 100 proteins similar to human hemoglobin subunit beta (NP_000509.1) in the taxonomic branch Proteobacteria (taxid:1224). Save the resulting Hit Table (either in text or comma-separated-values format). Use the default scoring matrix and gap costs.

  2. Repeat the previous search using the scoring matrix PAM30 and gap costs (5, 2). Save the resulting Hit Table in a separate file.

  3. Prepare a spreadsheet (using Excel, Google Sheets or a similar tool) containing both Hit Tables, in separate sheets. You may need to use the Import tool or the “Text to Columns” command. Plot e-value versus bit score and describe their relationship.

  4. (Bonus) Repeat the search of question 1 looking for proteins outside the taxonomic branch of animals (taxid:33208). Use the “exclude” checkbox on the side of “Organism” in the NCBI BLAST webpage.

In the first page of the spreadsheet you must write your name and your student number. Please send me the resulting Excel file to the same email address used in the previous homework. If you are using Google Sheets, then use the option “File → ✉︎Email” and choose “Microsoft Excel” format. Do not forget to write your student number in the subject line.

