Blog of Andrés Aravena
CMB2:

# Exercises for Computing in Molecular Biology

18 February 2016

These exercises are not graded but help you to understand the subject and learn effectively. They can be solved individually or in group. Remember that learning is a team activity.

1. Structured documents

2. Structured data: Entity-Relationship models and GEO.

3. Finding Structures: Clustering

4. Reading structured genomic data. Also the last part of clustering. Slides, Homework

5. Statistical properties of genomic sequences Slides, Homework

6. Genetic code. Genes and protein sequences Slides

7. ORF, CDS, Transcription Factor Binding Sites Slides

8. Identify and write down the new words or concepts presented in each classes and Google them. Learn them.

9. Describe with your own words what are Surjective and Injective relationships.

10. Identify the entities and relationships on the following cases. Draw Entitiy-Relationship (E-R) diagrams for each of them

• You as student of Istanbul University
• Human cells
• Streets of Istanbul
11. Look for the concept of Desired difficulty at Bjork “Learning & Forgetting” Lab webpage

12. Practice the usage of GEOquery. Read the series GSE3541

• use RStudio Environment pane to examine the components of the resulting object
• which are the samples?
• which is the platform?
• which are the genes?
13. If the mean is a single value to summarize a vector, how can we choose 2 values to have a better model? Or 3? Or any k number of models?

• Write a function that takes a list of sequences of genes and counts the number of times each codon is used
• What should be the output of this function?
• Write the definition of CAI
• What is it useful for?
• Describe how to calculate it for each gene
• Write an R function that takes a list of sequences of genes and calculates CAI for each of them.
• Write an RMarkdown document to report the genes with atypical CAI on E.coli

Hint: The function syncodons() may be useful

1. How do we determine the function of a CDS?
• What is homology
• What is orthology
• What is optimal pairwise sequence alignment
• In the global case
• In the local case
• In the semiglobal case

Hint: Needleman–Wunsch, Smith–Waterman