Exercises for Computing in Molecular Biology

These exercises are not graded but help you to understand the subject and learn effectively. They can be solved individually or in group. Remember that learning is a team activity.

You can always ask your questions (in any language) to the email iu-cmb@googlegroups.com or in the webpage https://groups.google.com/d/forum/iu-cmb. You can also give answers.

Structured documents
Structured data: Entity-Relationship models and GEO.
Finding Structures: Clustering
Reading structured genomic data. Also the last part of clustering. Slides, Homework
Statistical properties of genomic sequences Slides, Homework
Genetic code. Genes and protein sequences Slides
ORF, CDS, Transcription Factor Binding Sites Slides
Identify and write down the new words or concepts presented in each classes and Google them. Learn them.
Describe with your own words what are Surjective and Injective relationships.
Identify the entities and relationships on the following cases. Draw Entitiy-Relationship (E-R) diagrams for each of them
- You as student of Istanbul University
- Human cells
- Streets of Istanbul
- Your class schedule
Look for the concept of Desired difficulty at Bjork “Learning & Forgetting” Lab webpage
Practice the usage of GEOquery. Read the series GSE3541
- use RStudio Environment pane to examine the components of the resulting object
- which are the samples?
- which is the platform?
- do you need to download the platform?
- which are the genes?
If the mean is a single value to summarize a vector, how can we choose 2 values to have a better model? Or 3? Or any k number of models?
Can you guess which question will be asked in the exam about this subjects?

Codon Adaptation Index

deadline: midterm

Write a function that takes a list of sequences of genes and counts the number of times each codon is used
- What should be the output of this function?
Write the definition of CAI
- What is it useful for?
Describe how to calculate it for each gene
Write an R function that takes a list of sequences of genes and calculates CAI for each of them.
Write an RMarkdown document to report the genes with atypical CAI on E.coli

Hint: The function syncodons() may be useful

How do we determine the function of a CDS?

What is homology
What is orthology
What is optimal pairwise sequence alignment
- In the global case
- In the local case
- In the semiglobal case

Hint: Needleman–Wunsch, Smith–Waterman