Things to do
In this course we teach how to interpret and understand the results of bioinformatic analyses. Most molecular biologists will work in team with (or hire, or be hired by) bioinformatic teams, so even if they do not use the tools, all molecular biologists need to understand what is the meaning of the results. It is important to speak the same language, and be aware of the key aspects that can lead to the experiment’s success or failure.
This year’s slides are different to previous years, in content and organization. Sometimes we use copyrighted material that is ok to show in classes, but not ok for putting on the web. In those cases the slides are not here. We recommend you to take notes during classes, since many important things are written in the whiteboard but not in the slides. We recommend taking notes with pen and paper using the Cornell Method.
You can find the slides and videos of previous years at Bioinfo 2021 and Bioinfo 2022. Here you will find this year’s slides.
- Class 1: Why do we care about Bioinformatics?.
(Oct 5, 2023). [Slides].
What is and what is not Bioinformatics. What will we do here
- Class 2: Taxonomy. (Oct 12, 2023).
How to understand the universe.
- Class 3: Comparing sequences. (Oct 19,
How many generic codes? What are their differences? See also:
- Google Sheet used in class: “class03-bioinfo-2023”..
- Class 4: Global and Local Alignment. (Oct 26,
How to know if (parts of) two sequences are similar. See also:
- Google Sheet used in class: “class04-bioinfo-2023”.
- Needleman, Saul B., and Christian D. Wunsch. “A General Method Applicable to the Search for Similarities in the Amino Acid Sequence of Two Proteins.”. Journal of Molecular Biology 48, no. 3 (1970): 443–53.
- Smith, T. F., and M. S. Waterman. “Identification of Common Molecular Subsequences.”. Journal of Molecular Biology 147, no. 1 (1981): 195–97. https://doi.org/10.1016/0022-2836(81)90087-5.
- Dayhoff, Mo, and Rm Schwartz. “A Model of Evolutionary Change in Proteins.”. In Atlas of Protein Sequence and Structure. Washington, DC: National Biomedical Research Foundation, 1978. https://doi.org/10.1.1.145.4315.
- Henikoff, S, and J G Henikoff. “Amino Acid Substitution Matrices from Protein Blocks.”. Proc Natl Academy Sci 89 (1992). https://doi.org/10.1073/pnas.89.22.10915.
- Class 5: Finding Local Alignments. (Nov 2,
Looking for local matches is different from global ones. We need to use scores. They make more biological sense. See also:
- Google Sheet used in class: “class05-bioinfo-2023”.
- Class 7. Trees representing distance. (Nov 30,
Trees are used to represent phylogenetic relationships
Homework is an integral part of this course, since we want to match theory and practice. Besides, without practice it is easier to forget. All homework should be sent to firstname.lastname@example.org before the deadline to get a grade. Please be careful, otherwise you will get a grade zero.
1 (Deadline: Thursday 19 of October at
Practice creating NCBI Entrez queries.
2 (Deadline: Thursday 19 of October at
Read about genetic codes.
3 (Deadline: Thursday 26 of October at
How would you calculate the Hamming distance between genetic codes?
4 (Deadline: Thursday 2 of November at
Calculate global and semi global Levenstein distances between sequences.
5 (Deadline: Thursday 9 of November at
Find proteins similar to human hemoglobin.
Sequences used in classes
By regulation from the Rectory, students need to attend at least 70% of the classes. If you cannot attend, you must deliver all homework on time. Late submissions will not be accepted.
The attendance book is updated every week and can be seen in Google Sheets.
This course does not require knowledge of coding or programming, but it will always be a strong advantage —in this course and in professional life— to know how to code a program.
You will need:
- A computer with internet access for doing the homework.
- To know how to handle files and folders in the computer, how to copy and move files, and understand the folders’ structure.
- To know the difference between text and binary files, and between text editors, word processors, and integrated development environments.
- Install a text editor —not a word processor. There are many and you can use your favorite one. We recommend Visual Studio Code.
We recommend (but not require):
- Learn how to use the Unix/Linux command line.
- You can install Linux in your computer, either in parallel with Windows, or as a virtual machine.
- Alternatively, you can install Git for Windows and use the bash command line in Windows. This will work for ≈90% of the commands.
- Sometimes it is an advantage to write some small programs. It is good to know a little bit of R or Python. We recommend using RStudio and Jupyter Notebooks.
We follow partially the plan proposed by Sayres (2018)1. At the end of the course students should be able to:
- Understand the role of computation and data mining in hypothesis-driven processes within the life sciences
- Understand computational concepts used in bioinformatics
- Know the basic file types used in bioinformatics (FASTA, GBK, GFF, BLAST, FASTQ, SAM)
- Understand tree structures that are used to understand biological entities: phylogeny, taxonomy, ontology. Understand the difference between taxonomy and phylogeny
- Know how to access genomic data on the web
- Access NCBI nucleotide, protein, GEO, SRA databases, Entrez query system, EBI databases.
- Know how to handle the basic file types used in bioinformatics
- How to read them, how to understand them, how to transform one into another.
- Know how to visualize DNA sequences, partial genome assembly results, and protein domains
- Understand the results given by a bioinformatic tool
- know the different types of pairwise alignments (global,
semi-global, local) and when to use each one
- Know the biological hypotheses behind the alignment scores
- Understand the challenges of multiple alignment, how to use them to find SNPs. Know how to build phylogenetic trees.
- Understand how Databases Search works:
- Understand the difference between algorithms and heuristics, the role of indices
- Assigning putative functions to coding genes, using COG and Gene Ontology
- Assigning putative taxonomic identity, using alignment and alignment-free methods
- Understand the main DNA-assembly methodologies: Overlap-layout-consensus and De Bruijn graphs.
- know the different types of pairwise alignments (global, semi-global, local) and when to use each one
- Know how to design PCR primers and understand how to calculate the DNA melting temperature
Online supplementary material
The list of recommended and mandatory papers is in a separate page.
NCBI Videos: Sequences
These videos are complementary to our classes. They cover the same topics with more detail. Please watch them to understand better this course.
- NCBI Minute: A Beginner’s Guide to Genes and Sequences at NCBI (33:44)
- NCBI Minute: How to Quickly Retrieve Sequences from NCBI (23:38)
- NCBI: Download a custom set of records (03:11)
- NCBI: Retrieve Sequences for an Organism (01:36)
- Obtain Genomic Sequence for a gene (02:47)
- Webinar: Accessing 1000 Genomes Data at NCBI (32:15)
- NCBI Minute: Important Changes Coming to the Sequence Databases - GI Numbers (24:26)
NCBI Genome Visualization
NCBI Literature Search
- Webinar: Pubmed for Scientists (45:19)
- NCBI Minute: Tailor Your PubMed Search Experience with My NCBI (07:47)
- NCBI Minute: Keeping Current and Getting Help with NCBI Resources (14:22)
- NCBI Minute: On the NCBI Bookshelf, Textbooks for Free! (19:42)
- NCBI Minute: An Updated PubMed is on its Way! (25:30)
- Need the Full Text Article? (02:03)
- The NCBI Minute: PubMed Commons (12:06)
- NCBI Minute: Finding Genes in PubMed (11:50)
- The NCBI Minute: How You and Your Journal Club Can Contribute Using PubMed Commons (12:48)
- PubMed: Using the Advanced Search Builder (03:12)
- NCBI Minute: Finding Gene, Protein and Chemical Names, Aliases and Synonyms (15:17)
- NCBI Minute: How to Locate and Use Human Genomes and Annotations from the NCBI (09:08)
- Find in This Sequence (02:17)
- Save Search Results in Collections, including Favorites (02:57)
- NCBI Minute: Setting Up Alerts for New Data in My NCBI (07:46)
- NCBI Minute: Automate PubMed Searches & Save Citation Collections with My NCBI (12:55)
- My NCBI (02:30)
- PubMed Advanced Search Builder (02:27)
- PubMed: The Filters Sidebar (02:02)
- Use MeSH to Build a Better PubMed Query (03:03)
- E-Utilities Introduction (03:46)