# Bioinformatics

## Other taxonomies

There are other classifications used in biology

• EC numbers are a taxonomy of enzymatic reactions
• Clusters of Proteins are a taxonomy of proteins

In a taxonomy the relationships are “is-a”

• a primate is a mammal
• a mammal is a vertebrate
• a vertebrate is an animal

In an ontology other relationships are possible. For example

• a toe belongs to a feet
• a feet belongs to a leg

## Gene Ontology

This is probably the most important ontology for molecular biologists

# Pipelines

## Pipelines: putting all together

When we design molecular biology experiments, or when we analyze their results, we need to use several tools in chain

Today we are going to see an example using the NCBI website

## Filter results: only Legumes

Most of times is a good idea to check the Taxonomy database

Each sequence on GenBank is tagged with a taxon id

Using taxid is more precise than using common names

For example, a protein from human can be labeled “95% similar to mouse”

Is that a human or a mouse protein?

• Decide Format
• Decide Content

In this case we only need accession ids

It is essential that your protocol can be replicated

It is a very good idea to save the search strategy in a file

It is also wise to save the output in a text file

Separate by tab or by comma

## It is boring to do it one by one

And takes a lot of time

It is easy to make mistakes

It is hard to replicate

Can we do it automatically?

## E-tools: Entrez Pipelines

ESearch -> ESummary;
ESearch -> EFetch;
EPost -> ESummary;
EPost -> EFetch;
EPost -> ESearch;
EPost -> ESearch -> ESummary;
EPost -> ESearch -> EFetch;
EPost -> ELink -> ESearch -> ESummary;
EPost -> ELink -> ESearch -> EFetch;

## Map of E-tools

For example in R it is called rentrez