Class 4: NCBI Taxonomy

Bioinformatics

Andrés Aravena

30 October 2020

What is “Taxonomy”

Taxonomy is

  • the practice and science of classification of things or concepts,

    • including the principles that underlie such classification.
  • Originally used only about biological classification,

  • taxonomy has developed to be synonym for classification.

Example of non-biologic taxonomy

Bloom’s taxonomy

  • a set of three hierarchical models used to classify educational learning objectives into levels of complexity and specificity

    • cognitive

    • affective

    • sensory

Another example

Dewey Decimal Classification for libraries

000 – Computer science, information & general works
100 – Philosophy & psychology
200 – Religion
300 – Social sciences
400 – Language
500 – Pure Science
600 – Technology
700 – Arts & recreation
800 – Literature
900 – History & geography

Taxonomy in Biology

  • Classification of organisms
    • Group together organisms sharing “the same” characteristics
    • Initially based on phenotypical characteristics
    • Today it also uses genotypical information
  • Hierarchy of groups
    • Depending on the characteristics, we get different groups
    • More attributes result in more groups

Classification (set theory)

The set \(U\) of all things is separated in several subsets \(C_1,…, C_n\) called equivalence classes \[U=C_1 ∪ C_2 ∪ … ∪ C_n\]

This means that every organism must belong to some equivalence class

Everything is classified

Classification (set theory)

All equivalence classes are disjoint \[C_i ∩ C_j = ∅\quad\text{if }i≠j\]

This means that every organism belongs only to one equivalence class

There is only one classification for each thing

\(x\) is either in \(C_i\) or \(C_j\) but not in both

Hierarchical classification

In a taxonomy each equivalence class is further divided into smaller equivalence classes

  • Some animals are vertebrates
  • Some vertebrates are mammals
  • Some mammals are primates

There is a hierarchy of classes, with different levels

Classes of the same level are disjoint

Classes of different levels can be subsets

Tree representation

Hierarchical classifications are often represented by trees

Trees have root, branches, internal nodes and leaves

Edges (branches) connect nodes

Each node (except the root) has one unique parent node

A node can have several descendants. If a node has no descendants, we call it a leave

Taxonomical Hierarchy in Biology

Originally each hierarchy level (a.k.a. rank) was named

  • domain
  • kingdom
  • phylum
  • class
  • order
  • family
  • genus
  • species

Today there are more intermediate ranks

Binomial nomenclature

(literally “system of two names”)

Each organism is labeled with two words: genus and species

  • Genus describes what it is in general
    • same root as genera used to classify movies
  • Species describes what it is special

This is a good approach for any definition
X is like Y but with Z difference”

Taxonomy is not phylogeny

Taxonomic trees are similar to phylogenetic trees

But “genus” is not “common ancestor”

Each node in a phylogenetic tree is a species

Moreover, an organism has more than one ancestor

NCBI taxonomy

There is no “official” taxonomy

People are still figuring out many cases

NCBI has an taxonomy tree that is often used in practice

This tree does change in time

NCBI taxonomy is a tree

Each node has

  • a unique id called taxid
  • An official scientific name (as Linnaeus designed)
  • Any alternative alias by which the organism can be known
  • The taxid of the parent
  • zero or more descendants

Using NCBI taxid prevents many errors