# Looking for Biological meaning

## Enrichment analysis

At this point we have a list of DE genes

Let’s say they are 10% of all genes

Each gene is part of one or more networks

• Enzymes are in metabolic networks
• Transcription factors are in regulatory networks
• and regulated genes also
• Other genes are part of signaling networks

## Gene Ontology

Beyond pathways, it is useful to look at “Gene Ontology”

It has three branches

• Molecular Function
• Cellular Component
• Biological Process

## Which networks shall we look at?

To find the relevant networks, we use Enrichment Analysis

For example, let’s say we have 10000 genes in total, and 1000 DE genes

Let’s say that carbon metabolism takes 3000 genes in total

If, among the DE genes, 300 genes are in carbon metabolism, we will not be surprised

## Is this random or not?

3K carbon metabolism in 10K total genes is 30%

If we take 1K random genes, we expect to have 30% of them in carbon metabolism, just by randomness

If instead we have 600 carbon metabolism among the 1K DE genes, we are surprised

In that case we say that the set of carbon metabolism is enriched in the DE experiment

## How to do enrichment analysis

This problems follows an hypergeometric distribution

$ℙ(k \text{ observed successes} | N,K,n)$

• $$N$$ is the population size,
• $$K$$ is the number of success states in the population
• $$n$$ is the number of draws (i.e. quantity drawn in each trial)
• $$k$$ is the number of observed successes

## In our terms

$ℙ(k \text{ diff. expr. genes in group} | N,K,n)$

• $$N$$ is the total number of genes,
• $$K$$ is the total number of genes in the group
• $$n$$ is the number of DE genes
• $$k$$ is the number of DE genes in the group

# Networks

## Definitions

There are many types of networks

Graphs are the mathematical name of Networks

Let’s talk about what they all have in common

Engineers Mathematicians
Network Graph
Node Vertex
Nodes Vertices

## Everyday examples

It is easy to talk about networks using family or friendship relationships

• Each person is a node
• A and B are connected when A and B are friends

• A is connected to B when A follows B

## These are two kinds of graphs

• If A is friend of B, then B is friend of A

• A can follow B even if B does not follow A

## Attributes

Nodes and links can have many attributes

• Name
• Color
• Length
• Cost
• Class

and others, depending on the case

These values are given, they come from reality

## Moving through the network

Streets of a city can be modeled as edges in a graph

Each edge has a length

If we are pedestrians, the graph is undirected

We can go from Istanbul University to Taksim by different paths

Each path will have a total length

## Shortest paths

In general there may be many paths from A to B

At least one of them is the shortest one

The length of the shortest path between two nodes is called distance

(if the links do not have length attribute, we take 1.
In other words, we count the number of edges)

## Direction matters

If the graph is undirected, then everything is symmetrical

$distance(A,B)=distance(B,A)$

But if the graph is directed, distances can be very different

It is like moving in a car through the city

## Nodes can be of different class

We can make a graph connecting papers and authors

An author A is connected to a paper P if A wrote the paper P

Notice that there are no direct connections between authors, neither between papers

This is called a bipartite graph, and they are very common

## Examples of bipartite graphs

• Marriages (men, women)

• Chemical reactions (compounds, reactions)

• Systems, as we teach in CMB

• Transcription regulation (genes encoding proteins, proteins regulating genes)

• Movies (actors, movies they acted on)

## We can collapse a bipartite graph

We can build a new graph from a bipartite graph

For example, author A1 is connected to author A2 if they are coauthors of at least one paper

Choosing a fixed author, let’s say Einstein, we can calculate our distance to him through the coauthor network

This is called Einstein number

## Same with movies

Some people invented a game. Given any actor A, find the shortest path between A and Kevin Bacon

You win if the length is less than 6

The game is called “Six Degrees of Kevin Bacon”

(later Kevin Bacon created a philanthropic organization with that name. It’s a nice story)

## WARNING: Wrong name

“Six degrees” is a bad name, because distance and degree are different ideas

The Degree of a node is its number of edges

That is, the number of neighbors (friends)