At this point we have a list of DE genes

Let’s say they are 10% of all genes

Each gene is part of one or more networks

- Enzymes are in metabolic networks
- Transcription factors are in regulatory networks
- and regulated genes also

- Other genes are part of signaling networks

Beyond pathways, it is useful to look at “Gene Ontology”

It has three branches

- Molecular Function
- Cellular Component
- Biological Process

To find the relevant networks, we use *Enrichment Analysis*

For example, let’s say we have 10000 genes in total, and 1000 DE genes

Let’s say that carbon metabolism takes 3000 genes in total

If, among the DE genes, 300 genes are in carbon metabolism, we will not be surprised

3K carbon metabolism in 10K total genes is 30%

If we take 1K random genes, we expect to have 30% of them in carbon metabolism, just by randomness

If instead we have 600 carbon metabolism among the 1K DE genes, we are surprised

In that case we say that *the set of carbon metabolism is enriched* in the DE experiment

This problems follows an *hypergeometric* distribution

\[ℙ(k \text{ observed successes} | N,K,n)\]

- \(N\) is the population size,
- \(K\) is the number of success states in the population
- \(n\) is the number of draws (i.e. quantity drawn in each trial)
- \(k\) is the number of observed successes

\[ℙ(k \text{ diff. expr. genes in group} | N,K,n)\]

- \(N\) is the total number of genes,
- \(K\) is the total number of genes in the group
- \(n\) is the number of DE genes
- \(k\) is the number of DE genes in the group

There are many types of networks

Graphs are the mathematical name of Networks

Let’s talk about what they all have in common

Engineers | Mathematicians |
---|---|

Network | Graph |

Node | Vertex |

Nodes | Vertices |

Links | Edges, Arcs |

It is easy to talk about networks using *family* or *friendship* relationships

Facebook is a Graph

- Each person is a node
- A and B are connected when A and B are
*friends*

Twitter is also a graph

- A is connected to B when A
*follows*B

Notice that Facebook is *symmetrical*

- If A is friend of B, then B is friend of A

but Twitter is different

- A can follow B even if B does not follow A

Twitter is a *directed* graph, Facebook is *undirected*

Directed links are *arcs*, undirected links are *edges*

Nodes and links can have many *attributes*

- Name
- Color
- Length
- Cost
- Class

and others, depending on the case

These values are *given*, they come from reality

Streets of a city can be modeled as edges in a graph

Each edge has a length

If we are pedestrians, the graph is undirected

We can go from Istanbul University to Taksim by different paths

Each path will have a total length

In general there may be many paths from A to B

At least one of them is the shortest one

**The length of the shortest path between two nodes is called distance**

(if the links do not have length attribute, we take 1.

In other words, we count the number of edges)

If the graph is undirected, then everything is symmetrical

\[distance(A,B)=distance(B,A)\]

But if the graph is directed, distances can be *very* different

It is like moving in a car through the city

We can make a graph connecting *papers* and *authors*

An *author* A is connected to a *paper* P if A wrote the paper P

Notice that there are no direct connections between *authors*, neither between *papers*

This is called a *bipartite* graph, and they are very common

Marriages (men, women)

Chemical reactions (compounds, reactions)

Systems, as we teach in CMB

Transcription regulation (genes encoding proteins, proteins regulating genes)

Movies (actors, movies they acted on)

We can build a new graph from a bipartite graph

For example, author A_{1} is connected to author A_{2} if they are coauthors of at least one paper

Choosing a fixed author, let’s say Einstein, we can calculate our distance to him through the coauthor network

This is called *Einstein number*

Some people invented a game. Given any actor A, find the shortest path between A and Kevin Bacon

You win if the length is less than 6

The game is called “Six Degrees of Kevin Bacon”

(later Kevin Bacon created a philanthropic organization with that name. It’s a nice story)

“Six degrees” is a bad name, because *distance* and *degree* are different ideas

The *Degree* of a node is its number of edges

That is, the number of neighbors (friends)