At this point we have a list of DE genes
Let’s say they are 10% of all genes
Each gene is part of one or more networks
Beyond pathways, it is useful to look at “Gene Ontology”
It has three branches
To find the relevant networks, we use Enrichment Analysis
For example, let’s say we have 10000 genes in total, and 1000 DE genes
Let’s say that carbon metabolism takes 3000 genes in total
If, among the DE genes, 300 genes are in carbon metabolism, we will not be surprised
3K carbon metabolism in 10K total genes is 30%
If we take 1K random genes, we expect to have 30% of them in carbon metabolism, just by randomness
If instead we have 600 carbon metabolism among the 1K DE genes, we are surprised
In that case we say that the set of carbon metabolism is enriched in the DE experiment
This problems follows an hypergeometric distribution
\[ℙ(k \text{ observed successes} | N,K,n)\]
\[ℙ(k \text{ diff. expr. genes in group} | N,K,n)\]
There are many types of networks
Graphs are the mathematical name of Networks
Let’s talk about what they all have in common
Engineers | Mathematicians |
---|---|
Network | Graph |
Node | Vertex |
Nodes | Vertices |
Links | Edges, Arcs |
It is easy to talk about networks using family or friendship relationships
Facebook is a Graph
Twitter is also a graph
Notice that Facebook is symmetrical
but Twitter is different
Twitter is a directed graph, Facebook is undirected
Directed links are arcs, undirected links are edges
Nodes and links can have many attributes
and others, depending on the case
These values are given, they come from reality
Streets of a city can be modeled as edges in a graph
Each edge has a length
If we are pedestrians, the graph is undirected
We can go from Istanbul University to Taksim by different paths
Each path will have a total length
In general there may be many paths from A to B
At least one of them is the shortest one
The length of the shortest path between two nodes is called distance
(if the links do not have length attribute, we take 1.
In other words, we count the number of edges)
If the graph is undirected, then everything is symmetrical
\[distance(A,B)=distance(B,A)\]
But if the graph is directed, distances can be very different
It is like moving in a car through the city
We can make a graph connecting papers and authors
An author A is connected to a paper P if A wrote the paper P
Notice that there are no direct connections between authors, neither between papers
This is called a bipartite graph, and they are very common
Marriages (men, women)
Chemical reactions (compounds, reactions)
Systems, as we teach in CMB
Transcription regulation (genes encoding proteins, proteins regulating genes)
Movies (actors, movies they acted on)
We can build a new graph from a bipartite graph
For example, author A1 is connected to author A2 if they are coauthors of at least one paper
Choosing a fixed author, let’s say Einstein, we can calculate our distance to him through the coauthor network
This is called Einstein number
Some people invented a game. Given any actor A, find the shortest path between A and Kevin Bacon
You win if the length is less than 6
The game is called “Six Degrees of Kevin Bacon”
(later Kevin Bacon created a philanthropic organization with that name. It’s a nice story)
“Six degrees” is a bad name, because distance and degree are different ideas
The Degree of a node is its number of edges
That is, the number of neighbors (friends)