Class 13: Matrix Representation of Graphs

Systems Biology

Andrés Aravena, PhD

December 03, 2021

Graphs

“Vertices connected by edges”

or sometimes

“nodes connected by links”

Matrix representation

Let’s say that the graph has \(n\) nodes

We can build an \(n×n\) matrix \(A\) such that

\[A_{ij} = \begin{cases} 1\quad\text{if }i\text{ is connected to }j\\ 0\quad\text{otherwise} \end{cases}\]

This is called Adjacency Matrix

Example

\[A= \begin{pmatrix} 0 & 1 & 0\\ 1 & 0 & 1\\ 0 & 1 & 0\\ \end{pmatrix} \]

The matrix elements are either 1 or 0

Undirected graph ⟺ symmetric matrix

\[A= \begin{pmatrix} 0 & 1 & 0 & 0\\ 1 & 0 & 1 & 1\\ 0 & 1 & 0 & 1\\ 0 & 1 & 1 & 0\\ \end{pmatrix} \]

Directed graph may be asymmetric

\[A= \begin{pmatrix} 0 & 1 & 0 & 0\\ 0 & 0 & 1 & 0\\ 0 & 0 & 0 & 1\\ 0 & 1 & 0 & 0\\ \end{pmatrix} \]

Symmetric matrix can represent directed graphs

in- and out-degree

It is easy to see that the row sums are the number of out-neighbors \[\text{out–degree}(i) = \sum_j A_{ij}\]

Naturally, the column sums are the in-degrees \[\text{in–degree}(j) = \sum_i A_{ij}\]

Degree

If the matrix is symmetric, then row sums and column sums are the same \[\text{degree}(i) = \sum_j A_{ij}=\sum_j A_{ji}\]

Degree distribution

When we have a graph, we can always make its adjacency matrix

So we can always calculate the degree of each node

The degree distribution is the frequency of each degree

degree(g2)

A B C D 
1 3 2 2

degree.distribution(g2)

[1] 0.00 0.25 0.50 0.25

Degree centrality

One way to determine the “important” node is to find the one with higher degree

These are called hubs

max(degree(g2))

[1] 3

Can we get from here to there?

If \(A_{ij}=1\) then there is a direct way to go from \(i\) to \(j\)

If \(A_{ij}=0\) then we cannot go from \(i\) to \(j\) in one step

If instead we use an intermediate step \(k,\) and we do \(i→k→j,\) then there may be many ways to go from \(i\) to \(j\)

The number of two-step paths between \(i\) and \(j\) is \[\sum_k A_{ik}A_{kj}\]

This is matrix multiplication

We have shown that \(A^2\) represents the number of two-step paths between each pair of nodes

In other words, \((A^2)_{ij}≠0\) if there is at least one two-step path between \(i\) and \(j\)

In the same way, \(A^k\) represents the number of \(k\)-step paths between each pair of nodes

We can calculate the distance between any pair \((i,j)\) by looking the smallest \(k\) such that \((A^k)_{ij}≠0\)

If \((A^n)_{ij}=0\) then there is no path from \(i\) to \(j\)

Distance is another matrix

Notice that \(A^k\) is not the distance

We build the distance matrix \(D\) looking at each of \(A^k\)

There are many efficient methods to calculate the distance between nodes

It is interesting to see the distribution of distances

Most of the graphs we study are too big to draw, so we need tools to understand them without looking at them

Random walks

We will move around the graph choosing random neighbors

Our position is represented by a vector \(𝐮\) with \(n\) elements

\(𝐮_i=1\) if we are at the node \(i\)

Transition matrix

We make a new matrix \(B\) dividing each column by its out–degree \[P_{ij} = \frac{A_{ij}}{\sum_k A_{kj}}\]

Here \(P_{ij}\) will represent the probability that we arrive to \(j\) if we were at \(i\)

Moving around

We do not have any initial preference, so we start on any node with the same probability \[𝐮_i =1/n\qquad\text{ for all }i\]

After one step our position will be \(𝐯_{(1)}=P 𝐮\)

After \(k\) steps our position will be \(𝐯_{(k)}=P^k 𝐮\)

When \(k\) is large enough, then \(𝐯_{(k)}\) stabilizes

Another centrality index

Let’s call \[𝐯_{(∞)}=\lim_{k->∞}P^k 𝐮\]

Each element \(i\) of the vector \(𝐯_{(∞)}\) represents the probability of being on node \(i\) in a random walk

This is called eigen-centrality and is another way to define the importance of each node