For nineteen years he had lived as one in a dream:
When he fell, he became unconscious; when he came to, the present was almost intolerable in its richness and sharpness
We, at one glance, can perceive three glasses on a table
Funes, all the leaves and tendrils and fruit that make up a grapevine
He knew by heart the forms of the southern clouds at dawn
and could compare them in his memory with the mottled streaks on a book in Spanish binding he had only seen once
With no effort, he had learned English, French, Portuguese and Latin.
I suspect, however, that he was not very capable of thought.
To think is to forget differences, generalize, make abstractions.
In the teeming word of Funes, there were only details, almost immediate in their presence.
Computers have very good memory, like Funes
An Idea is the essence of an object
We only see shadows
\[3+5 = 5+3\]
\[9+2 = 2+9\] And then \[x + y = y + x\]
Algebra is a higher level of abstractions
Forget differences to find common identity
“New Oxford American Dictionary” defines
- a group of similar objects growing closely together: clusters of grapes.
- a group of people or similar objects positioned or occurring close together: a cluster of antique shops.
- a natural subgroup of a population, used for statistical sampling or analysis.
cellular organism; Eukaryota; Metazoa; Bilateria; Coelomata; Deuterostomia; Chordata; Craniata; Vertebrata; Gnathostomata; Teleostomi; Euteleostomi; Sarcopterygii; Tetrapoda; Amniota; Mammalia; Primates; Hominoidea; Hominidae; Homininae; Homo; H.sapiens; Latinamerican; chilean
Tree of Life
- a group of similar objects growing closely together
Let us put a number to measure similarity
Here \(x\),\(y\),\(z\) are real numbers, positive or negative.
If \(\mathrm{dist}(x,y)=(x-y)^2\) then:
So this is a valid distance
Exercise: prove it
bottom up: joining one by one
How to measure distance between \(x\) and \(C\)?
How to measure distance between cluster \(C_1\) and \(C_2\)?
\[\mathrm{dist}(x, C)=\mathrm{mean} (\mathrm{dist}(x, y): y \in C)\] \[\mathrm{dist}(C_1, C_2)=\mathrm{mean} (\mathrm{dist}(x, y): x \in C_1, y \in C_2)\] Distance between two clusters is the distance between their mass centers
\[\mathrm{dist}(x, C)=\min(\mathrm{dist}(x, y): y \in C)\] \[\mathrm{dist}(C_1, C_2)=\min(\mathrm{dist}(x, y): x \in C_1, y \in C_2)\] Distance between two clusters is the smallest distance between their elements
\[\mathrm{dist}(x, C)=\max(\mathrm{dist}(x, y): y \in C)\] \[\mathrm{dist}(C_1, C_2)=\max(\mathrm{dist}(x, y): x \in C_1, y \in C_2)\] Distance between two clusters is the maximal distance between their elements
se <- getGEO(GEO="GSE3541", destdir = "geo-data")
[1] 1
se <- se[[1]] expr <- exprs(se) pheno <- pData(se) feature <- fData(se)
d <- dist(expr) tree <- hclust(d, method = "complete") plot(tree, labels = FALSE)
If \(x\) and \(y\) are vectors of length \(n\), then \[\mathrm{dist}_2(x,y)=\sqrt{(x_1-y_1)^2+\cdots +(x_n-y_n)^2}\]
Sum of absolute values \[\mathrm{dist}_1(x,y)=\vert x_1-y_1\vert +\cdots +\vert x_n-y_n\vert\] Different geometrical meaning
\[\mathrm{dist}_∞ = max(\vert x_1-y_1\vert ,\ldots,\vert x_n-y_n\vert )\] Only the biggest one matters
\[X = (0,0), Y = (100,1)\] \[\mathrm{dist}_1(X,Y) = 101\] \[\mathrm{dist}_2(X,Y) = 100.005\] \[\mathrm{dist}_\infty(X,Y) = 100\]
\[X = (10,1), Y = (100,1)\] \[\mathrm{dist}_1(X,Y) = 90\] \[\mathrm{dist}_2(X,Y) = 90\] \[\mathrm{dist}_\infty(X,Y) = 90\]
We will start analyzing genomic sequences.
Prepare slides to explain
They are explained in Wikipedia and NCBI website.
