Blog of Andrés Aravena


07 November 2017

Matrices with a single column are called column-vectors, or simply vectors. They represent the state of the system in any given moment. We can have a graphical visualization of vector, at least when they have dimension 2.

If \(x=[x_1\; x_2]^T\) we can draw it in the cartesian plane, in the point of coordinates \((x_1,x_2).\)

Norm of a vector

The “size” of the vector \(x\) is the distance between the origin and the point \((x_1,x_2).\) This magnitude is called the norm of the vector \(x\) and is written \(\lVert x\rVert.\)

Pythagoras says that c^2=a^2+b^2.

According to the Pythagoras theorem, we have \[\lVert x\rVert^2=x_1^2+x_2^2\] (the square of the hypotenuse is the sum of the squares of the other sides). If we work with more dimensions, we have \[\lVert x\rVert^2=\sum_{k=1}^nx_k^2\]

In particular when the dimension is 1, that is, when the vector is just a number, then the norm of \(x\) is its absolute value: \[\lVert x\rVert=\lvert x\rvert\]

Norm as matrix multiplication

Vectors are one-column matrices, with dimension \(n\times1\). If we transpose a vector \(x\) we get a one-row matrix, with dimension \(1\times n,\) that we can multiply with a one-column vector. We get \[x^T x = \sum_{k=1}^nx_k^2=\lVert x\rVert^2\]

Dot product

This can be extended to two vectors. If \(x\) and \(y\) are vectors of the same dimension, then \[y^T x= \sum_{k=1}^nx_k y_k= x^Ty\] We can see that this result is always a dimension 1x1 matrix, that is, a simple scalar number. This is an important tool, that we call dot product and is defined as \[x\cdot y=x^T y\tag{dot product}\]

It is easy to see that \(x\cdot y = y \cdot x\) and that \(x\cdot x=\lVert x\rVert^2.\)


Let’s call \(\theta\) to the angle between sides a and c on Figure 1. Following the definitions of the trigonometric functions, we have \[\cos(\theta)=\frac{a}{c}=\frac{x_1}{\lVert x\rVert}\\ \sin(\theta)=\frac{b}{c}=\frac{x_2}{\lVert x\rVert} \] Therefore, we can write the vector \(x\) as \[\begin{bmatrix} x_1\\ x_2 \end{bmatrix} = \begin{bmatrix} \lVert x\rVert\cos(\theta)\\ \lVert x\rVert\sin(\theta) \end{bmatrix} \] This way of representing vectors is called polar coordinates.

Now if we calculate the dot product of two vectors \(x\) and \(y\), we get \[\begin{aligned} x\cdot y & = x_1 y_1+ x_2 y_2 \\ & = \lVert x\rVert\cos(\theta_x)\lVert y\rVert\cos(\theta_y)+ \lVert x\rVert\sin(\theta_x)\lVert x\rVert\sin(\theta_y)\\ & =\lVert x\rVert\lVert y\rVert(\cos(\theta_x)\cos(\theta_y)+ \sin(\theta_x)\sin(\theta_y)) \end{aligned}\tag{1} \] Here we must remember three results from trigonometry:

Replacing on equation (1) we conclude that \[x\cdot y = \lVert x\rVert\lVert y\rVert\cos(\theta_x-\theta_y)\]

In other words, the dot product value is the product of the norm of each vector and the cosine of the angle between them.

In particular we recover the previous result for the norm. Since the angle of any vector with itself is 0, the cosine is always 1 and \(x\cdot x=\lVert x\rVert^2.\)

Being Orthogonal (perpendicular)

One important consequence of the geometric interpretation of the dot product is that its value is 0 when the two vectors are in a rect angle. In that case we say that the two vectors are orthogonal. Sometimes we also say perpendicular.

Let’s consider the vectors \(x\) and \(y\) that form an angle \(\theta.\) If the vectors are orthogonal then \(\theta=90^\circ\) and thus \(\cos(\theta)=0,\) making \(x\cdot y=0.\)

If \(x\) is non-zero, then \(\lVert x\rVert>0\), the same applies to \(y\), so if \(x\cdot y=0\) then necessarily \(\cos(\theta)=0.\) Since \(0^\circ\leq\theta\leq180^\circ\), then we conclude that \(\theta=90^\circ.\)

If one of the vectors has norm zero, then it is exactly equal to \([0\;0]^T.\) In that case we cannot say anything about the angle between vectors. To make life easier, we usually say that the vector \([0\;0]^T\) is orthogonal to all other vectors.

In summary \(x\cdot y=0\) if and only if \(x\) is orthogonal to \(y.\)

Making lines

Let \(x\) be a fixed vector. We can multiply it by any real number \(\beta\) and get a new vector \(x \beta.\) It is easy to see that the new vector is parallel to the same one. One way to prove it is to find the cosine of the angle. For that we evaluate \[\cos\theta=\frac{x\cdot x\beta}{\lVert x\rVert\,\lVert x\beta\rVert}\] which results in \(\beta/\lvert\beta\rvert,\) which is either +1 or -1. Therefore the angle between the two vectors is either 0 or 180 degrees. In both cases the vectors are parallel.

Now let’s consider the set of vectors made with \(\beta\) taking values in the real numbers. The set \[\{x\beta : \beta\in\mathbb R\}\] includes the vector \(\vec 0.\) All the elements are parallel, and the set can be represented by a straight line crossing by the origin. Instead of straight line we usually say rect.

What we have to remember is that any vector \(x\) defines a straight line that crosses the points \(\vec 0\) and \(x,\) and that any point in this straight line is equal to \(x\beta\) for some real number \(\beta.\)

The nearest point in the line

Now let’s consider a point \(y\) anywhere. We want to find the nearest point to \(y\) among the points inside the rect defined by \(x.\) In other words we want to find the real number \(\beta\) that minimizes \(\lVert y - x\beta\rVert.\) To simplify, let’s call \(e=y - x\beta.\) Naturally, different \(\beta\) will correspond to different \(e,\) and we want to find the “smallest” one, that is, the one with shortest length.

In the Figure 2 we can see that the shortest \(e\) is the one perpendicular to the horizontal rect, that we call \(e^*.\) Now we want to be sure that the smallest \(e\) is perpendicular to \(x.\)

Figure 2: Different e for different \beta.

We can write any vector \(e\) as the sum of a vector \(e_\parallel,\) in the direction parallel to \(x,\) and a vector \(e_\perp\) in the direction perpendicular to \(x.\) Now \(e=e_\parallel+e_\perp,\) and also \(\lVert e\rVert^2=\lVert e_\parallel\rVert^2+\lVert e_\perp\rVert^2.\) Since \(e\) depends on \(\beta,\) different \(\beta\) will produce different \(e\) and different \(e_\parallel,\) but \(e_\perp\) will always be the same. The key part is that \(e_\perp\) only depends on \(y\) and \(x,\) and does not depend on \(\beta.\) Therefore the best \(e,\) which we call \(e^*,\) will be the one where \(e_\parallel=0\) and thus \(e^*=e_\perp.\)

Now, to find the point in the rect that is nearest to \(y,\) we need to find a value \(\beta^*\) that makes the vector \(e^*=y-x\beta^*\) perpendicular to \(x\). Remembering that perpendicular can be understood from the dot product, we have \[x\cdot e^* = x^T e^*=0\] Replacing \(e^*\) by its definition, we have \[x^T (y-x\beta^*) = x^T y-x^Tx\beta^*=0\] and therefore the best \(\beta\) is a real number such that \[x^T y = x^Tx\beta^*.\] The next time we will see if (or when) that number \(\beta^*\) exists and some applications of this formula.

Originally published at