Document

Now, let's go back to our red car in a parking lot picture. What we want to compute is the direction and position of this cloud of pixels. But now that we know about variance and covariance, we can consider that cloud of pixels to be two joint distributions (the distribution of the x-coordinates, and the one of the y-coordinates). Therefore, computing their variances and covariances will tell us how a change in the x-axis distribution affects the y-axis distribution (and vice versa), which is another way to formulate a direction !

So now the problem boils down to: how to compute the direction of our "pixel blob" only from variance and covariance ?

The power of linear algebra

The answer is to "throw" our covariances into a matrix and compute its eigenvectors. If you are not familiar with linear algebra, those words might seem overwhelming; and if you are, you might not see the intuition behind the process. In both cases, the next paragraphs will explain everything.

Matrices & linear maps

A matrix can be written as an array of values. Matrices can be of any shape but the most common and useful ones are square matrices. Here, we will only look at 2x2 matrices.

Now, let's picture a 2D-plane where vectors can be drawn. These vector can be represented as a 2-by-1 array containing their $x$ and $y$ coordinates. But most importantly, you can think of each points of the plane as being the vector going from $(0,0)$ to its coordinates.

Now if you want, you can create a function (or linear map) $f$ that takes as an input a vector $V$ and gives as an output $f(V)$:

Before discussing vectors and matrices, we must introduce first the vector space in which we will work. Here, we pick $\mathbb{R}^2$ meaning that the elements of our vector space are "2D-vectors": $$ \text{let } v \in \mathbb{R}^2, v = \begin{pmatrix} v_1 \\ v_2 \end{pmatrix}, (v_1,v_2) \in \mathbb{R} \times \mathbb{R} $$ Just like vectors, linear maps also have a formal definition: let $E$ and $F$ be two vector space, and $f:E \rightarrow F$ a function. $f$ is a linear map if: \begin{gather} \forall x,y \in \mathbb{E}, \quad \forall \alpha,\beta \in \mathbb{C} \\ f(\alpha x + \beta y) = \alpha f(x) + \beta f(y) \end{gather}

While this definition of a linear map works very well, another way to represent it is by building one of its representative matrices.

Let $E$ be a vector space such that $dim(E) = n_{(n \in \mathbb{N})}$, and let $V = (v_i)_{i \in [1,n]}$ be a basis of $E$. The fact that $V$ is a basis of $E$ means in particular that: $$ \forall v \in E, \exists (\alpha_1, \alpha_2, ..., \alpha_n) \in \mathbb{C}^n, \quad v = \alpha_1 v_1 + ... + \alpha_n v_n $$ Now, that everything is clearly defined, we can introduce the concept of representative matrix. Let $f:E \rightarrow E$ be a linear map. A simpler and more universal way to represent $f$ is by computing one of its representative matrices. I specify here "one of its representative matrices" because every representative matrix is linked with one and only one basis of $E$.

Therefore, the representative matrix $F$ of $f$ in the $V$ basis is defined as followed: $$ F = Mat_V(f) = (f(v_1),...,f(v_2)) $$ Moreover, if we have $F$ and $F'$ two different representative matrices of $f$, the two matrices are linked by the following formula: \begin{gather} F = P_V^{V'} \cdot F' \\ \small{ V,V' \in E^n, \quad F = Mat_V(f), \quad F'=Mat_V'(f) } \end{gather} Where $P_V^{V'}$ is the transfer matrice between the basis $V$ and $V'$. This matrix only express each vectors of the basis $V$ as a linear combination of the vectors of the basis $V'$.

So now, to compute the output of a vector throught the linear map $f$, you only have to multiply the vector by the linear map's representative matrix:

Therefore, matrices can be thougt of as transformation operators that can shift an "object" shape and orientation.

Eigenvectors and diagonalizable matrices

But now, what are eigenvectors? Well eigenvector are defined by the following expression:

If $V$ is a non-null eigenvector of the matrix $A$, and $\lambda $ a real or complex value, then passing $V$ througt $A$ doesn't change $V$'s orientation but only its magnitude and direction by a factor of $\lambda$, which we call an eigenvalue.

Now, if we take back the example above and we decide to highlight in red the eigenvectors of the matrix, we can see that they are the only vectors that do not move but only extend.

The spectral theorem

One last important fact about square matrices is what we call the spectral theorem. This theorem states: \begin{gather} \text{Every real symmetric matrices are diagonalizable} \\ \text{and their eigenvectors are 2 by 2 orthogonal} \end{gather}

To get a broader understanding of the Spectral Theorem, we will first define all the terms used above. First, the formal definition of a symmetric matrix is the following : if $A$ is a symmetrix matrix, then: \begin{gather} A = A^\intercal \end{gather} Second, a matrix $A$ is called diagonalizable if the eigenvectors of $A$ form a base of $E$ ($E$ being the vector space in in which $A$ is "defined").

Finally, two distinct vector $v$ and $u$ are orthogonal if: \begin{gather} \langle u , v \rangle = 0 \end{gather} In two dimensions, this relation means that the two vectors are perpendicular.

To put it in a nutshell, this means that every real symmetric matrices have eigenvectors and especially that those vectors are orthogonal, and that the matrix transform the space along those eigenvectors.

Putting it all together

Now, let's go back to our variance and covariance. Like we've seen before, variance and covariance tells us how a cloud of point is shaped. Thus, thanks to our understanding of linear algebra, we can define the covariance matrix as followed:

This covariance matrix can be thought as the transformation that will "shape any circle distribution" into the distribution that defined the covariance matrix.

Let's denote $D$ our circle distribution (meaning all the vector that compose the circle) and $T$ the matrix that will transform the circle distribution into the target distribution $D'$. What we would like is to decompose $T$ as the product of a rotation matrix $R$ and a scaling matrix $S$: \begin{gather} D' = T \cdot D \\ T = R \cdot S \\ \text{with} \ R = \begin{pmatrix} \cos(\theta) & -\sin(\theta) \\ \sin(\theta) & \cos(\theta) \end{pmatrix} \ \text{and} \ S = \begin{pmatrix} \alpha_1 & 0 \\ 0 & \alpha_2 \end{pmatrix} \end{gather} Now, if we compute the covariance matrix $C$ of $D$, thanks to the spectral theorem, we can rewrite it as: \begin{gather} C = A B A^{-1} \\ \text{with B =} \begin{pmatrix} \lambda_1 & 0 & 0 \\ 0 & \ddots & 0 \\ 0 & 0 & \lambda_n \end{pmatrix} \end{gather} Thanks to our previous knowledge, we can intuitively see the link between $C$ and $T$ being that : \begin{gather} C = R \cdot \sqrt{S} \cdot R^{-1} \\ \small{A = R, \ B = \sqrt{S}} \end{gather} Add explanation pictures The square root comes from analysing examples of distributions where covariance was null.

Therefore, we can simply compute the transformation matrix $T$ by finding $C$ eigenvalues and eigenvectors (using an SVD algorithm from instance) to then construct $R$ and $S$ according to their new definition.

In a nutsheel, $C$ is not properly speaking the matrix "that shift any circle distribution into the distribution it's defined from", but it's the matrix from which we can derive the transformation matrix $T$.

Thus the eigenvectors of the covariance matrix give us the axis along which the transformation happens also called direction (because $C_{X,Y}$ is a real symmetric matrix).

Track an object's direction using PCA

Iwan Bnlf's Github

Introduction

Isolating the relevant data

What is covariance, and what does it tells us ?

Variance

Covariance

Computing the direction

The power of linear algebra

Matrices & linear maps

Eigenvectors and diagonalizable matrices

The spectral theorem

Putting it all together

Conclusion