You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
* Corrected some typos. Added a brief description explaining why the Eckart-Young theorem is important.
Added details to how the PCA for any given matrix.
Changed the n_components term in the codes to r_components, so it is in accord with the definition of the Eckward-Young theorem and doesn't create confusion with the n columns of the matrix.
Added an exercise and its solution at the end (I've never used sphinx, so not sure if it's done correctly).
* Apply suggestions from code review
Co-authored-by: Humphrey Yang <[email protected]>
* Update svd_intro.md
Removed typos and excessive lines that were pointed out by HumphreyYang
* Update svd_intro.md
just being sure I commited all changes
* Update svd_intro.md
* Update svd_intro.md
corrected definition of pseudo inverse of sigma matrix in the exercise
* fix exercise solution syntax
---------
Co-authored-by: Humphrey Yang <[email protected]>
Co-authored-by: mmcky <[email protected]>
Co-authored-by: mmcky <[email protected]>
* $U$ is an $m \times m$ orthogonal matrix of **left singular vectors** of $X$
83
-
* Columns of $U$ are eigenvectors of $X^\top X$
84
-
* $V$ is an $n \times n$ orthogonal matrix of **right singular values** of $X$
85
-
* Columns of $V$ are eigenvectors of $X X^\top $
83
+
* Columns of $U$ are eigenvectors of $X X^\top $
84
+
* $V$ is an $n \times n$ orthogonal matrix of **right singular vectors** of $X$
85
+
* Columns of $V$ are eigenvectors of $X^\top X$
86
86
* $\Sigma$ is an $m \times n$ matrix in which the first $p$ places on its main diagonal are positive numbers $\sigma_1, \sigma_2, \ldots, \sigma_p$ called **singular values**; remaining entries of $\Sigma$ are all zero
87
87
88
88
* The $p$ singular values are positive square roots of the eigenvalues of the $m \times m$ matrix $X X^\top $ and also of the $n \times n$ matrix $X^\top X$
@@ -99,10 +99,10 @@ The matrices $U,\Sigma,V$ entail linear transformations that reshape in vectors
99
99
* multiplying vectors by the diagonal matrix $\Sigma$ leaves **angles between vectors** unchanged but **rescales** vectors.
100
100
101
101
Thus, representation {eq}`eq:SVD101` asserts that multiplying an $n \times 1$ vector $y$ by the $m \times n$ matrix $X$
102
-
amounts to performing the following three multiplcations of $y$ sequentially:
102
+
amounts to performing the following three multiplications of $y$ sequentially:
103
103
104
104
* **rotating** $y$ by computing $V^\top y$
105
-
* **rescaling** $V^\top y$ by multipying it by $\Sigma$
105
+
* **rescaling** $V^\top y$ by multiplying it by $\Sigma$
106
106
* **rotating** $\Sigma V^\top y$ by multiplying it by $U$
107
107
108
108
This structure of the $m \times n$ matrix $X$ opens the door to constructing systems
@@ -144,11 +144,11 @@ matrix $X$ of rank $p$.
144
144
145
145
* The **column space** of $X$, denoted ${\mathcal C}(X)$, is the span of the columns of $X$, i.e., all vectors $y$ that can be written as linear combinations of columns of $X$. Its dimension is $p$.
146
146
* The **null space** of $X$, denoted ${\mathcal N}(X)$ consists of all vectors $y$ that satisfy
147
-
$X y = 0$. Its dimension is $m-p$.
147
+
$X y = 0$. Its dimension is $n-p$.
148
148
* The **row space** of $X$, denoted ${\mathcal R}(X)$ is the column space of $X^\top $. It consists of all
149
149
vectors $z$ that can be written as linear combinations of rows of $X$. Its dimension is $p$.
150
150
* The **left null space** of $X$, denoted ${\mathcal N}(X^\top )$, consist of all vectors $z$ such that
151
-
$X^\top z =0$. Its dimension is $n-p$.
151
+
$X^\top z =0$. Its dimension is $m-p$.
152
152
153
153
For a full SVD of a matrix $X$, the matrix $U$ of left singular vectors and the matrix $V$ of right singular vectors contain orthogonal bases for all four subspaces.
Notice how equations {eq}`eq:orthoortho2` assert that the transformation $X^\top $ maps a pairsof distinct orthonormal vectors $u_i, u_j$ for $i$ and $j$ both less than or equal to the rank $p$ of $X$ into a pair of distinct orthonormal vectors $v_i, v_j$ .
239
+
Notice how equations {eq}`eq:orthoortho2` assert that the transformation $X^\top $ maps a pair of distinct orthonormal vectors $u_i, u_j$ for $i$ and $j$ both less than or equal to the rank $p$ of $X$ into a pair of distinct orthonormal vectors $v_i, v_j$ .
240
240
241
241
242
242
Equations {eq}`eq:Xfour1b` assert that
@@ -250,7 +250,7 @@ $$
250
250
251
251
252
252
253
-
Thus, taken together, the systems of quations {eq}`eq:Xfour1a` and {eq}`eq:Xfour1b`
253
+
Thus, taken together, the systems of equations {eq}`eq:Xfour1a` and {eq}`eq:Xfour1b`
254
254
describe the four fundamental subspaces of $X$ in the following ways:
255
255
256
256
$$
@@ -278,7 +278,7 @@ We have verified the four claims in {eq}`eq:fourspaceSVD` simply by performing
278
278
The claims in {eq}`eq:fourspaceSVD` and the fact that $U$ and $V$ are both unitary (i.e, orthonormal) matrices imply
279
279
that
280
280
281
-
* the column space of $X$ is orthogonal to the null space of of $X^\top $
281
+
* the column space of $X$ is orthogonal to the null space of $X^\top $
282
282
* the null space of $X$ is orthogonal to the row space of $X$
283
283
284
284
Sometimes these properties are described with the following two pairs of orthogonal complement subspaces:
This is a very powerful theorem that says that we can take our $ m \times n $ matrix $X$ that in not full rank, and we can best approximate it by a full rank $p \times p$ matrix through the SVD.
363
364
364
-
You can read about the Eckart-Young theorem and some of its uses here <https://en.wikipedia.org/wiki/Low-rank_approximation>.
365
+
Moreover, if some of these $p$ singular values carry more information than others, and if we want to have the most amount of information with the least amount of data, we can take $r$ leading singular values ordered by magnitude.
366
+
367
+
We'll say more about this later when we present Principal Component Analysis.
368
+
369
+
You can read about the Eckart-Young theorem and some of its uses [here](https://en.wikipedia.org/wiki/Low-rank_approximation).
365
370
366
371
We'll make use of this theorem when we discuss principal components analysis (PCA) and also dynamic mode decomposition (DMD).
367
372
@@ -465,7 +470,7 @@ print(f'rank of X = {rr}')
465
470
466
471
**Properties:**
467
472
468
-
* Where $U$ is constructed via a full SVD, $U^\top U = I_{p\times p}$ and $U U^\top = I_{m \times m}$
473
+
* Where $U$ is constructed via a full SVD, $U^\top U = I_{m\times m}$ and $U U^\top = I_{m \times m}$
469
474
* Where $\hat U$ is constructed via a reduced SVD, although $\hat U^\top \hat U = I_{p\times p}$, it happens that $\hat U \hat U^\top \neq I_{m \times m}$
470
475
471
476
We illustrate these properties for our example with the following code cells.
@@ -490,10 +495,10 @@ UhatUhatT, UhatTUhat
490
495
491
496
**Remarks:**
492
497
493
-
The cells above illustrate application of the `fullmatrices=True` and `full-matrices=False` options.
494
-
Using `full-matrices=False` returns a reduced singular value decomposition.
498
+
The cells above illustrate the application of the `full_matrices=True` and `full_matrices=False` options.
499
+
Using `full_matrices=False` returns a reduced singular value decomposition.
495
500
496
-
The **full** and **reduced** SVd's both accurately decompose an $m \times n$ matrix $X$
501
+
The **full** and **reduced** SVD's both accurately decompose an $m \times n$ matrix $X$
497
502
498
503
When we study Dynamic Mode Decompositions below, it will be important for us to remember the preceding properties of full and reduced SVD's in such tall-skinny cases.
499
504
@@ -569,19 +574,90 @@ where for $j = 1, \ldots, n$ the column vector $X_j = \begin{bmatrix}X_{1j}\\X_{
569
574
570
575
In a **time series** setting, we would think of columns $j$ as indexing different __times__ at which random variables are observed, while rows index different random variables.
571
576
572
-
In a **cross section** setting, we would think of columns $j$ as indexing different __individuals__ for which random variables are observed, while rows index different **attributes**.
577
+
In a **cross-section** setting, we would think of columns $j$ as indexing different __individuals__ for which random variables are observed, while rows index different **attributes**.
578
+
579
+
As we have seen before, the SVD is a way to decompose a matrix into useful components, just like polar decomposition, eigendecomposition, and many others.
580
+
581
+
PCA, on the other hand, is a method that builds on the SVD to analyze data. The goal is to apply certain steps, to help better visualize patterns in data, using statistical tools to capture the most important patterns in data.
582
+
583
+
**Step 1: Standardize the data:**
573
584
574
-
The number of positive singular values equals the rank of matrix $X$.
585
+
Because our data matrix may hold variables of different units and scales, we first need to standardize the data.
575
586
576
-
Arrange the singular values in decreasing order.
587
+
First by computing the average of each row of $X$.
577
588
578
-
Arrange the positive singular values on the main diagonal of the matrix $\Sigma$ of into a vector $\sigma_R$.
589
+
$$
590
+
\bar{X_j}= \frac{1}{m} \sum_{i = 1}^{m} x_{i,j}
591
+
$$
592
+
593
+
We then create an average matrix out of these means:
And subtract out of the original matrix to create a mean centered matrix:
601
+
602
+
$$
603
+
B = X - \bar{X}
604
+
$$
605
+
606
+
607
+
**Step 2: Compute the covariance matrix:**
608
+
609
+
Then because we want to extract the relationships between variables rather than just their magnitude, in other words, we want to know how they can explain each other, we compute the covariance matrix of $B$.
610
+
611
+
$$
612
+
C = \frac{1}{{n}} B^\top B
613
+
$$
614
+
615
+
**Step 3: Decompose the covariance matrix and arrange the singular values:**
616
+
617
+
If the matrix $C$ is diagonalizable, we can eigendecompose it, find its eigenvalues and rearrange the eigenvalue and eigenvector matrices in a decreasing other.
618
+
619
+
If $C$ is not diagonalizable, we can perform an SVD of $C$:
620
+
621
+
$$
622
+
\begin{align}
623
+
B^T B &= V \Sigma^\top U^\top U \Sigma V^\top \cr
624
+
&= V \Sigma^\top \Sigma V^\top
625
+
\end{align}
626
+
$$
627
+
628
+
$$
629
+
C = \frac{1}{{n}} V \Sigma^\top \Sigma V^\top
630
+
$$
631
+
632
+
We can then rearrange the columns in the matrices $V$ and $\Sigma$ so that the singular values are in decreasing order.
633
+
634
+
635
+
**Step 4: Select singular values, (optional) truncate the rest:**
636
+
637
+
We can now decide how many singular values to pick, based on how much variance you want to retain. (e.g., retaining 95% of the total variance).
638
+
639
+
We can obtain the percentage by calculating the variance contained in the leading $r$ factors divided by the variance in total:
# slicing matrices by the number of components to use
857
-
U = self.U[:, :self.n_component]
858
-
Σ = self.Σ[:self.n_component, :self.n_component]
859
-
VT = self.VT[:self.n_component, :]
935
+
U = self.U[:, :self.r_component]
936
+
Σ = self.Σ[:self.r_component, :self.r_component]
937
+
VT = self.VT[:self.r_component, :]
860
938
861
939
# transform data
862
940
self.X_svd = U @ Σ @ VT
863
941
864
-
def fit(self, n_component):
942
+
def fit(self, r_component):
865
943
866
944
# pca
867
-
P = self.P[:, :n_component]
868
-
𝜖 = self.𝜖[:n_component, :]
945
+
P = self.P[:, :r_component]
946
+
𝜖 = self.𝜖[:r_component, :]
869
947
870
948
# transform data
871
949
self.X_pca = P @ 𝜖
872
950
873
951
# svd
874
-
U = self.U[:, :n_component]
875
-
Σ = self.Σ[:n_component, :n_component]
876
-
VT = self.VT[:n_component, :]
952
+
U = self.U[:, :r_component]
953
+
Σ = self.Σ[:r_component, :r_component]
954
+
VT = self.VT[:r_component, :]
877
955
878
956
# transform data
879
957
self.X_svd = U @ Σ @ VT
@@ -926,6 +1004,58 @@ def compare_pca_svd(da):
926
1004
plt.show()
927
1005
```
928
1006
1007
+
## Exercises
1008
+
1009
+
```{exercise}
1010
+
:label: svd_ex1
1011
+
1012
+
In Ordinary Least Squares (OLS), we learn to compute $ \hat{\beta} = (X^\top X)^{-1} X^\top y $, but there are cases such as when we have colinearity or an underdetermined system: **short fat** matrix.
1013
+
1014
+
In these cases, the $ (X^\top X) $ matrix is not not invertible (its determinant is zero) or ill-conditioned (its determinant is very close to zero).
1015
+
1016
+
What we can do instead is to create what is called a [pseudoinverse](https://en.wikipedia.org/wiki/Moore%E2%80%93Penrose_inverse), a full rank approximation of the inverted matrix so we can compute $ \hat{\beta} $ with it.
1017
+
1018
+
Thinking in terms of the Eckart-Young theorem, build the pseudoinverse matrix $ X^{+} $ and use it to compute $ \hat{\beta} $.
1019
+
1020
+
```
1021
+
1022
+
```{solution-start} svd_ex1
1023
+
:class: dropdown
1024
+
```
1025
+
1026
+
We can use SVD to compute the pseudoinverse:
1027
+
1028
+
$$
1029
+
X = U \Sigma V^\top
1030
+
$$
1031
+
1032
+
inverting $X$, we have:
1033
+
1034
+
$$
1035
+
X^{+} = V \Sigma^{+} U^\top
1036
+
$$
1037
+
1038
+
where:
1039
+
1040
+
$$
1041
+
\Sigma^{+} = \begin{bmatrix}
1042
+
\frac{1}{\sigma_1} & 0 & \cdots & 0 & 0 \\
1043
+
0 & \frac{1}{\sigma_2} & \cdots & 0 & 0 \\
1044
+
\vdots & \vdots & \ddots & \vdots & \vdots \\
1045
+
0 & 0 & \cdots & \frac{1}{\sigma_p} & 0 \\
1046
+
0 & 0 & \cdots & 0 & 0 \\
1047
+
\end{bmatrix}
1048
+
$$
1049
+
1050
+
and finally:
1051
+
1052
+
$$
1053
+
\hat{\beta} = X^{+}y = V \Sigma^{+} U^\top y
1054
+
$$
1055
+
1056
+
```{solution-end}
1057
+
```
1058
+
929
1059
930
1060
For an example PCA applied to analyzing the structure of intelligence tests see this lecture {doc}`Multivariable Normal Distribution <multivariate_normal>`.
0 commit comments