You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: lectures/svd_intro.md
+68-32Lines changed: 68 additions & 32 deletions
Original file line number
Diff line number
Diff line change
@@ -31,24 +31,28 @@ import pandas as pd
31
31
32
32
## Overview
33
33
34
-
The **singular value decomposition** is a work-horse in applications of least squares projection that
35
-
form foundations for important machine learning methods.
34
+
The **singular value decomposition**(SVD) is a work-horse in applications of least squares projection that
35
+
form foundations for important statistical and machine learning methods.
36
36
37
-
This lecture describes the singular value decomposition and two of its uses:
37
+
After defining the SVD, we'll describe how it connects to
38
38
39
-
* principal components analysis (PCA)
39
+
* the **four fundamental spaces** of linear algebra
40
+
* underdetermined and over-determined **least squares regressions**
41
+
***principal components analysis** (PCA)
42
+
43
+
We'll also tell the essential role that the SVD plays in
40
44
41
45
* dynamic mode decomposition (DMD)
42
46
43
-
Each of these can be thought of as a data-reduction procedure designed to capture salient patterns by projecting data onto a limited set of factors.
47
+
Like principal components analysis (PCA), DMD can be thought of as a data-reduction procedure designed to capture salient patterns by projecting data onto a limited set of factors.
44
48
45
-
## The Setup
49
+
## The Setting
46
50
47
51
Let $X$ be an $m \times n$ matrix of rank $p$.
48
52
49
53
Necessarily, $p \leq \min(m,n)$.
50
54
51
-
In this lecture, we'll think of $X$ as a matrix of **data**.
55
+
In much of this lecture, we'll think of $X$ as a matrix of **data**.
52
56
53
57
* each column is an **individual** -- a time period or person, depending on the application
54
58
@@ -89,7 +93,7 @@ VV^T & = I & \quad V^T V = I
89
93
\end{aligned}
90
94
$$
91
95
92
-
where
96
+
and
93
97
94
98
* $U$ is an $m \times m$ matrix whose columns are eigenvectors of $X^T X$
95
99
@@ -99,7 +103,7 @@ where
99
103
100
104
* The $p$ singular values are positive square roots of the eigenvalues of the $m \times m$ matrix $X X^T$ and the $n \times n$ matrix $X^T X$
101
105
102
-
*When $U$ is a complex valued matrix, $U^T$ denotes the **conjugate-transpose** or **Hermitian-transpose** of $U$, meaning that
106
+
*We adopt a convention that when $U$ is a complex valued matrix, $U^T$ denotes the **conjugate-transpose** or **Hermitian-transpose** of $U$, meaning that
103
107
$U_{ij}^T$ is the complex conjugate of $U_{ji}$.
104
108
105
109
* Similarly, when $V$ is a complex valued matrix, $V^T$ denotes the **conjugate-transpose** or **Hermitian-transpose** of $V$
@@ -108,35 +112,41 @@ What we have described above is called a **full** SVD.
108
112
109
113
110
114
111
-
Here the shapes of $U$, $\Sigma$, and $V$ are $\left(m, m\right)$, $\left(m, n\right)$, $\left(n, n\right)$, respectively.
115
+
In a **full** SVD, the shapes of $U$, $\Sigma$, and $V$ are $\left(m, m\right)$, $\left(m, n\right)$, $\left(n, n\right)$, respectively.
112
116
113
117
Later we'll also describe an **economy** or **reduced** SVD.
114
118
115
-
But first we'll say a little more about properties of a **full** SVD.
119
+
But before we study a **reduced** SVD we'll say a little more about properties of a **full** SVD.
116
120
117
121
118
-
## Relationship of Full SVD to Four Fundamental Subspaces
122
+
## SVD and Four Fundamental Subspaces
119
123
120
124
121
125
Let's start with a reminder about definitions of the four fundamental subspaces of an $m \times n$
122
126
matrix $X$ of rank $p$.
123
127
124
-
* The **column space** of $X$, denoted ${\mathcal C}(X)$, is the span of the columns of $X$, i.e., all vectors $y$ that can be written as a linear combination of columns of $X$. Its dimension is $p$.
128
+
* The **column space** of $X$, denoted ${\mathcal C}(X)$, is the span of the columns of $X$, i.e., all vectors $y$ that can be written as linear combinations of columns of $X$. Its dimension is $p$.
125
129
* The **null space** of $X$, denoted ${\mathcal N}(X)$ consists of all vectors $y$ that satisfy
126
130
$X y = 0$. Its dimension is $m-p$.
127
131
* The **row space** of $X$, denoted ${\mathcal R}(X)$ is the column space of $X^T$. It consists of all
128
-
vectors $z$ that can be written as a linear combination of rows of $X$. Its dimension is $p$.
132
+
vectors $z$ that can be written as linear combinations of rows of $X$. Its dimension is $p$.
129
133
* The **left null space** of $X$, denoted ${\mathcal N}(X^T)$, consist of all vectors $z$ such that
130
134
$X^T z =0$. Its dimension is $n-p$.
131
135
132
-
A full SVD of a matrix $X$ contains orthogonal bases for all four subspaces.
136
+
A full SVD of a matrix $X$ contains orthogonal bases for all four subspaces.
137
+
138
+
And the subspaces are connected in interesting ways, consisting of two pairs of orthogonal subspaces
139
+
that we'll describe here.
140
+
141
+
Let $u_i, i = 1, \ldots, m$ be the $m$ column vectors of $U$ and let
142
+
$v_i, i = 1, \ldots, n$ be the $n$ column vectors of $V$.
These matrices are related to the four fundamental subspaces of $X$ in the following ways:
152
164
153
165
$$
@@ -169,6 +181,21 @@ Collection {eq}`eq:fourspaceSVD` asserts that
169
181
* $V_L$ is an orthonormal basis for the range space of $X$
170
182
* $U_R$ is an orthonormal basis for the column space of $X^T$
171
183
184
+
The four claims in {eq}`eq:fourspaceSVD` can be verified by performing the multiplications called for by the right side of {eq}`eq:fullSVDpartition` and interpreting them.
185
+
186
+
Although we won't go through the details of that verification here, we will note that the claims in {eq}`eq:fourspaceSVD` and the fact that $U$ and $V$ are both unitary (i.e, orthonormal) matrices immediately implies
187
+
that
188
+
189
+
* the column space of $X$ is orthogonal to the column space of of $X^T$
190
+
* the null space of $X$ is orthogonal to the range space of $X$
191
+
192
+
Sometimes these properties are described with the following two pairs of orthogonal complement subspaces:
193
+
194
+
* ${\mathcal C}(X)$ is the orthogonal complement of $ {\mathcal N}(X^T)$
195
+
* ${\mathcal R}(X)$ is the orthogonal complement ${\mathcal N}(X)$
196
+
197
+
198
+
172
199
## Properties of Full and Reduced SVD's
173
200
174
201
Up to now we have described properties of a **full** SVD in which shapes of $U$, $\Sigma$, and $V$ are $\left(m, m\right)$, $\left(m, n\right)$, $\left(n, n\right)$, respectively.
@@ -722,18 +749,9 @@ For an example PCA applied to analyzing the structure of intelligence tests see
722
749
723
750
Look at parts of that lecture that describe and illustrate the classic factor analysis model.
724
751
725
-
## Dynamic Mode Decomposition (DMD)
726
-
727
-
728
-
729
-
We turn to the **tall and skinny** case associated with **Dynamic Mode Decomposition**, the case in which $ m >>n $.
730
-
731
-
Here an $ m \times n $ data matrix $ \tilde X $ contains many more attributes $ m $ than individuals $ n $.
732
-
733
752
734
-
Dynamic mode decomposition was introduced by {cite}`schmid2010`,
735
753
736
-
You can read more about Dynamic Mode Decomposition here {cite}`DMD_book` and here [[BK19](https://python.quantecon.org/zreferences.html#id25)] (section 7.2).
754
+
## Vector Autoregressions
737
755
738
756
739
757
We want to fit a **first-order vector autoregression**
Here $ ' $ does not indicate matrix transposition but instead is part of the name of the matrix $ X' $.
806
+
Here $ ' $ is part of the name of the matrix $ X' $ and does not indicate matrix transposition.
807
+
808
+
We continue to use $\cdot^T$ to denote matrix transposition or its extension to complex matrices.
789
809
790
810
In forming $ X $ and $ X' $, we have in each case dropped a column from $ \tilde X $, the last column in the case of $ X $, and the first column in the case of $ X' $.
791
811
@@ -956,15 +976,31 @@ $$
956
976
\hat A = X' V \Sigma^{-1} U^T
957
977
$$ (eq:AhatSVDformula)
958
978
959
-
We’ll eventually use **dynamic mode decomposition** to compute a rank $ r $ approximation to $ \hat A $,
960
-
where $ r < p $.
979
+
980
+
981
+
## Dynamic Mode Decomposition (DMD)
982
+
983
+
984
+
985
+
We turn to the **tall and skinny** case associated with **Dynamic Mode Decomposition**, the case in which $ m >>n $.
986
+
987
+
Here an $ m \times n $ data matrix $ \tilde X $ contains many more attributes $ m $ than individuals $ n $.
988
+
989
+
990
+
Dynamic mode decomposition was introduced by {cite}`schmid2010`,
991
+
992
+
You can read more about Dynamic Mode Decomposition here {cite}`DMD_book` and here [[BK19](https://python.quantecon.org/zreferences.html#id25)] (section 7.2).
993
+
994
+
The key idea underlying **Dynamic Mode Decomposition** (DMD) is to compute a rank $ r < p > $ approximation to the least square regression coefficients $ \hat A $ that we described above by formula {eq}`eq:AhatSVDformula`.
995
+
961
996
962
-
**Remark:** In our Python code, we'll sometimes use a reduced SVD.
997
+
We'll build up gradually to a formulation that is typically used in applications of DMD.
963
998
964
999
965
-
Next, we describe alternative representations of our first-order linear dynamic system.
1000
+
We'll do this by describing three alternative representations of our first-order linear dynamic system, i.e.,
1001
+
our vector autoregression.
966
1002
967
-
**Guide to three representations:** In practice, we'll be interested in Representation 3. We present the first 2 in order to set the stage for some intermediate steps that might help us understand what is under the hood of Representation 3. In applications, we'll use only a small subset of the DMD to approximate dynamics. To do that, we'll want to use the reduced SVD's affiliated with representation 3, not the full SVD's affiliated with representations 1 and 2.
1003
+
**Guide to three representations:** In practice, we'll be interested in Representation 3. We present the first 2 in order to set the stage for some intermediate steps that might help us understand what is under the hood of Representation 3. In applications, we'll use only a small subset of the DMD to approximate dynamics. To do that, we'll want to use the **reduced** SVD's affiliated with representation 3, not the **full** SVD's affiliated with representations 1 and 2.
0 commit comments