You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: lectures/svd_intro.md
+42-24Lines changed: 42 additions & 24 deletions
Original file line number
Diff line number
Diff line change
@@ -32,7 +32,7 @@ import pandas as pd
32
32
## Overview
33
33
34
34
The **singular value decomposition** (SVD) is a work-horse in applications of least squares projection that
35
-
form foundations for important statistical and machine learning methods.
35
+
form foundations for many statistical and machine learning methods.
36
36
37
37
After defining the SVD, we'll describe how it connects to
38
38
@@ -44,22 +44,22 @@ We'll also tell the essential role that the SVD plays in
44
44
45
45
* dynamic mode decomposition (DMD)
46
46
47
-
Like principal components analysis (PCA), DMD can be thought of as a data-reduction procedure designed to capture salient patterns by projecting data onto a limited set of factors.
47
+
Like principal components analysis (PCA), DMD can be thought of as a data-reduction procedure that represents salient patterns by projecting data onto a limited set of factors.
48
48
49
49
## The Setting
50
50
51
51
Let $X$ be an $m \times n$ matrix of rank $p$.
52
52
53
53
Necessarily, $p \leq \min(m,n)$.
54
54
55
-
In much of this lecture, we'll think of $X$ as a matrix of **data**.
55
+
In much of this lecture, we'll think of $X$ as a matrix of **data** in which
56
56
57
57
* each column is an **individual** -- a time period or person, depending on the application
58
58
59
59
* each row is a **random variable** describing an attribute of a time period or a person, depending on the application
60
60
61
61
62
-
We'll be interested in two cases
62
+
We'll be interested in two situations
63
63
64
64
* A **short and fat** case in which $m << n$, so that there are many more columns (individuals) than rows (attributes).
65
65
@@ -68,11 +68,11 @@ We'll be interested in two cases
68
68
69
69
We'll apply a **singular value decomposition** of $X$ in both situations.
70
70
71
-
In the first case in which there are many more individuals $n$ than attributes $m$, we learn sample moments of a joint distribution by taking averages across observations of functions of the observations.
71
+
In the $ m < < n$ case in which there are many more individuals $n$ than attributes $m$, we can calculate sample moments of a joint distribution by taking averages across observations of functions of the observations.
72
72
73
73
In this $ m < < n$ case, we'll look for **patterns** by using a **singular value decomposition** to do a **principal components analysis** (PCA).
74
74
75
-
In the $m > > n$ case in which there are many more attributes $m$ than individuals $n$, we'll proceed in a different way.
75
+
In the $m > > n$ case in which there are many more attributes $m$ than individuals $n$ and when we are in a time-series setting in which $n$ equals the number of time periods covered in the data set $X$, we'll proceed in a different way.
76
76
77
77
We'll again use a **singular value decomposition**, but now to construct a **dynamic mode decomposition** (DMD)
78
78
@@ -95,34 +95,43 @@ $$
95
95
96
96
and
97
97
98
-
* $U$ is an $m \times m$ matrix whose columns are eigenvectors of $X^T X$
99
-
100
-
* $V$ is an $n \times n$ matrix whose columns are eigenvectors of $X X^T$
101
-
98
+
* $U$ is an $m \times m$ orthogonal matrix of **left singular vectors** of $X$
99
+
* Columns of $U$ are eigenvectors of $X^T X$
100
+
* $V$ is an $n \times n$ orthogonal matrix of **right singular values** of $X$
101
+
* Columns of $V$ are eigenvectors of $X X^T$
102
102
* $\Sigma$ is an $m \times n$ matrix in which the first $p$ places on its main diagonal are positive numbers $\sigma_1, \sigma_2, \ldots, \sigma_p$ called **singular values**; remaining entries of $\Sigma$ are all zero
103
103
104
-
* The $p$ singular values are positive square roots of the eigenvalues of the $m \times m$ matrix $X X^T$ and the $n \times n$ matrix $X^T X$
104
+
* The $p$ singular values are positive square roots of the eigenvalues of the $m \times m$ matrix $X X^T$ and also of the $n \times n$ matrix $X^T X$
105
105
106
106
* We adopt a convention that when $U$ is a complex valued matrix, $U^T$ denotes the **conjugate-transpose** or **Hermitian-transpose** of $U$, meaning that
107
107
$U_{ij}^T$ is the complex conjugate of $U_{ji}$.
108
108
109
109
* Similarly, when $V$ is a complex valued matrix, $V^T$ denotes the **conjugate-transpose** or **Hermitian-transpose** of $V$
110
110
111
-
What we have described above is called a **full** SVD.
111
+
112
+
The matrices $U,\Sigma,V$ entail linear transformations that reshape in vectors in the following ways:
113
+
114
+
* multiplying vectors by the unitary matrices $U$ and $V$ **rotate** them, but leave **angles between vectors** and **lengths of vectors** unchanged.
115
+
* multiplying vectors by the diagonal matrix $\Sigma$ leaves **angles between vectors** unchanged but **rescales** vectors.
116
+
117
+
Taken together the structure that the SVD provides for $X$ opens the door to constructing systems
118
+
of data **encoders** and **decoders**, an idea that we shall apply later in this lecture.
119
+
120
+
What we have described here is called a **full** SVD.
112
121
113
122
114
123
115
124
In a **full** SVD, the shapes of $U$, $\Sigma$, and $V$ are $\left(m, m\right)$, $\left(m, n\right)$, $\left(n, n\right)$, respectively.
116
125
117
126
Later we'll also describe an **economy** or **reduced** SVD.
118
127
119
-
But before we study a **reduced** SVD we'll say a little more about properties of a **full** SVD.
128
+
Before we study a **reduced** SVD we'll say a little more about properties of a **full** SVD.
120
129
121
130
122
131
## Four Fundamental Subspaces
123
132
124
133
125
-
Let's start with a reminder about definitions of the four fundamental subspaces of an $m \times n$
134
+
Let's start by recalling the four fundamental subspaces of an $m \times n$
126
135
matrix $X$ of rank $p$.
127
136
128
137
* The **column space** of $X$, denoted ${\mathcal C}(X)$, is the span of the columns of $X$, i.e., all vectors $y$ that can be written as linear combinations of columns of $X$. Its dimension is $p$.
@@ -133,10 +142,10 @@ vectors $z$ that can be written as linear combinations of rows of $X$. Its dime
133
142
* The **left null space** of $X$, denoted ${\mathcal N}(X^T)$, consist of all vectors $z$ such that
134
143
$X^T z =0$. Its dimension is $n-p$.
135
144
136
-
The $U$ and $V$ factors for a full SVD of a matrix $X$ contain orthogonal bases for all four subspaces.
145
+
For a full SVD of a matrix $X$, the matrix $U$ of left singular vectors and the matrix $V$ of right singular vectors contain orthogonal bases for all four subspaces.
137
146
138
-
The subspaces are connected in interesting ways, consisting of two pairs of orthogonal subspaces
139
-
that we'll describe here.
147
+
They form two pairs of orthogonal subspaces
148
+
that we'll describe now.
140
149
141
150
Let $u_i, i = 1, \ldots, m$ be the $m$ column vectors of $U$ and let
142
151
$v_i, i = 1, \ldots, n$ be the $n$ column vectors of $V$.
Up to now we have described properties of a **full** SVD in which shapes of $U$, $\Sigma$, and $V$ are $\left(m, m\right)$, $\left(m, n\right)$, $\left(n, n\right)$, respectively.
247
256
248
-
There is also an alternative shape convention called an **economy** or **reduced** SVD in which the shapes of $U, \Sigma$ and $V$ are different from what they are in a full SVD.
257
+
There is an alternative bookkeeping convention called an **economy** or **reduced** SVD in which the shapes of $U, \Sigma$ and $V$ are different from what they are in a full SVD.
249
258
250
259
Thus, note that because we assume that $X$ has rank $p$, there are only $p$ nonzero singular values, where $p=\textrm{rank}(X)\leq\min\left(m, n\right)$.
251
260
@@ -264,7 +273,7 @@ VV^T & = I & \quad V^T V = I
264
273
\end{aligned}
265
274
$$
266
275
267
-
But these properties don't hold for a **reduced** SVD.
276
+
But not all these properties hold for a **reduced** SVD.
268
277
269
278
Which properties hold depend on whether we are in a **tall-skinny** case or a **short-fat** case.
270
279
@@ -286,7 +295,7 @@ VV^T & = I & \quad V^T V \neq I
286
295
\end{aligned}
287
296
$$
288
297
289
-
When we study Dynamic Mode Decomposition below, we shall want to remember this caveat because sometimes we'll be using reduced SVD's to compute key objects.
298
+
When we study Dynamic Mode Decomposition below, we shall want to remember these properties when we use a reduced SVD to compute some DMD representations.
290
299
291
300
292
301
Let's do an exercise to compare **full** and **reduced** SVD's.
@@ -397,7 +406,7 @@ Uhat, Shat, Vhat
397
406
rr = np.linalg.matrix_rank(X)
398
407
print(f'rank X = {rr}')
399
408
```
400
-
## Digression: Polar Decomposition
409
+
## Polar Decomposition
401
410
402
411
A singular value decomposition (SVD) of $X$ is related to a **polar decomposition** of $X$
403
412
@@ -414,7 +423,10 @@ Q & = U V^T
414
423
\end{aligned}
415
424
$$
416
425
417
-
and $S$ is evidently a symmetric matrix and $Q$ is an orthogonal matrix.
426
+
Here
427
+
428
+
* $S$ is a symmetric matrix
429
+
* $Q$ is an orthogonal matrix
418
430
419
431
## Principal Components Analysis (PCA)
420
432
@@ -875,6 +887,8 @@ $$ (eq:commonA)
875
887
876
888
where $X^+$ is the pseudo-inverse of $X$.
877
889
890
+
To read about the **Moore-Penrose pseudo-inverse** please see [Moore-Penrose pseudo-inverse](https://en.wikipedia.org/wiki/Moore%E2%80%93Penrose_inverse)
891
+
878
892
Applicable formulas for the pseudo-inverse differ for our two cases.
879
893
880
894
**Short-Fat Case:**
@@ -1034,7 +1048,7 @@ Here an $ m \times n $ data matrix $ \tilde X $ contains many more attributes $
1034
1048
1035
1049
Dynamic mode decomposition was introduced by {cite}`schmid2010`,
1036
1050
1037
-
You can read more about Dynamic Mode Decomposition here {cite}`DMD_book` and here [[BK19](https://python.quantecon.org/zreferences.html#id25)] (section 7.2).
1051
+
You can read about Dynamic Mode Decomposition here {cite}`DMD_book` and here [[BK19](https://python.quantecon.org/zreferences.html#id25)] (section 7.2).
1038
1052
1039
1053
The key idea underlying **Dynamic Mode Decomposition** (DMD) is to compute a rank $ r < p > $ approximation to the least square regression coefficients $ \hat A $ that we described above by formula {eq}`eq:AhatSVDformula`.
1040
1054
@@ -1047,6 +1061,10 @@ our vector autoregression.
1047
1061
1048
1062
**Guide to three representations:** In practice, we'll be interested in Representation 3. We present the first 2 in order to set the stage for some intermediate steps that might help us understand what is under the hood of Representation 3. In applications, we'll use only a small subset of the DMD to approximate dynamics. To do that, we'll want to use the **reduced** SVD's affiliated with representation 3, not the **full** SVD's affiliated with representations 1 and 2.
1049
1063
1064
+
1065
+
**Guide to impatient reader:** In our applications, we'll be using Representation 3. You might want to skip
1066
+
the stage-setting representations 1 and 2 on first reading.
0 commit comments