Skip to content

Commit 4562461

Browse files
Tom's Jan 2 edits of svd lecture
1 parent 37ed3f0 commit 4562461

File tree

1 file changed

+42
-24
lines changed

1 file changed

+42
-24
lines changed

lectures/svd_intro.md

Lines changed: 42 additions & 24 deletions
Original file line numberDiff line numberDiff line change
@@ -32,7 +32,7 @@ import pandas as pd
3232
## Overview
3333

3434
The **singular value decomposition** (SVD) is a work-horse in applications of least squares projection that
35-
form foundations for important statistical and machine learning methods.
35+
form foundations for many statistical and machine learning methods.
3636

3737
After defining the SVD, we'll describe how it connects to
3838

@@ -44,22 +44,22 @@ We'll also tell the essential role that the SVD plays in
4444

4545
* dynamic mode decomposition (DMD)
4646

47-
Like principal components analysis (PCA), DMD can be thought of as a data-reduction procedure designed to capture salient patterns by projecting data onto a limited set of factors.
47+
Like principal components analysis (PCA), DMD can be thought of as a data-reduction procedure that represents salient patterns by projecting data onto a limited set of factors.
4848

4949
## The Setting
5050

5151
Let $X$ be an $m \times n$ matrix of rank $p$.
5252

5353
Necessarily, $p \leq \min(m,n)$.
5454

55-
In much of this lecture, we'll think of $X$ as a matrix of **data**.
55+
In much of this lecture, we'll think of $X$ as a matrix of **data** in which
5656

5757
* each column is an **individual** -- a time period or person, depending on the application
5858

5959
* each row is a **random variable** describing an attribute of a time period or a person, depending on the application
6060

6161

62-
We'll be interested in two cases
62+
We'll be interested in two situations
6363

6464
* A **short and fat** case in which $m << n$, so that there are many more columns (individuals) than rows (attributes).
6565

@@ -68,11 +68,11 @@ We'll be interested in two cases
6868

6969
We'll apply a **singular value decomposition** of $X$ in both situations.
7070

71-
In the first case in which there are many more individuals $n$ than attributes $m$, we learn sample moments of a joint distribution by taking averages across observations of functions of the observations.
71+
In the $ m < < n$ case in which there are many more individuals $n$ than attributes $m$, we can calculate sample moments of a joint distribution by taking averages across observations of functions of the observations.
7272

7373
In this $ m < < n$ case, we'll look for **patterns** by using a **singular value decomposition** to do a **principal components analysis** (PCA).
7474

75-
In the $m > > n$ case in which there are many more attributes $m$ than individuals $n$, we'll proceed in a different way.
75+
In the $m > > n$ case in which there are many more attributes $m$ than individuals $n$ and when we are in a time-series setting in which $n$ equals the number of time periods covered in the data set $X$, we'll proceed in a different way.
7676

7777
We'll again use a **singular value decomposition**, but now to construct a **dynamic mode decomposition** (DMD)
7878

@@ -95,34 +95,43 @@ $$
9595

9696
and
9797

98-
* $U$ is an $m \times m$ matrix whose columns are eigenvectors of $X^T X$
99-
100-
* $V$ is an $n \times n$ matrix whose columns are eigenvectors of $X X^T$
101-
98+
* $U$ is an $m \times m$ orthogonal matrix of **left singular vectors** of $X$
99+
* Columns of $U$ are eigenvectors of $X^T X$
100+
* $V$ is an $n \times n$ orthogonal matrix of **right singular values** of $X$
101+
* Columns of $V$ are eigenvectors of $X X^T$
102102
* $\Sigma$ is an $m \times n$ matrix in which the first $p$ places on its main diagonal are positive numbers $\sigma_1, \sigma_2, \ldots, \sigma_p$ called **singular values**; remaining entries of $\Sigma$ are all zero
103103

104-
* The $p$ singular values are positive square roots of the eigenvalues of the $m \times m$ matrix $X X^T$ and the $n \times n$ matrix $X^T X$
104+
* The $p$ singular values are positive square roots of the eigenvalues of the $m \times m$ matrix $X X^T$ and also of the $n \times n$ matrix $X^T X$
105105

106106
* We adopt a convention that when $U$ is a complex valued matrix, $U^T$ denotes the **conjugate-transpose** or **Hermitian-transpose** of $U$, meaning that
107107
$U_{ij}^T$ is the complex conjugate of $U_{ji}$.
108108

109109
* Similarly, when $V$ is a complex valued matrix, $V^T$ denotes the **conjugate-transpose** or **Hermitian-transpose** of $V$
110110

111-
What we have described above is called a **full** SVD.
111+
112+
The matrices $U,\Sigma,V$ entail linear transformations that reshape in vectors in the following ways:
113+
114+
* multiplying vectors by the unitary matrices $U$ and $V$ **rotate** them, but leave **angles between vectors** and **lengths of vectors** unchanged.
115+
* multiplying vectors by the diagonal matrix $\Sigma$ leaves **angles between vectors** unchanged but **rescales** vectors.
116+
117+
Taken together the structure that the SVD provides for $X$ opens the door to constructing systems
118+
of data **encoders** and **decoders**, an idea that we shall apply later in this lecture.
119+
120+
What we have described here is called a **full** SVD.
112121

113122

114123

115124
In a **full** SVD, the shapes of $U$, $\Sigma$, and $V$ are $\left(m, m\right)$, $\left(m, n\right)$, $\left(n, n\right)$, respectively.
116125

117126
Later we'll also describe an **economy** or **reduced** SVD.
118127

119-
But before we study a **reduced** SVD we'll say a little more about properties of a **full** SVD.
128+
Before we study a **reduced** SVD we'll say a little more about properties of a **full** SVD.
120129

121130

122131
## Four Fundamental Subspaces
123132

124133

125-
Let's start with a reminder about definitions of the four fundamental subspaces of an $m \times n$
134+
Let's start by recalling the four fundamental subspaces of an $m \times n$
126135
matrix $X$ of rank $p$.
127136

128137
* The **column space** of $X$, denoted ${\mathcal C}(X)$, is the span of the columns of $X$, i.e., all vectors $y$ that can be written as linear combinations of columns of $X$. Its dimension is $p$.
@@ -133,10 +142,10 @@ vectors $z$ that can be written as linear combinations of rows of $X$. Its dime
133142
* The **left null space** of $X$, denoted ${\mathcal N}(X^T)$, consist of all vectors $z$ such that
134143
$X^T z =0$. Its dimension is $n-p$.
135144

136-
The $U$ and $V$ factors for a full SVD of a matrix $X$ contain orthogonal bases for all four subspaces.
145+
For a full SVD of a matrix $X$, the matrix $U$ of left singular vectors and the matrix $V$ of right singular vectors contain orthogonal bases for all four subspaces.
137146

138-
The subspaces are connected in interesting ways, consisting of two pairs of orthogonal subspaces
139-
that we'll describe here.
147+
They form two pairs of orthogonal subspaces
148+
that we'll describe now.
140149

141150
Let $u_i, i = 1, \ldots, m$ be the $m$ column vectors of $U$ and let
142151
$v_i, i = 1, \ldots, n$ be the $n$ column vectors of $V$.
@@ -148,7 +157,7 @@ X = \begin{bmatrix} U_L & U_R \end{bmatrix} \begin{bmatrix} \Sigma_p & 0 \cr 0 &
148157
\begin{bmatrix} V_L & V_R \end{bmatrix}^T
149158
$$ (eq:fullSVDpartition)
150159
151-
where
160+
where $ \Sigma_p$ is a $p \times p$ diagonal matrix with the $p$ singular values on the diagonal and
152161
153162
$$
154163
\begin{aligned}
@@ -245,7 +254,7 @@ print("Right null space:\n", null_space.T)
245254
246255
Up to now we have described properties of a **full** SVD in which shapes of $U$, $\Sigma$, and $V$ are $\left(m, m\right)$, $\left(m, n\right)$, $\left(n, n\right)$, respectively.
247256
248-
There is also an alternative shape convention called an **economy** or **reduced** SVD in which the shapes of $U, \Sigma$ and $V$ are different from what they are in a full SVD.
257+
There is an alternative bookkeeping convention called an **economy** or **reduced** SVD in which the shapes of $U, \Sigma$ and $V$ are different from what they are in a full SVD.
249258
250259
Thus, note that because we assume that $X$ has rank $p$, there are only $p$ nonzero singular values, where $p=\textrm{rank}(X)\leq\min\left(m, n\right)$.
251260
@@ -264,7 +273,7 @@ VV^T & = I & \quad V^T V = I
264273
\end{aligned}
265274
$$
266275
267-
But these properties don't hold for a **reduced** SVD.
276+
But not all these properties hold for a **reduced** SVD.
268277
269278
Which properties hold depend on whether we are in a **tall-skinny** case or a **short-fat** case.
270279
@@ -286,7 +295,7 @@ VV^T & = I & \quad V^T V \neq I
286295
\end{aligned}
287296
$$
288297
289-
When we study Dynamic Mode Decomposition below, we shall want to remember this caveat because sometimes we'll be using reduced SVD's to compute key objects.
298+
When we study Dynamic Mode Decomposition below, we shall want to remember these properties when we use a reduced SVD to compute some DMD representations.
290299
291300
292301
Let's do an exercise to compare **full** and **reduced** SVD's.
@@ -397,7 +406,7 @@ Uhat, Shat, Vhat
397406
rr = np.linalg.matrix_rank(X)
398407
print(f'rank X = {rr}')
399408
```
400-
## Digression: Polar Decomposition
409+
## Polar Decomposition
401410
402411
A singular value decomposition (SVD) of $X$ is related to a **polar decomposition** of $X$
403412
@@ -414,7 +423,10 @@ Q & = U V^T
414423
\end{aligned}
415424
$$
416425
417-
and $S$ is evidently a symmetric matrix and $Q$ is an orthogonal matrix.
426+
Here
427+
428+
* $S$ is a symmetric matrix
429+
* $Q$ is an orthogonal matrix
418430
419431
## Principal Components Analysis (PCA)
420432
@@ -875,6 +887,8 @@ $$ (eq:commonA)
875887
876888
where $X^+$ is the pseudo-inverse of $X$.
877889
890+
To read about the **Moore-Penrose pseudo-inverse** please see [Moore-Penrose pseudo-inverse](https://en.wikipedia.org/wiki/Moore%E2%80%93Penrose_inverse)
891+
878892
Applicable formulas for the pseudo-inverse differ for our two cases.
879893
880894
**Short-Fat Case:**
@@ -1034,7 +1048,7 @@ Here an $ m \times n $ data matrix $ \tilde X $ contains many more attributes $
10341048
10351049
Dynamic mode decomposition was introduced by {cite}`schmid2010`,
10361050
1037-
You can read more about Dynamic Mode Decomposition here {cite}`DMD_book` and here [[BK19](https://python.quantecon.org/zreferences.html#id25)] (section 7.2).
1051+
You can read about Dynamic Mode Decomposition here {cite}`DMD_book` and here [[BK19](https://python.quantecon.org/zreferences.html#id25)] (section 7.2).
10381052
10391053
The key idea underlying **Dynamic Mode Decomposition** (DMD) is to compute a rank $ r < p > $ approximation to the least square regression coefficients $ \hat A $ that we described above by formula {eq}`eq:AhatSVDformula`.
10401054
@@ -1047,6 +1061,10 @@ our vector autoregression.
10471061
10481062
**Guide to three representations:** In practice, we'll be interested in Representation 3. We present the first 2 in order to set the stage for some intermediate steps that might help us understand what is under the hood of Representation 3. In applications, we'll use only a small subset of the DMD to approximate dynamics. To do that, we'll want to use the **reduced** SVD's affiliated with representation 3, not the **full** SVD's affiliated with representations 1 and 2.
10491063
1064+
1065+
**Guide to impatient reader:** In our applications, we'll be using Representation 3. You might want to skip
1066+
the stage-setting representations 1 and 2 on first reading.
1067+
10501068
+++
10511069
10521070
## Representation 1

0 commit comments

Comments
 (0)