Skip to content

Commit 5dd4646

Browse files
Tom's second Dec 23 edits of SVD lecture
1 parent fba44cd commit 5dd4646

File tree

1 file changed

+68
-32
lines changed

1 file changed

+68
-32
lines changed

lectures/svd_intro.md

Lines changed: 68 additions & 32 deletions
Original file line numberDiff line numberDiff line change
@@ -31,24 +31,28 @@ import pandas as pd
3131

3232
## Overview
3333

34-
The **singular value decomposition** is a work-horse in applications of least squares projection that
35-
form foundations for important machine learning methods.
34+
The **singular value decomposition** (SVD) is a work-horse in applications of least squares projection that
35+
form foundations for important statistical and machine learning methods.
3636

37-
This lecture describes the singular value decomposition and two of its uses:
37+
After defining the SVD, we'll describe how it connects to
3838

39-
* principal components analysis (PCA)
39+
* the **four fundamental spaces** of linear algebra
40+
* underdetermined and over-determined **least squares regressions**
41+
* **principal components analysis** (PCA)
42+
43+
We'll also tell the essential role that the SVD plays in
4044

4145
* dynamic mode decomposition (DMD)
4246

43-
Each of these can be thought of as a data-reduction procedure designed to capture salient patterns by projecting data onto a limited set of factors.
47+
Like principal components analysis (PCA), DMD can be thought of as a data-reduction procedure designed to capture salient patterns by projecting data onto a limited set of factors.
4448

45-
## The Setup
49+
## The Setting
4650

4751
Let $X$ be an $m \times n$ matrix of rank $p$.
4852

4953
Necessarily, $p \leq \min(m,n)$.
5054

51-
In this lecture, we'll think of $X$ as a matrix of **data**.
55+
In much of this lecture, we'll think of $X$ as a matrix of **data**.
5256

5357
* each column is an **individual** -- a time period or person, depending on the application
5458

@@ -89,7 +93,7 @@ VV^T & = I & \quad V^T V = I
8993
\end{aligned}
9094
$$
9195

92-
where
96+
and
9397

9498
* $U$ is an $m \times m$ matrix whose columns are eigenvectors of $X^T X$
9599

@@ -99,7 +103,7 @@ where
99103

100104
* The $p$ singular values are positive square roots of the eigenvalues of the $m \times m$ matrix $X X^T$ and the $n \times n$ matrix $X^T X$
101105

102-
* When $U$ is a complex valued matrix, $U^T$ denotes the **conjugate-transpose** or **Hermitian-transpose** of $U$, meaning that
106+
* We adopt a convention that when $U$ is a complex valued matrix, $U^T$ denotes the **conjugate-transpose** or **Hermitian-transpose** of $U$, meaning that
103107
$U_{ij}^T$ is the complex conjugate of $U_{ji}$.
104108

105109
* Similarly, when $V$ is a complex valued matrix, $V^T$ denotes the **conjugate-transpose** or **Hermitian-transpose** of $V$
@@ -108,35 +112,41 @@ What we have described above is called a **full** SVD.
108112

109113

110114

111-
Here the shapes of $U$, $\Sigma$, and $V$ are $\left(m, m\right)$, $\left(m, n\right)$, $\left(n, n\right)$, respectively.
115+
In a **full** SVD, the shapes of $U$, $\Sigma$, and $V$ are $\left(m, m\right)$, $\left(m, n\right)$, $\left(n, n\right)$, respectively.
112116

113117
Later we'll also describe an **economy** or **reduced** SVD.
114118

115-
But first we'll say a little more about properties of a **full** SVD.
119+
But before we study a **reduced** SVD we'll say a little more about properties of a **full** SVD.
116120

117121

118-
## Relationship of Full SVD to Four Fundamental Subspaces
122+
## SVD and Four Fundamental Subspaces
119123

120124

121125
Let's start with a reminder about definitions of the four fundamental subspaces of an $m \times n$
122126
matrix $X$ of rank $p$.
123127

124-
* The **column space** of $X$, denoted ${\mathcal C}(X)$, is the span of the columns of $X$, i.e., all vectors $y$ that can be written as a linear combination of columns of $X$. Its dimension is $p$.
128+
* The **column space** of $X$, denoted ${\mathcal C}(X)$, is the span of the columns of $X$, i.e., all vectors $y$ that can be written as linear combinations of columns of $X$. Its dimension is $p$.
125129
* The **null space** of $X$, denoted ${\mathcal N}(X)$ consists of all vectors $y$ that satisfy
126130
$X y = 0$. Its dimension is $m-p$.
127131
* The **row space** of $X$, denoted ${\mathcal R}(X)$ is the column space of $X^T$. It consists of all
128-
vectors $z$ that can be written as a linear combination of rows of $X$. Its dimension is $p$.
132+
vectors $z$ that can be written as linear combinations of rows of $X$. Its dimension is $p$.
129133
* The **left null space** of $X$, denoted ${\mathcal N}(X^T)$, consist of all vectors $z$ such that
130134
$X^T z =0$. Its dimension is $n-p$.
131135

132-
A full SVD of a matrix $X$ contains orthogonal bases for all four subspaces.
136+
A full SVD of a matrix $X$ contains orthogonal bases for all four subspaces.
137+
138+
And the subspaces are connected in interesting ways, consisting of two pairs of orthogonal subspaces
139+
that we'll describe here.
140+
141+
Let $u_i, i = 1, \ldots, m$ be the $m$ column vectors of $U$ and let
142+
$v_i, i = 1, \ldots, n$ be the $n$ column vectors of $V$.
133143

134144
Let's write the full SVD of X as
135145

136146
$$
137147
X = \begin{bmatrix} U_L & U_R \end{bmatrix} \begin{bmatrix} \Sigma_p & 0 \cr 0 & 0 \end{bmatrix}
138148
\begin{bmatrix} V_L & V_R \end{bmatrix}^T
139-
$$
149+
$$ (eq:fullSVDpartition)
140150
141151
where
142152
@@ -148,6 +158,8 @@ V_L & = \begin{bmatrix}v_1 & \cdots & v_p \end{bmatrix} , \quad U_R = \begin{b
148158
$$
149159
150160
161+
162+
151163
These matrices are related to the four fundamental subspaces of $X$ in the following ways:
152164
153165
$$
@@ -169,6 +181,21 @@ Collection {eq}`eq:fourspaceSVD` asserts that
169181
* $V_L$ is an orthonormal basis for the range space of $X$
170182
* $U_R$ is an orthonormal basis for the column space of $X^T$
171183
184+
The four claims in {eq}`eq:fourspaceSVD` can be verified by performing the multiplications called for by the right side of {eq}`eq:fullSVDpartition` and interpreting them.
185+
186+
Although we won't go through the details of that verification here, we will note that the claims in {eq}`eq:fourspaceSVD` and the fact that $U$ and $V$ are both unitary (i.e, orthonormal) matrices immediately implies
187+
that
188+
189+
* the column space of $X$ is orthogonal to the column space of of $X^T$
190+
* the null space of $X$ is orthogonal to the range space of $X$
191+
192+
Sometimes these properties are described with the following two pairs of orthogonal complement subspaces:
193+
194+
* ${\mathcal C}(X)$ is the orthogonal complement of $ {\mathcal N}(X^T)$
195+
* ${\mathcal R}(X)$ is the orthogonal complement ${\mathcal N}(X)$
196+
197+
198+
172199
## Properties of Full and Reduced SVD's
173200
174201
Up to now we have described properties of a **full** SVD in which shapes of $U$, $\Sigma$, and $V$ are $\left(m, m\right)$, $\left(m, n\right)$, $\left(n, n\right)$, respectively.
@@ -722,18 +749,9 @@ For an example PCA applied to analyzing the structure of intelligence tests see
722749
723750
Look at parts of that lecture that describe and illustrate the classic factor analysis model.
724751
725-
## Dynamic Mode Decomposition (DMD)
726-
727-
728-
729-
We turn to the **tall and skinny** case associated with **Dynamic Mode Decomposition**, the case in which $ m >>n $.
730-
731-
Here an $ m \times n $ data matrix $ \tilde X $ contains many more attributes $ m $ than individuals $ n $.
732-
733752
734-
Dynamic mode decomposition was introduced by {cite}`schmid2010`,
735753
736-
You can read more about Dynamic Mode Decomposition here {cite}`DMD_book` and here [[BK19](https://python.quantecon.org/zreferences.html#id25)] (section 7.2).
754+
## Vector Autoregressions
737755
738756
739757
We want to fit a **first-order vector autoregression**
@@ -785,7 +803,9 @@ $$
785803
X' = \begin{bmatrix} X_2 \mid X_3 \mid \cdots \mid X_{n+1}\end{bmatrix}
786804
$$
787805
788-
Here $ ' $ does not indicate matrix transposition but instead is part of the name of the matrix $ X' $.
806+
Here $ ' $ is part of the name of the matrix $ X' $ and does not indicate matrix transposition.
807+
808+
We continue to use $\cdot^T$ to denote matrix transposition or its extension to complex matrices.
789809
790810
In forming $ X $ and $ X' $, we have in each case dropped a column from $ \tilde X $, the last column in the case of $ X $, and the first column in the case of $ X' $.
791811
@@ -956,15 +976,31 @@ $$
956976
\hat A = X' V \Sigma^{-1} U^T
957977
$$ (eq:AhatSVDformula)
958978
959-
We’ll eventually use **dynamic mode decomposition** to compute a rank $ r $ approximation to $ \hat A $,
960-
where $ r < p $.
979+
980+
981+
## Dynamic Mode Decomposition (DMD)
982+
983+
984+
985+
We turn to the **tall and skinny** case associated with **Dynamic Mode Decomposition**, the case in which $ m >>n $.
986+
987+
Here an $ m \times n $ data matrix $ \tilde X $ contains many more attributes $ m $ than individuals $ n $.
988+
989+
990+
Dynamic mode decomposition was introduced by {cite}`schmid2010`,
991+
992+
You can read more about Dynamic Mode Decomposition here {cite}`DMD_book` and here [[BK19](https://python.quantecon.org/zreferences.html#id25)] (section 7.2).
993+
994+
The key idea underlying **Dynamic Mode Decomposition** (DMD) is to compute a rank $ r < p > $ approximation to the least square regression coefficients $ \hat A $ that we described above by formula {eq}`eq:AhatSVDformula`.
995+
961996
962-
**Remark:** In our Python code, we'll sometimes use a reduced SVD.
997+
We'll build up gradually to a formulation that is typically used in applications of DMD.
963998
964999
965-
Next, we describe alternative representations of our first-order linear dynamic system.
1000+
We'll do this by describing three alternative representations of our first-order linear dynamic system, i.e.,
1001+
our vector autoregression.
9661002
967-
**Guide to three representations:** In practice, we'll be interested in Representation 3. We present the first 2 in order to set the stage for some intermediate steps that might help us understand what is under the hood of Representation 3. In applications, we'll use only a small subset of the DMD to approximate dynamics. To do that, we'll want to use the reduced SVD's affiliated with representation 3, not the full SVD's affiliated with representations 1 and 2.
1003+
**Guide to three representations:** In practice, we'll be interested in Representation 3. We present the first 2 in order to set the stage for some intermediate steps that might help us understand what is under the hood of Representation 3. In applications, we'll use only a small subset of the DMD to approximate dynamics. To do that, we'll want to use the **reduced** SVD's affiliated with representation 3, not the **full** SVD's affiliated with representations 1 and 2.
9681004
9691005
+++
9701006

0 commit comments

Comments
 (0)