You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: lectures/svd_intro.md
+26-19Lines changed: 26 additions & 19 deletions
Original file line number
Diff line number
Diff line change
@@ -13,7 +13,7 @@ kernelspec:
13
13
14
14
# Singular Value Decomposition (SVD)
15
15
16
-
In addition to regular packages contained in Anaconda by default, this notebook also requires:
16
+
In addition to regular packages contained in Anaconda by default, this lecture also requires:
17
17
18
18
```{code-cell} ipython3
19
19
:tags: [hide-output]
@@ -46,7 +46,7 @@ This lecture describes the singular value decomposition and two of its uses:
46
46
47
47
Let $X$ be an $m \times n$ matrix of rank $r$.
48
48
49
-
In this notebook, we'll think of $X$ as a matrix of **data**.
49
+
In this lecture, we'll think of $X$ as a matrix of **data**.
50
50
51
51
* each column is an **individual** -- a time period or person, depending on the application
52
52
@@ -55,21 +55,24 @@ In this notebook, we'll think of $X$ as a matrix of **data**.
55
55
56
56
We'll be interested in two distinct cases
57
57
58
-
*The**short and fat** case in which $m << n$, so that there are many more columns than rows.
58
+
*A**short and fat** case in which $m << n$, so that there are many more columns than rows.
59
59
60
-
*The**tall and skinny** case in which $m >> n$, so that there are many more rows than columns.
60
+
*A**tall and skinny** case in which $m >> n$, so that there are many more rows than columns.
61
61
62
62
63
63
We'll apply a **singular value decomposition** of $X$ in both situations.
64
64
65
-
In the first case in which there are many more observations $n$ than there are random variables $m$, we learn about the joint distribution of the random variables by taking averages across observations of functions of the observations. Here we'll look for **patterns** by using a **singular value decomosition** to do a **principal components analysis** (PCA).
65
+
In the first case in which there are many more observations $n$ than random variables $m$, we learn about the joint distribution of the random variables by taking averages across observations of functions of the observations.
66
+
67
+
Here we'll look for **patterns** by using a **singular value decomosition** to do a **principal components analysis** (PCA).
66
68
67
69
In the second case in which there are many more random variables $m$ than observations $n$, we'll proceed in a different way.
70
+
68
71
We'll again use a **singular value decomposition**, but now to do a **dynamic mode decomposition** (DMD)
69
72
70
73
## Singular Value Decomposition
71
74
72
-
The**singular value decomposition** of an $m \times n$ matrix $X$ of rank $r \leq \min(m,n)$ is
75
+
A**singular value decomposition** of an $m \times n$ matrix $X$ of rank $r \leq \min(m,n)$ is
73
76
74
77
$$
75
78
X = U \Sigma V^T
@@ -93,15 +96,17 @@ where
93
96
* The $r$ singular values are square roots of the eigenvalues of the $m \times m$ matrix $X X^T$ and the $n \times n$ matrix $X^T X$
94
97
95
98
* When $U$ is a complex valued matrix, $U^T$ denotes the **conjugate-transpose** or **Hermitian-transpose** of $U$, meaning that
96
-
$U_{ij}^T$ is the complex conjugate of $U_{ji}$. Similarly, when $V$ is a complex valued matrix, $V^T$ denotes the **conjugate-transpose** or **Hermitian-transpose** of $V$
99
+
$U_{ij}^T$ is the complex conjugate of $U_{ji}$.
100
+
101
+
* Similarly, when $V$ is a complex valued matrix, $V^T$ denotes the **conjugate-transpose** or **Hermitian-transpose** of $V$
97
102
98
103
The shapes of $U$, $\Sigma$, and $V$ are $\left(m, m\right)$, $\left(m, n\right)$, $\left(n, n\right)$, respectively.
99
104
100
105
Below, we shall assume these shapes.
101
106
102
-
However, there is an alternative shape convention that we could have used, though we chose not to.
107
+
However, though we chose not to, there is an alternative shape convention that we could have used.
103
108
104
-
Thus, note that because we assume that $A$ has rank $r$, there are only $r $ nonzero singular values, where $r=rank(A)\leq\min\left(m, n\right)$.
109
+
Thus, note that because we assume that $A$ has rank $r$, there are only $r $ nonzero singular values, where $r=\textrm{rank}(A)\leq\min\left(m, n\right)$.
105
110
106
111
Therefore, we could also write $U$, $\Sigma$, and $V$ as matrices with shapes $\left(m, r\right)$, $\left(r, r\right)$, $\left(r, n\right)$.
107
112
@@ -124,11 +129,11 @@ Q & = U V^T
124
129
125
130
where $S$ is evidently a symmetric matrix and $Q$ is an orthogonal matrix.
126
131
127
-
## Principle Componenents Analysis (PCA)
132
+
## Principle Components Analysis (PCA)
128
133
129
-
Let's begin with the case in which $n >> m$, so that we have many more observations $n$ than random variables $m$.
134
+
Let's begin with a case in which $n >> m$, so that we have many more observations $n$ than random variables $m$.
130
135
131
-
The data matrix $X$ is **short and fat** in the $n >> m$ case as opposed to a **tall and skinny** case with $m > > n $ to be discussed later in this notebook.
136
+
The data matrix $X$ is **short and fat** in an $n >> m$ case as opposed to a **tall and skinny** case with $m > > n $ to be discussed later in this lecture.
132
137
133
138
We regard $X$ as an $m \times n$ matrix of **data**:
134
139
@@ -140,7 +145,7 @@ where for $j = 1, \ldots, n$ the column vector $X_j = \begin{bmatrix}X_{1j}\\X_{
140
145
141
146
In a **time series** setting, we would think of columns $j$ as indexing different __times__ at which random variables are observed, while rows index different random variables.
142
147
143
-
In a **cross section** setting, we would think of columns $j$ as indexing different __individuals__ for which random variables are observed, while rows index different random variables.
148
+
In a **cross section** setting, we would think of columns $j$ as indexing different __individuals__ for which random variables are observed, while rows index different **random variables**.
144
149
145
150
The number of singular values equals the rank of matrix $X$.
146
151
@@ -187,7 +192,7 @@ is a vector of loadings of variables $X_i$ on the $k$th principle component, $i
187
192
188
193
* $\sigma_k $ for each $k=1, \ldots, r$ is the strength of $k$th **principal component**
189
194
190
-
## Digression: Reduced (or Economy) Versus Full SVD
0 commit comments