Skip to content

Commit 52a9195

Browse files
Tom's March 12 edit of var_dmd lecture
1 parent 3f8ac5e commit 52a9195

File tree

1 file changed

+41
-52
lines changed

1 file changed

+41
-52
lines changed

lectures/var_dmd.md

Lines changed: 41 additions & 52 deletions
Original file line numberDiff line numberDiff line change
@@ -82,7 +82,7 @@ Evidently, $ X $ and $ X' $ are both $ m \times n $ matrices.
8282
8383
We denote the rank of $ X $ as $ p \leq \min(m, n) $.
8484
85-
Two possible cases are
85+
Two cases that interest us are
8686
8787
* $ n > > m$, so that we have many more time series observations $n$ than variables $m$
8888
* $m > > n$, so that we have many more variables $m $ than time series observations $n$
@@ -121,9 +121,9 @@ $$
121121
$$ (eq:Ahatform101)
122122
123123
124-
This formula for least-squares regression coefficients widely used in econometrics.
124+
This formula for least-squares regression coefficients is widely used in econometrics.
125125
126-
For example, it is used to estimate vector autorgressions.
126+
It is used to estimate vector autorgressions.
127127
128128
The right side of formula {eq}`eq:Ahatform101` is proportional to the empirical cross second moment matrix of $X_{t+1}$ and $X_t$ times the inverse
129129
of the second moment matrix of $X_t$.
@@ -149,7 +149,7 @@ $$ (eq:hatAversion0)
149149
150150
Please compare formulas {eq}`eq:Ahatform101` and {eq}`eq:hatAversion0` for $\hat A$.
151151
152-
Here we are interested in formula {eq}`eq:hatAversion0`.
152+
Here we are especially interested in formula {eq}`eq:hatAversion0`.
153153
154154
The $ i $th row of $ \hat A $ is an $ m \times 1 $ vector of regression coefficients of $ X_{i,t+1} $ on $ X_{j,t}, j = 1, \ldots, m $.
155155
@@ -162,14 +162,20 @@ $$
162162
163163
so that the regression equation **fits perfectly**.
164164
165-
This is the usual outcome in an **underdetermined least-squares** model.
165+
This is a typical outcome in an **underdetermined least-squares** model.
166166
167167
168168
To reiterate, in the **tall-skinny** case (described in {doc}`Singular Value Decomposition <svd_intro>`) in which we have a number $n$ of observations that is small relative to the number $m$ of
169169
attributes that appear in the vector $X_t$, we want to fit equation {eq}`eq:VARfirstorder`.
170170
171+
We confront the facts that the least squares estimator is underdetermined and that the regression equation fits perfectly.
171172
172-
To offer ideas about how we can efficiently calculate the pseudo-inverse $X^+$, as our estimator $\hat A$ of $A$ we form an $m \times m$ matrix that solves the least-squares best-fit problem
173+
174+
To proceed, we'll want efficiently to calculate the pseudo-inverse $X^+$.
175+
176+
The pseudo-inverse $X^+$ will be a component of our estimator of $A$.
177+
178+
As our estimator $\hat A$ of $A$ we want to form an $m \times m$ matrix that solves the least-squares best-fit problem
173179
174180
$$
175181
\hat A = \textrm{argmin}_{\check A} || X' - \check A X ||_F
@@ -195,9 +201,12 @@ where the (possibly huge) $ n \times m $ matrix $ X^{+} = (X^\top X)^{-1} X^\to
195201
196202
197203
198-
For some situations that we are interested in, $X^\top X $ can be close to singular, a situation that can make some numerical algorithms be error-prone.
204+
For some situations that we are interested in, $X^\top X $ can be close to singular, a situation that makes some numerical algorithms be inaccurate.
199205
200-
To acknowledge that possibility, we'll use efficient algorithms for computing and for constructing reduced rank approximations of $\hat A$ in formula {eq}`eq:hatAversion0`.
206+
To acknowledge that possibility, we'll use efficient algorithms to constructing
207+
a **reduced-rank approximation** of $\hat A$ in formula {eq}`eq:hatAversion0`.
208+
209+
Such an approximation to our vector autoregression will no longer fit perfectly.
201210
202211
203212
The $ i $th row of $ \hat A $ is an $ m \times 1 $ vector of regression coefficients of $ X_{i,t+1} $ on $ X_{j,t}, j = 1, \ldots, m $.
@@ -262,20 +271,21 @@ Dynamic mode decomposition was introduced by {cite}`schmid2010`,
262271
263272
You can read about Dynamic Mode Decomposition here {cite}`DMD_book` and here [[BK19](https://python.quantecon.org/zreferences.html#id25)] (section 7.2).
264273
265-
**Dynamic Mode Decomposition** (DMD) computes a rank $ r < p $ approximation to the least square regression coefficients $ \hat A $ that we described above by formula {eq}`eq:AhatSVDformula`.
274+
**Dynamic Mode Decomposition** (DMD) computes a rank $ r < p $ approximation to the least squares regression coefficients $ \hat A $ described by formula {eq}`eq:AhatSVDformula`.
266275
267276
268277
We'll build up gradually to a formulation that is useful in applications.
269278
270279
271-
We'll do this by describing three alternative representations of our first-order linear dynamic system, i.e.,
272-
our vector autoregression.
280+
We'll do this by describing three alternative representations of our first-order linear dynamic system, i.e., our vector autoregression.
281+
282+
**Guide to three representations:** In practice, we'll mainly be interested in Representation 3.
273283
274-
**Guide to three representations:** In practice, we'll be interested in Representation 3.
284+
We use the first two representations to present some useful intermediate steps that help us to appreciate what is under the hood of Representation 3.
275285
276-
We present the first 2 in order to set the stage for some intermediate steps that might help us understand what is under the hood of Representation 3.
286+
In applications, we'll use only a small subset of **DMD modes** to approximate dynamics.
277287
278-
In applications, we'll use only a small subset of the DMD to approximate dynamics.
288+
We use such a small subset of DMD modes to construct a reduced-rank approximation to $A$.
279289
280290
To do that, we'll want to use the **reduced** SVD's affiliated with representation 3, not the **full** SVD's affiliated with representations 1 and 2.
281291
@@ -337,9 +347,7 @@ $$
337347
\tilde b_{t+1} = \tilde A \tilde b_t
338348
$$
339349
340-
To construct forecasts $\overline X_t$ of future values of $X_t$ conditional on $X_1$, we can apply decoders
341-
(i.e., rotators) to both sides of this
342-
equation and deduce
350+
To construct forecasts $\overline X_t$ of future values of $X_t$ conditional on $X_1$, we can apply decoders (i.e., rotators) to both sides of this equation and deduce
343351
344352
$$
345353
\overline X_{t+1} = U \tilde A^t U^\top X_1
@@ -363,15 +371,15 @@ As with Representation 1, we continue to
363371
364372
365373
366-
As we observed and illustrated earlier in this lecture
374+
As we observed and illustrated in a lecture about the {doc}`Singular Value Decomposition <svd_intro>`
367375
368376
* (a) for a full SVD $U U^\top = I_{m \times m} $ and $U^\top U = I_{p \times p}$ are both identity matrices
369377
370378
* (b) for a reduced SVD of $X$, $U^\top U $ is not an identity matrix.
371379
372380
As we shall see later, a full SVD is too confining for what we ultimately want to do, namely, cope with situations in which $U^\top U$ is **not** an identity matrix because we use a reduced SVD of $X$.
373381
374-
But for now, let's proceed under the assumption that we are using a full SVD so that both of the preceding two requirements (a) and (b) are satisfied.
382+
But for now, let's proceed under the assumption that we are using a full SVD so that requirements (a) and (b) are both satisfied.
375383
376384
377385
@@ -391,8 +399,7 @@ $$
391399
\hat A = U \tilde A U^\top = U W \Lambda W^{-1} U^\top
392400
$$ (eq:eqeigAhat)
393401
394-
According to equation {eq}`eq:eqeigAhat`, the diagonal matrix $\Lambda$ contains eigenvalues of
395-
$\hat A$ and corresponding eigenvectors of $\hat A$ are columns of the matrix $UW$.
402+
According to equation {eq}`eq:eqeigAhat`, the diagonal matrix $\Lambda$ contains eigenvalues of $\hat A$ and corresponding eigenvectors of $\hat A$ are columns of the matrix $UW$.
396403
397404
It follows that the systematic (i.e., not random) parts of the $X_t$ dynamics captured by our first-order vector autoregressions are described by
398405
@@ -467,16 +474,9 @@ $$
467474
468475
is a matrix of regression coefficients of the $m \times n$ matrix $X$ on the $m \times p$ matrix $\Phi_s$.
469476
470-
We'll say more about this interpretation in a related context when we discuss representation 3.
471-
472-
473-
474-
475-
476-
477-
We turn next to an alternative representation suggested by Tu et al. {cite}`tu_Rowley`.
477+
We'll say more about this interpretation in a related context when we discuss representation 3, which was suggested by Tu et al. {cite}`tu_Rowley`.
478478
479-
It is more appropriate to use this alternative representation when, as is typically the case in practice, we use a reduced SVD.
479+
It is more appropriate to use representation 3 when, as is often the case in practice, we want to use a reduced SVD.
480480
481481
482482
@@ -523,8 +523,7 @@ $$ (eq:tildeAverify)
523523
524524
525525
526-
Next, we'll just compute the regression coefficients in a projection of $\hat A$ on $\tilde U$ using the
527-
standard least-square formula
526+
Next, we'll just compute the regression coefficients in a projection of $\hat A$ on $\tilde U$ using a standard least-squares formula
528527
529528
$$
530529
(\tilde U^\top \tilde U)^{-1} \tilde U^\top \hat A = (\tilde U^\top \tilde U)^{-1} \tilde U^\top X' \tilde V \tilde \Sigma^{-1} \tilde U^\top =
@@ -534,7 +533,7 @@ $$
534533
535534
536535
537-
Note that because we are now working with a reduced SVD, $\tilde U \tilde U^\top \neq I$.
536+
Note that because we are using a reduced SVD, $\tilde U \tilde U^\top \neq I$.
538537
539538
Consequently,
540539
@@ -585,8 +584,7 @@ $$ (eq:Phiformula)
585584
586585
It turns out that columns of $\Phi$ **are** eigenvectors of $\hat A$.
587586
588-
This is
589-
a consequence of a result established by Tu et al. {cite}`tu_Rowley`, which we now present.
587+
This is a consequence of a result established by Tu et al. {cite}`tu_Rowley`, which we now present.
590588
591589
592590
@@ -693,15 +691,13 @@ $$
693691
X = \Phi \check b + \epsilon
694692
$$ (eq:Xbcheck)
695693
696-
where $\epsilon$ is an $m \times n$ matrix of least squares errors satisfying the least squares
697-
orthogonality conditions $\epsilon^\top \Phi =0 $ or
694+
where $\epsilon$ is an $m \times n$ matrix of least squares errors satisfying the least squares orthogonality conditions $\epsilon^\top \Phi =0 $ or
698695
699696
$$
700697
(X - \Phi \check b)^\top \Phi = 0_{m \times p}
701698
$$ (eq:orthls)
702699
703-
Rearranging the orthogonality conditions {eq}`eq:orthls` gives $X^\top \Phi = \check b \Phi^\top \Phi$,
704-
which implies formula {eq}`eq:checkbform`.
700+
Rearranging the orthogonality conditions {eq}`eq:orthls` gives $X^\top \Phi = \check b \Phi^\top \Phi$, which implies formula {eq}`eq:checkbform`.
705701
706702
707703
@@ -711,11 +707,9 @@ which implies formula {eq}`eq:checkbform`.
711707
712708
713709
714-
There is a useful way to approximate the $p \times 1$ vector $\check b_t$ instead of using formula
715-
{eq}`eq:decoder102`.
710+
There is a useful way to approximate the $p \times 1$ vector $\check b_t$ instead of using formula {eq}`eq:decoder102`.
716711
717-
In particular, the following argument adapted from {cite}`DDSE_book` (page 240) provides a computationally efficient way
718-
to approximate $\check b_t$.
712+
In particular, the following argument adapted from {cite}`DDSE_book` (page 240) provides a computationally efficient way to approximate $\check b_t$.
719713
720714
For convenience, we'll do this first for time $t=1$.
721715
@@ -747,8 +741,7 @@ $$
747741
$$
748742
749743
750-
Replacing the error term $U^\top \epsilon_1$ by zero, and replacing $U$ from a full SVD of $X$ with
751-
$\tilde U$ from a reduced SVD, we obtain an approximation $\hat b_1$ to $\tilde b_1$:
744+
Replacing the error term $U^\top \epsilon_1$ by zero, and replacing $U$ from a full SVD of $X$ with $\tilde U$ from a reduced SVD, we obtain an approximation $\hat b_1$ to $\tilde b_1$:
752745
753746
754747
@@ -794,8 +787,7 @@ $$ (eq:bphieqn)
794787
795788
(To highlight that {eq}`eq:beqnsmall` is an approximation, users of DMD sometimes call components of the basis vector $\check b_t = \Phi^+ X_t $ the **exact** DMD modes.)
796789
797-
Conditional on $X_t$, we can compute our decoded $\check X_{t+j}, j = 1, 2, \ldots $ from
798-
either
790+
Conditional on $X_t$, we can compute our decoded $\check X_{t+j}, j = 1, 2, \ldots $ from either
799791
800792
$$
801793
\check X_{t+j} = \Phi \Lambda^j \Phi^{+} X_t
@@ -816,15 +808,12 @@ We can then use $\check X_{t+j}$ or $\hat X_{t+j}$ to forecast $X_{t+j}$.
816808
817809
In applications, we'll actually use only a few modes, often three or less.
818810
819-
Some of the preceding formulas assume that we have retained all $p$ modes associated with the positive
820-
singular values of $X$.
811+
Some of the preceding formulas assume that we have retained all $p$ modes associated with singular values of $X$.
821812
822813
We can adjust our formulas to describe a situation in which we instead retain only
823814
the $r < p$ largest singular values.
824815
825-
In that case, we simply replace $\tilde \Sigma$ with the appropriate $r\times r$ matrix of singular values,
826-
$\tilde U$ with the $m \times r$ matrix whose columns correspond to the $r$ largest singular values,
827-
and $\tilde V$ with the $n \times r$ matrix whose columns correspond to the $r$ largest singular values.
816+
In that case, we simply replace $\tilde \Sigma$ with the appropriate $r\times r$ matrix of singular values, $\tilde U$ with the $m \times r$ matrix whose columns correspond to the $r$ largest singular values, and $\tilde V$ with the $n \times r$ matrix whose columns correspond to the $r$ largest singular values.
828817
829818
Counterparts of all of the salient formulas above then apply.
830819

0 commit comments

Comments
 (0)