You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: lectures/var_dmd.md
+41-52Lines changed: 41 additions & 52 deletions
Original file line number
Diff line number
Diff line change
@@ -82,7 +82,7 @@ Evidently, $ X $ and $ X' $ are both $ m \times n $ matrices.
82
82
83
83
We denote the rank of $ X $ as $ p \leq \min(m, n) $.
84
84
85
-
Two possible cases are
85
+
Two cases that interest us are
86
86
87
87
* $ n > > m$, so that we have many more time series observations $n$ than variables $m$
88
88
* $m > > n$, so that we have many more variables $m $ than time series observations $n$
@@ -121,9 +121,9 @@ $$
121
121
$$ (eq:Ahatform101)
122
122
123
123
124
-
This formula for least-squares regression coefficients widely used in econometrics.
124
+
This formula for least-squares regression coefficients is widely used in econometrics.
125
125
126
-
For example, it is used to estimate vector autorgressions.
126
+
It is used to estimate vector autorgressions.
127
127
128
128
The right side of formula {eq}`eq:Ahatform101` is proportional to the empirical cross second moment matrix of $X_{t+1}$ and $X_t$ times the inverse
129
129
of the second moment matrix of $X_t$.
@@ -149,7 +149,7 @@ $$ (eq:hatAversion0)
149
149
150
150
Please compare formulas {eq}`eq:Ahatform101` and {eq}`eq:hatAversion0` for $\hat A$.
151
151
152
-
Here we are interested in formula {eq}`eq:hatAversion0`.
152
+
Here we are especially interested in formula {eq}`eq:hatAversion0`.
153
153
154
154
The $ i $th row of $ \hat A $ is an $ m \times 1 $ vector of regression coefficients of $ X_{i,t+1} $ on $ X_{j,t}, j = 1, \ldots, m $.
155
155
@@ -162,14 +162,20 @@ $$
162
162
163
163
so that the regression equation **fits perfectly**.
164
164
165
-
This is the usual outcome in an **underdetermined least-squares** model.
165
+
This is a typical outcome in an **underdetermined least-squares** model.
166
166
167
167
168
168
To reiterate, in the **tall-skinny** case (described in {doc}`Singular Value Decomposition <svd_intro>`) in which we have a number $n$ of observations that is small relative to the number $m$ of
169
169
attributes that appear in the vector $X_t$, we want to fit equation {eq}`eq:VARfirstorder`.
170
170
171
+
We confront the facts that the least squares estimator is underdetermined and that the regression equation fits perfectly.
171
172
172
-
To offer ideas about how we can efficiently calculate the pseudo-inverse $X^+$, as our estimator $\hat A$ of $A$ we form an $m \times m$ matrix that solves the least-squares best-fit problem
173
+
174
+
To proceed, we'll want efficiently to calculate the pseudo-inverse $X^+$.
175
+
176
+
The pseudo-inverse $X^+$ will be a component of our estimator of $A$.
177
+
178
+
As our estimator $\hat A$ of $A$ we want to form an $m \times m$ matrix that solves the least-squares best-fit problem
173
179
174
180
$$
175
181
\hat A = \textrm{argmin}_{\check A} || X' - \check A X ||_F
@@ -195,9 +201,12 @@ where the (possibly huge) $ n \times m $ matrix $ X^{+} = (X^\top X)^{-1} X^\to
195
201
196
202
197
203
198
-
For some situations that we are interested in, $X^\top X $ can be close to singular, a situation that can make some numerical algorithms be error-prone.
204
+
For some situations that we are interested in, $X^\top X $ can be close to singular, a situation that makes some numerical algorithms be inaccurate.
199
205
200
-
To acknowledge that possibility, we'll use efficient algorithms for computing and for constructing reduced rank approximations of $\hat A$ in formula {eq}`eq:hatAversion0`.
206
+
To acknowledge that possibility, we'll use efficient algorithms to constructing
207
+
a **reduced-rank approximation** of $\hat A$ in formula {eq}`eq:hatAversion0`.
208
+
209
+
Such an approximation to our vector autoregression will no longer fit perfectly.
201
210
202
211
203
212
The $ i $th row of $ \hat A $ is an $ m \times 1 $ vector of regression coefficients of $ X_{i,t+1} $ on $ X_{j,t}, j = 1, \ldots, m $.
@@ -262,20 +271,21 @@ Dynamic mode decomposition was introduced by {cite}`schmid2010`,
262
271
263
272
You can read about Dynamic Mode Decomposition here {cite}`DMD_book` and here [[BK19](https://python.quantecon.org/zreferences.html#id25)] (section 7.2).
264
273
265
-
**Dynamic Mode Decomposition** (DMD) computes a rank $ r < p $ approximation to the least square regression coefficients $ \hat A $ that we described above by formula {eq}`eq:AhatSVDformula`.
274
+
**Dynamic Mode Decomposition** (DMD) computes a rank $ r < p $ approximation to the least squares regression coefficients $ \hat A $ described by formula {eq}`eq:AhatSVDformula`.
266
275
267
276
268
277
We'll build up gradually to a formulation that is useful in applications.
269
278
270
279
271
-
We'll do this by describing three alternative representations of our first-order linear dynamic system, i.e.,
272
-
our vector autoregression.
280
+
We'll do this by describing three alternative representations of our first-order linear dynamic system, i.e., our vector autoregression.
281
+
282
+
**Guide to three representations:** In practice, we'll mainly be interested in Representation 3.
273
283
274
-
**Guide to three representations:** In practice, we'll be interested in Representation 3.
284
+
We use the first two representations to present some useful intermediate steps that help us to appreciate what is under the hood of Representation 3.
275
285
276
-
We present the first 2 in order to set the stage for some intermediate steps that might help us understand what is under the hood of Representation 3.
286
+
In applications, we'll use only a small subset of **DMD modes** to approximate dynamics.
277
287
278
-
In applications, we'll use only a small subset of the DMD to approximate dynamics.
288
+
We use such a small subset of DMD modes to construct a reduced-rank approximation to $A$.
279
289
280
290
To do that, we'll want to use the **reduced** SVD's affiliated with representation 3, not the **full** SVD's affiliated with representations 1 and 2.
281
291
@@ -337,9 +347,7 @@ $$
337
347
\tilde b_{t+1} = \tilde A \tilde b_t
338
348
$$
339
349
340
-
To construct forecasts $\overline X_t$ of future values of $X_t$ conditional on $X_1$, we can apply decoders
341
-
(i.e., rotators) to both sides of this
342
-
equation and deduce
350
+
To construct forecasts $\overline X_t$ of future values of $X_t$ conditional on $X_1$, we can apply decoders (i.e., rotators) to both sides of this equation and deduce
343
351
344
352
$$
345
353
\overline X_{t+1} = U \tilde A^t U^\top X_1
@@ -363,15 +371,15 @@ As with Representation 1, we continue to
363
371
364
372
365
373
366
-
As we observed and illustrated earlier in this lecture
374
+
As we observed and illustrated in a lecture about the {doc}`Singular Value Decomposition <svd_intro>`
367
375
368
376
* (a) for a full SVD $U U^\top = I_{m \times m} $ and $U^\top U = I_{p \times p}$ are both identity matrices
369
377
370
378
* (b) for a reduced SVD of $X$, $U^\top U $ is not an identity matrix.
371
379
372
380
As we shall see later, a full SVD is too confining for what we ultimately want to do, namely, cope with situations in which $U^\top U$ is **not** an identity matrix because we use a reduced SVD of $X$.
373
381
374
-
But for now, let's proceed under the assumption that we are using a full SVD so that both of the preceding two requirements (a) and (b) are satisfied.
382
+
But for now, let's proceed under the assumption that we are using a full SVD so that requirements (a) and (b) are both satisfied.
375
383
376
384
377
385
@@ -391,8 +399,7 @@ $$
391
399
\hat A = U \tilde A U^\top = U W \Lambda W^{-1} U^\top
392
400
$$ (eq:eqeigAhat)
393
401
394
-
According to equation {eq}`eq:eqeigAhat`, the diagonal matrix $\Lambda$ contains eigenvalues of
395
-
$\hat A$ and corresponding eigenvectors of $\hat A$ are columns of the matrix $UW$.
402
+
According to equation {eq}`eq:eqeigAhat`, the diagonal matrix $\Lambda$ contains eigenvalues of $\hat A$ and corresponding eigenvectors of $\hat A$ are columns of the matrix $UW$.
396
403
397
404
It follows that the systematic (i.e., not random) parts of the $X_t$ dynamics captured by our first-order vector autoregressions are described by
398
405
@@ -467,16 +474,9 @@ $$
467
474
468
475
is a matrix of regression coefficients of the $m \times n$ matrix $X$ on the $m \times p$ matrix $\Phi_s$.
469
476
470
-
We'll say more about this interpretation in a related context when we discuss representation 3.
471
-
472
-
473
-
474
-
475
-
476
-
477
-
We turn next to an alternative representation suggested by Tu et al. {cite}`tu_Rowley`.
477
+
We'll say more about this interpretation in a related context when we discuss representation 3, which was suggested by Tu et al. {cite}`tu_Rowley`.
478
478
479
-
It is more appropriate to use this alternative representation when, as is typically the case in practice, we use a reduced SVD.
479
+
It is more appropriate to use representation 3 when, as is often the case in practice, we want to use a reduced SVD.
480
480
481
481
482
482
@@ -523,8 +523,7 @@ $$ (eq:tildeAverify)
523
523
524
524
525
525
526
-
Next, we'll just compute the regression coefficients in a projection of $\hat A$ on $\tilde U$ using the
527
-
standard least-square formula
526
+
Next, we'll just compute the regression coefficients in a projection of $\hat A$ on $\tilde U$ using a standard least-squares formula
Note that because we are now working with a reduced SVD, $\tilde U \tilde U^\top \neq I$.
536
+
Note that because we are using a reduced SVD, $\tilde U \tilde U^\top \neq I$.
538
537
539
538
Consequently,
540
539
@@ -585,8 +584,7 @@ $$ (eq:Phiformula)
585
584
586
585
It turns out that columns of $\Phi$ **are** eigenvectors of $\hat A$.
587
586
588
-
This is
589
-
a consequence of a result established by Tu et al. {cite}`tu_Rowley`, which we now present.
587
+
This is a consequence of a result established by Tu et al. {cite}`tu_Rowley`, which we now present.
590
588
591
589
592
590
@@ -693,15 +691,13 @@ $$
693
691
X = \Phi \check b + \epsilon
694
692
$$ (eq:Xbcheck)
695
693
696
-
where $\epsilon$ is an $m \times n$ matrix of least squares errors satisfying the least squares
697
-
orthogonality conditions $\epsilon^\top \Phi =0 $ or
694
+
where $\epsilon$ is an $m \times n$ matrix of least squares errors satisfying the least squares orthogonality conditions $\epsilon^\top \Phi =0 $ or
698
695
699
696
$$
700
697
(X - \Phi \check b)^\top \Phi = 0_{m \times p}
701
698
$$ (eq:orthls)
702
699
703
-
Rearranging the orthogonality conditions {eq}`eq:orthls` gives $X^\top \Phi = \check b \Phi^\top \Phi$,
704
-
which implies formula {eq}`eq:checkbform`.
700
+
Rearranging the orthogonality conditions {eq}`eq:orthls` gives $X^\top \Phi = \check b \Phi^\top \Phi$, which implies formula {eq}`eq:checkbform`.
705
701
706
702
707
703
@@ -711,11 +707,9 @@ which implies formula {eq}`eq:checkbform`.
711
707
712
708
713
709
714
-
There is a useful way to approximate the $p \times 1$ vector $\check b_t$ instead of using formula
715
-
{eq}`eq:decoder102`.
710
+
There is a useful way to approximate the $p \times 1$ vector $\check b_t$ instead of using formula {eq}`eq:decoder102`.
716
711
717
-
In particular, the following argument adapted from {cite}`DDSE_book` (page 240) provides a computationally efficient way
718
-
to approximate $\check b_t$.
712
+
In particular, the following argument adapted from {cite}`DDSE_book` (page 240) provides a computationally efficient way to approximate $\check b_t$.
719
713
720
714
For convenience, we'll do this first for time $t=1$.
721
715
@@ -747,8 +741,7 @@ $$
747
741
$$
748
742
749
743
750
-
Replacing the error term $U^\top \epsilon_1$ by zero, and replacing $U$ from a full SVD of $X$ with
751
-
$\tilde U$ from a reduced SVD, we obtain an approximation $\hat b_1$ to $\tilde b_1$:
744
+
Replacing the error term $U^\top \epsilon_1$ by zero, and replacing $U$ from a full SVD of $X$ with $\tilde U$ from a reduced SVD, we obtain an approximation $\hat b_1$ to $\tilde b_1$:
752
745
753
746
754
747
@@ -794,8 +787,7 @@ $$ (eq:bphieqn)
794
787
795
788
(To highlight that {eq}`eq:beqnsmall` is an approximation, users of DMD sometimes call components of the basis vector $\check b_t = \Phi^+ X_t $ the **exact** DMD modes.)
796
789
797
-
Conditional on $X_t$, we can compute our decoded $\check X_{t+j}, j = 1, 2, \ldots $ from
798
-
either
790
+
Conditional on $X_t$, we can compute our decoded $\check X_{t+j}, j = 1, 2, \ldots $ from either
799
791
800
792
$$
801
793
\check X_{t+j} = \Phi \Lambda^j \Phi^{+} X_t
@@ -816,15 +808,12 @@ We can then use $\check X_{t+j}$ or $\hat X_{t+j}$ to forecast $X_{t+j}$.
816
808
817
809
In applications, we'll actually use only a few modes, often three or less.
818
810
819
-
Some of the preceding formulas assume that we have retained all $p$ modes associated with the positive
820
-
singular values of $X$.
811
+
Some of the preceding formulas assume that we have retained all $p$ modes associated with singular values of $X$.
821
812
822
813
We can adjust our formulas to describe a situation in which we instead retain only
823
814
the $r < p$ largest singular values.
824
815
825
-
In that case, we simply replace $\tilde \Sigma$ with the appropriate $r\times r$ matrix of singular values,
826
-
$\tilde U$ with the $m \times r$ matrix whose columns correspond to the $r$ largest singular values,
827
-
and $\tilde V$ with the $n \times r$ matrix whose columns correspond to the $r$ largest singular values.
816
+
In that case, we simply replace $\tilde \Sigma$ with the appropriate $r\times r$ matrix of singular values, $\tilde U$ with the $m \times r$ matrix whose columns correspond to the $r$ largest singular values, and $\tilde V$ with the $n \times r$ matrix whose columns correspond to the $r$ largest singular values.
828
817
829
818
Counterparts of all of the salient formulas above then apply.
0 commit comments