Skip to content

Commit fc2139e

Browse files
authored
ENH: minor edits of new lecture mix-model (#285)
* ENH: minor edits of new lecture mix-model * fix equation references * upload latex execution errors * Install numpyro[cuda] when installing jax * fix typo from editor * adjust scroll and surrounding text
1 parent 8df4052 commit fc2139e

File tree

2 files changed

+31
-70
lines changed

2 files changed

+31
-70
lines changed

.github/workflows/ci.yml

Lines changed: 8 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -43,6 +43,7 @@ jobs:
4343
shell: bash -l {0}
4444
run: |
4545
pip install --upgrade "jax[cuda]" -f https://storage.googleapis.com/jax-releases/jax_cuda_releases.html
46+
pip install --upgrade "numpyro[cuda]" -f https://storage.googleapis.com/jax-releases/jax_cuda_releases.html
4647
nvidia-smi
4748
- name: Install latex dependencies
4849
shell: bash -l {0}
@@ -84,12 +85,18 @@ jobs:
8485
jb build lectures --builder pdflatex --path-output ./ -n -W --keep-going
8586
mkdir _build/html/_pdf
8687
cp -u _build/latex/*.pdf _build/html/_pdf
88+
- name: Upload Execution Reports (LaTeX)
89+
uses: actions/upload-artifact@v2
90+
if: failure()
91+
with:
92+
name: execution-reports
93+
path: _build/latex/reports
8794
# Final Build of HTML
8895
- name: Build HTML
8996
shell: bash -l {0}
9097
run: |
9198
jb build lectures --path-output ./ -n -W --keep-going
92-
- name: Upload Execution Reports
99+
- name: Upload Execution Reports (HTML)
93100
uses: actions/upload-artifact@v2
94101
if: failure()
95102
with:

lectures/mix_model.md

Lines changed: 23 additions & 69 deletions
Original file line numberDiff line numberDiff line change
@@ -11,30 +11,24 @@ kernelspec:
1111
name: python3
1212
---
1313

14-
15-
<a id='likelihood-ratio-process'></a>
16-
17-
+++
18-
14+
(likelihood-ratio-process)=
1915
# Incorrect Models
2016

2117
## Overview
2218

23-
24-
This is a sequel to {doc}`this quantecon lecture <likelihood_bayes>`.
19+
This is a sequel to {doc}`this quantecon lecture <likelihood_bayes>`.
2520

2621
We discuss two ways to create compound lottery and their consequences.
2722

2823
A compound lottery can be said to create a _mixture distribution_.
2924

3025
Our two ways of constructing a compound lottery will differ in their **timing**.
3126

32-
* in one, mixing between two possible probability distributions will occur once and all at the beginning of time
33-
34-
* in the other, mixing between the same two possible possible probability distributions will occur each period
27+
* in one, mixing between two possible probability distributions will occur once and all at the beginning of time
3528

36-
The statistical setting is close but not identical to the problem studied in that quantecon lecture.
29+
* in the other, mixing between the same two possible possible probability distributions will occur each period
3730

31+
The statistical setting is close but not identical to the problem studied in that quantecon lecture.
3832

3933
In that lecture, there were two i.i.d. processes that could possibly govern successive draws of a non-negative random variable $W$.
4034

@@ -53,8 +47,6 @@ $$
5347
\pi_t = E [ \textrm{nature chose distribution} f | w^t] , \quad t = 0, 1, 2, \ldots
5448
$$
5549

56-
57-
5850
However, in the setting of this lecture, that rule imputes to the agent an incorrect model.
5951

6052
The reason is that now the wage sequence is actually described by a different statistical model.
@@ -70,7 +62,6 @@ $$
7062
H(w ) = \alpha F(w) + (1-\alpha) G(w), \quad \alpha \in (0,1)
7163
$$
7264

73-
7465
We'll study two agents who try to learn about the wage process, but who use different statistical models.
7566

7667
Both types of agent know $f$ and $g$ but neither knows $\alpha$.
@@ -100,7 +91,6 @@ is in a special sense *closest* to the $h$ that actually generates the data.
10091

10192
We'll tell the sense in which it is closest.
10293

103-
10494
Our second type of agent understands that nature mixes between $f$ and $g$ each period with a fixed mixing
10595
probability $\alpha$.
10696

@@ -111,17 +101,14 @@ The agent sets out to learn $\alpha$ using Bayes' law applied to his model.
111101
His model is correct in the sense that
112102
it includes the actual data generating process $h$ as a possible distribution.
113103

114-
115104
In this lecture, we'll learn about
116105

117106
* how nature can *mix* between two distributions $f$ and $g$ to create a new distribution $h$.
118107

119-
* The Kullback-Leibler statistical divergence <https://en.wikipedia.org/wiki/Kullback%E2%80%93Leibler_divergence> that governs statistical learning under an incorrect statistical model
108+
* The Kullback-Leibler statistical divergence <https://en.wikipedia.org/wiki/Kullback%E2%80%93Leibler_divergence> that governs statistical learning under an incorrect statistical model
120109

121110
* A useful Python function `numpy.searchsorted` that, in conjunction with a uniform random number generator, can be used to sample from an arbitrary distribution
122111

123-
124-
125112
As usual, we'll start by importing some Python tools.
126113

127114
```{code-cell} ipython3
@@ -214,14 +201,11 @@ l_seq_f = np.cumprod(l_arr_f, axis=1)
214201

215202
## Sampling from Compound Lottery $H$
216203

217-
218204
We implement two methods to draw samples from
219205
our mixture model $\alpha F + (1-\alpha) G$.
220206

221207
We'll generate samples using each of them and verify that they match well.
222208

223-
224-
225209
Here is pseudo code for a direct "method 1" for drawing from our compound lottery:
226210

227211
* Step one:
@@ -314,8 +298,6 @@ plt.show()
314298

315299
**Note:** With numba acceleration the first method is actually only slightly slower than the second when we generated 1,000,000 samples.
316300

317-
+++
318-
319301
## Type 1 Agent
320302

321303
We'll now study what our type 1 agent learns
@@ -326,7 +308,6 @@ The type 1 agent thus uses the learning algorithm studied in {doc}`this quantec
326308

327309
We'll briefly review that learning algorithm now.
328310

329-
330311
Let $ \pi_t $ be a Bayesian posterior defined as
331312

332313
$$
@@ -338,17 +319,15 @@ of the posterior probability $ \pi_t $, an instance of **Bayes’ Law**.
338319

339320
Bayes’ law implies that $ \{\pi_t\} $ obeys the recursion
340321

341-
342-
<a id='equation-eq-recur1'></a>
343-
$$
344-
\pi_t=\frac{\pi_{t-1} l_t(w_t)}{\pi_{t-1} l_t(w_t)+1-\pi_{t-1}} \tag{56.1}
345322
$$
323+
\pi_t=\frac{\pi_{t-1} l_t(w_t)}{\pi_{t-1} l_t(w_t)+1-\pi_{t-1}}
324+
$$ (equation-eq-recur1)
346325
347326
with $ \pi_{0} $ being a Bayesian prior probability that $ q = f $,
348327
i.e., a personal or subjective belief about $ q $ based on our having seen no data.
349328
350329
Below we define a Python function that updates belief $ \pi $ using
351-
likelihood ratio $ \ell $ according to recursion [(56.1)](#equation-eq-recur1)
330+
likelihood ratio $ \ell $ according to recursion {eq}`equation-eq-recur1`
352331
353332
```{code-cell} ipython3
354333
:hide-output: false
@@ -363,7 +342,7 @@ def update(π, l):
363342
return π
364343
```
365344
366-
Formula [(56.1)](#equation-eq-recur1) can be generalized by iterating on it and thereby deriving an
345+
Formula {eq}`equation-eq-recur1` can be generalized by iterating on it and thereby deriving an
367346
expression for the time $ t $ posterior $ \pi_{t+1} $ as a function
368347
of the time $ 0 $ prior $ \pi_0 $ and the likelihood ratio process
369348
$ L(w^{t+1}) $ at time $ t $.
@@ -413,24 +392,20 @@ After rearranging the preceding equation, we can express $ \pi_{t+1} $ as a
413392
function of $ L\left(w^{t+1}\right) $, the likelihood ratio process at $ t+1 $,
414393
and the initial prior $ \pi_{0} $
415394
416-
417-
<a id='equation-eq-bayeslaw103'></a>
418-
$$
419-
\pi_{t+1}=\frac{\pi_{0}L\left(w^{t+1}\right)}{\pi_{0}L\left(w^{t+1}\right)+1-\pi_{0}} . \tag{56.2}
420395
$$
396+
\pi_{t+1}=\frac{\pi_{0}L\left(w^{t+1}\right)}{\pi_{0}L\left(w^{t+1}\right)+1-\pi_{0}}.
397+
$$ (equation-eq-bayeslaw103)
421398
422-
Formula [(56.2)](#equation-eq-bayeslaw103) generalizes formula [(56.1)](#equation-eq-recur1).
399+
Formula {eq}`equation-eq-bayeslaw103` generalizes formula {eq}`equation-eq-recur1`.
423400
424-
Formula [(56.2)](#equation-eq-bayeslaw103) can be regarded as a one step revision of prior probability $ \pi_0 $ after seeing
401+
Formula {eq}`equation-eq-bayeslaw103` can be regarded as a one step revision of prior probability $ \pi_0 $ after seeing
425402
the batch of data $ \left\{ w_{i}\right\} _{i=1}^{t+1} $.
426403
427-
+++
428-
429404
## What a type 1 Agent Learns when Mixture $H$ Generates Data
430405
431406
We now study what happens when the mixture distribution $h;\alpha$ truly generated the data each period.
432407
433-
A submartingale or supermartingale continues to describe $\pi_t$
408+
A submartingale or supermartingale continues to describe $\pi_t$
434409
435410
It raises its ugly head and causes $\pi_t$ to converge either to $0$ or to $1$.
436411
@@ -439,10 +414,8 @@ This is true even though in truth nature always mixes between $f$ and $g$.
439414
After verifying that claim about possible limit points of $\pi_t$ sequences, we'll drill down and study
440415
what fundamental force determines the limiting value of $\pi_t$.
441416
442-
443417
Let's set a value of $\alpha$ and then watch how $\pi_t$ evolves.
444418
445-
446419
```{code-cell} ipython3
447420
def simulate_mixed(α, T=50, N=500):
448421
"""
@@ -506,13 +479,8 @@ plot_π_seq(α = 0.2)
506479
507480
Evidently, $\alpha$ is having a big effect on the destination of $\pi_t$ as $t \rightarrow + \infty$
508481
509-
510-
511-
+++
512-
513482
## Kullback-Leibler Divergence Governs Limit of $\pi_t$
514483
515-
516484
To understand what determines whether the limit point of $\pi_t$ is $0$ or $1$ and how the answer depends on the true value of the mixing probability $\alpha \in (0,1) $ that generates
517485
518486
$$ h(w) \equiv h(w | \alpha) = \alpha f(w) + (1-\alpha) g(w) $$
@@ -525,7 +493,9 @@ $$
525493

526494
and
527495

528-
$$ KL_f (\alpha) = \int \log\left(\frac{f(w)}{h(w)}\right) h(w) d w $$
496+
$$
497+
KL_f (\alpha) = \int \log\left(\frac{f(w)}{h(w)}\right) h(w) d w
498+
$$
529499

530500
We shall plot both of these functions against $\alpha$ as we use $\alpha$ to vary
531501
$h(w) = h(w|\alpha)$.
@@ -662,19 +632,18 @@ plt.show()
662632
Evidently, our type 1 learner who applies Bayes' law to his misspecified set of statistical models eventually learns an approximating model that is as close as possible to the true model, as measured by its
663633
Kullback-Leibler divergence.
664634

665-
+++
666-
667635
## Type 2 Agent
668636

669637
We now describe how our type 2 agent formulates his learning problem and what he eventually learns.
670638

671639
Our type 2 agent understands the correct statistical model but acknowledges does not know $\alpha$.
672640

673-
674641
We apply Bayes law to deduce an algorithm for learning $\alpha$ under the assumption
675642
that the agent knows that
676643

677-
$$ h(w) = h(w| \alpha)$$
644+
$$
645+
h(w) = h(w| \alpha)
646+
$$
678647

679648
but does not know $\alpha$.
680649

@@ -690,25 +659,16 @@ Bayes' law now takes the form
690659
$$
691660
\pi_{t+1}(\alpha) = \frac {h(w_{t+1} | \alpha) \pi_t(\alpha)}
692661
{ \int h(w_{t+1} | \hat \alpha) \pi_t(\hat \alpha) d \hat \alpha }
693-
$$
694-
662+
$$
695663

696664
We'll use numpyro to approximate this equation.
697665

698-
699666
We'll create graphs of the posterior $\pi_t(\alpha)$ as
700667
$t \rightarrow +\infty$ corresponding to ones presented in the quantecon lecture <https://python.quantecon.org/bayes_nonconj.html>.
701668

702669
We anticipate that a posterior distribution will collapse around the true $\alpha$ as
703670
$t \rightarrow + \infty$.
704671

705-
706-
707-
708-
709-
710-
+++
711-
712672
Let us try a uniform prior first.
713673

714674
We use the `Mixture` class in Numpyro to construct the likelihood function.
@@ -737,11 +697,9 @@ def MCMC_run(ws):
737697
return sample['α']
738698
```
739699

740-
After the following code, please scroll down all the way to the end to see the graph that displays Bayesian posteriors for $\alpha$
741-
at various history lengths.
700+
The following code displays Bayesian posteriors for $\alpha$ at various history lengths.
742701

743702
```{code-cell} ipython3
744-
:tags: ["output_scroll"]
745703
746704
fig, ax = plt.subplots(figsize=(10, 6))
747705
@@ -757,11 +715,8 @@ ax.set_xlabel('$\\alpha$')
757715
plt.show()
758716
```
759717

760-
Again, please scroll down to the end of the output to see a graph of posteriors at different history lengths.
761-
762718
It shows how the Bayesian posterior narrows in on the true value $\alpha = .8$ of the mixing parameter as the length of a history of observations grows.
763719

764-
765720
## Concluding Remarks
766721

767722
Our type 1 person deploys an incorrect statistical model.
@@ -789,5 +744,4 @@ $s(x | \theta)$ to infer $\theta$ from $x$.
789744

790745
But the scientist's model is misspecified, being only an approximation to an unknown model $h$ that nature uses to generate $X$.
791746

792-
793747
If the scientist uses Bayes' law or a related likelihood-based method to infer $\theta$, it occurs quite generally that for large sample sizes the inverse problem infers a $\theta$ that minimizes the KL divergence of the scientist's model $s$ relative to nature's model $h$.

0 commit comments

Comments
 (0)