You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
* ENH: minor edits of new lecture mix-model
* fix equation references
* upload latex execution errors
* Install numpyro[cuda] when installing jax
* fix typo from editor
* adjust scroll and surrounding text
We'll study two agents who try to learn about the wage process, but who use different statistical models.
75
66
76
67
Both types of agent know $f$ and $g$ but neither knows $\alpha$.
@@ -100,7 +91,6 @@ is in a special sense *closest* to the $h$ that actually generates the data.
100
91
101
92
We'll tell the sense in which it is closest.
102
93
103
-
104
94
Our second type of agent understands that nature mixes between $f$ and $g$ each period with a fixed mixing
105
95
probability $\alpha$.
106
96
@@ -111,17 +101,14 @@ The agent sets out to learn $\alpha$ using Bayes' law applied to his model.
111
101
His model is correct in the sense that
112
102
it includes the actual data generating process $h$ as a possible distribution.
113
103
114
-
115
104
In this lecture, we'll learn about
116
105
117
106
* how nature can *mix* between two distributions $f$ and $g$ to create a new distribution $h$.
118
107
119
-
*The Kullback-Leibler statistical divergence <https://en.wikipedia.org/wiki/Kullback%E2%80%93Leibler_divergence> that governs statistical learning under an incorrect statistical model
108
+
* The Kullback-Leibler statistical divergence <https://en.wikipedia.org/wiki/Kullback%E2%80%93Leibler_divergence> that governs statistical learning under an incorrect statistical model
120
109
121
110
* A useful Python function `numpy.searchsorted` that, in conjunction with a uniform random number generator, can be used to sample from an arbitrary distribution
122
111
123
-
124
-
125
112
As usual, we'll start by importing some Python tools.
Formula [(56.2)](#equation-eq-bayeslaw103) generalizes formula [(56.1)](#equation-eq-recur1).
399
+
Formula {eq}`equation-eq-bayeslaw103` generalizes formula {eq}`equation-eq-recur1`.
423
400
424
-
Formula [(56.2)](#equation-eq-bayeslaw103) can be regarded as a one step revision of prior probability $ \pi_0 $ after seeing
401
+
Formula {eq}`equation-eq-bayeslaw103` can be regarded as a one step revision of prior probability $ \pi_0 $ after seeing
425
402
the batch of data $ \left\{ w_{i}\right\} _{i=1}^{t+1} $.
426
403
427
-
+++
428
-
429
404
## What a type 1 Agent Learns when Mixture $H$ Generates Data
430
405
431
406
We now study what happens when the mixture distribution $h;\alpha$ truly generated the data each period.
432
407
433
-
A submartingale or supermartingale continues to describe $\pi_t$
408
+
A submartingale or supermartingale continues to describe $\pi_t$
434
409
435
410
It raises its ugly head and causes $\pi_t$ to converge either to $0$ or to $1$.
436
411
@@ -439,10 +414,8 @@ This is true even though in truth nature always mixes between $f$ and $g$.
439
414
After verifying that claim about possible limit points of $\pi_t$ sequences, we'll drill down and study
440
415
what fundamental force determines the limiting value of $\pi_t$.
441
416
442
-
443
417
Let's set a value of $\alpha$ and then watch how $\pi_t$ evolves.
444
418
445
-
446
419
```{code-cell} ipython3
447
420
def simulate_mixed(α, T=50, N=500):
448
421
"""
@@ -506,13 +479,8 @@ plot_π_seq(α = 0.2)
506
479
507
480
Evidently, $\alpha$ is having a big effect on the destination of $\pi_t$ as $t \rightarrow + \infty$
508
481
509
-
510
-
511
-
+++
512
-
513
482
## Kullback-Leibler Divergence Governs Limit of $\pi_t$
514
483
515
-
516
484
To understand what determines whether the limit point of $\pi_t$ is $0$ or $1$ and how the answer depends on the true value of the mixing probability $\alpha \in (0,1) $ that generates
$$ KL_f (\alpha) = \int \log\left(\frac{f(w)}{h(w)}\right) h(w) d w $$
496
+
$$
497
+
KL_f (\alpha) = \int \log\left(\frac{f(w)}{h(w)}\right) h(w) d w
498
+
$$
529
499
530
500
We shall plot both of these functions against $\alpha$ as we use $\alpha$ to vary
531
501
$h(w) = h(w|\alpha)$.
@@ -662,19 +632,18 @@ plt.show()
662
632
Evidently, our type 1 learner who applies Bayes' law to his misspecified set of statistical models eventually learns an approximating model that is as close as possible to the true model, as measured by its
663
633
Kullback-Leibler divergence.
664
634
665
-
+++
666
-
667
635
## Type 2 Agent
668
636
669
637
We now describe how our type 2 agent formulates his learning problem and what he eventually learns.
670
638
671
639
Our type 2 agent understands the correct statistical model but acknowledges does not know $\alpha$.
672
640
673
-
674
641
We apply Bayes law to deduce an algorithm for learning $\alpha$ under the assumption
675
642
that the agent knows that
676
643
677
-
$$ h(w) = h(w| \alpha)$$
644
+
$$
645
+
h(w) = h(w| \alpha)
646
+
$$
678
647
679
648
but does not know $\alpha$.
680
649
@@ -690,25 +659,16 @@ Bayes' law now takes the form
We'll create graphs of the posterior $\pi_t(\alpha)$ as
700
667
$t \rightarrow +\infty$ corresponding to ones presented in the quantecon lecture <https://python.quantecon.org/bayes_nonconj.html>.
701
668
702
669
We anticipate that a posterior distribution will collapse around the true $\alpha$ as
703
670
$t \rightarrow + \infty$.
704
671
705
-
706
-
707
-
708
-
709
-
710
-
+++
711
-
712
672
Let us try a uniform prior first.
713
673
714
674
We use the `Mixture` class in Numpyro to construct the likelihood function.
@@ -737,11 +697,9 @@ def MCMC_run(ws):
737
697
return sample['α']
738
698
```
739
699
740
-
After the following code, please scroll down all the way to the end to see the graph that displays Bayesian posteriors for $\alpha$
741
-
at various history lengths.
700
+
The following code displays Bayesian posteriors for $\alpha$ at various history lengths.
742
701
743
702
```{code-cell} ipython3
744
-
:tags: ["output_scroll"]
745
703
746
704
fig, ax = plt.subplots(figsize=(10, 6))
747
705
@@ -757,11 +715,8 @@ ax.set_xlabel('$\\alpha$')
757
715
plt.show()
758
716
```
759
717
760
-
Again, please scroll down to the end of the output to see a graph of posteriors at different history lengths.
761
-
762
718
It shows how the Bayesian posterior narrows in on the true value $\alpha = .8$ of the mixing parameter as the length of a history of observations grows.
763
719
764
-
765
720
## Concluding Remarks
766
721
767
722
Our type 1 person deploys an incorrect statistical model.
@@ -789,5 +744,4 @@ $s(x | \theta)$ to infer $\theta$ from $x$.
789
744
790
745
But the scientist's model is misspecified, being only an approximation to an unknown model $h$ that nature uses to generate $X$.
791
746
792
-
793
747
If the scientist uses Bayes' law or a related likelihood-based method to infer $\theta$, it occurs quite generally that for large sample sizes the inverse problem infers a $\theta$ that minimizes the KL divergence of the scientist's model $s$ relative to nature's model $h$.
0 commit comments