You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: lectures/prob_meaning.md
+46-35Lines changed: 46 additions & 35 deletions
Original file line number
Diff line number
Diff line change
@@ -21,9 +21,9 @@ This lecture illustrates two distinct interpretations of a **probability distr
21
21
22
22
* A frequentist interpretation as **relative frequencies** anticipated to occur in a large i.i.d. sample
23
23
24
-
* A Bayesian interpretation as a **personal probability** (about a parameter or list of parameters) after seeing a collection of observations
24
+
* A Bayesian interpretation as a **personal opinion** (about a parameter or list of parameters) after seeing a collection of observations
25
25
26
-
We recommend watching this video about **hypothesis testing**according to the frequentist approach
26
+
We recommend watching this video about **hypothesis testing**within the frequentist approach
27
27
28
28
```{youtube} 8JIe_cz6qGA
29
29
```
@@ -33,7 +33,7 @@ After you watch that video, please watch the following video on the Bayesian app
33
33
```{youtube} Pahyv9i_X2k
34
34
```
35
35
36
-
After you are familiar with the above material, this lecture uses the Socratic method to to help consolidate your understanding of the different questions that are answered by
36
+
After you are familiar with the material in these videos, this lecture uses the Socratic method to to help consolidate your understanding of the different questions that are answered by
37
37
38
38
* a frequentist confidence interval
39
39
@@ -49,7 +49,7 @@ We provide our own answers as the lecture unfolds, but you'll learn more if you
49
49
**Code for answering questions:**
50
50
51
51
52
-
In addition to what’s in Anaconda, this lecture will need the following libraries:
52
+
In addition to what’s in Anaconda, this lecture will deploy the following library:
53
53
54
54
```{code-cell} ipython3
55
55
:tags: [hide-output]
@@ -83,26 +83,26 @@ $$
83
83
84
84
where the fixed parameter $\theta \in (0,1)$.
85
85
86
-
This is called the the __binomial distribution__.
86
+
This is called the __binomial distribution__.
87
87
88
88
Here
89
89
90
90
* $\theta$ is the probability that one toss of a coin will be a head, an outcome that we encode as $Y = 1$.
91
91
92
92
* $1 -\theta$ is the probability that one toss of the coin will be a tail, an outcome that we denote $Y = 0$.
93
93
94
-
* $X$ is the total number of heads that come up after flipping the coin $n$ times.
94
+
* $X$ is the total number of heads that came up after flipping the coin $n$ times.
95
95
96
96
Consider the following experiment:
97
97
98
-
Take **$I$ independent sequences of $n$ independent flips of the coin**
98
+
Take $I$ **independent** sequences of $n$ **independent** flips of the coin**
99
99
100
100
Notice the repeated use of the adjective **independent**:
101
101
102
102
* we use it once to describe that we are drawing $n$ independent times from a **Bernoulli** distribution with parameter $\theta$ to arrive at one draw from a **Binomial** distribution with parameters
103
103
$\theta,n$.
104
104
105
-
* we use it again to describe that we are then drawing $I$ such sequences of $n$ coin draws.
105
+
* we use it again to describe that we are then drawing $I$ sequences of $n$ coin draws.
106
106
107
107
Let $y_h^i \in \{0, 1\}$ be the realized value of $Y$ on the $h$th flip during the $i$th sequence of flips.
108
108
@@ -210,10 +210,10 @@ Let's do some more calculations.
210
210
Now we fix
211
211
212
212
$$
213
-
n=20, k=10, I=1,000,000
213
+
n=20, k=10, I=1,000,000
214
214
$$
215
215
216
-
and vary $\theta$ from $0.01$ to $0.99$.
216
+
We'll vary $\theta$ from $0.01$ to $0.99$ and plot outcomes against $\theta$.
217
217
218
218
```{code-cell} ipython3
219
219
θ_low, θ_high, npt = 0.01, 0.99, 50
@@ -246,6 +246,8 @@ plt.show()
246
246
247
247
Now we fix $\theta=0.7, k=10, I=1,000,000$ and vary $n$ from $1$ to $100$.
248
248
249
+
Then we'll plot outcomes.
250
+
249
251
```{code-cell} ipython3
250
252
n_low, n_high, nn = 1, 100, 50
251
253
ns = np.linspace(n_low, n_high, nn, dtype='int')
@@ -309,7 +311,8 @@ From the above graphs, we can see that **$I$, the number of independent sequence
309
311
310
312
When $I$ becomes larger, the difference between theoretical probability and frequentist estimate becomes smaller.
311
313
312
-
Also, as long as $I$ is large enough, changing $\theta$ or $n$ does not significantly change the accuracy of frequentist estimation.
314
+
Also, as long as $I$ is large enough, changing $\theta$ or $n$ does not substantially change the accuracy of the observed fraction
315
+
as an approximation of $\theta$.
313
316
314
317
The Law of Large Numbers is at work here.
315
318
@@ -330,23 +333,29 @@ as $I$ goes to infinity.
330
333
331
334
## Bayesian Interpretation
332
335
333
-
Consider again a binomial distribution above, but now assume that the parameter $\theta$ is a fixed number.
336
+
We again a binomial distribution.
337
+
338
+
But now we don't regard $\theta$ as being a fixed number.
339
+
340
+
Instead, we think of it as a **random variable**.
334
341
335
-
Instead, we think of it as a **random variable** in the sense that it is itself described by a probability
336
-
distribution.
342
+
$\theta$ is described by a probability distribution.
337
343
338
-
But now this probability distribution means something different than relative frequency that we anticipate in a large i.i.d. sample.
344
+
But now this probability distribution means something different than a relative frequency that we can anticipate to occur in a large i.i.d. sample.
339
345
340
-
Instead, the probability distribution for the parameter $\theta$ is now a summary of our views about the likely values of $\theta$ before we have seen any data, or any more data.
346
+
Instead, the probability distribution of $\theta$ is now a summary of our views about likely values of $\theta$ either
347
+
348
+
***before** we have seen **any** data at all, or
349
+
***before** we have seen **more** data, after we have seen **some** data
341
350
342
351
Thus, suppose that, before seeing any data, you have a personal prior probability distribution saying that
As n increases, we can see that Bayesian coverage intervals narrow and move toward $0.4$.
523
+
As $n$ increases, we can see that Bayesian coverage intervals narrow and move toward $0.4$.
515
524
516
525
**g)** Please tell what question a Bayesian coverage interval answers.
517
526
@@ -543,25 +552,25 @@ ax.set_xlabel('Number of Observations', fontsize=11)
543
552
plt.show()
544
553
```
545
554
546
-
Notice that in the graph above the posterior probabililty that $\theta \in [.45, .55]$ exhibits a hump shape (in general) as n increases.
555
+
Notice that in the graph above the posterior probabililty that $\theta \in [.45, .55]$ typically exhibits a hump shape as $n$ increases.
547
556
548
557
Two opposing forces are at work.
549
558
550
559
The first force is that the individual adjusts his belief as he observes new outcomes, so his posterior probability distribution becomes more and more realistic, which explains the rise of the posterior probabililty.
551
560
552
-
However, $[.45, .55]$ actually excludes the true $\theta$ that generates the data (which is 0.4).
561
+
However, $[.45, .55]$ actually excludes the true $\theta =.4 $ that generates the data.
553
562
554
563
As a result, the posterior probabililty drops as larger and larger samples refine his posterior probability distribution of $\theta$.
555
564
556
565
The descent seems precipitous only because of the scale of the graph that has the number of observations increasing disproportionately.
557
566
558
-
When the number of observations becomes large enough, our Bayesian becomes so confident about $\theta$ that he considers $\theta \in [.45, .55]$ unlikely.
567
+
When the number of observations becomes large enough, our Bayesian becomes so confident about $\theta$ that he considers $\theta \in [.45, .55]$ very unlikely.
559
568
560
569
That is why we see a nearly horizontal line when the number of observations exceeds 500.
561
570
562
571
**i)** Please use your Python class to study what happens to the posterior distribution as $n \rightarrow + \infty$, again assuming that the true value of $\theta = .4$, though it is unknown to the person doing the updating via Bayes' Law.
563
572
564
-
Using the Python class we made above, we can see the evolution of posterior distributions as n approaches infinity.
573
+
Using the Python class we made above, we can see the evolution of posterior distributions as $n$ approaches infinity.
565
574
566
575
```{code-cell} ipython3
567
576
fig, ax = plt.subplots(figsize=(10, 6))
@@ -579,11 +588,11 @@ ax.legend(fontsize=11)
579
588
plt.show()
580
589
```
581
590
582
-
As $n$ increases, we can see that the probability density functions 'concentrate' on $0.4$, which is the true value of $\theta$.
591
+
As $n$ increases, we can see that the probability density functions _concentrate_ on $0.4$, the true value of $\theta$.
583
592
584
-
Correspondingly, posterior means converge to $0.4$ while posterior standard deviations drop to $0$.
593
+
Here the posterior means converges to $0.4$ while the posterior standard deviations converges to $0$ from above.
585
594
586
-
To show this, we explicitly compute these statistics of the posterior distributions.
595
+
To show this, we compute the means and variances statistics of the posterior distributions.
587
596
588
597
```{code-cell} ipython3
589
598
mean_list = [ii.mean() for ii in Bay_stat.posterior_list]
@@ -608,9 +617,9 @@ plt.show()
608
617
609
618
How shall we interpret the patterns above?
610
619
611
-
The answer is encoded in the Bayesian updating formulas
620
+
The answer is encoded in the Bayesian updating formulas.
612
621
613
-
It is natural to extend the one-step Bayesian update to n-step Bayesian update.
622
+
It is natural to extend the one-step Bayesian update to an $n$-step Bayesian update.
614
623
615
624
616
625
$$
@@ -629,7 +638,7 @@ $$
629
638
={Beta}(\alpha + k, \beta+N-k)
630
639
$$
631
640
632
-
A Beta Distribution with $\alpha$ and $\beta$ has the following mean and variance.
641
+
A beta Distribution with $\alpha$ and $\beta$ has the following mean and variance.
633
642
634
643
The mean is $\frac{\alpha}{\alpha + \beta}$
635
644
@@ -639,11 +648,13 @@ The variance is $\frac{\alpha \beta}{(\alpha + \beta)^2 (\alpha + \beta + 1)}$
639
648
640
649
* $\beta$ can be viewed as the number of failures
641
650
642
-
The random variables $k$ and $N-k$ are governed by Binomial Distribution with $\theta=0.4$ (that we call true data generation process).
651
+
The random variables $k$ and $N-k$ are governed by Binomial Distribution with $\theta=0.4$.
652
+
653
+
Call this the true data generating process.
643
654
644
-
According to the Law of Large Numbers, for a large number of observations, observed frequencies of $k$ and $N-k$ will be described by the true data generation process, i.e., the population probability distribution that we assumed when generating the observations on the computer. (See Exercise $1$).
655
+
According to the Law of Large Numbers, for a large number of observations, observed frequencies of $k$ and $N-k$ will be described by the true data generating process, i.e., the population probability distribution that we assumed when generating the observations on the computer. (See Exercise $1$).
645
656
646
-
Consequently, the mean of the posterior distribution converges to $0.4$ and the variance withers toward zero.
657
+
Consequently, the mean of the posterior distribution converges to $0.4$ and the variance withers to zero.
647
658
648
659
```{code-cell} ipython3
649
660
upper_bound = [ii.ppf(0.95) for ii in Bay_stat.posterior_list]
@@ -664,7 +675,7 @@ plt.show()
664
675
665
676
After observing a large number of outcomes, the posterior distribution collapses around $0.4$.
666
677
667
-
Thus, he comes to believe that $\theta$ is near $.4$.
678
+
Thus, the Bayesian statististian comes to believe that $\theta$ is near $.4$.
668
679
669
680
As shown in the figure above, as the number of observations grows, the Bayesian coverage intervals (BCIs) become narrower and narrower around $0.4$.
0 commit comments