Skip to content

Commit c30411f

Browse files
Tom's Feb 9 edit of two senses of probability lecture
1 parent 9ce3dc0 commit c30411f

File tree

1 file changed

+46
-35
lines changed

1 file changed

+46
-35
lines changed

lectures/prob_meaning.md

Lines changed: 46 additions & 35 deletions
Original file line numberDiff line numberDiff line change
@@ -21,9 +21,9 @@ This lecture illustrates two distinct interpretations of a **probability distr
2121

2222
* A frequentist interpretation as **relative frequencies** anticipated to occur in a large i.i.d. sample
2323

24-
* A Bayesian interpretation as a **personal probability** (about a parameter or list of parameters) after seeing a collection of observations
24+
* A Bayesian interpretation as a **personal opinion** (about a parameter or list of parameters) after seeing a collection of observations
2525

26-
We recommend watching this video about **hypothesis testing** according to the frequentist approach
26+
We recommend watching this video about **hypothesis testing** within the frequentist approach
2727

2828
```{youtube} 8JIe_cz6qGA
2929
```
@@ -33,7 +33,7 @@ After you watch that video, please watch the following video on the Bayesian app
3333
```{youtube} Pahyv9i_X2k
3434
```
3535

36-
After you are familiar with the above material, this lecture uses the Socratic method to to help consolidate your understanding of the different questions that are answered by
36+
After you are familiar with the material in these videos, this lecture uses the Socratic method to to help consolidate your understanding of the different questions that are answered by
3737

3838
* a frequentist confidence interval
3939

@@ -49,7 +49,7 @@ We provide our own answers as the lecture unfolds, but you'll learn more if you
4949
**Code for answering questions:**
5050

5151

52-
In addition to what’s in Anaconda, this lecture will need the following libraries:
52+
In addition to what’s in Anaconda, this lecture will deploy the following library:
5353

5454
```{code-cell} ipython3
5555
:tags: [hide-output]
@@ -83,26 +83,26 @@ $$
8383

8484
where the fixed parameter $\theta \in (0,1)$.
8585

86-
This is called the the __binomial distribution__.
86+
This is called the __binomial distribution__.
8787

8888
Here
8989

9090
* $\theta$ is the probability that one toss of a coin will be a head, an outcome that we encode as $Y = 1$.
9191

9292
* $1 -\theta$ is the probability that one toss of the coin will be a tail, an outcome that we denote $Y = 0$.
9393

94-
* $X$ is the total number of heads that come up after flipping the coin $n$ times.
94+
* $X$ is the total number of heads that came up after flipping the coin $n$ times.
9595

9696
Consider the following experiment:
9797

98-
Take **$I$ independent sequences of $n$ independent flips of the coin**
98+
Take $I$ **independent** sequences of $n$ **independent** flips of the coin**
9999

100100
Notice the repeated use of the adjective **independent**:
101101

102102
* we use it once to describe that we are drawing $n$ independent times from a **Bernoulli** distribution with parameter $\theta$ to arrive at one draw from a **Binomial** distribution with parameters
103103
$\theta,n$.
104104

105-
* we use it again to describe that we are then drawing $I$ such sequences of $n$ coin draws.
105+
* we use it again to describe that we are then drawing $I$ sequences of $n$ coin draws.
106106

107107
Let $y_h^i \in \{0, 1\}$ be the realized value of $Y$ on the $h$th flip during the $i$th sequence of flips.
108108

@@ -210,10 +210,10 @@ Let's do some more calculations.
210210
Now we fix
211211

212212
$$
213-
n=20, k=10, I=1,000,000
213+
n=20, k=10, I=1,000,000
214214
$$
215215

216-
and vary $\theta$ from $0.01$ to $0.99$.
216+
We'll vary $\theta$ from $0.01$ to $0.99$ and plot outcomes against $\theta$.
217217

218218
```{code-cell} ipython3
219219
θ_low, θ_high, npt = 0.01, 0.99, 50
@@ -246,6 +246,8 @@ plt.show()
246246

247247
Now we fix $\theta=0.7, k=10, I=1,000,000$ and vary $n$ from $1$ to $100$.
248248

249+
Then we'll plot outcomes.
250+
249251
```{code-cell} ipython3
250252
n_low, n_high, nn = 1, 100, 50
251253
ns = np.linspace(n_low, n_high, nn, dtype='int')
@@ -309,7 +311,8 @@ From the above graphs, we can see that **$I$, the number of independent sequence
309311

310312
When $I$ becomes larger, the difference between theoretical probability and frequentist estimate becomes smaller.
311313

312-
Also, as long as $I$ is large enough, changing $\theta$ or $n$ does not significantly change the accuracy of frequentist estimation.
314+
Also, as long as $I$ is large enough, changing $\theta$ or $n$ does not substantially change the accuracy of the observed fraction
315+
as an approximation of $\theta$.
313316

314317
The Law of Large Numbers is at work here.
315318

@@ -330,23 +333,29 @@ as $I$ goes to infinity.
330333

331334
## Bayesian Interpretation
332335

333-
Consider again a binomial distribution above, but now assume that the parameter $\theta$ is a fixed number.
336+
We again a binomial distribution.
337+
338+
But now we don't regard $\theta$ as being a fixed number.
339+
340+
Instead, we think of it as a **random variable**.
334341

335-
Instead, we think of it as a **random variable** in the sense that it is itself described by a probability
336-
distribution.
342+
$\theta$ is described by a probability distribution.
337343

338-
But now this probability distribution means something different than relative frequency that we anticipate in a large i.i.d. sample.
344+
But now this probability distribution means something different than a relative frequency that we can anticipate to occur in a large i.i.d. sample.
339345

340-
Instead, the probability distribution for the parameter $\theta$ is now a summary of our views about the likely values of $\theta$ before we have seen any data, or any more data.
346+
Instead, the probability distribution of $\theta$ is now a summary of our views about likely values of $\theta$ either
347+
348+
* **before** we have seen **any** data at all, or
349+
* **before** we have seen **more** data, after we have seen **some** data
341350

342351
Thus, suppose that, before seeing any data, you have a personal prior probability distribution saying that
343352

344353
$$
345354
P(\theta) = \frac{\theta^{\alpha-1}(1-\theta)^{\beta -1}}{B(\alpha, \beta)}
346355
$$
347356

348-
where $B(\alpha, \beta)$ the **beta function** so that $P(\theta)$ is
349-
a beta distribution with parameters $\alpha, \beta$.
357+
where $B(\alpha, \beta)$ is a **beta function** , so that $P(\theta)$ is
358+
a **beta distribution** with parameters $\alpha, \beta$.
350359

351360
**Exercise 2:**
352361

@@ -358,7 +367,7 @@ a beta distribution with parameters $\alpha, \beta$.
358367

359368
**d)** Please write a Python class to simulate this person's personal posterior distribution for $\theta$ for a _single_ sequence of $n$ draws.
360369

361-
**e)** Please plot the posterior distribution for $\theta$ as a function of $\theta$ as $n$ grows from $1, 2, \ldots$.
370+
**e)** Please plot the posterior distribution for $\theta$ as a function of $\theta$ as $n$ grows as $1, 2, \ldots$.
362371

363372
**f)** For various $n$'s, please describe and compute a Bayesian coverage interval for the interval $[.45, .55]$.
364373

@@ -389,7 +398,7 @@ $$
389398
\textrm{Prob}(\theta) = \frac{\theta^{\alpha - 1} (1 - \theta)^{\beta - 1}}{B(\alpha, \beta)}
390399
$$
391400

392-
We can derive the posterior distribution for $\theta$ by
401+
We can derive the posterior distribution for $\theta$ via
393402

394403
\begin{align*}
395404
\textrm{Prob}(\theta | Y) &= \frac{\textrm{Prob}(Y | \theta) \textrm{Prob}(\theta)}{\textrm{Prob}(Y)} \\
@@ -511,7 +520,7 @@ interval_df = interval_df.T
511520
interval_df
512521
```
513522

514-
As n increases, we can see that Bayesian coverage intervals narrow and move toward $0.4$.
523+
As $n$ increases, we can see that Bayesian coverage intervals narrow and move toward $0.4$.
515524

516525
**g)** Please tell what question a Bayesian coverage interval answers.
517526

@@ -543,25 +552,25 @@ ax.set_xlabel('Number of Observations', fontsize=11)
543552
plt.show()
544553
```
545554

546-
Notice that in the graph above the posterior probabililty that $\theta \in [.45, .55]$ exhibits a hump shape (in general) as n increases.
555+
Notice that in the graph above the posterior probabililty that $\theta \in [.45, .55]$ typically exhibits a hump shape as $n$ increases.
547556

548557
Two opposing forces are at work.
549558

550559
The first force is that the individual adjusts his belief as he observes new outcomes, so his posterior probability distribution becomes more and more realistic, which explains the rise of the posterior probabililty.
551560

552-
However, $[.45, .55]$ actually excludes the true $\theta$ that generates the data (which is 0.4).
561+
However, $[.45, .55]$ actually excludes the true $\theta =.4 $ that generates the data.
553562

554563
As a result, the posterior probabililty drops as larger and larger samples refine his posterior probability distribution of $\theta$.
555564

556565
The descent seems precipitous only because of the scale of the graph that has the number of observations increasing disproportionately.
557566

558-
When the number of observations becomes large enough, our Bayesian becomes so confident about $\theta$ that he considers $\theta \in [.45, .55]$ unlikely.
567+
When the number of observations becomes large enough, our Bayesian becomes so confident about $\theta$ that he considers $\theta \in [.45, .55]$ very unlikely.
559568

560569
That is why we see a nearly horizontal line when the number of observations exceeds 500.
561570

562571
**i)** Please use your Python class to study what happens to the posterior distribution as $n \rightarrow + \infty$, again assuming that the true value of $\theta = .4$, though it is unknown to the person doing the updating via Bayes' Law.
563572

564-
Using the Python class we made above, we can see the evolution of posterior distributions as n approaches infinity.
573+
Using the Python class we made above, we can see the evolution of posterior distributions as $n$ approaches infinity.
565574

566575
```{code-cell} ipython3
567576
fig, ax = plt.subplots(figsize=(10, 6))
@@ -579,11 +588,11 @@ ax.legend(fontsize=11)
579588
plt.show()
580589
```
581590

582-
As $n$ increases, we can see that the probability density functions 'concentrate' on $0.4$, which is the true value of $\theta$.
591+
As $n$ increases, we can see that the probability density functions _concentrate_ on $0.4$, the true value of $\theta$.
583592

584-
Correspondingly, posterior means converge to $0.4$ while posterior standard deviations drop to $0$.
593+
Here the posterior means converges to $0.4$ while the posterior standard deviations converges to $0$ from above.
585594

586-
To show this, we explicitly compute these statistics of the posterior distributions.
595+
To show this, we compute the means and variances statistics of the posterior distributions.
587596

588597
```{code-cell} ipython3
589598
mean_list = [ii.mean() for ii in Bay_stat.posterior_list]
@@ -608,9 +617,9 @@ plt.show()
608617

609618
How shall we interpret the patterns above?
610619

611-
The answer is encoded in the Bayesian updating formulas
620+
The answer is encoded in the Bayesian updating formulas.
612621

613-
It is natural to extend the one-step Bayesian update to n-step Bayesian update.
622+
It is natural to extend the one-step Bayesian update to an $n$-step Bayesian update.
614623

615624

616625
$$
@@ -629,7 +638,7 @@ $$
629638
={Beta}(\alpha + k, \beta+N-k)
630639
$$
631640

632-
A Beta Distribution with $\alpha$ and $\beta$ has the following mean and variance.
641+
A beta Distribution with $\alpha$ and $\beta$ has the following mean and variance.
633642

634643
The mean is $\frac{\alpha}{\alpha + \beta}$
635644

@@ -639,11 +648,13 @@ The variance is $\frac{\alpha \beta}{(\alpha + \beta)^2 (\alpha + \beta + 1)}$
639648

640649
* $\beta$ can be viewed as the number of failures
641650

642-
The random variables $k$ and $N-k$ are governed by Binomial Distribution with $\theta=0.4$ (that we call true data generation process).
651+
The random variables $k$ and $N-k$ are governed by Binomial Distribution with $\theta=0.4$.
652+
653+
Call this the true data generating process.
643654

644-
According to the Law of Large Numbers, for a large number of observations, observed frequencies of $k$ and $N-k$ will be described by the true data generation process, i.e., the population probability distribution that we assumed when generating the observations on the computer. (See Exercise $1$).
655+
According to the Law of Large Numbers, for a large number of observations, observed frequencies of $k$ and $N-k$ will be described by the true data generating process, i.e., the population probability distribution that we assumed when generating the observations on the computer. (See Exercise $1$).
645656

646-
Consequently, the mean of the posterior distribution converges to $0.4$ and the variance withers toward zero.
657+
Consequently, the mean of the posterior distribution converges to $0.4$ and the variance withers to zero.
647658

648659
```{code-cell} ipython3
649660
upper_bound = [ii.ppf(0.95) for ii in Bay_stat.posterior_list]
@@ -664,7 +675,7 @@ plt.show()
664675

665676
After observing a large number of outcomes, the posterior distribution collapses around $0.4$.
666677

667-
Thus, he comes to believe that $\theta$ is near $.4$.
678+
Thus, the Bayesian statististian comes to believe that $\theta$ is near $.4$.
668679

669680
As shown in the figure above, as the number of observations grows, the Bayesian coverage intervals (BCIs) become narrower and narrower around $0.4$.
670681

0 commit comments

Comments
 (0)