You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: lectures/prob_dist.md
+33-31
Original file line number
Diff line number
Diff line change
@@ -46,18 +46,22 @@ Let's start with discrete distributions.
46
46
47
47
A discrete distribution is defined by a set of numbers $S = \{x_1, \ldots, x_n\}$ and a **probability mass function** (PMF) on $S$, which is a function $p$ from $S$ to $[0,1]$ with the property
48
48
49
-
$$ \sum_{i=1}^n p(x_i) = 1 $$
49
+
$$
50
+
\sum_{i=1}^n p(x_i) = 1
51
+
$$
50
52
51
53
We say that a random variable $X$ **has distribution** $p$ if $X$ takes value $x_i$ with probability $p(x_i)$.
Variance is also called the *second central moment* of the distribution.
74
78
75
79
The **cumulative distribution function** (CDF) of $X$ is defined by
76
80
77
81
$$
78
-
F(x) = \mathbb{P}\{X \leq x\}
79
-
= \sum_{i=1}^n \mathbb 1\{x_i \leq x\} p(x_i)
82
+
F(x) = \mathbb{P}\{X \leq x\}
83
+
= \sum_{i=1}^n \mathbb 1\{x_i \leq x\} p(x_i)
80
84
$$
81
85
82
86
Here $\mathbb 1\{ \textrm{statement} \} = 1$ if "statement" is true and zero otherwise.
@@ -115,7 +119,6 @@ u.pmf(1)
115
119
u.pmf(2)
116
120
```
117
121
118
-
119
122
Here's a plot of the probability mass function:
120
123
121
124
```{code-cell} ipython3
@@ -129,7 +132,6 @@ ax.set_ylabel('PMF')
129
132
plt.show()
130
133
```
131
134
132
-
133
135
Here's a plot of the CDF:
134
136
135
137
```{code-cell} ipython3
@@ -143,10 +145,8 @@ ax.set_ylabel('CDF')
143
145
plt.show()
144
146
```
145
147
146
-
147
148
The CDF jumps up by $p(x_i)$ at $x_i$.
148
149
149
-
150
150
```{exercise}
151
151
:label: prob_ex1
152
152
@@ -179,7 +179,7 @@ We can import the Bernoulli distribution on $S = \{0,1\}$ from SciPy like so:
179
179
180
180
```{code-cell} ipython3
181
181
θ = 0.4
182
-
u = scipy.stats.bernoulli(p)
182
+
u = scipy.stats.bernoulli(θ)
183
183
```
184
184
185
185
Here's the mean and variance at $\theta=0.4$
@@ -201,7 +201,7 @@ u.pmf(1)
201
201
Another useful (and more interesting) distribution is the **binomial distribution** on $S=\{0, \ldots, n\}$, which has PMF:
202
202
203
203
$$
204
-
p(i) = \binom{n}{i} \theta^i (1-\theta)^{n-i}
204
+
p(i) = \binom{n}{i} \theta^i (1-\theta)^{n-i}
205
205
$$
206
206
207
207
Again, $\theta \in [0,1]$ is a parameter.
@@ -299,7 +299,7 @@ We can see that the output graph is the same as the one above.
299
299
The geometric distribution has infinite support $S = \{0, 1, 2, \ldots\}$ and its PMF is given by
300
300
301
301
$$
302
-
p(i) = (1 - \theta)^i \theta
302
+
p(i) = (1 - \theta)^i \theta
303
303
$$
304
304
305
305
where $\lambda \in [0,1]$ is a parameter
@@ -338,7 +338,7 @@ plt.show()
338
338
The Poisson distribution on $S = \{0, 1, \ldots\}$ with parameter $\lambda > 0$ has PMF
339
339
340
340
$$
341
-
p(i) = \frac{\lambda^i}{i!} e^{-\lambda}
341
+
p(i) = \frac{\lambda^i}{i!} e^{-\lambda}
342
342
$$
343
343
344
344
The interpretation of $p(i)$ is: the probability of $i$ events in a fixed time interval, where the events occur independently at a constant rate $\lambda$.
@@ -376,12 +376,14 @@ plt.show()
376
376
377
377
A continuous distribution is represented by a **probability density function**, which is a function $p$ over $\mathbb R$ (the set of all real numbers) such that $p(x) \geq 0$ for all $x$ and
378
378
379
-
$$ \int_{-\infty}^\infty p(x) dx = 1 $$
379
+
$$
380
+
\int_{-\infty}^\infty p(x) dx = 1
381
+
$$
380
382
381
383
We say that random variable $X$ has distribution $p$ if
382
384
383
385
$$
384
-
\mathbb P\{a < X < b\} = \int_a^b p(x) dx
386
+
\mathbb P\{a < X < b\} = \int_a^b p(x) dx
385
387
$$
386
388
387
389
for all $a \leq b$.
@@ -391,14 +393,14 @@ The definition of the mean and variance of a random variable $X$ with distributi
391
393
For example, the mean of $X$ is
392
394
393
395
$$
394
-
\mathbb{E}[X] = \int_{-\infty}^\infty x p(x) dx
396
+
\mathbb{E}[X] = \int_{-\infty}^\infty x p(x) dx
395
397
$$
396
398
397
399
The **cumulative distribution function** (CDF) of $X$ is defined by
398
400
399
401
$$
400
-
F(x) = \mathbb P\{X \leq x\}
401
-
= \int_{-\infty}^x p(x) dx
402
+
F(x) = \mathbb P\{X \leq x\}
403
+
= \int_{-\infty}^x p(x) dx
402
404
$$
403
405
404
406
@@ -407,8 +409,8 @@ $$
407
409
Perhaps the most famous distribution is the **normal distribution**, which has density
408
410
409
411
$$
410
-
p(x) = \frac{1}{\sqrt{2\pi}\sigma}
411
-
\exp\left(-\frac{(x-\mu)^2}{2\sigma^2}\right)
412
+
p(x) = \frac{1}{\sqrt{2\pi}\sigma}
413
+
\exp\left(-\frac{(x-\mu)^2}{2\sigma^2}\right)
412
414
$$
413
415
414
416
This distribution has two parameters, $\mu \in \mathbb R$ and $\sigma \in (0, \infty)$.
@@ -468,8 +470,8 @@ plt.show()
468
470
The **lognormal distribution** is a distribution on $\left(0, \infty\right)$ with density
469
471
470
472
$$
471
-
p(x) = \frac{1}{\sigma x \sqrt{2\pi}}
472
-
\exp \left(- \frac{\left(\log x - \mu\right)^2}{2 \sigma^2} \right)
473
+
p(x) = \frac{1}{\sigma x \sqrt{2\pi}}
474
+
\exp \left(- \frac{\left(\log x - \mu\right)^2}{2 \sigma^2} \right)
473
475
$$
474
476
475
477
This distribution has two parameters, $\mu$ and $\sigma$.
@@ -530,8 +532,8 @@ plt.show()
530
532
The **exponential distribution** is a distribution supported on $\left(0, \infty\right)$ with density
531
533
532
534
$$
533
-
p(x) = \lambda \exp \left( - \lambda x \right)
534
-
\qquad (x > 0)
535
+
p(x) = \lambda \exp \left( - \lambda x \right)
536
+
\qquad (x > 0)
535
537
$$
536
538
537
539
This distribution has one parameter $\lambda$.
@@ -586,8 +588,8 @@ plt.show()
586
588
The **beta distribution** is a distribution on $(0, 1)$ with density
0 commit comments