|
32 | 32 | "\n",
|
33 | 33 | "* *child variable* are variables that are affected by other variables, i.e. are the subject of parent variables. \n",
|
34 | 34 | "\n",
|
35 |
| - "A wariables can be both a parents and child. For example, consider the PyMC code below." |
| 35 | + "A variable can be both a parent and child. For example, consider the PyMC code below." |
36 | 36 | ]
|
37 | 37 | },
|
38 | 38 | {
|
|
814 | 814 | "The special case when $N = 1$ corresponds to the Bernoulli distribution. If $ X\\ \\sim \\text{Ber}(p)$, then $X$ is 1 with probability $p$ and 0 with probability $1-p$.\n",
|
815 | 815 | "The Bernoulli distribution is useful for indicators, e.g. $Y = X\\alpha + (1-X)\\beta$ is $\\alpha$ with probability $p$ and $\\beta$ with probability $1-p$. \n",
|
816 | 816 | "\n",
|
817 |
| - "There is another connection between Bernoulli and Binomial random variables. If we have $X_1, X_2, ... , X_N$ Benoulli random variables with the same $p$, then $Z = X_1 + X_2 + ... + X_N \\sim \\text{Binomial}(N, p )$.\n", |
| 817 | + "There is another connection between Bernoulli and Binomial random variables. If we have $X_1, X_2, ... , X_N$ Bernoulli random variables with the same $p$, then $Z = X_1 + X_2 + ... + X_N \\sim \\text{Binomial}(N, p )$.\n", |
818 | 818 | "\n",
|
819 | 819 | "The expected value of a Bernoulli random variable is $p$. This can be seen by noting the the more general Binomial random variable has expected value $Np$ and setting $N=1$."
|
820 | 820 | ]
|
|
831 | 831 | "\n",
|
832 | 832 | "> In the interview process for each student, the student flips a coin, hidden from the interviewer. The student agrees to answer honestly if the coin comes up heads. Otherwise, if the coin comes up tails, the student (secretly) flips the coin again, and answers \"Yes, I did cheat\" if the coin flip lands heads, and \"No, I did not cheat\", if the coin flip lands tails. This way, the interviewer does not know if a \"Yes\" was the result of a guilty plea, or a Heads on a second coin toss. Thus privacy is preserved and the researchers receive honest answers. \n",
|
833 | 833 | "\n",
|
834 |
| - "I call this the Privacy Algorithm. One could of course argue that the interviewers are still receiving false data since some \"Yes\"'s are not confessions but instead randomness, but an alternative perspective is that the researchers are discarding approximately half of there original dataset since half of the responses will by noise. But they have gained a systematic data generation process that can be modeled. Furthermore, they do not have to incorporate (perhaps somewhat naively) the possibility of deceitful answers. We can use PyMC to dig through this noisy model, and find a posterior distribution for the true frequency of liars. " |
| 834 | + "I call this the Privacy Algorithm. One could of course argue that the interviewers are still receiving false data since some \"Yes\"'s are not confessions but instead randomness, but an alternative perspective is that the researchers are discarding approximately half of their original dataset since half of the responses will be noise. But they have gained a systematic data generation process that can be modeled. Furthermore, they do not have to incorporate (perhaps somewhat naively) the possibility of deceitful answers. We can use PyMC to dig through this noisy model, and find a posterior distribution for the true frequency of liars. " |
835 | 835 | ]
|
836 | 836 | },
|
837 | 837 | {
|
838 | 838 | "cell_type": "markdown",
|
839 | 839 | "metadata": {},
|
840 | 840 | "source": [
|
841 |
| - "Suppose 100 students are being surveyed for cheating, and we wish to find $p$, the proportion of cheaters. There a few ways we can model this in PyMC. I'll demonstrate the most explict way, and later show a simplified version. Both versions arrive at the same inference. In our data-generation model, we sample $p$, the true proportion of cheaters, from a prior. Since we are quite ignorrant about $p$, we will assign it a $\\text{Uniform}(0,1)$ prior." |
| 841 | + "Suppose 100 students are being surveyed for cheating, and we wish to find $p$, the proportion of cheaters. There a few ways we can model this in PyMC. I'll demonstrate the most explict way, and later show a simplified version. Both versions arrive at the same inference. In our data-generation model, we sample $p$, the true proportion of cheaters, from a prior. Since we are quite ignorant about $p$, we will assign it a $\\text{Uniform}(0,1)$ prior." |
842 | 842 | ]
|
843 | 843 | },
|
844 | 844 | {
|
|
955 | 955 | "cell_type": "markdown",
|
956 | 956 | "metadata": {},
|
957 | 957 | "source": [
|
958 |
| - "The line `t_a*fc + (1-fc)*sc` contains the heart of the Privacy algorithm. Elements in this array are 1 *if and only if* the first toss is i) the first toss is heads and the student cheated or ii) the first toss is tails, and the second is heads, and are 0 else. Summing this vector and dividing by `float(N)` produces a propotion. " |
| 958 | + "The line `t_a*fc + (1-fc)*sc` contains the heart of the Privacy algorithm. Elements in this array are 1 *if and only if* i) the first toss is heads and the student cheated or ii) the first toss is tails, and the second is heads, and are 0 else. Summing this vector and dividing by `float(N)` produces a propotion. " |
959 | 959 | ]
|
960 | 960 | },
|
961 | 961 | {
|
|
981 | 981 | "cell_type": "markdown",
|
982 | 982 | "metadata": {},
|
983 | 983 | "source": [
|
984 |
| - "Next we need a dataset. After performing our coin-flipped interviews the researchers received 35 \"Yes\" responses. To put this into a relative perspective, if there truely were no cheaters, we should expect to see on average 1/4 of all responses being a \"Yes\" (half chance of having first coin land Tails, and another half chance of having second coin land Heads), so about 25 responses in a cheat-free world. On the other hand, if *all students cheated*, we should expected to see on approximately 3/4 of all response be \"Yes\". \n", |
| 984 | + "Next we need a dataset. After performing our coin-flipped interviews the researchers received 35 \"Yes\" responses. To put this into a relative perspective, if there truly were no cheaters, we should expect to see on average 1/4 of all responses being a \"Yes\" (half chance of having first coin land Tails, and another half chance of having second coin land Heads), so about 25 responses in a cheat-free world. On the other hand, if *all students cheated*, we should expected to see on approximately 3/4 of all response be \"Yes\". \n", |
985 | 985 | "\n",
|
986 | 986 | "The researchers observe a Binomial random variable, with `N = 100` and `p = observed_proportion` with `value = 35`: "
|
987 | 987 | ]
|
|
1067 | 1067 | "source": [
|
1068 | 1068 | "With regards to the above plot, we are still pretty uncertain about what the true frequency of cheaters might be, but we have narrowed it down to a range between 0.05 to 0.35 (marked by the dashed lines). This is pretty good, as *a priori* we had no idea how many students might have cheated (hence the uniform distribution for our prior). On the other hand, it is also pretty bad since there is a .3 length window the true value most likely lives in. Have we even gained anything, or are we still too uncertain about the true frequency? \n",
|
1069 | 1069 | "\n",
|
1070 |
| - "I would argue, yes, we have discovered something. It is implausbile, according to our posterior that there are *no cheaters*, i.e. the posterior assigns low probability probability to $p=0$. Since we started with a uniform prior, treating all values of $p$ as equally plausible, but the data ruled out $p=0$ as a possibility. We can be confident that there were cheaters. \n", |
| 1070 | + "I would argue, yes, we have discovered something. It is implausible, according to our posterior, that there are *no cheaters*, i.e. the posterior assigns low probability to $p=0$. Since we started with an uniform prior, treating all values of $p$ as equally plausible, but the data ruled out $p=0$ as a possibility, we can be confident that there were cheaters. \n", |
1071 | 1071 | "\n",
|
1072 |
| - "This kind of algorithm can be used to gather private information from users and be *reasonablly* confident that the data, though noisy, is truthful. \n", |
| 1072 | + "This kind of algorithm can be used to gather private information from users and be *reasonably* confident that the data, though noisy, is truthful. \n", |
1073 | 1073 | "\n"
|
1074 | 1074 | ]
|
1075 | 1075 | },
|
|
1197 | 1197 | "\n",
|
1198 | 1198 | "#### Protip: *Lighter* deterministic variables with `Lambda` class\n",
|
1199 | 1199 | "\n",
|
1200 |
| - "Sometimes writing a deterministic function using the `@mc.deterministic` decorator can seem like a chore, especially for a small function. I have already mentioned that elementary math operations *can* produce deterministic variables implicitly, but what about operations like indexing or slicing? Built-in `Lambda` functions can handle this with elegance and simplicity required. For example, \n", |
| 1200 | + "Sometimes writing a deterministic function using the `@mc.deterministic` decorator can seem like a chore, especially for a small function. I have already mentioned that elementary math operations *can* produce deterministic variables implicitly, but what about operations like indexing or slicing? Built-in `Lambda` functions can handle this with the elegance and simplicity required. For example, \n", |
1201 | 1201 | "\n",
|
1202 | 1202 | " beta = mc.Normal( \"coefficients\", 0, size=(N,1) )\n",
|
1203 | 1203 | " x = np.random.randn( (N,1) )\n",
|
|
1400 | 1400 | "source": [
|
1401 | 1401 | "Adding a constant term $\\alpha$ amounts to shifting the curve left or right (hence why it is called a *bias*. )\n",
|
1402 | 1402 | "\n",
|
1403 |
| - "Let's start modeling this in PyMC. The $\\beta, \\alpha$ paramters have no reason to be positive, bounded or relativly large, so they are best modeled by a *Normal random variable*, introduced next." |
| 1403 | + "Let's start modeling this in PyMC. The $\\beta, \\alpha$ paramters have no reason to be positive, bounded or relatively large, so they are best modeled by a *Normal random variable*, introduced next." |
1404 | 1404 | ]
|
1405 | 1405 | },
|
1406 | 1406 | {
|
|
1409 | 1409 | "source": [
|
1410 | 1410 | "### Normal distributions\n",
|
1411 | 1411 | "\n",
|
1412 |
| - "A Normal random variable, denoted $X \\sim N(\\mu, 1/\\tau)$, has a distribution with two parameters: the mean, $\\mu$, and the *precision*, $\\tau$. Those familar with the Normal distribution already have probably seen $\\sigma^2$ instead of $\\tau$. They are infact reciprocals of each other. The change was motivated by simplier mathematical analysis and is an artifact of older Bayesian methods. Just remember: The smaller $\\tau$, the larger the spread of the distribution (i.e. we are more uncertain); the larger $\\tau$, the tighter the distribution (i.e. we are more certain). Regardless, $\\tau$ is always positive. \n", |
| 1412 | + "A Normal random variable, denoted $X \\sim N(\\mu, 1/\\tau)$, has a distribution with two parameters: the mean, $\\mu$, and the *precision*, $\\tau$. Those familar with the Normal distribution already have probably seen $\\sigma^2$ instead of $\\tau$. They are in fact reciprocals of each other. The change was motivated by simpler mathematical analysis and is an artifact of older Bayesian methods. Just remember: The smaller $\\tau$, the larger the spread of the distribution (i.e. we are more uncertain); the larger $\\tau$, the tighter the distribution (i.e. we are more certain). Regardless, $\\tau$ is always positive. \n", |
1413 | 1413 | "\n",
|
1414 | 1414 | "The probability density function of a $N( \\mu, 1/\\tau)$ random variable is:\n",
|
1415 | 1415 | "\n",
|
|
1739 | 1739 | "source": [
|
1740 | 1740 | "### Is our model appropriate?\n",
|
1741 | 1741 | "\n",
|
1742 |
| - "The skeptical reader will say \"You delibrately chose the logistic function for $p(t)$ and the specific priors. Perhaps other functions or priors will give different results. How do I know I have chosen a good model?\" This is absolutely true. To consider an extreme situation, what if I had chosen the function $p(t) = 1,\\; \\forall t$, which garuntees a defect always occuring: I would have again predicted disaster on January 28th. Yet this is clearly a poorly chosen model. On the other hand, if I did choose the logistic function for $p(t)$, but specificed all my priors to be very tight around 0, likely we would have very different posterior distributions. How do we know our model is an expression of the data? This encourages us to measure the model's **goodness of fit**.\n", |
| 1742 | + "The skeptical reader will say \"You delibrately chose the logistic function for $p(t)$ and the specific priors. Perhaps other functions or priors will give different results. How do I know I have chosen a good model?\" This is absolutely true. To consider an extreme situation, what if I had chosen the function $p(t) = 1,\\; \\forall t$, which guarantees a defect always occuring: I would have again predicted disaster on January 28th. Yet this is clearly a poorly chosen model. On the other hand, if I did choose the logistic function for $p(t)$, but specificed all my priors to be very tight around 0, likely we would have very different posterior distributions. How do we know our model is an expression of the data? This encourages us to measure the model's **goodness of fit**.\n", |
1743 | 1743 | "\n",
|
1744 | 1744 | "We can think: *how can we test whether our model is a bad fit?* An idea is to compare observed data (which if we recall is a *fixed* stochastic variable) with artifical dataset which we can simulate. The rational is that if the simulated dataset does not appear similar, statistically, to the observed dataset, then likely our model is not accurately represented the observed data. \n",
|
1745 | 1745 | "\n",
|
|
1978 | 1978 | "\n",
|
1979 | 1979 | "The black vertical line is the expected number of defects we should observe, given this model. This allows the user to see how the total number of events predicted by the model compares to the actual number of events in the data.\n",
|
1980 | 1980 | "\n",
|
1981 |
| - "Much more informative is it to compare this to separation plots for other models. Below we compare our model (top) versus three others:\n", |
| 1981 | + "It is much more informative to compare this to separation plots for other models. Below we compare our model (top) versus three others:\n", |
1982 | 1982 | "\n",
|
1983 | 1983 | "1. the perfect model, which predicts the posterior probability to be equal 1 if a defect did occur.\n",
|
1984 | 1984 | "2. a completely random model, which predicts random probabilities regardless of temperature.\n",
|
|
0 commit comments