Skip to content

fixed two minor grammar errors and a python typo #168

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 2 commits into from
Jan 14, 2014
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
4 changes: 2 additions & 2 deletions Chapter4_TheGreatestTheoremNeverTold/LawOfLargeNumbers.ipynb
Original file line number Diff line number Diff line change
Expand Up @@ -489,7 +489,7 @@
"cell_type": "markdown",
"metadata": {},
"source": [
"One way to determine a prior on the upvote ratio is that look at the historical distribution of upvote ratios. This can be accomplished by scraping Reddit's comments and determining a distribution. There are a few problems with this technique though:\n",
"One way to determine a prior on the upvote ratio is to look at the historical distribution of upvote ratios. This can be accomplished by scraping Reddit's comments and determining a distribution. There are a few problems with this technique though:\n",
"\n",
"1. Skewed data: The vast majority of comments have very few votes, hence there will be many comments with ratios near the extremes (see the \"triangular plot\" in the above Kaggle dataset), effectively skewing our distribution to the extremes. One could try to only use comments with votes greater than some threshold. Again, problems are encountered. There is a tradeoff between number of comments available to use and a higher threshold with associated ratio precision. \n",
"2. Biased data: Reddit is composed of different subpages, called subreddits. Two examples are *r/aww*, which posts pics of cute animals, and *r/politics*. It is very likely that the user behaviour towards comments of these two subreddits are very different: visitors are likely friend and affectionate in the former, and would therefore upvote comments more, compared to the latter, where comments are likely to be controversial and disagreed upon. Therefore not all comments are the same. \n",
Expand Down Expand Up @@ -674,7 +674,7 @@
"\n",
"### Sorting!\n",
"\n",
"We have been ignoring the goal of this exercise: how do we sort the comments from *best to worst*? Of course, we cannot sort distributions, we must sort scalar numbers. There are many ways to distill a distribution down to a scalar: expressing the distribution through its expected value, or mean, is one way. Choosing the mean bad choice though. This is because the mean does not take into account the uncertainty of distributions.\n",
"We have been ignoring the goal of this exercise: how do we sort the comments from *best to worst*? Of course, we cannot sort distributions, we must sort scalar numbers. There are many ways to distill a distribution down to a scalar: expressing the distribution through its expected value, or mean, is one way. Choosing the mean is a bad choice though. This is because the mean does not take into account the uncertainty of distributions.\n",
"\n",
"I suggest using the *95% least plausible value*, defined as the value such that there is only a 5% chance the true parameter is lower (think of the lower bound on the 95% credible region). Below are the posterior distributions with the 95% least-plausible value plotted:"
]
Expand Down
4 changes: 2 additions & 2 deletions Chapter5_LossFunctions/LossFunctions.ipynb
Original file line number Diff line number Diff line change
Expand Up @@ -119,7 +119,7 @@
"\n",
"Notice that measuring your loss via an *expected value* uses more information from the distribution than the MAP estimate which, if you recall, will only find the maximum value of the distribution and ignore the shape of the distribution. Ignoring information can over-expose yourself to tail risks, like the unlikely hurricane, and leaves your estimate ignorant of how ignorant you really are about the parameter.\n",
"\n",
"Similarly, compare this with frequentist methods, that traditionally only aim to minimize the error, and not considering the *loss associated with the result of that error*. Compound this with the fact that frequentist methods are almost guaranteed to never be absolutely accurate. Bayesian point estimates fix this by planning ahead: your estimate is going to be wrong, you might as well err on the right side of wrong."
"Similarly, compare this with frequentist methods, that traditionally only aim to minimize the error, and do not consider the *loss associated with the result of that error*. Compound this with the fact that frequentist methods are almost guaranteed to never be absolutely accurate. Bayesian point estimates fix this by planning ahead: your estimate is going to be wrong, you might as well err on the right side of wrong."
]
},
{
Expand Down Expand Up @@ -583,7 +583,7 @@
"def stock_loss( true_return, yhat, alpha = 100. ):\n",
" if true_return*yhat < 0:\n",
" #opposite signs, not good\n",
" return alpha*yhat**2 - sign( true_return )*yhat \\\n",
" return alpha*yhat**2 - np.sign( true_return )*yhat \\\n",
" + abs( true_return ) \n",
" else:\n",
" return abs( true_return - yhat )\n",
Expand Down