Skip to content

Commit edc6f8e

Browse files
Merge pull request #168 from mdiephuis/master
fixed two minor grammar errors and a python typo
2 parents 28f639d + 8e4a64c commit edc6f8e

File tree

2 files changed

+4
-4
lines changed

2 files changed

+4
-4
lines changed

Chapter4_TheGreatestTheoremNeverTold/LawOfLargeNumbers.ipynb

+2-2
Original file line numberDiff line numberDiff line change
@@ -489,7 +489,7 @@
489489
"cell_type": "markdown",
490490
"metadata": {},
491491
"source": [
492-
"One way to determine a prior on the upvote ratio is that look at the historical distribution of upvote ratios. This can be accomplished by scraping Reddit's comments and determining a distribution. There are a few problems with this technique though:\n",
492+
"One way to determine a prior on the upvote ratio is to look at the historical distribution of upvote ratios. This can be accomplished by scraping Reddit's comments and determining a distribution. There are a few problems with this technique though:\n",
493493
"\n",
494494
"1. Skewed data: The vast majority of comments have very few votes, hence there will be many comments with ratios near the extremes (see the \"triangular plot\" in the above Kaggle dataset), effectively skewing our distribution to the extremes. One could try to only use comments with votes greater than some threshold. Again, problems are encountered. There is a tradeoff between number of comments available to use and a higher threshold with associated ratio precision. \n",
495495
"2. Biased data: Reddit is composed of different subpages, called subreddits. Two examples are *r/aww*, which posts pics of cute animals, and *r/politics*. It is very likely that the user behaviour towards comments of these two subreddits are very different: visitors are likely friend and affectionate in the former, and would therefore upvote comments more, compared to the latter, where comments are likely to be controversial and disagreed upon. Therefore not all comments are the same. \n",
@@ -674,7 +674,7 @@
674674
"\n",
675675
"### Sorting!\n",
676676
"\n",
677-
"We have been ignoring the goal of this exercise: how do we sort the comments from *best to worst*? Of course, we cannot sort distributions, we must sort scalar numbers. There are many ways to distill a distribution down to a scalar: expressing the distribution through its expected value, or mean, is one way. Choosing the mean bad choice though. This is because the mean does not take into account the uncertainty of distributions.\n",
677+
"We have been ignoring the goal of this exercise: how do we sort the comments from *best to worst*? Of course, we cannot sort distributions, we must sort scalar numbers. There are many ways to distill a distribution down to a scalar: expressing the distribution through its expected value, or mean, is one way. Choosing the mean is a bad choice though. This is because the mean does not take into account the uncertainty of distributions.\n",
678678
"\n",
679679
"I suggest using the *95% least plausible value*, defined as the value such that there is only a 5% chance the true parameter is lower (think of the lower bound on the 95% credible region). Below are the posterior distributions with the 95% least-plausible value plotted:"
680680
]

Chapter5_LossFunctions/LossFunctions.ipynb

+2-2
Original file line numberDiff line numberDiff line change
@@ -119,7 +119,7 @@
119119
"\n",
120120
"Notice that measuring your loss via an *expected value* uses more information from the distribution than the MAP estimate which, if you recall, will only find the maximum value of the distribution and ignore the shape of the distribution. Ignoring information can over-expose yourself to tail risks, like the unlikely hurricane, and leaves your estimate ignorant of how ignorant you really are about the parameter.\n",
121121
"\n",
122-
"Similarly, compare this with frequentist methods, that traditionally only aim to minimize the error, and not considering the *loss associated with the result of that error*. Compound this with the fact that frequentist methods are almost guaranteed to never be absolutely accurate. Bayesian point estimates fix this by planning ahead: your estimate is going to be wrong, you might as well err on the right side of wrong."
122+
"Similarly, compare this with frequentist methods, that traditionally only aim to minimize the error, and do not consider the *loss associated with the result of that error*. Compound this with the fact that frequentist methods are almost guaranteed to never be absolutely accurate. Bayesian point estimates fix this by planning ahead: your estimate is going to be wrong, you might as well err on the right side of wrong."
123123
]
124124
},
125125
{
@@ -583,7 +583,7 @@
583583
"def stock_loss( true_return, yhat, alpha = 100. ):\n",
584584
" if true_return*yhat < 0:\n",
585585
" #opposite signs, not good\n",
586-
" return alpha*yhat**2 - sign( true_return )*yhat \\\n",
586+
" return alpha*yhat**2 - np.sign( true_return )*yhat \\\n",
587587
" + abs( true_return ) \n",
588588
" else:\n",
589589
" return abs( true_return - yhat )\n",

0 commit comments

Comments
 (0)