|
2 | 2 | "cells": [
|
3 | 3 | {
|
4 | 4 | "cell_type": "code",
|
5 |
| - "execution_count": 1, |
| 5 | + "execution_count": null, |
6 | 6 | "metadata": {
|
7 | 7 | "collapsed": false
|
8 | 8 | },
|
9 |
| - "outputs": [ |
10 |
| - { |
11 |
| - "name": "stderr", |
12 |
| - "output_type": "stream", |
13 |
| - "text": [ |
14 |
| - ":0: FutureWarning: IPython widgets are experimental and may change in the future.\n" |
15 |
| - ] |
16 |
| - } |
17 |
| - ], |
| 9 | + "outputs": [], |
18 | 10 | "source": [
|
19 | 11 | "%matplotlib inline\n",
|
20 | 12 | "from numpy import linspace, log, exp\n",
|
21 | 13 | "from numpy.random import normal\n",
|
22 |
| - "from scipy import stats\n", |
23 | 14 | "from lmfit import Model\n",
|
24 | 15 | "import seaborn as sns\n",
|
25 | 16 | "sns.set_style('ticks')\n",
|
|
30 | 21 | "cell_type": "markdown",
|
31 | 22 | "metadata": {},
|
32 | 23 | "source": [
|
33 |
| - "The likelihood ratio test works like this (based on [this](http://www.stat.sc.edu/~habing/courses/703/GLRTExample.pdf)):\n", |
| 24 | + "# The likelihood ratio test: Python implementation" |
| 25 | + ] |
| 26 | + }, |
| 27 | + { |
| 28 | + "cell_type": "markdown", |
| 29 | + "metadata": {}, |
| 30 | + "source": [ |
| 31 | + "For two models, one nested in the other (meaning that the nested model estimated parameters are a subset of the nesting model), the test statistic $D$ is (based on [this](http://www.stat.sc.edu/~habing/courses/703/GLRTExample.pdf)):\n", |
34 | 32 | "\n",
|
35 | 33 | "$$\n",
|
36 |
| - "D = -2 log \\Lambda = -2 log ( \\Big(\\frac{\\sum{(X_i - \\hat{X_i}(\\theta_1))^2}}{\\sum{(X_i - \\hat{X_i}(\\theta_0))^2}}\\Big)^{n/2} ) \\\\\n", |
37 |
| - "lim_{n \\to \\infty} D = \\chi^2_{df=\\Delta}\n", |
| 34 | + "\\Lambda = \\Big( \\Big(\\frac{\\sum{(X_i - \\hat{X_i}(\\theta_1))^2}}{\\sum{(X_i - \\hat{X_i}(\\theta_0))^2}}\\Big)^{n/2} \\Big) \\\\\n", |
| 35 | + "D = -2 log \\Lambda \\\\\n", |
| 36 | + "lim_{n \\to \\infty} D \\sim \\chi^2_{df=\\Delta}\n", |
38 | 37 | "$$\n",
|
39 | 38 | "\n",
|
40 |
| - "where $\\Lambda$ is the likelihood ratio, $D$ is the statistic, $X_{i}$ are the data points, $X_i(\\theta)$ is the model prediction with parameters $\\theta$, $\\theta_i$ is the parameters estimation for model $i$, $n$ is the number of data points and $\\Delta$ is the difference in number of parameters between the models." |
| 39 | + "where $\\Lambda$ is the likelihood ratio, $D$ is the statistic, $X_{i}$ are the data points, $\\hat{X_i}(\\theta)$ is the model prediction with parameters $\\theta$, $\\theta_i$ is the parameters estimation for model $i$, $n$ is the number of data points and $\\Delta$ is the difference in number of parameters between the models." |
| 40 | + ] |
| 41 | + }, |
| 42 | + { |
| 43 | + "cell_type": "markdown", |
| 44 | + "metadata": {}, |
| 45 | + "source": [ |
| 46 | + "The python implementation below compares between two `lmfit.ModelFit` objects. These are the results of fitting models to the same data set using the [`lmfit` package](lmfit.github.io/lmfit-py/). \n", |
| 47 | + "\n", |
| 48 | + "The function compares between model fit `m0` and `m1` and assumes that `m0` is nested in `m1`, meaning that the set of varying parameters of `m0` is a subset of the varying parameters of `m1`. The property `chisqr` of the `ModelFit` objects is the sum of the square of the residuals of the fit. `ndata` is the number of data points. `nvarys` is the number of varying parameters." |
41 | 49 | ]
|
42 | 50 | },
|
43 | 51 | {
|
|
66 | 74 | " return stats.chisqprob(D, ddf), D, ddf"
|
67 | 75 | ]
|
68 | 76 | },
|
| 77 | + { |
| 78 | + "cell_type": "markdown", |
| 79 | + "metadata": {}, |
| 80 | + "source": [ |
| 81 | + "## Test on a simple model\n", |
| 82 | + "\n", |
| 83 | + "We test the function on data generated from the equation:\n", |
| 84 | + "\n", |
| 85 | + "$$\n", |
| 86 | + "y = b + e^{-a t} + N(0, \\sigma^2)\n", |
| 87 | + "$$\n", |
| 88 | + "\n", |
| 89 | + "where $a$ and $b$ are the parameters, $t$ is the variable, and $N$ is the normal distribution.\n", |
| 90 | + "\n", |
| 91 | + "We fit a nesting model `model_fit1` (the alternative hypothesis of the test). This model estimates both $a$ and $b$.\n", |
| 92 | + "We also fit a nested model `model_fit0` (the nul hypothesis of the test) in which either $a$ or $b$ is fixed at an initial value.\n", |
| 93 | + "We than plot both model fits, print the estimated parameters, the test statistic and the p-value of the test." |
| 94 | + ] |
| 95 | + }, |
69 | 96 | {
|
70 | 97 | "cell_type": "code",
|
71 | 98 | "execution_count": 4,
|
|
5134 | 5161 | "source": [
|
5135 | 5162 | "## Compare with R\n",
|
5136 | 5163 | "\n",
|
| 5164 | + "Here we compare the Python implementation with the R implementation - `lmtest::lrtest`. The data is generated from a quadratic polynomial:\n", |
| 5165 | + "\n", |
| 5166 | + "$$\n", |
| 5167 | + "y = at^2 + bt + c + N(0,\\sigma^2)\n", |
| 5168 | + "$$\n", |
| 5169 | + "\n", |
| 5170 | + "and we fit two models - the nested model is a linear function and the nesting model is a quadratic polynomial.\n", |
| 5171 | + "\n", |
| 5172 | + "### Note\n", |
| 5173 | + "\n", |
5137 | 5174 | "In order to get this to work you will need to make sure the `lmtest` package is installed (don't install via RStudio - use regular command line R) and that the %R_HOME% and %R_USER% are set to where R is running from the command line and where the packages are installed to when running from the command line."
|
5138 | 5175 | ]
|
5139 | 5176 | },
|
|
5232 | 5269 | " %Rpull pvalue\n",
|
5233 | 5270 | " print pvalue[0]"
|
5234 | 5271 | ]
|
| 5272 | + }, |
| 5273 | + { |
| 5274 | + "cell_type": "markdown", |
| 5275 | + "metadata": {}, |
| 5276 | + "source": [ |
| 5277 | + "## License\n", |
| 5278 | + "\n", |
| 5279 | + "This notebook was written by [Yoav Ram](http://www.yoavram.com) and [Uri Obolski](https://sites.google.com/site/hadanylab/people/uri-obolski). The latest version can be found at [ipython.yoavram.com](http://nbviewer.ipython.org/github/yoavram/ipython-notebooks/blob/master/likelihood%20ratio%20test.ipynb). \n", |
| 5280 | + "\n", |
| 5281 | + "The code is released with a CC-BY-SA 3.0 license." |
| 5282 | + ] |
5235 | 5283 | }
|
5236 | 5284 | ],
|
5237 | 5285 | "metadata": {
|
|
0 commit comments