fix equation, added docs

Yoav Ram · Yoav Ram · commit a734a053b6f0 · 2015-04-01T08:31:26.000+03:00
diff --git a/likelihood ratio test.ipynb b/likelihood ratio test.ipynb
@@ -2,24 +2,15 @@
  "cells": [
   {
    "cell_type": "code",
-   "execution_count": 1,
+   "execution_count": null,
    "metadata": {
     "collapsed": false
    },
-   "outputs": [
-    {
-     "name": "stderr",
-     "output_type": "stream",
-     "text": [
-      ":0: FutureWarning: IPython widgets are experimental and may change in the future.\n"
-     ]
-    }
-   ],
+   "outputs": [],
    "source": [
     "%matplotlib inline\n",
     "from numpy import linspace, log, exp\n",
     "from numpy.random import normal\n",
-    "from scipy import stats\n",
     "from lmfit import Model\n",
     "import seaborn as sns\n",
     "sns.set_style('ticks')\n",
@@ -30,14 +21,31 @@
    "cell_type": "markdown",
    "metadata": {},
    "source": [
-    "The likelihood ratio test works like this (based on [this](http://www.stat.sc.edu/~habing/courses/703/GLRTExample.pdf)):\n",
+    "# The likelihood ratio test: Python implementation"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "For two models, one nested in the other (meaning that the nested model estimated parameters are a subset of the nesting model), the test statistic $D$ is (based on [this](http://www.stat.sc.edu/~habing/courses/703/GLRTExample.pdf)):\n",
     "\n",
     "$$\n",
-    "D = -2 log \\Lambda = -2 log ( \\Big(\\frac{\\sum{(X_i - \\hat{X_i}(\\theta_1))^2}}{\\sum{(X_i - \\hat{X_i}(\\theta_0))^2}}\\Big)^{n/2} ) \\\\\n",
-    "lim_{n \\to \\infty} D = \\chi^2_{df=\\Delta}\n",
+    "\\Lambda = \\Big( \\Big(\\frac{\\sum{(X_i - \\hat{X_i}(\\theta_1))^2}}{\\sum{(X_i - \\hat{X_i}(\\theta_0))^2}}\\Big)^{n/2} \\Big) \\\\\n",
+    "D = -2 log \\Lambda \\\\\n",
+    "lim_{n \\to \\infty} D \\sim \\chi^2_{df=\\Delta}\n",
     "$$\n",
     "\n",
-    "where $\\Lambda$ is the likelihood ratio, $D$ is the statistic, $X_{i}$ are the data points, $X_i(\\theta)$ is the model prediction with parameters $\\theta$, $\\theta_i$ is the parameters estimation for model $i$, $n$ is the number of data points and $\\Delta$ is the difference in number of parameters between the models."
+    "where $\\Lambda$ is the likelihood ratio, $D$ is the statistic, $X_{i}$ are the data points, $\\hat{X_i}(\\theta)$ is the model prediction with parameters $\\theta$, $\\theta_i$ is the parameters estimation for model $i$, $n$ is the number of data points and $\\Delta$ is the difference in number of parameters between the models."
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "The python implementation below compares between two `lmfit.ModelFit` objects. These are the results of fitting models to the same data set using the [`lmfit` package](lmfit.github.io/lmfit-py/). \n",
+    "\n",
+    "The function compares between model fit `m0` and `m1` and assumes that `m0` is nested in `m1`, meaning that the set of varying parameters of `m0` is a subset of the varying parameters of `m1`. The property `chisqr` of the `ModelFit` objects is the sum of the square of the residuals of the fit. `ndata` is the number of data points. `nvarys` is the number of varying parameters."
    ]
   },
   {
@@ -66,6 +74,25 @@
     "    return stats.chisqprob(D, ddf), D, ddf"
    ]
   },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "## Test on a simple model\n",
+    "\n",
+    "We test the function on data generated from the equation:\n",
+    "\n",
+    "$$\n",
+    "y = b + e^{-a t} + N(0, \\sigma^2)\n",
+    "$$\n",
+    "\n",
+    "where $a$ and $b$ are the parameters, $t$ is the variable, and $N$ is the normal distribution.\n",
+    "\n",
+    "We fit a nesting model `model_fit1` (the alternative hypothesis of the test). This model estimates both $a$ and $b$.\n",
+    "We also fit a nested model `model_fit0` (the nul hypothesis of the test) in which either $a$ or $b$ is fixed at an initial value.\n",
+    "We than plot both model fits, print the estimated parameters, the test statistic and the p-value of the test."
+   ]
+  },
   {
    "cell_type": "code",
    "execution_count": 4,
@@ -5134,6 +5161,16 @@
    "source": [
     "## Compare with R\n",
     "\n",
+    "Here we compare the Python implementation with the R implementation - `lmtest::lrtest`. The data is generated from a quadratic polynomial:\n",
+    "\n",
+    "$$\n",
+    "y = at^2 + bt + c + N(0,\\sigma^2)\n",
+    "$$\n",
+    "\n",
+    "and we fit two models - the nested model is a linear function and the nesting model is a quadratic polynomial.\n",
+    "\n",
+    "### Note\n",
+    "\n",
     "In order to get this to work you will need to make sure the `lmtest` package is installed (don't install via RStudio - use regular command line R) and that the %R_HOME% and %R_USER% are set to where R is running from the command line and where the packages are installed to when running from the command line."
    ]
   },
@@ -5232,6 +5269,17 @@
     "    %Rpull pvalue\n",
     "    print pvalue[0]"
    ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "## License\n",
+    "\n",
+    "This notebook was written by [Yoav Ram](http://www.yoavram.com) and [Uri Obolski](https://sites.google.com/site/hadanylab/people/uri-obolski). The latest version can be found at [ipython.yoavram.com](http://nbviewer.ipython.org/github/yoavram/ipython-notebooks/blob/master/likelihood%20ratio%20test.ipynb). \n",
+    "\n",
+    "The code is released with a CC-BY-SA 3.0 license."
+   ]
   }
  ],
  "metadata": {