You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
where we use $f$ to denote a generic probability density.
81
+
where we use $f$ to denote a generic probability density.
84
82
85
83
The statistical model {eq}`eq:themodel`-{eq}`eq:themodel_2` implies
86
84
@@ -95,7 +93,7 @@ We want to study how inferences about the unknown parameters $(\rho, \sigma_x)$
95
93
96
94
Below, we study two widely used alternative assumptions:
97
95
98
-
- $(\mu_0,\sigma_0) = (y_0, 0)$ which means that $y_0$ is drawn from the distribution ${\mathcal N}(y_0, 0)$; in effect, we are **conditioning on an observed initial value**.
96
+
- $(\mu_0,\sigma_0) = (y_0, 0)$ which means that $y_0$ is drawn from the distribution ${\mathcal N}(y_0, 0)$; in effect, we are **conditioning on an observed initial value**.
99
97
100
98
- $\mu_0,\sigma_0$ are functions of $\rho, \sigma_x$ because $y_0$ is drawn from the stationary distribution implied by $\rho, \sigma_x$.
101
99
@@ -105,20 +103,20 @@ Below, we study two widely used alternative assumptions:
105
103
106
104
Unknown parameters are $\rho, \sigma_x$.
107
105
108
-
We have independent **prior probability distributions** for $\rho, \sigma_x$ and want to compute a posterior probability distribution after observing a sample $\{y_{t}\}_{t=0}^T$.
106
+
We have independent **prior probability distributions** for $\rho, \sigma_x$ and want to compute a posterior probability distribution after observing a sample $\{y_{t}\}_{t=0}^T$.
109
107
110
108
The notebook uses `pymc4` and `numpyro` to compute a posterior distribution of $\rho, \sigma_x$. We will use NUTS samplers to generate samples from the posterior in a chain. Both of these libraries support NUTS samplers.
111
109
112
110
NUTS is a form of Monte Carlo Markov Chain (MCMC) algorithm that bypasses random walk behaviour and allows for convergence to a target distribution more quickly. This not only has the advantage of speed, but allows for complex models to be fitted without having to employ specialised knowledge regarding the theory underlying those fitting methods.
113
111
114
112
Thus, we explore consequences of making these alternative assumptions about the distribution of $y_0$:
115
113
116
-
- A first procedure is to condition on whatever value of $y_0$ is observed. This amounts to assuming that the probability distribution of the random variable $y_0$ is a Dirac delta function that puts probability one on the observed value of $y_0$.
114
+
- A first procedure is to condition on whatever value of $y_0$ is observed. This amounts to assuming that the probability distribution of the random variable $y_0$ is a Dirac delta function that puts probability one on the observed value of $y_0$.
117
115
118
116
- A second procedure assumes that $y_0$ is drawn from the stationary distribution of a process described by {eq}`eq:themodel`
119
117
so that $y_0 \sim {\cal N} \left(0, {\sigma_x^2\over (1-\rho)^2} \right) $
120
118
121
-
When the initial value $y_0$ is far out in a tail of the stationary distribution, conditioning on an initial value gives a posterior that is **more accurate** in a sense that we'll explain.
119
+
When the initial value $y_0$ is far out in a tail of the stationary distribution, conditioning on an initial value gives a posterior that is **more accurate** in a sense that we'll explain.
122
120
123
121
Basically, when $y_0$ happens to be in a tail of the stationary distribution and we **don't condition on $y_0$**, the likelihood function for $\{y_t\}_{t=0}^T$ adjusts the posterior distribution of the parameter pair $\rho, \sigma_x $ to make the observed value of $y_0$ more likely than it really is under the stationary distribution, thereby adversely twisting the posterior in short samples.
124
122
@@ -129,7 +127,7 @@ We begin by solving a **direct problem** that simulates an AR(1) process.
129
127
130
128
How we select the initial value $y_0$ matters.
131
129
132
-
* If we think $y_0$ is drawn from the stationary distribution ${\mathcal N}(0, \frac{\sigma_x^{2}}{1-\rho^2})$, then it is a good idea to use this distribution as $f(y_0)$. Why? Because $y_0$ contains information about $\rho, \sigma_x$.
130
+
* If we think $y_0$ is drawn from the stationary distribution ${\mathcal N}(0, \frac{\sigma_x^{2}}{1-\rho^2})$, then it is a good idea to use this distribution as $f(y_0)$. Why? Because $y_0$ contains information about $\rho, \sigma_x$.
133
131
134
132
* If we suspect that $y_0$ is far in the tails of the stationary distribution -- so that variation in early observations in the sample have a significant **transient component** -- it is better to condition on $y_0$ by setting $f(y_0) = 1$.
135
133
@@ -273,7 +271,7 @@ summary_y0
273
271
274
272
Please note how the posterior for $\rho$ has shifted to the right relative to when we conditioned on $y_0$ instead of assuming that $y_0$ is drawn from the stationary distribution.
275
273
276
-
Think about why this happens.
274
+
Think about why this happens.
277
275
278
276
```{hint}
279
277
It is connected to how Bayes Law (conditional probability) solves an **inverse problem** by putting high probability on parameter values
0 commit comments