Skip to content

Commit ab31e74

Browse files
committed
improve random values generation, improve display of math equations, replace univariate_ordered with ordered, filter future warnings
1 parent 670657a commit ab31e74

File tree

6 files changed

+882
-807
lines changed

6 files changed

+882
-807
lines changed

examples/case_studies/nyc_bym.ipynb

-1
Original file line numberDiff line numberDiff line change
@@ -634,7 +634,6 @@
634634
"outputs": [],
635635
"source": [
636636
"with pm.Model(coords=coords) as BYM_model:\n",
637-
"\n",
638637
" # intercept\n",
639638
" beta0 = pm.Normal(\"beta0\", 0, 1)\n",
640639
"\n",

examples/case_studies/nyc_bym.myst.md

-1
Original file line numberDiff line numberDiff line change
@@ -319,7 +319,6 @@ Lastly, we'll use a Poisson outcome distribution. The number of traffic accident
319319

320320
```{code-cell} ipython3
321321
with pm.Model(coords=coords) as BYM_model:
322-
323322
# intercept
324323
beta0 = pm.Normal("beta0", 0, 1)
325324

examples/generalized_linear_models/GLM-ordinal-regression.ipynb

+866-786
Large diffs are not rendered by default.

examples/generalized_linear_models/GLM-ordinal-regression.myst.md

+16-16
Original file line numberDiff line numberDiff line change
@@ -20,6 +20,10 @@ kernelspec:
2020
:::
2121

2222
```{code-cell} ipython3
23+
import warnings
24+
25+
warnings.simplefilter(action="ignore", category=FutureWarning)
26+
2327
import arviz as az
2428
import matplotlib.pyplot as plt
2529
import numpy as np
@@ -28,7 +32,6 @@ import pymc as pm
2832
import pytensor.tensor as pt
2933
import statsmodels.api as sm
3034
31-
from scipy.stats import bernoulli
3235
from statsmodels.miscmodels.ordinal_model import OrderedModel
3336
```
3437

@@ -52,14 +55,11 @@ Ordinal Regression is a statistical technique designed to **model** these kinds
5255

5356
```{code-cell} ipython3
5457
def make_data():
55-
np.random.seed(100)
56-
salary = np.random.normal(40, 10, 500)
57-
work_sat = np.random.beta(1, 0.4, 500)
58-
work_from_home = bernoulli.rvs(0.7, size=500)
58+
salary = rng.normal(40, 10, 500)
59+
work_sat = rng.beta(1, 0.4, 500)
60+
work_from_home = rng.binomial(n=1, p=0.7, size=500)
5961
work_from_home_calc = np.where(work_from_home, 1.4 * work_from_home, work_from_home)
60-
latent_rating = (
61-
0.08423 * salary + 0.2 * work_sat + work_from_home_calc + np.random.normal(0, 1, 500)
62-
)
62+
latent_rating = 0.08423 * salary + 0.2 * work_sat + work_from_home_calc + rng.normal(0, 1, 500)
6363
explicit_rating = np.round(latent_rating, 0)
6464
df = pd.DataFrame(
6565
{
@@ -154,19 +154,19 @@ In the data set above we've explicitly specified the relationship, and in the fo
154154

155155
The model specification for ordinal regression models typically makes use of the the logit transformation and the cumulative probabilities implied. For $c$ outcome categories with probabilities $\pi_1, .... \pi_n$ the *cumulative logits* are defined:
156156

157-
$$ logit[P(Y \leq j)] = log \frac{P(Y \leq j)}{1 - p(Y \leq j)} = log \frac{\pi_1 + ... + \pi_j}{\pi_{j+1} + ... + \pi_n} \text{ where j = 1, ..., c-1} $$
157+
$$ \text{logit}[P(Y \leq j)] = \log \frac{P(Y \leq j)}{1 - p(Y \leq j)} = \log \frac{\pi_1 + ... + \pi_j}{\pi_{j+1} + ... + \pi_n} \text{ where j = 1, ..., c-1} $$
158158

159159
This gets employed in a regression context where we specify the factors which determine our latent outcome in a linear fashion:
160160

161-
$$ logit[P(Y \leq j)] = \alpha_{j} + \beta'x $$
161+
$$ \text{logit}[P(Y \leq j)] = \alpha_{j} + \beta'x $$
162162

163163
which implies that:
164164

165-
$$ P(Y \leq j) = \frac{exp(\alpha_{j} + \beta'x)}{1 + exp(\alpha_{j} + \beta'x)} $$
165+
$$ P(Y \leq j) = \frac{\exp(\alpha_{j} + \beta'x)}{1 + \exp(\alpha_{j} + \beta'x)} $$
166166

167167
and that the probability for belonging within a particular category $j$ is determined by the probability of being in the cell defined by:
168168

169-
$$ P(Y = j) = \frac{exp(\alpha_{j} + \beta'x)}{1 + exp(\alpha_{j} + \beta'x)} - \frac{exp(\alpha_{j-1} + \beta'x)}{1 + exp(\alpha_{j-1} + \beta'x)} $$
169+
$$ P(Y = j) = \frac{\exp(\alpha_{j} + \beta'x)}{1 + \exp(\alpha_{j} + \beta'x)} - \frac{\exp(\alpha_{j-1} + \beta'x)}{1 + \exp(\alpha_{j-1} + \beta'x)} $$
170170

171171
One nice feature of ordinal regressions specified in this fashion is that the interpretation of the coefficients on the beta terms remain the same across each interval on the latent space. The interpretaiton of the model parameters is typical: a unit increase in $x_{k}$ corresponds to an increase in $Y_{latent}$ of $\beta_{k}$ Similar interpretation holds for probit regression specification too. However we must be careful about comparing the interpretation of coefficients across different model specifications with different variables. The above coefficient interpretation makes sense as conditional interpretation based on holding fixed precisely the variables in the model. Adding or removing variables changes the conditionalisation which breaks the comparability of the models due the phenomena of non-collapsability. We'll show below how it's better to compare the models on their predictive implications using the posterior predictive distribution.
172172

@@ -207,7 +207,7 @@ def make_model(priors, model_spec=1, constrained_uniform=False, logit=True):
207207
"cutpoints",
208208
mu=priors["mu"],
209209
sigma=sigma,
210-
transform=pm.distributions.transforms.univariate_ordered,
210+
transform=pm.distributions.transforms.ordered,
211211
)
212212
213213
if model_spec == 1:
@@ -421,7 +421,7 @@ calc_not_wfh = [
421421
+ 0 * betas_posterior[2, :]
422422
for i in range(500)
423423
]
424-
sal = np.random.normal(25, 5, 500)
424+
sal = rng.normal(25, 5, 500)
425425
calc_wfh_and_low_sal = [
426426
sal[i] * betas_posterior[0, :]
427427
+ df.iloc[i]["work_sat"] * betas_posterior[1, :]
@@ -505,8 +505,6 @@ def constrainedUniform(N, group, min=0, max=1):
505505
We will fit this data with both an ordinal model and as a metric. This will show how the ordinal fit is subtantially more compelling.
506506

507507
```{code-cell} ipython3
508-
:tags: [hide-output]
509-
510508
K = 5
511509
movies_by_rating = movies_by_rating[movies_by_rating["movie_id"].isin([1, 2, 3, 4, 5, 6])]
512510
indx, unique = pd.factorize(movies_by_rating["movie_id"])
@@ -641,7 +639,9 @@ In this notebook we've seen how to build ordinal regression models with PyMC and
641639
+++
642640

643641
## Authors
642+
644643
- Authored by [Nathaniel Forde](https://github.com/NathanielF) in June 2023
644+
- Updated by [Miha Gazvoda](https://mihagazvoda.com) in September 2023
645645

646646
+++
647647

examples/howto/model_builder.ipynb

-2
Original file line numberDiff line numberDiff line change
@@ -46,7 +46,6 @@
4646
"metadata": {},
4747
"outputs": [],
4848
"source": [
49-
5049
"from typing import Dict, List, Optional, Tuple, Union\n",
5150
"\n",
5251
"import arviz as az\n",
@@ -206,7 +205,6 @@
206205
" self._generate_and_preprocess_model_data(X_values, y_values)\n",
207206
"\n",
208207
" with pm.Model(coords=self.model_coords) as self.model:\n",
209-
"\n",
210208
" # Create mutable data containers\n",
211209
" x_data = pm.MutableData(\"x_data\", X_values)\n",
212210
" y_data = pm.MutableData(\"y_data\", y_values)\n",

examples/howto/model_builder.myst.md

-1
Original file line numberDiff line numberDiff line change
@@ -119,7 +119,6 @@ class LinearModel(ModelBuilder):
119119
self._generate_and_preprocess_model_data(X_values, y_values)
120120
121121
with pm.Model(coords=self.model_coords) as self.model:
122-
123122
# Create mutable data containers
124123
x_data = pm.MutableData("x_data", X_values)
125124
y_data = pm.MutableData("y_data", y_values)

0 commit comments

Comments
 (0)