jupytext | kernelspec | ||||||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
|
|
(moderation_analysis)=
:::{post} March, 2022 :tags: moderation, path analysis, :category: beginner :author: Benjamin T. Vincent :::
This notebook covers Bayesian moderation analysis. This is appropriate when we believe that one predictor variable (the moderator) may influence the linear relationship between another predictor variable and an outcome. Here we look at an example where we look at the relationship between hours of training and muscle mass, where it may be that age (the moderating variable) affects this relationship.
This is not intended as a one-stop solution to a wide variety of data analysis problems, rather, it is intended as an educational exposition to show how moderation analysis works and how to conduct Bayesian parameter estimation in PyMC.
Note that this is sometimes mixed up with mediation analysis. Mediation analysis is appropriate when we believe the effect of a predictor variable upon an outcome variable is (partially, or fully) mediated through a 3rd mediating variable. Readers are referred to the textbook by {cite:t}hayes2017introduction
as a comprehensive (albeit Frequentist) guide to moderation and related models as well as the PyMC example {ref}mediation_analysis
.
import arviz as az
import matplotlib.pyplot as plt
import numpy as np
import pandas as pd
import pymc as pm
import xarray as xr
from matplotlib.cm import ScalarMappable
from matplotlib.colors import Normalize
az.style.use("arviz-darkgrid")
%config InlineBackend.figure_format = 'retina'
First in the (hidden) code cell below, we define some helper functions for plotting that we will use later.
:tags: [hide-input]
def make_scalarMap(m):
"""Create a Matplotlib `ScalarMappable` so we can use a consistent colormap across both data points and posterior predictive lines. We can use `scalarMap.cmap` to use as a colormap, and `scalarMap.to_rgba(moderator_value)` to grab a colour for a given moderator value."""
return ScalarMappable(norm=Normalize(vmin=np.min(m), vmax=np.max(m)), cmap="viridis")
def plot_data(x, moderator, y, ax=None):
if ax is None:
fig, ax = plt.subplots(1, 1)
else:
fig = plt.gcf()
h = ax.scatter(x, y, c=moderator, cmap=scalarMap.cmap)
ax.set(xlabel="x", ylabel="y")
# colourbar for moderator
cbar = fig.colorbar(h)
cbar.ax.set_ylabel("moderator")
return ax
def posterior_prediction_plot(result, x, moderator, m_quantiles, ax=None):
"""Plot posterior predicted `y`"""
if ax is None:
fig, ax = plt.subplots(1, 1)
post = result.posterior.stack(sample=("chain", "draw"))
xi = xr.DataArray(np.linspace(np.min(x), np.max(x), 20), dims=["x_plot"])
m_levels = result.constant_data["m"].quantile(m_quantiles).rename({"quantile": "m_level"})
for p, m in zip(m_quantiles, m_levels):
y = post.β0 + post.β1 * xi + post.β2 * xi * m + post.β3 * m
region = y.quantile([0.025, 0.5, 0.975], dim="sample")
ax.fill_between(
xi,
region.sel(quantile=0.025),
region.sel(quantile=0.975),
alpha=0.2,
color=scalarMap.to_rgba(m),
edgecolor="w",
)
ax.plot(
xi,
region.sel(quantile=0.5),
color=scalarMap.to_rgba(m),
linewidth=2,
label=f"{p*100}th percentile of moderator",
)
ax.legend(fontsize=9)
ax.set(xlabel="x", ylabel="y")
return ax
def plot_moderation_effect(m, m_quantiles, ax=None):
"""Spotlight graph"""
if ax is None:
fig, ax = plt.subplots(1, 1)
post = result.posterior.stack(sample=("chain", "draw"))
# calculate 95% CI region and median
xi = xr.DataArray(np.linspace(np.min(m), np.max(m), 20), dims=["x_plot"])
rate = post.β1 + post.β2 * xi
region = rate.quantile([0.025, 0.5, 0.975], dim="sample")
ax.fill_between(
xi,
region.sel(quantile=0.025),
region.sel(quantile=0.975),
alpha=0.2,
color="k",
edgecolor="w",
)
ax.plot(xi, region.sel(quantile=0.5), color="k", linewidth=2)
# plot points at each percentile of m
percentile_list = np.array(m_quantiles) * 100
m_levels = np.percentile(m, percentile_list)
for p, m in zip(percentile_list, m_levels):
ax.plot(
m,
np.mean(post.β1) + np.mean(post.β2) * m,
"o",
c=scalarMap.to_rgba(m),
markersize=10,
label=f"{p}th percentile of moderator",
)
ax.legend(fontsize=9)
ax.set(
title="Spotlight graph",
xlabel="$moderator$",
ylabel=r"$\beta_1 + \beta_2 \cdot moderator$",
)
I've taken inspiration from a blog post {cite:t}vandenbergSPSS
which examines whether age influences (moderates) the effect of training on muscle percentage. We might speculate that more training results in higher muscle mass, at least for younger people. But it might be the case that the relationship between training and muscle mass changes with age - perhaps training is less effective at increasing muscle mass in older age?
The schematic box and arrow notation often used to represent moderation is shown by an arrow from the moderating variable to the line between a predictor and an outcome variable.
It can be useful to use consistent notation, so we will define:
-
$x$ as the main predictor variable. In this example it is training. -
$y$ as the outcome variable. In this example it is muscle percentage. -
$m$ as the moderator. In this example it is age.
While the visual schematic (above) is a useful shorthand to understand complex models when you already know what moderation is, you can't derive it from the diagram alone. So let us formally specify the moderation model - it defines an outcome variable
where
-
$\beta_0$ is the intercept, its value does not have that much importance in the interpretation of this model. -
$\beta_1$ is the rate at which$y$ (muscle percentage) increases per unit of$x$ (training hours). -
$\beta_2$ is the coefficient for the interaction term$x \cdot m$ . -
$\beta_3$ is the rate at which$y$ (muscle percentage) increases per unit of$m$ (age). -
$\sigma$ is the standard deviation of the observation noise.
We can see that the mean
We can get some insight into why this is the case by thinking about this as a multiple linear regression with
and if we define that as a linear function,
We can use
+++
First, we will load up our example data and do some basic data visualisation. The dataset is taken from {cite:t}vandenbergSPSS
but it is unclear if this corresponds to real life research data or if it was simulated.
def load_data():
try:
df = pd.read_csv("../data/muscle-percent-males-interaction.csv")
except:
df = pd.read_csv(pm.get_data("muscle-percent-males-interaction.csv"))
x = df["thours"].values
m = df["age"].values
y = df["mperc"].values
return (x, y, m)
x, y, m = load_data()
# Make a scalar color map for this dataset (Just for plotting, nothing to do with inference)
scalarMap = make_scalarMap(m)
fig, ax = plt.subplots(1, 3, figsize=(14, 3))
ax[0].hist(x, alpha=0.5)
ax[0].set(xlabel="training, $x$")
ax[1].hist(m, alpha=0.5)
ax[1].set(xlabel="age, $m$")
ax[2].hist(y, alpha=0.5)
ax[2].set(xlabel="muscle percentage, $y$");
def model_factory(x, m, y):
with pm.Model() as model:
x = pm.ConstantData("x", x)
m = pm.ConstantData("m", m)
# priors
β0 = pm.Normal("β0", mu=0, sd=10)
β1 = pm.Normal("β1", mu=0, sd=10)
β2 = pm.Normal("β2", mu=0, sd=10)
β3 = pm.Normal("β3", mu=0, sd=10)
σ = pm.HalfCauchy("σ", 1)
# likelihood
y = pm.Normal("y", mu=β0 + (β1 * x) + (β2 * x * m) + (β3 * m), sd=σ, observed=y, dims="obs")
return model
model = model_factory(x, m, y)
Plot the model graph to confirm it is as intended.
pm.model_to_graphviz(model)
with model:
result = pm.sample(draws=1000, tune=1000, random_seed=42, nuts={"target_accept": 0.9})
Visualise the trace to check for convergence.
az.plot_trace(result);
We have good chain mixing and the posteriors for each chain look very similar, so no problems in that regard.
+++
First we will use a pair plot to look at joint posterior distributions. This might help us identify any estimation issues with the interaction term (see the discussion below about multicollinearity).
az.plot_pair(
result,
marginals=True,
point_estimate="median",
figsize=(12, 12),
scatter_kwargs={"alpha": 0.01},
);
And just for the sake of completeness, we can plot the posterior distributions for each of the
az.plot_posterior(result, var_names=["β1", "β2", "β3"], figsize=(14, 4));
For example, from an estimation (in contrast to a hypothesis testing) perspective, we could look at the posterior over
+++
Define a set of quantiles of
m_quantiles = [0.025, 0.25, 0.5, 0.75, 0.975]
Here we will plot the data alongside model posterior predictive checks. This can be a useful visual method of comparing the model predictions against the data.
fig, ax = plt.subplots(figsize=(10, 6))
ax = plot_data(x, m, y, ax=ax)
posterior_prediction_plot(result, x, m, m_quantiles, ax=ax)
ax.set_title("Data and posterior prediction");
We can also visualise the moderation effect by plotting spiller2013spotlights
and {cite:t}mcclelland2017multicollinearity
.
fig, ax = plt.subplots(1, 2, figsize=(10, 5))
plot_moderation_effect(m, m_quantiles, ax[0])
az.plot_posterior(result, var_names="β2", ax=ax[1]);
The expression
+++
Readers should be aware that there are issues around mean-centering and multicollinearity. The original SPSS Moderation Regression Tutorial did mean-centre the predictor variables
One effect of mean centering is to change the interpretation of the parameter estimates. In this notebook, we did not mean center the variables which will affect the parameter estimates and their interpretation. It is not that one is correct or incorrect, but one must be cognisant of how mean-centering (or not) affects the interpretation of parameter estimates. Readers are again directed to {cite:t}hayes2017introduction
for a more in-depth consideration of mean-centering in moderation analyses.
Another issue, particularly relevant to moderation analysis is multicollinearity, where one predictor variable is well-described as a linear combination of other predictors. This is clearly the case in moderation analysis as the interaction term
{cite:t}iacobucci2016mean
explored the issues of mean-centering and multicollinearity and conclude:
When all is said and done, should a researcher mean center the X1 and X2 variables before computing a product term X1X2 to include in a moderated multiple regression? It depends. Mean centering is advisable when: (1) the predictor variables are measured on scales with arbitrary zeros and the researcher seeks to enhance the interpretation of the regression results vis-à-vis the variables’ means rather than the arbitrary zero points, or (2) the research questions involve testing the main effect terms in addition to the interaction term and the researcher seeks to obtain these statistical tests without the interference of the so-called nonessential multicollinearity. On the other hand, mean centering may be bypassed when: (1) the research question involves primarily the test of the interaction term, with no regard for the lower order main effect terms, or (2) the research question involves primarily the assessment of the overall fit of the model, the R2, with no interest in apportioning the explained variability across the predictors, main effects or interaction alike.
This was critiqued however by {cite:t}mcclelland2017multicollinearity
who claimed that {cite:t}iacobucci2016mean
made a number of errors, and that multicollinearity is a red herring:
Multicollinearity is irrelevant to the search for moderator variables, contrary to the implications of Iacobucci, Schneider, Popovich, and Bakamitsos (Behavior Research Methods, 2016, this issue). Multicollinearity is like the red herring in a mystery novel that distracts the statistical detective from the pursuit of a true moderator relationship.
They state:
Researchers using MMR [moderated multiple regression] need not compute any multicollinearity diagnostics nor worry about it at all. They need not use mean-centering or the orthogonal transformation or do anything else to avoid the purported problems of multicollinearity. The only purpose of those transformations is to facilitate understanding of MMR models.
Bearing in mind {cite:t}mcclelland2017multicollinearity
took a frequentist hypothesis testing (not a Bayesian approach) their take-home points can be paraphrased as:
- Fit the regression model,
$y \sim \mathrm{Normal}(\beta_0 + \beta_1 \cdot x + \beta_2 \cdot x \cdot m + \beta_3 \cdot m, \sigma^2)$ , with original (not mean-centred) data. - If the main interest is on the moderation effect, then focus upon
$\beta_2$ . - Transformations are useful if conditional relationships are to be highlighted.
- "... researchers who wish to examine all possible conditional relationships or to help their readers who might want to consider other conditional relationships, should construct the [spotlight] graph..."
But readers are strongly encouraged to read {cite:t}mcclelland2017multicollinearity
for more details, as well as the reply from {cite:t}iacobucci2017mean
. Readers should also be aware that there are conflicting opinions and recommendations about mean centering etc in textbooks (see Further Reading below), some of which are published before 2017. None of these textbooks explicitly cite {cite:t}mcclelland2017multicollinearity
, so it is unclear if the textbook authors are unaware of, agree with, or disagree with {cite:t}mcclelland2017multicollinearity
.
- Further information about the 'moderation effect', or what {cite:t}
mcclelland2017multicollinearity
called a spotlight graphs, can be found in {cite:t}bauer2005probing
and {cite:t}spiller2013spotlights
. Although these papers take a frequentist (not Bayesian) perspective. - {cite:t}
zhang2017moderation
compare maximum likelihood and Bayesian methods for moderation analysis with missing predictor variables. - Multicollinearity, data centering, and linear models with interaction terms are also discussed in a number of prominent Bayesian text books {cite:p}
gelman2013bayesian, gelman2020regression,kruschke2014doing,mcelreath2018statistical
.
+++
- Authored by Benjamin T. Vincent in June 2021
- Updated by Benjamin T. Vincent in March 2022
+++
:::{bibliography} :filter: docname in docnames :::
+++
%load_ext watermark
%watermark -n -u -v -iv -w -p aesara,aeppl,xarray
:::{include} ../page_footer.md :::