Moderation analysis example #142

drbenvincent · 2021-04-06T12:12:07Z

Hi all
Here's a pull request adding an example notebook covering moderation analysis. I wasn't quite sure on the policy of where to store an embedded image, so I've just included one in the directory that the notebook is in. Just let me know if you need any changes made.

review-notebook-app · 2021-04-06T12:12:11Z

Check out this pull request on

See visual diffs & provide feedback on Jupyter Notebooks.

Powered by ReviewNB

review-notebook-app · 2021-04-06T12:34:06Z

View / edit / reply to this conversation on ReviewNB

ricardoV94 commented on 2021-04-06T12:34:05Z
----------------------------------------------------------------

Perhaps add a note at the end summarizing how mediation differs from moderation, instead of mentioning it here without further context?

drbenvincent commented on 2021-04-06T18:27:14Z
----------------------------------------------------------------

I've added some more info about mediation analysis now

review-notebook-app · 2021-04-06T12:34:06Z

View / edit / reply to this conversation on ReviewNB

ricardoV94 commented on 2021-04-06T12:34:06Z
----------------------------------------------------------------

Is this a real dataset? If so maybe just mention in 1 or 2 lines its source. If it's fake it's useful to mention that as well

drbenvincent commented on 2021-04-06T18:27:49Z
----------------------------------------------------------------

I've clarified the source of the data and that it is unclear if it was from a real research study or if it was simulated.

review-notebook-app · 2021-04-06T12:34:07Z

View / edit / reply to this conversation on ReviewNB

ricardoV94 commented on 2021-04-06T12:34:06Z
----------------------------------------------------------------

If you happen to know of any, it could be useful to point to examples of more advanced moderation analysis in a bayesian setting.

drbenvincent commented on 2021-04-06T18:28:13Z
----------------------------------------------------------------

Done. Added a bit more to the discussion part, as well as added a Further Reading section

ricardoV94 · 2021-04-06T12:35:34Z

This is really neat! The visualizations with the moderator variable really speak for themselves.

I left some minor comments if you want to address them :)

drbenvincent · 2021-04-06T12:39:15Z

Thanks @ricardoV94, I should be able to deal with these points soon.

fonnesbeck · 2021-04-06T15:37:03Z

This is great, thank you. The data import should probably use pm.get_data, rather than a relative path string.

review-notebook-app · 2021-04-06T17:18:46Z

View / edit / reply to this conversation on ReviewNB

OriolAbril commented on 2021-04-06T17:18:45Z
----------------------------------------------------------------

This should be splitted in two, with random seed and style in the 2nd cell. It also should not use numpy global random seed but use the new [Generator](https://numpy.org/doc/stable/reference/random/generator.html#numpy.random.Generator) instead. We want to follow the recommendations on [numpy docs](https://numpy.org/doc/stable/reference/random/index.html#quick-start) about random number generation.

drbenvincent commented on 2021-04-06T18:50:45Z
----------------------------------------------------------------

If I'm not actually generating random numbers in the notebook, and just want to use a seed to get consistent inference results, then I guess I can remove this and just pass in random_seed=42 as a kwarg to pm.sample()?

(Thanks for all the comments - I'll work through them)

OriolAbril commented on 2021-04-06T21:57:31Z
----------------------------------------------------------------

Yes, passing the kwarg to sample is the ideal practice for that.

drbenvincent commented on 2021-04-07T15:44:43Z
----------------------------------------------------------------

Done

review-notebook-app · 2021-04-06T17:18:46Z

View / edit / reply to this conversation on ReviewNB

OriolAbril commented on 2021-04-06T17:18:46Z
----------------------------------------------------------------

As Chris was saying, data loading should use a try except. First try to load the local file if present (which will be if user cloned the pymc-examples repo), otherwise download it from github (pm.get_data does that). There is one example of that in https://nbviewer.jupyter.org/github/pymc-devs/pymc-examples/blob/main/examples/case_studies/rugby_analytics.ipynb

drbenvincent commented on 2021-04-07T15:45:20Z
----------------------------------------------------------------

Done. I've followed the example, so believe I'm doing this correctly. Let me know if not.

OriolAbril commented on 2021-04-07T17:35:29Z
----------------------------------------------------------------

Looks good, thanks!

review-notebook-app · 2021-04-06T17:18:47Z

View / edit / reply to this conversation on ReviewNB

OriolAbril commented on 2021-04-06T17:18:46Z
----------------------------------------------------------------

As a general comment, I personally prefer having this code together with the output instead of writing functions that are only called once. I don't know how other people feel about that. I probably have a higher tendency to read (and like) plotting code than the average and therefore like having both code and output as close as possible and not having to worry about local and global variables.

Regarding the model itself, I would use pm.Data for x and m in order to have them as variables in the resulting inferencedata and converted to xarray automatically. It will make plotting and postprocessing much easier. Option 1:

with pm.Model(coords={"obs_id": np.arange(len(x))}) as model:
        x_ = pm.Data("x", x, dims="obs_id")
        m_ = pm.Data("m", m, dims="obs_id")
        # priors
        β0 = pm.Normal("β0", mu=0, sd=10)
        β1 = pm.Normal("β1", mu=0, sd=10)
        β2 = pm.Normal("β2", mu=0, sd=10)
        β3 = pm.Normal("β3", mu=0, sd=10)
        σ = pm.HalfCauchy("σ", 1)
        # likelihood
        y = pm.Normal("y", mu=β0 + (β1 * x_) + (β2 * x_ * m_) + (β3 * m_), sd=σ, observed=y, dims="obs_id")

option 2: leave the model as is and modify:

result = pm.sample(1000, tune=2000, target_accept=0.9, return_inferencedata=True, idata_kwargs={"dims": {"x": ["obs_id"], "m": ["obs_id"], "y": ["obs_id"]}})

drbenvincent commented on 2021-04-07T14:48:40Z
----------------------------------------------------------------

Not having it in a function results in y getting over-written as a pymc3.model.ObservedRV variable which causes a problem when you try to use y in posterior prediction plots later on. So you have to name it y_ and it starts to get unclear. Personally I prefer keeping all the PyMC3 variables out of the global scope - but I'll follow this if it's the house style.

drbenvincent commented on 2021-04-07T15:46:31Z
----------------------------------------------------------------

Partially completed. Working through the coords={"obs_id": np.arange(len(x))} part to make sure I understand it properly.

OriolAbril commented on 2021-04-07T17:48:52Z
----------------------------------------------------------------

Not having it in a function results in y getting over-written as a pymc3.model.ObservedRV variable which causes a problem when you try to use y in posterior prediction plots later on. So you have to name it y_ and it starts to get unclear

Makes sense, the model creation can be kept inside a function, and move only the call to sample outside. We call that approach the model_factory one, in that the function takes data as input and returns the model. Something like:

def model_factory(x, m, y):
    # model
    ...
model = model_factory(x, m, y)

with model:

result = pm.sample()

This has some interesting advantages when it comes to posterior predictive sampling, generating predictions or doing cross-validation, and should keep the variables out of the global scope. As a general comment, for the observed variable, if you are not planning on using it anywhere (same for deterministics that are not used afterwards and only used to track a variable) you can use _ = pm.RandomVar(...) to avoid having a variable storing it.

Regarding coordinates, these are used at two specific moments only. When building the model and using dims instead of shape, pymc3 will look into the coords dict and replace the dims by the corresponding shape, so for that we don't really care about the coordinate values, only about it's length. When converting to InferenceData, both dims and coords are passed to xarray so they persist and are stored into the resulting InferenceData object. You can then use xarray to take care of broadcasting and alignment of all consequent computation, similar to pandas, but in n-d, not only for tabular data.

drbenvincent commented on 2021-04-13T15:22:18Z
----------------------------------------------------------------

I've opted for the factory pattern, which I quite like.

So far I've held off adding the obs_id stuff until comment down below about xarray is decided on. Without doing that, it seems like an addition that might confuse a beginner perhaps.

drbenvincent commented on 2021-05-02T12:34:11Z
----------------------------------------------------------------

I've now implemented option 2

review-notebook-app · 2021-04-06T17:18:47Z

View / edit / reply to this conversation on ReviewNB

OriolAbril commented on 2021-04-06T17:18:47Z
----------------------------------------------------------------

I think that

param_names = ["β1", "β2", "β3"]
az.plot_posterior(result, var_names=param_names, figsize=(14, 4));

will give the same result. if it doesn't, this will

param_names = ["β1", "β2", "β3"]
fig, ax = plt.subplots(1, len(param_names), figsize=(14, 4))
az.plot_posterior(result, var_names=param_names, ax=ax);

drbenvincent commented on 2021-05-05T14:53:53Z
----------------------------------------------------------------

this was done a while back

review-notebook-app · 2021-04-06T17:18:48Z

View / edit / reply to this conversation on ReviewNB

OriolAbril commented on 2021-04-06T17:18:47Z
----------------------------------------------------------------

I think this function should either be removed completely or reduced to subset the data to use only some random draws.

drbenvincent commented on 2021-05-05T14:53:38Z
----------------------------------------------------------------

This was removed in the last commit

review-notebook-app · 2021-04-06T17:18:48Z

View / edit / reply to this conversation on ReviewNB

OriolAbril commented on 2021-04-06T17:18:48Z
----------------------------------------------------------------

Provided the model is modified as indicate above:

post = result.posterior
xi = xr.DataArray(np.linspace(np.min(x), np.max(x), 20), dims=["x_plot"])
m_levels = result.constant_data["m"].quantile(quantile_list).rename({"quantile": "m_level"})
_y = post.β0 + post.β1 * xi + post.β2 * xi * m_levels + post.β3 * m

region = _y.quanitile([.025, .5, .95], dim=("chain", "draw"))

for q in quantile_list:

region_q = region.sel(m_level=q)

ax.fill_between(

xi,

region_q.sel(quantile=.025)  # or directly region.sel(m_level=q, quantile=.025),

...

xarray will take care of all the broadcasting and alignment of arrays and we can forget about np.newaxis and the like. A similar thing is probably possible below, I'll try to get there with more specific guidance whenever I have more time. Also, I have not tried the code above, there may be some minor changes needed but I'd say the probability it runs as is is high.

side note, even if it's in a function defined inside a notebook that is only called once I don't think we should use mutable objects like lists as default arguments.

drbenvincent commented on 2021-04-13T15:19:19Z
----------------------------------------------------------------

So I had a go with this. I don't think it's as simple as it might seem.

After updating the model in the way suggested above, the broadcasting in _y = post.β0 + post.β1 * xi + post.β2 * xi * m_levels + post.β3 * m

doesn't work the way you want it to work. The parameters result.posterior.β0 etc are shape (n_chains, s_samples) and they need to be squeezed. So the apparently unsightly code in extract_samples might not be escapable. Then again, I am new to xarray, so maybe there's some nice way to do it. But I don't think extract_samples is that horrific to be honest. Let me know what you think.

Currently I've left things as they are, but I can persist with this approach if you think it's a deal breaker.

OriolAbril commented on 2021-04-13T17:57:23Z
----------------------------------------------------------------

If the chain and draw dimensions need to be flattened you can use post = result.posterior.stack(sample=("chain", "draw")) which you can then subsample with random indexes with post.isel(sample=rng....), independently of: the number of variables in the post tdataset, the dimensions and order of the dimensions in each variable in post or in whatever dataarray/dataset generated in postprocessing. Note that you can get the length of the dimension also by name withlen(post.sample)

I think that precisely for begginers, converting extract_samples to two lines of code that explicitly use the dimension names and don't need np.newaxis nor : will be extremely beneficial. I love np.newaxis now and can go to great lengths with advanced indexing, einsums and whatever is needed to avoid loops, but I don't think this is begginer friendly at all, it is only friendly to people comfortable with numpy/scipy/pandas but don't know pymc3. I even have difficulties understanding my own code years or months later in some of these loop avoiding cases.

drbenvincent · 2021-04-06T18:27:15Z

I've added some more info about mediation analysis now

Moderation analysis example #142

Moderation analysis example #142

Conversation

drbenvincent commented Apr 6, 2021

review-notebook-app bot commented Apr 6, 2021

review-notebook-app bot commented Apr 6, 2021 • edited Loading

review-notebook-app bot commented Apr 6, 2021 • edited Loading

review-notebook-app bot commented Apr 6, 2021 • edited Loading

ricardoV94 commented Apr 6, 2021

drbenvincent commented Apr 6, 2021

fonnesbeck commented Apr 6, 2021

review-notebook-app bot commented Apr 6, 2021 • edited Loading

review-notebook-app bot commented Apr 6, 2021 • edited Loading

review-notebook-app bot commented Apr 6, 2021 • edited Loading

review-notebook-app bot commented Apr 6, 2021 • edited Loading

review-notebook-app bot commented Apr 6, 2021 • edited Loading

review-notebook-app bot commented Apr 6, 2021 • edited Loading

drbenvincent commented Apr 6, 2021

drbenvincent commented Apr 6, 2021

drbenvincent commented Apr 6, 2021

drbenvincent commented Apr 6, 2021

OriolAbril commented Apr 6, 2021

review-notebook-app bot commented Apr 7, 2021 • edited Loading

review-notebook-app bot commented Apr 7, 2021 • edited Loading

drbenvincent commented Apr 7, 2021

drbenvincent commented Apr 7, 2021

drbenvincent commented Apr 7, 2021

drbenvincent commented Apr 7, 2021

drbenvincent commented Apr 7, 2021

drbenvincent commented Apr 7, 2021

OriolAbril commented Apr 7, 2021

OriolAbril commented Apr 7, 2021

drbenvincent commented Apr 13, 2021

drbenvincent commented Apr 13, 2021

OriolAbril commented Apr 13, 2021

drbenvincent commented May 2, 2021

drbenvincent commented May 2, 2021

review-notebook-app bot commented May 4, 2021 • edited Loading

review-notebook-app bot commented May 4, 2021 • edited Loading

drbenvincent commented May 5, 2021

drbenvincent commented May 5, 2021

drbenvincent commented May 5, 2021

drbenvincent commented May 5, 2021

drbenvincent commented May 5, 2021

OriolAbril left a comment

Choose a reason for hiding this comment

review-notebook-app bot commented Apr 6, 2021 •

edited

Loading

review-notebook-app bot commented Apr 6, 2021 •

edited

Loading

review-notebook-app bot commented Apr 6, 2021 •

edited

Loading

review-notebook-app bot commented Apr 6, 2021 •

edited

Loading

review-notebook-app bot commented Apr 6, 2021 •

edited

Loading

review-notebook-app bot commented Apr 6, 2021 •

edited

Loading

review-notebook-app bot commented Apr 6, 2021 •

edited

Loading

review-notebook-app bot commented Apr 6, 2021 •

edited

Loading

review-notebook-app bot commented Apr 6, 2021 •

edited

Loading

review-notebook-app bot commented Apr 7, 2021 •

edited

Loading

review-notebook-app bot commented Apr 7, 2021 •

edited

Loading

review-notebook-app bot commented May 4, 2021 •

edited

Loading

review-notebook-app bot commented May 4, 2021 •

edited

Loading