-
-
Notifications
You must be signed in to change notification settings - Fork 268
Moderation analysis example #142
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Check out this pull request on See visual diffs & provide feedback on Jupyter Notebooks. Powered by ReviewNB |
View / edit / reply to this conversation on ReviewNB ricardoV94 commented on 2021-04-06T12:34:05Z Perhaps add a note at the end summarizing how mediation differs from moderation, instead of mentioning it here without further context? drbenvincent commented on 2021-04-06T18:27:14Z I've added some more info about mediation analysis now |
View / edit / reply to this conversation on ReviewNB ricardoV94 commented on 2021-04-06T12:34:06Z Is this a real dataset? If so maybe just mention in 1 or 2 lines its source. If it's fake it's useful to mention that as well drbenvincent commented on 2021-04-06T18:27:49Z I've clarified the source of the data and that it is unclear if it was from a real research study or if it was simulated. |
View / edit / reply to this conversation on ReviewNB ricardoV94 commented on 2021-04-06T12:34:06Z If you happen to know of any, it could be useful to point to examples of more advanced moderation analysis in a bayesian setting.
drbenvincent commented on 2021-04-06T18:28:13Z Done. Added a bit more to the discussion part, as well as added a Further Reading section |
This is really neat! The visualizations with the moderator variable really speak for themselves. I left some minor comments if you want to address them :) |
Thanks @ricardoV94, I should be able to deal with these points soon. |
This is great, thank you. The data import should probably use |
View / edit / reply to this conversation on ReviewNB OriolAbril commented on 2021-04-06T17:18:45Z This should be splitted in two, with random seed and style in the 2nd cell. It also should not use numpy global random seed but use the new [Generator](https://numpy.org/doc/stable/reference/random/generator.html#numpy.random.Generator) instead. We want to follow the recommendations on [numpy docs](https://numpy.org/doc/stable/reference/random/index.html#quick-start) about random number generation. drbenvincent commented on 2021-04-06T18:50:45Z If I'm not actually generating random numbers in the notebook, and just want to use a seed to get consistent inference results, then I guess I can remove this and just pass in
(Thanks for all the comments - I'll work through them)
OriolAbril commented on 2021-04-06T21:57:31Z Yes, passing the kwarg to sample is the ideal practice for that. drbenvincent commented on 2021-04-07T15:44:43Z Done |
View / edit / reply to this conversation on ReviewNB OriolAbril commented on 2021-04-06T17:18:46Z As Chris was saying, data loading should use a drbenvincent commented on 2021-04-07T15:45:20Z Done. I've followed the example, so believe I'm doing this correctly. Let me know if not. OriolAbril commented on 2021-04-07T17:35:29Z Looks good, thanks! |
View / edit / reply to this conversation on ReviewNB OriolAbril commented on 2021-04-06T17:18:46Z As a general comment, I personally prefer having this code together with the output instead of writing functions that are only called once. I don't know how other people feel about that. I probably have a higher tendency to read (and like) plotting code than the average and therefore like having both code and output as close as possible and not having to worry about local and global variables.
Regarding the model itself, I would use
with pm.Model(coords={"obs_id": np.arange(len(x))}) as model: x_ = pm.Data("x", x, dims="obs_id") m_ = pm.Data("m", m, dims="obs_id") # priors β0 = pm.Normal("β0", mu=0, sd=10) β1 = pm.Normal("β1", mu=0, sd=10) β2 = pm.Normal("β2", mu=0, sd=10) β3 = pm.Normal("β3", mu=0, sd=10) σ = pm.HalfCauchy("σ", 1) # likelihood y = pm.Normal("y", mu=β0 + (β1 * x_) + (β2 * x_ * m_) + (β3 * m_), sd=σ, observed=y, dims="obs_id") option 2: leave the model as is and modify:
result = pm.sample(1000, tune=2000, target_accept=0.9, return_inferencedata=True, idata_kwargs={"dims": {"x": ["obs_id"], "m": ["obs_id"], "y": ["obs_id"]}}) drbenvincent commented on 2021-04-07T14:48:40Z Not having it in a function results in drbenvincent commented on 2021-04-07T15:46:31Z Partially completed. Working through the OriolAbril commented on 2021-04-07T17:48:52Z Not having it in a function results in
Makes sense, the model creation can be kept inside a function, and move only the call to sample outside. We call that approach the
def model_factory(x, m, y): # model ...
This has some interesting advantages when it comes to posterior predictive sampling, generating predictions or doing cross-validation, and should keep the variables out of the global scope. As a general comment, for the observed variable, if you are not planning on using it anywhere (same for deterministics that are not used afterwards and only used to track a variable) you can use
Regarding coordinates, these are used at two specific moments only. When building the model and using drbenvincent commented on 2021-04-13T15:22:18Z I've opted for the factory pattern, which I quite like.
So far I've held off adding the drbenvincent commented on 2021-05-02T12:34:11Z I've now implemented option 2 |
View / edit / reply to this conversation on ReviewNB OriolAbril commented on 2021-04-06T17:18:47Z I think that
param_names = ["β1", "β2", "β3"] az.plot_posterior(result, var_names=param_names, figsize=(14, 4));
will give the same result. if it doesn't, this will
param_names = ["β1", "β2", "β3"] fig, ax = plt.subplots(1, len(param_names), figsize=(14, 4)) az.plot_posterior(result, var_names=param_names, ax=ax); drbenvincent commented on 2021-05-05T14:53:53Z this was done a while back |
View / edit / reply to this conversation on ReviewNB OriolAbril commented on 2021-04-06T17:18:47Z I think this function should either be removed completely or reduced to subset the data to use only some random draws. drbenvincent commented on 2021-05-05T14:53:38Z This was removed in the last commit |
View / edit / reply to this conversation on ReviewNB OriolAbril commented on 2021-04-06T17:18:48Z Provided the model is modified as indicate above:
post = result.posterior xi = xr.DataArray(np.linspace(np.min(x), np.max(x), 20), dims=["x_plot"]) m_levels = result.constant_data["m"].quantile(quantile_list).rename({"quantile": "m_level"})
xarray will take care of all the broadcasting and alignment of arrays and we can forget about
side note, even if it's in a function defined inside a notebook that is only called once I don't think we should use mutable objects like lists as default arguments. drbenvincent commented on 2021-04-13T15:19:19Z So I had a go with this. I don't think it's as simple as it might seem. After updating the model in the way suggested above, the broadcasting in doesn't work the way you want it to work. The parameters
Currently I've left things as they are, but I can persist with this approach if you think it's a deal breaker. OriolAbril commented on 2021-04-13T17:57:23Z If the chain and draw dimensions need to be flattened you can use
I think that precisely for begginers, converting
|
I've added some more info about mediation analysis now View entire conversation on ReviewNB |
I've clarified the source of the data and that it is unclear if it was from a real research study or if it was simulated. View entire conversation on ReviewNB |
Done. Added a bit more to the discussion part, as well as added a Further Reading section View entire conversation on ReviewNB |
If I'm not actually generating random numbers in the notebook, and just want to use a seed to get consistent inference results, then I guess I can remove this and just pass in
(Thanks for all the comments - I'll work through them)
View entire conversation on ReviewNB |
Yes, passing the kwarg to sample is the ideal practice for that. View entire conversation on ReviewNB |
View / edit / reply to this conversation on ReviewNB ricardoV94 commented on 2021-04-07T14:13:17Z In the first paragraph, shouldn't it be "(the moderator)" and "(the moderating variable)"? drbenvincent commented on 2021-04-07T15:44:27Z Whoops! Fixed.
|
View / edit / reply to this conversation on ReviewNB ricardoV94 commented on 2021-04-07T14:13:18Z Perhaps you can refer here to the floodlight graph by name since it appears in the discussion. drbenvincent commented on 2021-04-07T15:48:25Z Turns out there are internal inconsistencies in McClelland et al (2016). Looks like it is actually a spotlight graph. But have implemented the suggestion. Although this refers to the plot below, not the main x, y plot, so the edits are below this point. |
Not having it in a function results in View entire conversation on ReviewNB |
McClelland is inconsistent, leading to confusion. It's a spotlight graph, not a floodlight graph.
Whoops! Fixed.
View entire conversation on ReviewNB |
Done View entire conversation on ReviewNB |
Done. I've followed the example, so believe I'm doing this correctly. Let me know if not. View entire conversation on ReviewNB |
Partially completed. Working through the View entire conversation on ReviewNB |
Turns out there are internal inconsistencies in McClelland et al (2016). Looks like it is actually a spotlight graph. But have implemented the suggestion. Although this refers to the plot below, not the main x, y plot, so the edits are below this point. View entire conversation on ReviewNB |
Looks good, thanks! View entire conversation on ReviewNB |
Not having it in a function results in
Makes sense, the model creation can be kept inside a function, and move only the call to sample outside. We call that approach the
def model_factory(x, m, y): # model ...
This has some interesting advantages when it comes to posterior predictive sampling, generating predictions or doing cross-validation, and should keep the variables out of the global scope. As a general comment, for the observed variable, if you are not planning on using it anywhere (same for deterministics that are not used afterwards and only used to track a variable) you can use
Regarding coordinates, these are used at two specific moments only. When building the model and using View entire conversation on ReviewNB |
So I had a go with this. I don't think it's as simple as it might seem. After updating the model in the way suggested above, the broadcasting in doesn't work the way you want it to work. The parameters
Currently I've left things as they are, but I can persist with this approach if you think it's a deal breaker. View entire conversation on ReviewNB |
I've opted for the factory pattern, which I quite like.
So far I've held off adding the View entire conversation on ReviewNB |
If the chain and draw dimensions need to be flattened you can use
I think that precisely for begginers, converting
View entire conversation on ReviewNB |
I've now implemented option 2 View entire conversation on ReviewNB |
Excuse the slight delay... I had to get a couple of papers submitted. I believe I've not implemented all the changes asked for. The last changes were to provide |
View / edit / reply to this conversation on ReviewNB OriolAbril commented on 2021-05-04T11:37:50Z The name (as string passed to the distribution) of drbenvincent commented on 2021-05-05T15:23:34Z Done :) |
View / edit / reply to this conversation on ReviewNB OriolAbril commented on 2021-05-04T11:37:51Z Can you also add
drbenvincent commented on 2021-05-05T15:23:41Z Done :) |
This was removed in the last commit View entire conversation on ReviewNB |
this was done a while back View entire conversation on ReviewNB |
Done :) View entire conversation on ReviewNB |
1 similar comment
Done :) View entire conversation on ReviewNB |
Thanks for those points @OriolAbril. Unless anyone has any further things, I think that might be it. 🤞🏼 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the PR! I'll happily add this notebook directly to the "Best Practices" column
Hi all
Here's a pull request adding an example notebook covering moderation analysis. I wasn't quite sure on the policy of where to store an embedded image, so I've just included one in the directory that the notebook is in. Just let me know if you need any changes made.