Skip to content

update Gaussian Mixture Model example with pm.NormalMixture #310

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 22 commits into from
Apr 17, 2022

Conversation

drbenvincent
Copy link
Contributor

Following discussion in #102, this is a revamp of the GMM example. The model/notebook is now considerably simpler as it uses the new pm.NormalMixture distribution.

@review-notebook-app
Copy link

Check out this pull request on  ReviewNB

See visual diffs & provide feedback on Jupyter Notebooks.


Powered by ReviewNB

@drbenvincent drbenvincent requested a review from OriolAbril April 13, 2022 13:07
@review-notebook-app
Copy link

review-notebook-app bot commented Apr 13, 2022

View / edit / reply to this conversation on ReviewNB

OriolAbril commented on 2022-04-13T15:18:48Z
----------------------------------------------------------------

I think it could be good to add a classification tag, I assume there will be some more notebooks with mixtures or bart even that cover that.


drbenvincent commented on 2022-04-16T13:18:43Z
----------------------------------------------------------------

done

@review-notebook-app
Copy link

review-notebook-app bot commented Apr 13, 2022

View / edit / reply to this conversation on ReviewNB

OriolAbril commented on 2022-04-13T15:18:49Z
----------------------------------------------------------------

You can use xarray-einstats here:

from xarray_einstats.stats import XrContinuousRV

xi = np.linspace(-7, 7, ni)
post = idata.posterior
pdf_components = XrContinuousRV(norm, post["mu"], post["σ"]).pdf(xi) * post["w"]
pdf = pdf_components.sum("cluster")

...

x[1].plot(xi, pdf)
...

(pdf_components / pdf).plot.line(hue="cluster", ax=ax[2])
...

More details on https://einstats.python.arviz.org/en/latest/tutorials/stats_tutorial.html#probability-distributions if this looks interesting. Main highlights are that XrContinuousRV wraps the scipy distributions so they accept DataArrays as inputs, yet, the pdf/ppf/cdf... methods can still take array_like as input (it is internally converted to xarray using point or quantile as dimension name).



drbenvincent commented on 2022-04-16T13:39:35Z
----------------------------------------------------------------

I've played about with this a bit, but the plotting is not working. I'm leaning towards keeping the existing code because it is very transparent about what is happening, and use of XrContinuousRV would only save a couple of lines of code.

@review-notebook-app
Copy link

review-notebook-app bot commented Apr 13, 2022

View / edit / reply to this conversation on ReviewNB

OriolAbril commented on 2022-04-13T15:18:50Z
----------------------------------------------------------------

The verbs should stick to the https://docs.pymc.io/en/latest/contributing/jupyter_style.html#authorship-and-attribution, the dates and pr links can be skipped for old and hard to track contributions. We discussed having a more graded approach for "updated" type with "extended" or "re-written" too for example but eventually decided on only using "re-executed" and "updated" (with the definition/line on updated being very vague)


drbenvincent commented on 2022-04-16T13:25:14Z
----------------------------------------------------------------

Have updated the verbs

Copy link
Contributor Author

done


View entire conversation on ReviewNB

Copy link
Contributor Author

Have updated the verbs


View entire conversation on ReviewNB

Copy link
Contributor Author

I've played about with this a bit, but the plotting is not working. I'm leaning towards keeping the existing code because it is very transparent about what is happening, and use of XrContinuousRV would only save a couple of lines of code.


View entire conversation on ReviewNB

@drbenvincent
Copy link
Contributor Author

Think I've addressed all the points - although let me know about xarray-einstats. I don't think it really it's ultra compelling to use in this simple example where the plot code is pretty concise to begin with

@OriolAbril
Copy link
Member

I forgot the means in the xarray code, so now the plot is getting a 4d array. The difference is in how the operations are done, mean first vs mean later. Ending up with some more or some less lines of code is not an issue for me, the goal is not having less lines of code, if xarray plotting is less clear let's keep the raw matplotlib plotting. I would like to use the mean later approach though, and I think einstats helps with that (it is basically a collection of things that have helped me do that throughout the last couple years)

@drbenvincent
Copy link
Contributor Author

Ok. Tried harder with plotting through xarray and got it to work

Copy link
Member

@OriolAbril OriolAbril left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks great, thanks for taking a look at the mean later approach and getting it to work.

@drbenvincent
Copy link
Contributor Author

Ok. Think that's done now?

@OriolAbril OriolAbril merged commit 4cecae5 into pymc-devs:main Apr 17, 2022
@drbenvincent drbenvincent deleted the gmm branch April 17, 2022 15:21
OriolAbril added a commit that referenced this pull request Apr 24, 2022
* updated notebook

* run wit latest v2 version

* fixed graphviz redering

* partial resolution of pr comments

* Add guide on how to wrap a JAX function in a Aesara Op (#302)

* Add guide on how to wrap a JAX function in a Aesara Op

* Fix typos and reorder last sections

* Fix reference paths

* Simplify model and be more verbose in Op creation

* Address review comments

* Address more review comments

* Fix more Aesara docs references

* Use reference to Aesara index page

* Update myst_nbs/case_studies/wrapping_jax_function.myst.md

Co-authored-by: Oriol Abril-Pla <[email protected]>

* update twitter link (#314)

* update Gaussian Mixture Model example with `pm.NormalMixture` (#310)

* create truncated regression example

* delete truncated regression example from main branch

* create truncated regression example

* delete truncated regression example from main branch

* create truncated regression example

* delete truncated regression example from main branch

* initial commit

* update link to pull request in Authors section

* add tag `classification`

* update authorship verbs

* plot through xarray, using XrContinuousRV

* add x axis labels

* add xarray_einstats to watermark, and fix 'classification' as a tag, not a category

Co-authored-by: Benjamin T. Vincent <[email protected]>

* add `*.DS_Store` to `.gitignore` (#315)

Add *.DS_Store to .gitignore

Co-authored-by: Benjamin T. Vincent <[email protected]>

* updated notebook

* run wit latest v2 version

* fixed graphviz redering

* partial resolution of pr comments

* reverted plot labels

* restored plt labels

* fixed arviz labels

* fixed sampling warning

* fixed authors

Co-authored-by: Ricardo Vieira <[email protected]>
Co-authored-by: Oriol Abril-Pla <[email protected]>
Co-authored-by: Abhishek K M <[email protected]>
Co-authored-by: Benjamin T. Vincent <[email protected]>
Co-authored-by: Benjamin T. Vincent <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants