-
-
Notifications
You must be signed in to change notification settings - Fork 2.1k
Add sample plot to Interpolated Distribution docs #3859
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
This looks fun, I'll take it on. Looks like this is meant to interpolate a set of points on a grid and constrict a distribution out of those points. The other plots show distributions under various parameterizations, which I don't think this is amenable to. Would it be sufficient to construct a lattice, sample from it, and then plot a histogram and lattice points? Here is an example of what I'm thinking |
Sounds good to me :) |
Hi, can I take this issue? Thanks |
so, im a little confused about something, the input into the interpolated function - lattice points and their pdf, are those supposed to be sampled from a pymc3 distribution(intentionally picking them from uneven intervals), or they can be any set of monotonically increasing random points. also i'm still trying to make sense of what would pdf at a point mean, as in wouldnt probability distribution at that point be a better measure. i might be absolutely wrong, but i guess probability density function is sort of measured around a point right, so maybe distribution would make more sense, its just a lil ambiguous in my head. also is it any different from the scipy interpolation thing, or is it the same thing. |
okay got the density thing, that was my bad. still confused about some of it |
@Rishabh261998 Since you were first you're welcome to work on this, although it looks like @almostmeenal is interested as well. If you're still interested could you comment in the next 3 days indicating your interest |
I couldn't figure out a way to plot points which aren't a part of a pymc3 distribution in arviz, so i just did matplotlib to get an idea, on a small sample(100) , it was terrible lol, so I'm still wondering if it makes sense to sample a pymc3 distribution randomly as input for the interpolation or not, because it needs a static input.
|
I think this example plot needs to be significantly more complicated than the plots for other distributions. The plots for all the other distributions show the pdf (probability density function) for several values of the input parameters. The plot for the normal distribution for example shows the pdf of 4 different normal distributions, each with different mean and standard deviation. In the case of the Interpolated distribution, we don't have the pdf, so there is no way to reproduce these kind of plots, we have to generate a completely different plot customized to the Interpolated case. The main use of the interpolated distribution is to be able to use distributions whose pdf is not known in PyMC3 models. Instead of knowing the pdf like we do for the other cases, here we only have available some evaluations of the pdf on a grid (not necessarily uniform). With this info we can create an Interpolated distribution that will behave like this unknown distribution of which we only know the pdf at some given points. In order to showcase this conceptual difference, I think we can follow these steps:
Note that the Interpolated distribution defines the logp and the random method that should both approximate the original and unknown pdf, so I think it is a good idea to have the following elements in the plot:
The example image provided above by Dpananos has elements 2 and 4 only. We could also use ArviZ for 4 and plot a KDE instead of a histogram, but I think it is an overkill and its better to keep the example with minimal requirements. Implementation proposals
|
okay yeah that makes sense, so essentially its line fitting, except the linear limitation is removed, I'll see if I can get it done today, also I guess it's cool if I play around with the input as long as the final figure makes sense? I mean I was trying random stuff and often the interpolation wasn't that good for some randomized points, so yeah. |
Play with it as much as you want, I added this toy example because I figured it would help in making the comment and idea on how to move forward clear, not as the final example to be used in the docs |
like ofc the plot isnt very nice and I think playing with other distributions would be better, I'll fine tune the plot later, but as a rough idea of what we're trying to do does this make sense |
This looks good, I will add some comments here, but I'd encourage you to already submit a PR with these changes as it will be better to review and comment on specific parts of the code. You can add
This is great, using the ppf is clever to get the right limits for the linspace. I only have two very minor comments. First, using 10000 points is probably a bit too much, all the other plots use 1000 and do already look perfectly smooth. Second, I'd define
I strongly discourage using a dictionary for this. All the code above can be reduced to two steps: defining the rough grid
Moreover, I'd personally not use a random grid, here is why. It is true that
The values used for plotting should not be
As for general aesthetics of the plot, I think it would be great to try to keep the style similar to the other plots of the section. The black dots and grey histogram are already great as is, for the other two lines, I'd use |
Try using I am also now seeing that the histogram changes significantly between runs, this could be due to having too many bins, how many bins do you get with
It is definitely your place, and I think it's a great idea. I though that |
Okayy, I'm totally sorry I forgot to update the grid, your point is fair, documentation should have the ideal scenario, because random grids might not (always) work. And as for color, I believe seaborn has a style - seaborn-colorblind I don't know much about arviz styles but I'll have a look, and bins default works better. also maybe its a silly question but to make a uniform grid should i pick points from x at uniform intervals or actually ensure the numerical values of x are uniformly increasing. and numpy sort won't work because then the pairwise x and pdf relation would be distorted we need to preserve that i guess, then every point in x would be mapped to a different pdf |
Any of
When doing:
the x<->pdf relation is defined after sorting, each position of the pdf array will correspond to the pdf evaluated at the same position always and independently of how |
okayy got it my bad, I'm sorry for it, I assumed we were doing that after mapping, so I'll do the uniform grid and add a PR, for now I'm using the seaborn-darkgrid, later we can change it to seaborn-colorblind maybe |
arviz-darkgrid should be very similar to seaborn-darkgrid but uses a colorblind friendly pallette, I did not know about seaborn-colorblind, I guess both should work fine and will be a great improvement to the documentation. Out of curiosity, are there any sample plots to see how seaborn-colorblind looks like? Some examples of arviz-darkgrid are available here: https://arviz-devs.github.io/arviz/examples/styles.html |
Looks great yeah, thanks |
okay so i tried seaborn and matplotlib palettes and themes, I think using sns theme seaborn-darkgrid from matplotlib doesn't leave us an option to use darkgrid with colorblind colors with |
but i understand that would create an additional dependency of installing seaborn, whereas now we can leverage sns functions through matplotlib itself
|
We can use
or
seaborn is a dependency on some example notebooks which is fine, but we should not make it a dependency to generate the api docs, I think the first alternative will achieve the same result, after all, seaborn is defining custom matplotlib styles that became popular and were then included to the main matplotlib library with the ArviZ is already a dependency on pymc3 so using it does not alter the doc generating dependencies. The example notebooks use |
yeah example docs use arviz-darkgrid and seaborn darkgrid both, it definitely makes sense to not use sns. If you just check it once, the palette used in plt.style.use("seaborn-colorblind") is different from sns.color_palette("colorblind") , probably because sns changed their palette and matplotlib did not make the updates. I found a few options to go about:
I'm sorry for making so many comments, I learned a bit about color palettes too today, finally I'm going for this figure, lemme know if I should use a different palette, I'll make the PR tonight, so before that it'd be great if you'd tell me which one to go for. Also, when i went through plots in continuous.py , none of them imported pymc3 and instead used scipy or something, so should I also use scipy spline instead of the actual interpolated function for the plot? |
@almostmeenal Dont worry too much about the comments, my recommendation at this point is to pick what you think is best and open a PR! The reason I say this is with a PR its much easier for us to see what the specific changes and effects in a precise way, and then we can also clone the code locally and test it out ourselves. Issues comments are great for general discussion (and youve been using them fantastically) but they do leave some ambiguity. It is also not a requirement to have a PR "perfect" before opening one, Quite often we open PRs that that are partially done to get feedback and advice from others. So I should ask, any hesitation in opening a PR and moving the discussion there? If so just let us know and we can keep conversing here as well. Whatever works for you :) |
don't worry about that at all
Glad it was useful, plot styling can easily get you down an unending path of color palettes and colormaps.
Would there be any drawbacks to using `az.style.use("arviz-darkgrid"). After your comment on seaborn colorblind not being updated, I have seen that matplotlib styles have not been updated in ~5 years, so using ArviZ would give us better control and the ability to easily update the style if needed.
this is not affecting the plot unluckily, the plot is using the default matplotlib palette which i think is not colorblind friendly. I also agree with Ravin on encouraging you to open a PR, we'll review it and we can comment directly on the code, it will probably keep the conversation more structured. |
thanks! I was skeptical of a PR because it looked far from done to me and so I was unsure. |
okayy, yeah the whole color cycle and colormap thing was confusing, i was trying color cycles initially but that needed another import (from cycle import cycle) so i dropped that. yes, using arviz makes sense, that would mean changing inference data to arviz format right? also I've made a preliminary draft PR |
Fixed via #4557 |
Description of your problem
Interpolated Docs are missing sample plot. One should be added
https://docs.pymc.io/api/distributions/continuous.html#pymc3.distributions.continuous.Interpolated
Please provide any additional information below.
See example from Normal plot for how how this can be done
https://github.com/pymc-devs/pymc3/blob/master/pymc3/distributions/continuous.py#L433
The text was updated successfully, but these errors were encountered: