Add scatterplot function #2861

agustinaarroyuelo · 2018-02-15T14:43:14Z

This function allows to plot scatter matrices of the sampled parameters, or a subset of them. Additionally, it can display divergences. I updated the "Diagnosing biased Inference with Divergences" notebook with examples using this new feature. I am looking forward to receive feedback.

fonnesbeck · 2018-02-15T14:57:08Z

pymc3/plots/__init__.py

@@ -6,3 +6,4 @@
 from .traceplot import traceplot
 from .energyplot import energyplot
 from .densityplot import densityplot
+from .scatterplot import scatterplot


Add carriage return at the end of the line

fonnesbeck · 2018-02-15T14:57:22Z

pymc3/plots/artists.py

+                    traces['_'.join([v, str(i)])] = vi
+            else:
+                traces[v] = vals
+        return traces


Add carriage return at end of line

junpenglao · 2018-02-15T14:59:10Z

This looks quite nice.
I think scatterplot might be a bit confusing with native matplotlib function. Maybe a more meaningful name such as pairvarplot? As it plot pairs of variables?

fonnesbeck · 2018-02-15T14:59:49Z

pymc3/plots/scatterplot.py

+                ax[j, 0].tick_params(labelsize=text_size)
+
+    plt.tight_layout()  
+    return ax      


Add carriage return

fonnesbeck · 2018-02-15T15:02:40Z

This looks really good.

junpenglao · 2018-02-15T15:13:04Z

pymc3/plots/scatterplot.py

+
+    if varnames is None:
+        if plot_transformed:
+            trace_dict = get_trace_dict(trace, get_default_varnames(trace.varnames, True))


I think it is better to plot only the free_RVs if plot_transformed=True, otherwise you will have a plot showing only the transformation (e.g., tau and tau_log__, which would essentially be redundant).

ColCarroll

Code looks really nice! Do you have an example you can post in the comments?

I made a few suggestions for simplifying the code, but matplotlib can be inscrutable, so there's a good chance they are bad suggestions.

ColCarroll · 2018-02-15T15:10:31Z

pymc3/plots/scatterplot.py

+                if np.any(divergent):
+                    ax.scatter(trace_dict[varnames[0]][divergent == 1], trace_dict[varnames[1]][divergent == 1], **kwargs_divergence)
+                else:
+                    print('No divergences were found.')


(this whole comment also applies to the similar block below)

could you remove this print statement?

I think this block might be nicer as just

try: divergent = trace['diverging'] except KeyError: warnings.warn(...) return ax diverge = (divergent == 1) ax.scatter(trace_dict[varnames[0]][diverge], trace_dict[varnames[1]][diverge], **kwargs_divergence)

The scatter should then just be empty if there are no divergences, which I think is fine (the other alternative would be something like putting text on the plot saying "there are no divergences found").

ColCarroll · 2018-02-15T15:12:42Z

pymc3/plots/scatterplot.py

+                warnings.warn('There is no divergence information in the passed trace.')
+                return ax
+
+    if ax is None:


creating default axes should be above, right after defining numvars. Then you also would not have to special case numvars==2.

ColCarroll · 2018-02-15T15:15:16Z

pymc3/plots/scatterplot.py

+                return ax
+
+    if ax is None:
+        _, ax = plt.subplots(nrows=numvars, ncols=numvars, figsize=figsize)


if you do nrows=numvars - 1, ncols=numvars - 1, then the two loops below would be

for i in range(numvars): for j in range(i + 1, numvars): ...

and you can remove the if i == j block

(this might be totally off base, if axes.remove() does not do what I think it does).

junpenglao · 2018-02-15T15:24:36Z

Looking at the example it seems you need to set plot_transformed=True when plotting transformed RVs even when you specify the name in sub_varnames.

pm.scatterplot(short_trace,
               sub_varnames=['theta_0', 'tau_log__'], 
               divergences=True, 
               plot_transformed=True,
               color='red', figsize=(15, 10), kwargs_divergence={'color':'green'})

Is there a way to optimized it? Since plot_transformed=True becomes a bit redundent when you already supply the varname.

Also somewhat related is that, while I think using names like theta_0 is nice for non-scaler RVs, it is not always intuitive for users - what users interact and specifying when varname is involved, they usually don't need to think about which index to use - as pymc3 just handle it internally. In fact, currently if you pass sub_varnames=['theta', 'tau'] as argument the function doesn't work. I wonder if there is a better way to handle this?

aloctavodia · 2018-02-16T17:41:19Z

I think scatterplot might be a bit confusing with native matplotlib function. Maybe a more meaningful name such as pairvarplot? As it plot pairs of variables?

I agree, pairvarplot is ok. Maybe something shorter would be better, like pairplot

fonnesbeck · 2018-02-16T18:25:53Z

I'm not keen on pairvarplot. Its not a tremendously clear name. I don't think the name collision is a big deal -- that's what namespaces are for.

agustinaarroyuelo · 2018-02-16T19:04:48Z

Thanks everyone for your insightful comments. I am taking every suggestion into account for my next commits.

I think pairplot is a better suited name than scatterplot, because this function includes hexbin plot, which is not strictly a scatter plot.

@ColCarroll Here are some examples:

aseyboldt

I like this :-)

aseyboldt · 2018-02-16T22:40:28Z

pymc3/plots/scatterplot.py

+
+        if divergences:
+            try:
+                divergent = trace['diverging']


It would be cleaner to use trace.get_sampler_stats, just in case there is a var named 'diverging'.

aseyboldt · 2018-02-16T22:42:54Z

pymc3/plots/scatterplot.py

+                            if np.any(divergent):
+                                ax[j, i].scatter(var1[divergent == 1], var2[divergent == 1], **kwargs_divergence)
+                            else:
+                                print('No divergences were found.')


no print. Do we even need to do anything here?

aseyboldt

Somehow github lost my comments the first time.
In general, you can also try to limit the line length to 80

aseyboldt · 2018-02-16T22:48:46Z

pymc3/plots/scatterplot.py

+        If True draws an hexbin plot
+    plot_transformed : bool
+        Flag for plotting automatically transformed variables in addition to
+        original variables (defaults to False). Applies when varnames/sub_varnames = None.


line lengths

aseyboldt · 2018-02-16T22:50:25Z

pymc3/plots/scatterplot.py

+                        try:
+                            divergent = trace['diverging']
+                            if np.any(divergent):
+                                ax[j, i].scatter(var1[divergent == 1], var2[divergent == 1], **kwargs_divergence)


You could use the actual position of the divergence here. They are buried in the warnings, see
https://github.com/pymc-devs/pymc3/blob/master/docs/source/notebooks/Diagnosing_biased_Inference_with_Divergences.ipynb
I don't want to expose this to users, but using it here seems fine.

agustinaarroyuelo · 2018-02-21T19:43:56Z

Thanks you for your suggestions. I applied them in every case that was possible.

into scatterplot

twiecki · 2018-02-22T11:47:38Z

Looks like you need to rebase.

aloctavodia · 2018-02-23T12:24:51Z

pymc3/plots/pairplot.py

@@ -0,0 +1,162 @@
+import warnings
+import matplotlib.gridspec as gridspec


This line should be inside the try-except block below. Additionally you can change it to from matplotlib import gridspec

aloctavodia · 2018-02-23T12:32:48Z

pymc3/plots/pairplot.py

+        Text size for labels
+    gs : Grid spec
+        Matplotlib Grid spec.
+    ax: axes


Passing ax and gs could be confusing, could we just use gs? I see the ax is there for the special case of plotting a two variable pairplot, but the same effect can be achieved using just gs, right?

aloctavodia · 2018-02-23T12:33:49Z

pymc3/plots/pairplot.py

+    Returns
+    -------
+
+    ax : matplotlib axes


The function should return a gridspec object

aloctavodia · 2018-02-23T12:34:44Z

pymc3/plots/pairplot.py

+    ax : matplotlib axes
+
+    """    
+


Remove blank line, try running autopep8 to fix all this style issues.

aloctavodia · 2018-02-26T14:12:15Z

LGTM!

ColCarroll · 2018-02-26T14:34:11Z

Thanks, @agustinaarroyuelo!

fonnesbeck · 2018-02-26T17:03:23Z

Thanks for the contribution, Agustina!

agustinaarroyuelo · 2018-02-26T18:37:25Z

resolves #2745

add new scatterplot function

3486b36

fonnesbeck reviewed Feb 15, 2018

View reviewed changes

pymc3/plots/scatterplot.py Outdated

ax[j, 0].tick_params(labelsize=text_size)

plt.tight_layout()

return ax

Copy link

Member

fonnesbeck Feb 15, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Add carriage return

junpenglao reviewed Feb 15, 2018

View reviewed changes

ColCarroll reviewed Feb 15, 2018

View reviewed changes

aseyboldt reviewed Feb 16, 2018

View reviewed changes

followed sugestions made in pymc-devs#2861

be39d3d

agustinaarroyuelo force-pushed the scatterplot branch from ad4c616 to be39d3d Compare February 21, 2018 19:22

run Divergences notebook with examples

a60234d

agustinaarroyuelo force-pushed the scatterplot branch from 141dc98 to a60234d Compare February 21, 2018 19:33

Merge branch 'master' into scatterplot

1786209

agustinaarroyuelo added 2 commits February 21, 2018 17:10

add carriage return

e688425

Merge branch 'scatterplot' of https://github.com/agustinaarroyuelo/pymc3

6c31e67

into scatterplot

agustinaarroyuelo added 2 commits February 22, 2018 17:52

fix plot_transformed argument

12d7afe

fix fig_size

72dab1c

aloctavodia reviewed Feb 23, 2018

View reviewed changes

fix gridspec import error and minor issues

3ea30c4

agustinaarroyuelo changed the title ~~Add new scatterplot function~~ Add scatterplot function Feb 24, 2018

agustinaarroyuelo added 2 commits February 25, 2018 19:26

remove unused module import line

05ed875

Merge branch 'master' into scatterplot

dcc551b

ColCarroll merged commit 553f057 into pymc-devs:master Feb 26, 2018

agustinaarroyuelo deleted the scatterplot branch February 26, 2018 15:14

		@@ -0,0 +1,162 @@
		import warnings
		import matplotlib.gridspec as gridspec

Uh oh!

Add scatterplot function #2861

Add scatterplot function #2861

Uh oh!

Conversation

agustinaarroyuelo commented Feb 15, 2018 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

junpenglao commented Feb 15, 2018

Uh oh!

Choose a reason for hiding this comment

Uh oh!

fonnesbeck commented Feb 15, 2018

Uh oh!

Choose a reason for hiding this comment

Uh oh!

ColCarroll left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

junpenglao commented Feb 15, 2018

Uh oh!

aloctavodia commented Feb 16, 2018

Uh oh!

fonnesbeck commented Feb 16, 2018

Uh oh!

agustinaarroyuelo commented Feb 16, 2018

Uh oh!

aseyboldt left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

aseyboldt left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

agustinaarroyuelo commented Feb 21, 2018

Uh oh!

twiecki commented Feb 22, 2018

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

aloctavodia commented Feb 26, 2018

Uh oh!

ColCarroll commented Feb 26, 2018

Uh oh!

fonnesbeck commented Feb 26, 2018

Uh oh!

agustinaarroyuelo commented Feb 26, 2018

Uh oh!

Uh oh!

agustinaarroyuelo commented Feb 15, 2018 •

edited

Loading