Skip to content

Fix scatter norm keyword #45966

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 13 commits into from
Mar 6, 2022
Merged

Conversation

Andy-Grigg
Copy link
Contributor

@Andy-Grigg Andy-Grigg commented Feb 12, 2022

This PR addresses a bug where the norm parameter is used for a scatter plot where the c color is a non-categorical type (e.g. a numeric type). This combination of inputs would result in the norm parameter being specified both as an explicit keyword argument and in self.kwds.

The behavior has changed such that norm is only ever specified once, and in the scenario described above the value specified by the user will be honored.

I tried to test that the specific norm value ended up in the plot, but there doesn't seem to be a good way of comparing norm._scale objects without comparing the individual attributes. I'm happy to add this in if required.

Copy link
Member

@mroeschke mroeschke left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Will also need an entry in doc/source/whatsnew/v1.5.0.rst in the bug fix / visualization section

@jreback jreback added this to the 1.5 milestone Feb 27, 2022
@jreback
Copy link
Contributor

jreback commented Feb 27, 2022

small comment & if you can merge master; ping on green

ax = df.plot.scatter(x="a", y="b", c="c")
color_min_max = (df.c.min(), df.c.max())
default_norm = mpl.colors.Normalize(*color_min_max)
assert all(df.c.apply(lambda x: ax.collections[0].norm(x) == default_norm(x)))
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could this be done without apply? It will be easier to debug if this assertion doesn't go through necessary code paths.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What's the better approach? Just a loop?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah the old check you had prior was good. The data doesn't have to be random as well if that helps shorten your loop and allows looping over less values

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok, I'll revert to the previous check. I think looping over the values in the dataframe makes sense; it's replicating what's going on in the actual plot, and there's no real benefit to including values that are out of bounds - that's matplotlib normalize functionality.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Changes made

Copy link
Contributor

@jreback jreback left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm over to you @mroeschke

@mroeschke mroeschke merged commit 1cad74e into pandas-dev:main Mar 6, 2022
@mroeschke
Copy link
Member

Great PR @Andy-Grigg, thanks.

@@ -421,7 +421,7 @@ Plotting
- Bug in :meth:`DataFrame.plot.box` that prevented labeling the x-axis (:issue:`45463`)
- Bug in :meth:`DataFrame.boxplot` that prevented passing in ``xlabel`` and ``ylabel`` (:issue:`45463`)
- Bug in :meth:`DataFrame.boxplot` that prevented specifying ``vert=False`` (:issue:`36918`)
-
- Bug in :meth:`DataFrame.scatter` that prevented specifying ``norm`` (:issue:`45809`)
Copy link
Contributor

@tdy tdy Mar 6, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see this PR has now been merged, but I think this is supposed to be `DataFrame.plot.scatter`. Are the changelogs worth fixing?

@jreback
Copy link
Contributor

jreback commented Mar 6, 2022

sure can do another PR to update

yehoshuadimarsky pushed a commit to yehoshuadimarsky/pandas that referenced this pull request Jul 13, 2022
* TST: add test for norm scatter plot parameter (pandas-dev#45809)

* BUG: don't duplicate norm parameter for scatter plots (pandas-dev#45809)

* TST: Add github issue numbers (GH45809)

* TST: remove dependence on private attributes (pandas-dev#45809)

* DOC: add entry to visualization bug fixes (pandas-dev#45809)

* TST: reduce number of norm comparisons (pandas-dev#45809)

* TST: simplify test (pandas-dev#45809)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

BUG: df.plot.scatter() with norm keyword fails with 'multiple values for keyword argument' TypeError
4 participants