-
-
Notifications
You must be signed in to change notification settings - Fork 18.4k
BUG: groupby.hist legend should use group keys #33493
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Thanks for the PR! @rhshadrach you still interested in working on this? Mind trying to fix up the tests if so? |
@alimcmaster1 Yep - this is still active, thanks for the response. I'd be happy to improve the tests, but I'm not sure how. Do you have any recommendations? |
@alimcmaster1 I've improved the tests in test_hist_method. test_groupby on the other hand seems to only make sure that the groupby method works. I'm guessing the idea is to make sure groupby works there, and then leave the details to the other plotting tests. With this, I've left that test as-is (aside from minor formatting changes). But let me know if you think that test should be improved somehow. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
thanks for the PR.
you also need a whatsnew note in 1.1
The introduced test fails on an older version of matplotlib. The exact line it fails on is
This issue is fixed in matplotlib master by converting the label to a string
with this, I feel pretty comfortable changing the test labels to string to get them to pass. |
@charlesdong1991 Thanks for the review, changes made and checks pass. For the whatsnew, it seemed to me to be a toss up as whether this is a bug or enhancement; I went with enhancement. I also added one additional test for SeriesGroupby.hist. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
pandas/plotting/_matplotlib/hist.py
Outdated
kwargs : dict, keyword arguments passed to matplotlib.Axes.hist | ||
|
||
Returns | ||
------- | ||
collection of Matplotlib Axes | ||
""" | ||
|
||
if legend and "label" not in kwargs: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is there a reason why someone would pass legend
and label
together? Should this not just raise?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this should just raise instead
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It seems to me that we would then be raising on the only case where label has any effect - or am I missing another case?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Also, I don't understand the reason why we would raise here when not raising makes it easier for the end user have different names appear in the legend. What does raising gain?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So my thought is that If a user wants particular labels they should just provide that to the grouper; adding this here is just another way of doing things so just API clout
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would completely agree if this was implemented within pandas. However it's being passed through to matplotlib via kwargs. Disabling certain kwargs from working and allowing others to go through seems hazardous to user expectations. But even if this is unconvincing, I think we should strive to be consistent with various plots:
index = 3 * ['1'] + 3 * ['2']
df = DataFrame([(3, 4)]*3 + [(5, 6)]*3, index=index, columns=['a', 'b'])
df.index.names = ['c']
ret = df.plot(y=['a', 'b'], label=['Nice Name 1', 'Nice Name 2'])
Perhaps an issue could be raised as to how labels should be treated across all plots?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Well shoot. As a counter-argument to what I just said. DataFrame.plot.hist ignores label altogether. I suppose if there is one plotting function we should be consistent with here, it's that one. What do you think about not raising, but simply ignoring any label kwarg?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I still think we should raise instead of ignoring; ignoring leads to confusion or surprising behavior, so always better to be explicit
@WillAyd, @charlesdong1991 Thanks for the reviews - changes made and checks pass. |
pandas/plotting/_matplotlib/hist.py
Outdated
kwargs : dict, keyword arguments passed to matplotlib.Axes.hist | ||
|
||
Returns | ||
------- | ||
collection of Matplotlib Axes | ||
""" | ||
|
||
if legend and "label" not in kwargs: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this should just raise instead
@WillAyd changes made, use of label and legend now raises. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
looks good. have a merge conflict and just a few things to clean up
Yep that’s what I’m referring to
…Sent from my iPhone
On Jun 4, 2020, at 3:02 PM, rhshadrach ***@***.***> wrote:
@rhshadrach commented on this pull request.
In pandas/plotting/_matplotlib/hist.py:
> kwargs : dict, keyword arguments passed to matplotlib.Axes.hist
Returns
-------
collection of Matplotlib Axes
"""
+ if legend:
+ if isinstance(data, ABCDataFrame):
+ if column is None:
+ kwargs["label"] = data.columns
+ else:
When you say branch - do you mean just the else clause?
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub, or unsubscribe.
|
@WillAyd Changes made and checks pass. In addition to the splitting out the tests, I also improve the logic involving setting the label. Instead of using isinstance, it now uses ndim. Also the logic is flat instead of nested. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
minor comments otherwise lgtm. @TomAugspurger or @jorisvandenbossche if you care to look
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks nice overall.
One question about the expected behavior though: why don't we use a user-provided label
when legend=True
?
kwargs : dict, keyword arguments passed to matplotlib.Axes.hist | ||
|
||
Returns | ||
------- | ||
collection of Matplotlib Axes | ||
""" | ||
if legend: | ||
assert "label" not in kwargs |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is pandas the only one calling this, or could users call it with labels in kwargs?
Users shouldn't see bare asserts like this. It should be a ValueError
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
And just to verify the expected behavior, if the user provides a label
why wouldn't we want to use that? Can we just do kwargs.setdefault("label", data.name)
(or the other cases)?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For the first question, this function is only called from hist_frame and hist_series, each of which raise a ValueError on this condition. This assert is to maintain this behavior if in the future any other code paths call it.
For the second question, see #33493 (comment)
doc/source/whatsnew/v1.1.0.rst
Outdated
@@ -291,6 +291,7 @@ Other enhancements | |||
- :meth:`groupby.transform` now allows ``func`` to be ``pad``, ``backfill`` and ``cumcount`` (:issue:`31269`). | |||
- :meth:`~pandas.io.json.read_json` now accepts `nrows` parameter. (:issue:`33916`). | |||
- :meth `~pandas.io.gbq.read_gbq` now allows to disable progress bar (:issue:`33360`). | |||
- :meth:`DataFrame.hist`, :meth:`Series.hist`, :meth:`DataFrameGroupby.hist`, and :meth:`SeriesGroupby.hist` have gained the ``legend`` argument. Set to True to show a legend in the histogram. (:issue:`6279`) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
DataFrameGroupBy and SeriesGroupBy aren't in the top-level. It should be something like core.groupby.DataFrameGroupBy
. See doc/source/referenece.groupby.rst
for the right path..
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah - thanks!
Talked about this starting here: #33493 (comment) so I specifically asked to have it raise. Happy to be overridden if you think better otherwise |
@TomAugspurger: I've fixed the issues you pointed out in the whatsnew. What's still uncertain is how to handle label when legend=True. From #33493 (comment), I was for supporting and not overwriting label whereas @WillAyd was opposed to supporting because there was another way to set the labels (namely, changing the column names of the dataframe). Would like to get your thoughts here. |
@TomAugspurger friendly ping. Just want to let you know I'll be unavailable starting the 24th until July 5th in case we want to get this into 1.1. No issue with it waiting until 1.2 on my end if the timing doesn't work out. |
Don't have strong thoughts. Let's just keep it as on this PR and adjust based on user feedback if it arises. |
Thanks @rhshadrach |
black pandas
git diff upstream/master -u -- "*.py" | flake8 --diff
cc @TomAugspurger
Added argument "legend" to histogram backend. This adds the ability to display a legend even when not using a groupby.
produces
I went off the tests that already existed in test_hist_method and test_groupby, but perhaps more stringent tests should be added. I've been manually checking with the following code: