BUG: groupby.hist legend should use group keys #33493

rhshadrach · 2020-04-12T13:55:43Z

closes BUG/VIS: groupby.hist/plot() should pass group keys as labels #6279
tests added / passed
passes black pandas
passes git diff upstream/master -u -- "*.py" | flake8 --diff
whatsnew entry

Added argument "legend" to histogram backend. This adds the ability to display a legend even when not using a groupby.

df = pd.DataFrame(np.random.randn(30, 2), columns=['A', 'B'])
df['C'] = 15 * ['a'] + 15 * ['b']
df = df.set_index('C')
df.groupby('C')['A'].hist(legend=True)

produces

I went off the tests that already existed in test_hist_method and test_groupby, but perhaps more stringent tests should be added. I've been manually checking with the following code:

import matplotlib.pyplot as plt
import pandas as pd

df = pd.DataFrame(np.random.randn(30, 2), columns=['A', 'B'])
df['C'] = 15 * ['a'] + 15 * ['b']
df = df.set_index('C')

df.groupby('C')['A'].hist(legend=False)
plt.show()
df.groupby('C')['A'].hist(legend=True)
plt.show()
df.A.hist(by='C', legend=False)
plt.show()
df.A.hist(by='C', legend=True)
plt.show()
plt.show()
df.hist(by='C', legend=False)
plt.show()
df.hist(by='C', legend=True)
plt.show()
df.hist(by='C', column='B', legend=False)
plt.show()
df.hist(by='C', column='B', legend=True)
plt.show()

df.groupby('C')['A'].hist(label='D', legend=False)
plt.show()
df.groupby('C')['A'].hist(label='D', legend=True)
plt.show()
df.A.hist(by='C', label='D', legend=False)
plt.show()
df.A.hist(by='C', label='D', legend=True)
plt.show()
df.hist(by='C', label='D', legend=False)
plt.show()
df.hist(by='C', label='D', legend=True)
plt.show()
df.hist(by='C', column='B', label='D', legend=False)
plt.show()
df.hist(by='C', column='B', label='D', legend=True)
plt.show()

alimcmaster1 · 2020-05-02T23:48:16Z

Thanks for the PR!

@rhshadrach you still interested in working on this? Mind trying to fix up the tests if so?

rhshadrach · 2020-05-03T00:07:00Z

@alimcmaster1 Yep - this is still active, thanks for the response. I'd be happy to improve the tests, but I'm not sure how. Do you have any recommendations?

rhshadrach · 2020-05-10T14:18:33Z

@alimcmaster1 I've improved the tests in test_hist_method. test_groupby on the other hand seems to only make sure that the groupby method works. I'm guessing the idea is to make sure groupby works there, and then leave the details to the other plotting tests. With this, I've left that test as-is (aside from minor formatting changes). But let me know if you think that test should be improved somehow.

charlesdong1991

thanks for the PR.

you also need a whatsnew note in 1.1

pandas/plotting/_core.py

pandas/tests/plotting/test_groupby.py

pandas/tests/plotting/test_hist_method.py

rhshadrach · 2020-05-18T22:22:44Z

The introduced test fails on an older version of matplotlib. The exact line it fails on is

labels = [six.text_type(lab) for lab in label]

This issue is fixed in matplotlib master by converting the label to a string

 labels = [] if label is None else np.atleast_1d(np.asarray(label, str))

with this, I feel pretty comfortable changing the test labels to string to get them to pass.

rhshadrach · 2020-05-19T23:13:41Z

@charlesdong1991 Thanks for the review, changes made and checks pass. For the whatsnew, it seemed to me to be a toss up as whether this is a bug or enhancement; I went with enhancement. I also added one additional test for SeriesGroupby.hist.

pandas/plotting/_matplotlib/hist.py

pandas/tests/plotting/test_hist_method.py

charlesdong1991

looks really good! @rhshadrach

only two nitpicks, otherwise LGTM!

cc @jreback @WillAyd for reviews

pandas/tests/plotting/test_groupby.py

pandas/plotting/_core.py

WillAyd · 2020-05-21T15:55:36Z

pandas/plotting/_matplotlib/hist.py

    kwargs : dict, keyword arguments passed to matplotlib.Axes.hist

    Returns
    -------
    collection of Matplotlib Axes
    """

+    if legend and "label" not in kwargs:


Is there a reason why someone would pass legend and label together? Should this not just raise?

The only reason I think is to rename for output:

index = 3 * ['1'] + 3 * ['2'] df = DataFrame([(3, 4)]*3 + [(5, 6)]*3, index=index, columns=['a', 'b']) df.index.names = ['c'] ret = df.hist(by='c', legend=True, label=['Nice Name 1', 'Nice Name 2'])

I think this should just raise instead

It seems to me that we would then be raising on the only case where label has any effect - or am I missing another case?

Also, I don't understand the reason why we would raise here when not raising makes it easier for the end user have different names appear in the legend. What does raising gain?

So my thought is that If a user wants particular labels they should just provide that to the grouper; adding this here is just another way of doing things so just API clout

I would completely agree if this was implemented within pandas. However it's being passed through to matplotlib via kwargs. Disabling certain kwargs from working and allowing others to go through seems hazardous to user expectations. But even if this is unconvincing, I think we should strive to be consistent with various plots:

index = 3 * ['1'] + 3 * ['2'] df = DataFrame([(3, 4)]*3 + [(5, 6)]*3, index=index, columns=['a', 'b']) df.index.names = ['c'] ret = df.plot(y=['a', 'b'], label=['Nice Name 1', 'Nice Name 2'])

Perhaps an issue could be raised as to how labels should be treated across all plots?

Well shoot. As a counter-argument to what I just said. DataFrame.plot.hist ignores label altogether. I suppose if there is one plotting function we should be consistent with here, it's that one. What do you think about not raising, but simply ignoring any label kwarg?

I still think we should raise instead of ignoring; ignoring leads to confusion or surprising behavior, so always better to be explicit

rhshadrach · 2020-05-22T14:11:01Z

@WillAyd, @charlesdong1991 Thanks for the reviews - changes made and checks pass.

charlesdong1991 · 2020-05-23T08:29:21Z

nice job, lgtm

cc @jreback @WillAyd

WillAyd · 2020-05-27T04:35:03Z

pandas/plotting/_matplotlib/hist.py

    kwargs : dict, keyword arguments passed to matplotlib.Axes.hist

    Returns
    -------
    collection of Matplotlib Axes
    """

+    if legend and "label" not in kwargs:


I think this should just raise instead

rhshadrach · 2020-06-03T03:42:27Z

@WillAyd changes made, use of label and legend now raises.

WillAyd

looks good. have a merge conflict and just a few things to clean up

pandas/tests/plotting/test_hist_method.py

pandas/tests/plotting/test_groupby.py

pandas/plotting/_matplotlib/hist.py

WillAyd · 2020-06-04T22:16:49Z

Yep that’s what I’m referring to

…

Sent from my iPhone

On Jun 4, 2020, at 3:02 PM, rhshadrach ***@***.***> wrote: @rhshadrach commented on this pull request. In pandas/plotting/_matplotlib/hist.py: > kwargs : dict, keyword arguments passed to matplotlib.Axes.hist Returns ------- collection of Matplotlib Axes """ + if legend: + if isinstance(data, ABCDataFrame): + if column is None: + kwargs["label"] = data.columns + else: When you say branch - do you mean just the else clause? — You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub, or unsubscribe.

rhshadrach · 2020-06-05T00:27:36Z

@WillAyd Changes made and checks pass. In addition to the splitting out the tests, I also improve the logic involving setting the label. Instead of using isinstance, it now uses ndim. Also the logic is flat instead of nested.

WillAyd

minor comments otherwise lgtm. @TomAugspurger or @jorisvandenbossche if you care to look

pandas/plotting/_core.py

TomAugspurger

Looks nice overall.

One question about the expected behavior though: why don't we use a user-provided label when legend=True?

TomAugspurger · 2020-06-08T13:45:42Z

pandas/plotting/_matplotlib/hist.py

    kwargs : dict, keyword arguments passed to matplotlib.Axes.hist

    Returns
    -------
    collection of Matplotlib Axes
    """
+    if legend:
+        assert "label" not in kwargs


Is pandas the only one calling this, or could users call it with labels in kwargs?

Users shouldn't see bare asserts like this. It should be a ValueError.

And just to verify the expected behavior, if the user provides a label why wouldn't we want to use that? Can we just do kwargs.setdefault("label", data.name) (or the other cases)?

For the first question, this function is only called from hist_frame and hist_series, each of which raise a ValueError on this condition. This assert is to maintain this behavior if in the future any other code paths call it.

For the second question, see #33493 (comment)

TomAugspurger · 2020-06-08T13:48:29Z

doc/source/whatsnew/v1.1.0.rst

@@ -291,6 +291,7 @@ Other enhancements
 - :meth:`groupby.transform` now allows ``func`` to be ``pad``, ``backfill`` and ``cumcount`` (:issue:`31269`).
 - :meth:`~pandas.io.json.read_json` now accepts `nrows` parameter. (:issue:`33916`).
 - :meth `~pandas.io.gbq.read_gbq` now allows to disable progress bar (:issue:`33360`).
+- :meth:`DataFrame.hist`, :meth:`Series.hist`, :meth:`DataFrameGroupby.hist`, and :meth:`SeriesGroupby.hist` have gained the ``legend`` argument. Set to True to show a legend in the histogram. (:issue:`6279`)


DataFrameGroupBy and SeriesGroupBy aren't in the top-level. It should be something like core.groupby.DataFrameGroupBy. See doc/source/referenece.groupby.rst for the right path..

Ah - thanks!

WillAyd · 2020-06-10T03:09:19Z

One question about the expected behavior though: why don't we use a user-provided label when legend=True?

Talked about this starting here: #33493 (comment) so I specifically asked to have it raise. Happy to be overridden if you think better otherwise

rhshadrach · 2020-06-13T19:25:01Z

@TomAugspurger: I've fixed the issues you pointed out in the whatsnew. What's still uncertain is how to handle label when legend=True. From #33493 (comment), I was for supporting and not overwriting label whereas @WillAyd was opposed to supporting because there was another way to set the labels (namely, changing the column names of the dataframe). Would like to get your thoughts here.

rhshadrach · 2020-06-19T00:14:58Z

@TomAugspurger friendly ping. Just want to let you know I'll be unavailable starting the 24th until July 5th in case we want to get this into 1.1. No issue with it waiting until 1.2 on my end if the timing doesn't work out.

TomAugspurger · 2020-06-22T14:27:29Z

Don't have strong thoughts. Let's just keep it as on this PR and adjust based on user feedback if it arises.

WillAyd · 2020-06-22T15:31:02Z

Thanks @rhshadrach

rhshadrach force-pushed the hist_legend branch from ec9989e to ef5f2af Compare May 10, 2020 14:06

BUG: groupby.hist legend should use group keys

bab7491

rhshadrach force-pushed the hist_legend branch from ef5f2af to bab7491 Compare May 10, 2020 14:14

charlesdong1991 suggested changes May 17, 2020

View reviewed changes

Some changes

e0c9466

charlesdong1991 added the Visualization plotting label May 18, 2020

rhshadrach added 4 commits May 18, 2020 16:57

Requested changes

61693c6

whatsnew

cbfc167

Merge remote-tracking branch 'upstream/master' into hist_legend

a5f5ba1

Minor tweaks

726d147

rhshadrach added 4 commits May 18, 2020 18:25

Changed test to use strings

3e4925d

Merge remote-tracking branch 'upstream/master' into hist_legend

8db5247

Corrected test modification

a46609a

Cleanup

c36cae2

charlesdong1991 suggested changes May 20, 2020

View reviewed changes

pandas/plotting/_matplotlib/hist.py Outdated Show resolved Hide resolved

charlesdong1991 reviewed May 20, 2020

View reviewed changes

pandas/tests/plotting/test_hist_method.py Outdated Show resolved Hide resolved

charlesdong1991 suggested changes May 20, 2020

View reviewed changes

pandas/tests/plotting/test_groupby.py Outdated Show resolved Hide resolved

pandas/tests/plotting/test_groupby.py Outdated Show resolved Hide resolved

Test refinements

c01c7ab

rhshadrach force-pushed the hist_legend branch from cff88a6 to c01c7ab Compare May 21, 2020 15:54

WillAyd reviewed May 21, 2020

View reviewed changes

rhshadrach added 3 commits May 21, 2020 14:19

Updated type-hint to Label

c207ca2

Merge remote-tracking branch 'upstream/master' into hist_legend

a1a7e27

Changed type-hint to Label

5b9cae7

WillAyd requested changes May 27, 2020

View reviewed changes

rhshadrach added 2 commits June 2, 2020 22:22

Use of label now raises

1db3b35

Merge remote-tracking branch 'upstream/master' into hist_legend

0b0dbdd

WillAyd reviewed Jun 3, 2020

View reviewed changes

pandas/tests/plotting/test_hist_method.py Outdated Show resolved Hide resolved

pandas/tests/plotting/test_groupby.py Outdated Show resolved Hide resolved

pandas/plotting/_matplotlib/hist.py Outdated Show resolved Hide resolved

rhshadrach added 2 commits June 4, 2020 18:28

Refactored tests, improved label logic in _grouped_hist

0dba5d9

Merge remote-tracking branch 'upstream/master' into hist_legend

4f24547

WillAyd reviewed Jun 5, 2020

View reviewed changes

pandas/plotting/_core.py Outdated Show resolved Hide resolved

pandas/plotting/_core.py Outdated Show resolved Hide resolved

Moved legend kwarg behind backend

1c9cef5

WillAyd approved these changes Jun 5, 2020

View reviewed changes

TomAugspurger reviewed Jun 8, 2020

View reviewed changes

Fixed whatsnew

11cf3f8

Merge branch 'master' into hist_legend

c5749cb

TomAugspurger approved these changes Jun 22, 2020

View reviewed changes

WillAyd added this to the 1.1 milestone Jun 22, 2020

WillAyd merged commit 506eb54 into pandas-dev:master Jun 22, 2020

rhshadrach deleted the hist_legend branch July 11, 2020 16:02

BUG: groupby.hist legend should use group keys #33493

BUG: groupby.hist legend should use group keys #33493

Conversation

rhshadrach commented Apr 12, 2020 • edited Loading

alimcmaster1 commented May 2, 2020

rhshadrach commented May 3, 2020

rhshadrach commented May 10, 2020

charlesdong1991 left a comment

Choose a reason for hiding this comment

rhshadrach commented May 18, 2020

rhshadrach commented May 19, 2020

charlesdong1991 left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

rhshadrach commented May 22, 2020

charlesdong1991 commented May 23, 2020

Choose a reason for hiding this comment

rhshadrach commented Jun 3, 2020

WillAyd left a comment

Choose a reason for hiding this comment

WillAyd commented Jun 4, 2020 via email

rhshadrach commented Jun 5, 2020

WillAyd left a comment

Choose a reason for hiding this comment

TomAugspurger left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

WillAyd commented Jun 10, 2020

rhshadrach commented Jun 13, 2020

rhshadrach commented Jun 19, 2020

TomAugspurger commented Jun 22, 2020

WillAyd commented Jun 22, 2020

rhshadrach commented Apr 12, 2020 •

edited

Loading