Skip to content

ENH: Allow column grouping in DataFrame.plot #29944

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 62 commits into from
Jun 9, 2022

Conversation

NicolasHug
Copy link
Contributor

@NicolasHug NicolasHug commented Dec 1, 2019

This is an attempt at #29688 i.e. group some columns on the same ax when calling df.plot

See this gist for illustration of different supported plots

Thanks!

@jbrockmendel jbrockmendel added the Visualization plotting label Dec 3, 2019
@WillAyd
Copy link
Member

WillAyd commented Jan 16, 2020

@NicolasHug is this still active? Looks like a merge conflict and CI failure need to be addressed

@NicolasHug
Copy link
Contributor Author

I'm still happy to work on this. I just haven't received initial feedback yet (and don't want to push either, I understand this isn't necessarily a high priority)

Before working more on this I'd appreciate some feedback to make sure this is going the in right direction, and that this is actually a desired feature.
Feel free to close otherwise, or LMK if you need to see more to decide :)

@WillAyd
Copy link
Member

WillAyd commented Jan 16, 2020

Understood. I doubt anyone has reviewed since CI has been red, so definitely want to get that green to start

@NicolasHug
Copy link
Contributor Author

NicolasHug commented Jan 16, 2020

Will do

@WillAyd
Copy link
Member

WillAyd commented Mar 14, 2020

@charlesdong1991 any thoughts on this?

Copy link
Member

@charlesdong1991 charlesdong1991 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've noticed this feature request a while ago and personally really like this feature and find it quite potentially useful ^^ @WillAyd but I think the behaviour of this needs to be well defined.

Many thanks for the PR, it's very nice! @NicolasHug some minor comments:

  1. As mentioned in review, the input for subplots needs to be carefully defined, thus only checking if it is Iterable or bool might not be sufficient.
  2. And it would be very nice if you could add some detailed docstring for subplots parameter because this is a behaviour change, i think somewhere in PlotAccessor in _core.py.
  3. And some more tests would be appreciated (see comments below)
  4. Nicer to have a docstring for _col_idx_to_axis_idx although i think it is quite straightforward
  5. Nicer to have a whatsnew note in 1.1.0 rst !

@NicolasHug NicolasHug changed the title WIP: Allow column grouping in DataFrame.plot MRG: Allow column grouping in DataFrame.plot Mar 30, 2020
@NicolasHug
Copy link
Contributor Author

Thanks a lot for the review @charlesdong1991 , I addressed your comments and added a bunch of validation / tests.

It looks like the code works for most kinds of plots. You can see a few illustrations in this gist.

Looks like the subplot titles could be improved in some cases, I'll see what I can do. But otherwise I think this is ready for another round ;)

@NicolasHug
Copy link
Contributor Author

Regarding the docstring, I'm a bit confused because it seems that subplots have been removed from most docstrings? i.e. git grep "subplots :" only gives one hit, for boxplot_frame_groupby(). I can't find it for PlotAccessor.

Copy link
Member

@charlesdong1991 charlesdong1991 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

very nice! some minor comments

Could you also please make some docs for subplots? I think you could add it in PlotAcessor, there are some missing docs for parameters, so very nice this one is found out! ^^

"kde",
"density",
"area",
"pie",
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

looking at your impressive examples, maybe we should exclude pie chart here? doesnt seem to fit as good as other charts here? any thoughts on it?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree it doesn't look good, but it seems to be consistent (i.e. equally bad) as when subplots=True. I updated the notebook at the end where you can see how it looks. So maybe it can be kept? No strong opinion!

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i have no strong opinion either, my only concern is for pie chart, if they share the same category, they will have the same color in the pie, but different sizes depending on the corresponding values.

but no strong opinion

)

if any(
not isinstance(group, Iterable) or isinstance(group, str)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

should only not isinstance(group, Iterable) be enough here? isinstance(group, str) also falls into this, right?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The thing is, we want group to be an iterable but we don't want group to be a str, which is also iterable.

Only using if any(not isinstance(group, Iterable) would allow strings and we don't want that

@MarcoGorelli MarcoGorelli self-requested a review June 8, 2021 08:06
Copy link
Member

@ivanovmg ivanovmg left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good to me.
Pending CI checks.

for group_idx, group in enumerate(self.subplots):
if col_idx in group:
return group_idx
else:
Copy link
Member

@ivanovmg ivanovmg Jun 8, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I suppose, that once you added return type, mypy started complaining.
Inside the loop you return only in case if col_idx in group, this is the reason for the mypy error.
I suggest that you modify it:

for group_idx, group in enumerate(self.subplots):
    if col_idx in group:
        return group_idx
raise KeyError(f"Failed to find {col_idx} in {self.subplots}")

Or something like this.

@jreback
Copy link
Contributor

jreback commented Oct 4, 2021

@NicolasHug was looking good. if you want to continue, pls merge master and can relook.

@jreback
Copy link
Contributor

jreback commented Jan 16, 2022

sorry for the delay here

@NicolasHug was looking good. if you want to continue, pls merge master and can relook.

@NicolasHug
Copy link
Contributor Author

PR is up to date with main @jreback

@jreback
Copy link
Contributor

jreback commented Mar 6, 2022

@NicolasHug can you merge master and we'll see if can get in

@mroeschke mroeschke added this to the 1.5 milestone Jun 3, 2022
Copy link
Member

@mroeschke mroeschke left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for sticking with this @NicolasHug. I merged master and did a small refactor to validate the subplot groups in one pass.

I think this looks pretty good to merge. I believe it requires @jreback approval to move forward.

@jreback jreback merged commit 297c59a into pandas-dev:main Jun 9, 2022
@jreback
Copy link
Contributor

jreback commented Jun 9, 2022

thanks @NicolasHug very nice!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

ENH: Group some columns in subplots with DataFrame.plot