Skip to content

ENH: When using another plotting backend, minimize pre-processing #28647

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 4 commits into from
Nov 19, 2019

Conversation

jsignell
Copy link
Contributor

  • closes #xxxx
  • tests added / passed
  • passes black pandas
  • passes git diff upstream/master -u -- "*.py" | flake8 --diff
  • whatsnew entry

I ran into this while implementing the hvplot backend. In hvplot you can do:

df.hvplot.hist(y='y', by='category')

but with the pandas version

pd.options.plotting.backend= 'holoviews'
df.plot.hist(y='y', by='category')

will fail because data = data[y] is called before the plotting is passed off to the backend.

Basically it seems like backend writers should be free to get the passed pandas objects with as little interference as possible.

@jsignell
Copy link
Contributor Author

jsignell commented Oct 1, 2019

@datapythonista does this seem reasonable to you?

Copy link
Member

@datapythonista datapythonista left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree with the problem, but I'm -1 on the fix. If this is wrong for hvplot, it's also wrong for the matplotlib backend, and we should change the matplotlib backend instead.

Is this related #9274 or #15079?

@jsignell
Copy link
Contributor Author

jsignell commented Oct 1, 2019

I guess it could be perceived as related to those issues if you think the best fix would be to try to know all the options that can accept column names and then reduce the data to only those columns. That will never be sufficient for hvplot though because of options like hover_cols which matplotlib will never have.

I think the more essential issue is that pandas.plotting should do minimal data transforming before handing off the data to the plotting backend. I just wasn't sure whether folks from other backends would agree that we don't want processing done ahead of time. Maybe @jakevdp can comment on what would be best for altair?

@datapythonista
Copy link
Member

I'm happy with that. The way it is now it's based in how the matplotlib backend was implemented when was coupled.

Besides being simpler in the pandas side, I see advantages on passing the whole dataframe, like backends being able to use other columns for hover and things like that.

I think making the changes shouldn't be hard.

@TomAugspurger what do you think?

Copy link
Contributor

@TomAugspurger TomAugspurger left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the more essential issue is that pandas.plotting should do minimal data transforming before handing off the data to the plotting backend.

I agree with this goal.

Looking through the stuff that's skipped, ideally it could / would all be deleted. Any interest in working on that @jsignell?

@TomAugspurger TomAugspurger added the Visualization plotting label Nov 5, 2019
@TomAugspurger TomAugspurger added this to the 1.0 milestone Nov 5, 2019
Copy link
Contributor

@TomAugspurger TomAugspurger left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pushed a small test.

While I think that removing all this preprocessing before passing off to the backend for all kinds is valuable, I don't think we need to hold up this PR for that. I'll open up a followup issue.

#29412

Copy link
Member

@datapythonista datapythonista left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fair enough.

May be worth adding a TODO with the issue number, so we can identify in the code this is a temporary solution.

@WillAyd WillAyd merged commit d134b47 into pandas-dev:master Nov 19, 2019
@WillAyd
Copy link
Member

WillAyd commented Nov 19, 2019

Thanks @jsignell

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants