Skip to content

DEPR: Clean up of pandas.plotting #28177

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
datapythonista opened this issue Aug 27, 2019 · 10 comments
Open

DEPR: Clean up of pandas.plotting #28177

datapythonista opened this issue Aug 27, 2019 · 10 comments
Labels
Deprecate Functionality to remove in pandas Needs Discussion Requires discussion from core team before further action Visualization plotting

Comments

@datapythonista
Copy link
Member

xref: #26747, #28159

The current plotting API feels inconsistent and I think it's the one we have for historical reasons, and not the one we want. I propose the next changes:

  1. Leave the current API based on the .plot accessor as is (e.g. Series.plot.hist, Series.plot(kind='box')). In the future we should consider two things:
  • Whether we want backend to be able to add plots that we don't define
  • Move all the matplotlib specific parameters to **kwargs
  1. Remove all duplicate functions:
  • hist_series (Series.hist -> Series.plot.hist)
  • hist_frame (DataFrame.hist -> DataFrame.plot.hist)
  • boxplot (pandas.plotting.boxplot -> DataFrame.plot.box)
  • boxplot_frame (DataFrame.boxplot -> DataFrame.plot.box)
  1. Move the matplotlib backend to a separate project (pandas.plotting._matplotlib -> pandas_matplotlib)

  2. Move to the matplotlib backend the non-accessor plotting functions:

  • andrews_curves (pandas.plotting.andrews_curves -> pandas_matplotlib.andrews_curves)
  • autocorrelation_plot
  • bootstrap_plot
  • lag_plot
  • parallel_coordinates
  • radviz
  • scatter_matrix
  • table
  1. Move to the matplotlib backend register/unregister of the converters (pandas.plotting.register->pandas_matplotlib.register`)

CC: @pandas-dev/pandas-core @jakevdp

@charlesdong1991
Copy link
Member

may i have a try on this if some agreements are reached by maintainers?

@datapythonista datapythonista added the Needs Discussion Requires discussion from core team before further action label Aug 27, 2019
@datapythonista
Copy link
Member Author

I'll be creating new issues for the tasks that result from this issue, but you're surely welcome to work on those.

@Jeitan
Copy link

Jeitan commented Sep 9, 2019

I just ran across this and see that it is fairly recent, hooray! Might I make an observation of something to consider as this rework is done? Grouped histograms. There are currently 8 native ways to do this, including the "redundant" .hist and .plot.hist and depending on what kind of object you're calling it from. Almost none of them behave the same and some don't behave in any expected way (no grouping).

I am concerned because if the plan is to drop Series.hist and DataFrame.hist, (which is fine by me actually I don't like API redundancy), it is worth noting that grouping using the by= keyword to Series.plot.hist and DataFrame.plot.hist does not work. I've compiled all the behaviors in this spreadsheet (file should be available here).

Is this the right place to bring it up or should I make a separate issue on grouping behavior?

@TomAugspurger
Copy link
Contributor

TomAugspurger commented Sep 9, 2019 via email

@Jeitan
Copy link

Jeitan commented Sep 9, 2019

@TomAugspurger Well it looks like that's along the right track, for sure ... that .hist and .plot.hist do very different things. I don't see any mention of the 'by= keyword, though, which is what I'm concerned about. For example, the following:

import numpy as np
import pandas as pd

np.random.seed(159753)

df = pd.DataFrame(np.random.randn(30, 2), columns=['A', 'B'])
df['C'] = np.random.choice(['a', 'b', 'c'], 30)

df.hist(column='A', by='C')
df.plot.hist(column='A', by='C')

Yield these two plots:
(1)
pdh_df_hist_by
(2)
pdh_df_plothist_by

As you can see the first one uses subplots, but the second one just plots the whole histogram of 'A' with no grouping whatsoever.

@Jeitan
Copy link

Jeitan commented Sep 9, 2019

It looks like one way of achieving the .hist type result with .plot.hist is to make the subplots keyword functional (right now it doesn't do anything in this particular situation).

@Jeitan
Copy link

Jeitan commented Sep 9, 2019

Actually, I just found issue #15079 which I think is most closely related, methinks. Sorry for cluttering the space here. However, since actually implementing by in the plot.hist pathway was kicked down the road at that point over a year ago, now seems a good time to get it done if hist is really going to be deprecated.

@jbrockmendel
Copy link
Member

@datapythonista would it make sense to make checkboxes in the top post to clarify what the status of this issue is?

@datapythonista
Copy link
Member Author

I opened this to have a discussion and see if people was happy with my proposed changes. But I don't think there has been any discussion or any progress on this. So probably not worth having the checkboxes for now.

@stelios-c
Copy link

Hello, is there any planned change on pandas.plotting? I see this issue is from 2019 but open.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Deprecate Functionality to remove in pandas Needs Discussion Requires discussion from core team before further action Visualization plotting
Projects
None yet
Development

No branches or pull requests

6 participants