Skip to content

PLOT: Split matplotlib specific code from pandas plotting #26414

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 34 commits into from
Jun 8, 2019
Merged

PLOT: Split matplotlib specific code from pandas plotting #26414

merged 34 commits into from
Jun 8, 2019

Conversation

datapythonista
Copy link
Member

In order to be able to change pandas plotting backend, I think we need 3 main things:

  1. Decouple the current matplotlib backend from the pandas plotting system, and just call it using a defined public API
  2. Implement an option to select the backend (e.g. pandas.set_option('plotting.backend', 'pandas.matplotlib'))
  3. Have other alternative backends which define the same API as the current matplotlib one

This PR is splitting most of the matplotlib backend code into a new _matplotlib.py file. Some matplotlib code still lives in plotting/_core.py, but would be nice to have early feedback on whether everybody is happy with this approach before moving forward. CC: @jreback @TomAugspurger

@datapythonista datapythonista added Refactor Internal refactoring of code Visualization plotting labels May 15, 2019
@TomAugspurger
Copy link
Contributor

Thanks, will take a look later. In case it's useful now, I started TomAugspurger@b07aba2, which adds a config option for plotting, but never got to submitting a PR.

@jbrockmendel
Copy link
Member

Totally on-board with the concept. What other backends do you expect (either implemented here or downstream)? My understanding is many of the other graphics libraries wrap matplotlib themselves.

@TomAugspurger
Copy link
Contributor

TomAugspurger commented May 15, 2019

I know of two libraries that have reimplemented the pd.DataFrame.plot API

@datapythonista
Copy link
Member Author

I think Phillip started hvplot with the discussions of this issue, but as we didn't finish this work, it ended monkeypatching pandas.

The goal here is that projects like the ones Tom mentioned, and others like https://github.com/PatrikHlobil/Pandas-Bokeh can be implemented simpler and in a standard way, so we make the life of devs and users easier.

Personally, I'd at some point split the matplotlib backend from pandas and have it as a third party package. But that's for the long term, and unrelated to this PR.

@datapythonista
Copy link
Member Author

For what I could see in the code, the API that pandas plotting backends need to provide are:

  • LinePlot
  • BarPlot
  • BarhPlot
  • HistPlot
  • BoxPlot
  • KdePlot
  • AreaPlot
  • PiePlot (only for pandas.Series)
  • ScatterPlot (only for pandas.DataFrame)
  • HexBinPlot (only for pandas.DataFrame)
  • hist_series
  • hist_frame
  • boxplot_frame
  • boxplot_frame_groupby

The implementation in this PR moves all the matplotlib code into pandas.plotting.matplotlib. And keeps all the public documentation in pandas (outside of the matplotlib backend). In a follow up PR should be simple to add an option to change the backend to another respecting the same API as the matplotlib one.

Any comment appreciated.

CC: @tacaswell, @jakevdp, @philippjfr, @PatrikHlobil

@jakevdp
Copy link
Contributor

jakevdp commented May 16, 2019

This is great - thanks for working on it!

@codecov
Copy link

codecov bot commented May 16, 2019

Codecov Report

Merging #26414 into master will decrease coverage by 49.94%.
The diff coverage is 17%.

Impacted file tree graph

@@             Coverage Diff             @@
##           master   #26414       +/-   ##
===========================================
- Coverage   91.69%   41.74%   -49.95%     
===========================================
  Files         174      178        +4     
  Lines       50739    50739               
===========================================
- Hits        46524    21181    -25343     
- Misses       4215    29558    +25343
Flag Coverage Δ
#multiple ?
#single 41.74% <17%> (+0.4%) ⬆️
Impacted Files Coverage Δ
pandas/plotting/matplotlib/__init__.py 100% <100%> (ø)
pandas/plotting/matplotlib/boxplot.py 15.02% <15.02%> (ø)
pandas/plotting/matplotlib/main.py 15.19% <15.19%> (ø)
pandas/plotting/matplotlib/hist.py 21.38% <21.38%> (ø)
pandas/plotting/_core.py 44.3% <27.9%> (-39.6%) ⬇️
pandas/io/formats/latex.py 0% <0%> (-100%) ⬇️
pandas/io/sas/sas_constants.py 0% <0%> (-100%) ⬇️
pandas/core/groupby/categorical.py 0% <0%> (-100%) ⬇️
pandas/tseries/plotting.py 0% <0%> (-100%) ⬇️
pandas/tseries/converter.py 0% <0%> (-100%) ⬇️
... and 140 more

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update ff4437e...0e45c2a. Read the comment docs.

@codecov
Copy link

codecov bot commented May 16, 2019

Codecov Report

Merging #26414 into master will decrease coverage by 0.09%.
The diff coverage is 75.96%.

Impacted file tree graph

@@            Coverage Diff            @@
##           master   #26414     +/-   ##
=========================================
- Coverage   91.88%   91.79%   -0.1%     
=========================================
  Files         174      179      +5     
  Lines       50699    50763     +64     
=========================================
+ Hits        46587    46600     +13     
- Misses       4112     4163     +51
Flag Coverage Δ
#multiple 90.33% <75.96%> (-0.09%) ⬇️
#single 41.31% <3.43%> (-0.7%) ⬇️
Impacted Files Coverage Δ
pandas/plotting/_matplotlib/converter.py 58.43% <ø> (ø)
pandas/plotting/_matplotlib/compat.py 84.61% <ø> (ø)
pandas/tseries/converter.py 0% <0%> (-100%) ⬇️
pandas/plotting/_matplotlib/style.py 85.96% <100%> (ø)
pandas/tseries/plotting.py 100% <100%> (ø) ⬆️
pandas/plotting/_matplotlib/tools.py 78.4% <100%> (ø)
pandas/core/groupby/generic.py 89.08% <100%> (ø) ⬆️
pandas/core/series.py 93.61% <100%> (ø) ⬆️
pandas/core/frame.py 97% <100%> (-0.12%) ⬇️
pandas/plotting/_matplotlib/__init__.py 100% <100%> (ø)
... and 18 more

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 5b10e31...ceedd3b. Read the comment docs.

@@ -0,0 +1,8 @@
from .boxplot import BoxPlot, boxplot_frame, boxplot_frame_groupby
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Rename this module to pandas/plotting/_matplotlib

@@ -0,0 +1,1374 @@
import re
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe "core" instead of "main"? I typically associate main with __main__ / running from the command line.

@tacaswell
Copy link
Contributor

I am too 💤 right now to have much sensible comments on the code.

If you are interested in pulling this whole thing out and putting it in matplotlib/matplotlib-pandas I am open to that and would happily make sure anyone from the pandas side who needs / wants access will have it. That is probably something to look at down the road after sorting out what you want the interface to be (mostly to avoid obligatory coordinated releases).

If there is interest in more aggressive re-thinking of the API DistrictDataLabs/yellowbrick#826 , matplotlib/matplotlib#14058, and https://paper.dropbox.com/doc/Matplotlib-4.0--AdQMb0ugRPWeqq3U4dEAHuSDAQ-WTYwd0NQaSHTjtLUZwkNx may be of interest.

Marc Garcia added 2 commits May 17, 2019 14:34
@pep8speaks
Copy link

pep8speaks commented May 17, 2019

Hello @datapythonista! Thanks for updating this PR. We checked the lines you've touched for PEP 8 issues, and found:

There are currently no PEP 8 issues detected in this Pull Request. Cheers! 🍻

Comment last updated at 2019-06-08 22:28:02 UTC

Copy link
Contributor

@jreback jreback left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is this PR just moving things (and fixing the imports). would greatly appreciate that. Then can followon with a PR to change code. It is impossible to see what you are actually changing otherwise.

@datapythonista
Copy link
Member Author

100% agree, there is no functional change, and no public API change in this PR. Just moving all the code specific to matplotlib from pandas/plotting/* to pandas/plotting/_matplotlib/*. All the calls to the "public" functions in _matplotlib are done by importing the module in a lazy way (that fails when matplotlib is not installed in the same was as it fails now).

Ideally would like to avoid touching any tests to guarantee that no unintentional functional change happens. In practice I need to update the import paths of some tests, since they import private functions that have been moved.

Still work in progress, the change is much bigger than what I originally thought, but will provide more details on what the PR involves when it's ready to be merged.

@jreback
Copy link
Contributor

jreback commented May 18, 2019

thanks @datapythonista yeah also happy to do this in pieces if its easier.

@TomAugspurger
Copy link
Contributor

@datapythonista are you able to reproduce the CI failures like https://dev.azure.com/pandas-dev/pandas/_build/results?buildId=12231 locally? If not, I can try taking a look.

@datapythonista
Copy link
Member Author

No, I'm taking a look, but very confused on what's going on, even the mypy error. The CI was green before merging master, which confuses me even more. If you can take a look too, that would be great, let me know if you find something.

@WillAyd
Copy link
Member

WillAyd commented Jun 3, 2019

@datapythonista haven't had the chance to look too deep but interesting that boxplot_frame throws a mypy error while hist_frame the line above doesn't. What's different between those two is the Appender - can you see if removing that maybe silences the mypy error?

@datapythonista
Copy link
Member Author

thanks @WillAyd, I also saw that, and removing the Appender does fix the problem, but I think there is a root error, Appender is being used eveywhere... :\

@WillAyd
Copy link
Member

WillAyd commented Jun 3, 2019

Yea not sure why it's appearing now. Probably just need to annotate Appender for proper inference.

I don't think it's critical for this PR so if that's the only hangup feel free to add type: ignore and we can address in a follow up

@datapythonista
Copy link
Member Author

I tried annotating Appender but didn't work. I saw a mypy issue where this error was raised for unrelated problems. One of the test failures (happens just with couple of builds) is related to boxplot too. I'll see if I can find the problem for that, as it's difficult to debug the mypy error. With some luck all the problems are caused by the same. Thanks for the help.

@WillAyd
Copy link
Member

WillAyd commented Jun 3, 2019

No problem. As mentioned I think OK to mark as type: ignore for now so it doesn't hold this up. I've opened #26629 to address separately

@datapythonista
Copy link
Member Author

No idea what's going on here. The 3 errors seem unrelated. One of them is a failure on the qt backend of matplotlib because can't find the environment variable DISPLAY (I don't think this PR affects that variable, I guess the problem is something else causing that).

The other error is caused by a plot losing the tick information in a plot. None of them reproduce locally with the dependencies of the build (one is the mac build, may be someone with a mac can try to reproduce).

The mypy error seems to be nonsense too.

@TomAugspurger
Copy link
Contributor

I tried on a Mac, and wasn't able to reproduce the error (will try in an exact replica of that env later). I will note that the warning shows up under pytest under your branch, whereas it didn't before. I'm not sure why that would be.

@datapythonista
Copy link
Member Author

I fixed the errors, not sure why they were caused, but importing matplotlib to see if it was installed before registering the converters was causing the problems (including the mypy one which is weird).

CI is green, and it's generating the warning reported by @TomAugspurger like in master. Should be ready to be merged.

@jreback
Copy link
Contributor

jreback commented Jun 8, 2019

merge master one more time, ping on green.


def _make_plot(self):
if self.subplots:
from pandas.core.series import Series
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

later on I suppose could move all imports to the top as this is now a delayed import generally

@jreback
Copy link
Contributor

jreback commented Jun 8, 2019

@datapythonista after merge if you can create any needed followup issues, and issues for splitting tests and moving imports

@datapythonista
Copy link
Member Author

@jreback this is green.

Will open PRs for the immediate follow up things, and create issues for the rest once merged.

@jreback jreback merged commit c7748ca into pandas-dev:master Jun 8, 2019
@jreback
Copy link
Contributor

jreback commented Jun 8, 2019

thanks @datapythonista very nice!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Refactor Internal refactoring of code Visualization plotting
Projects
None yet
Development

Successfully merging this pull request may close these issues.

10 participants