Skip to content

PERF: improve plotting performance by not stringifying all x data #18373

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged

Conversation

jorisvandenbossche
Copy link
Member

@jorisvandenbossche jorisvandenbossche commented Nov 19, 2017

Closes #18236

Currently when plotting all x / index data are converted to strings, while you typically only need a few tick labels. So when you have a lot of data, this cause the pandas plotter to be hugely slower than needed (and than a pure matplotlib one)

On master:

[ 33.33%] ··· Running plotting.Plotting.time_frame_plot                        4.43s
[ 50.00%] ··· Running plotting.Plotting.time_series_plot                       4.19s
[ 66.67%] ··· Running plotting.TimeseriesPlotting.time_plot_irregular          72.2±0.2ms
[ 83.33%] ··· Running plotting.TimeseriesPlotting.time_plot_regular            108±0.5ms
[100.00%] ··· Running plotting.TimeseriesPlotting.time_plot_regular_compat     71.5±0.8ms

with this branch:

[ 33.33%] ··· Running plotting.Plotting.time_frame_plot                        132±20ms
[ 50.00%] ··· Running plotting.Plotting.time_series_plot                       71.4±30ms
[ 66.67%] ··· Running plotting.TimeseriesPlotting.time_plot_irregular          58.7±0.5ms
[ 83.33%] ··· Running plotting.TimeseriesPlotting.time_plot_regular            96.9±2ms
[100.00%] ··· Running plotting.TimeseriesPlotting.time_plot_regular_compat     57.4±0.8ms

So for very simple plot when from 4s to ca 100ms (which is much closer to the pure matplotlib performance)

@jorisvandenbossche jorisvandenbossche added Performance Memory or execution speed performance Visualization plotting labels Nov 19, 2017
@codecov
Copy link

codecov bot commented Nov 20, 2017

Codecov Report

Merging #18373 into master will decrease coverage by 0.01%.
The diff coverage is 85.71%.

Impacted file tree graph

@@            Coverage Diff             @@
##           master   #18373      +/-   ##
==========================================
- Coverage   91.38%   91.36%   -0.02%     
==========================================
  Files         164      164              
  Lines       49797    49800       +3     
==========================================
- Hits        45508    45502       -6     
- Misses       4289     4298       +9
Flag Coverage Δ
#multiple 89.17% <85.71%> (ø) ⬆️
#single 39.55% <0%> (-0.07%) ⬇️
Impacted Files Coverage Δ
pandas/plotting/_core.py 82.49% <85.71%> (+0.03%) ⬆️
pandas/io/gbq.py 25% <0%> (-58.34%) ⬇️
pandas/core/frame.py 97.8% <0%> (-0.1%) ⬇️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update a172ff9...449c296. Read the comment docs.

@codecov
Copy link

codecov bot commented Nov 20, 2017

Codecov Report

Merging #18373 into master will decrease coverage by 0.03%.
The diff coverage is 85.71%.

Impacted file tree graph

@@            Coverage Diff             @@
##           master   #18373      +/-   ##
==========================================
- Coverage   91.38%   91.34%   -0.04%     
==========================================
  Files         164      164              
  Lines       49797    49721      -76     
==========================================
- Hits        45508    45420      -88     
- Misses       4289     4301      +12
Flag Coverage Δ
#multiple 89.14% <85.71%> (-0.03%) ⬇️
#single 39.61% <0%> (-0.01%) ⬇️
Impacted Files Coverage Δ
pandas/plotting/_core.py 82.49% <85.71%> (+0.03%) ⬆️
pandas/io/gbq.py 25% <0%> (-58.34%) ⬇️
pandas/core/indexes/interval.py 92.52% <0%> (-0.34%) ⬇️
pandas/core/indexes/category.py 97.2% <0%> (-0.26%) ⬇️
pandas/tseries/offsets.py 96.7% <0%> (-0.23%) ⬇️
pandas/core/frame.py 97.8% <0%> (-0.1%) ⬇️
pandas/core/indexes/base.py 96.42% <0%> (ø) ⬆️
pandas/core/indexes/multi.py 96.4% <0%> (+0.02%) ⬆️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update a172ff9...6124860. Read the comment docs.

@jreback jreback added this to the 0.22.0 milestone Nov 20, 2017
@jreback
Copy link
Contributor

jreback commented Nov 20, 2017

lgtm. needs a whatsnew, prob ok for 0.21.1

@jreback jreback removed this from the 0.22.0 milestone Nov 20, 2017
@jreback
Copy link
Contributor

jreback commented Nov 20, 2017

is there an associated issue?

@TomAugspurger
Copy link
Contributor

#18236

@jorisvandenbossche jorisvandenbossche added this to the 0.21.1 milestone Nov 20, 2017
@jorisvandenbossche
Copy link
Member Author

Ah, yes, didn't see that one, but from the description this PR is exactly fixing that :-)

@jorisvandenbossche
Copy link
Member Author

@TomAugspurger any comments on the content of the PR?

Copy link
Contributor

@TomAugspurger TomAugspurger left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good.

@jorisvandenbossche jorisvandenbossche merged commit 8d04daf into pandas-dev:master Nov 20, 2017
@jorisvandenbossche jorisvandenbossche deleted the perf-plotting branch November 20, 2017 18:21
TomAugspurger pushed a commit to TomAugspurger/pandas that referenced this pull request Dec 8, 2017
…ndas-dev#18373)

* add benchmark with basic default plotting

(cherry picked from commit 8d04daf)
TomAugspurger pushed a commit that referenced this pull request Dec 11, 2017
…8373)

* add benchmark with basic default plotting

(cherry picked from commit 8d04daf)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Performance Memory or execution speed performance Visualization plotting
Projects
None yet
Development

Successfully merging this pull request may close these issues.

df.plot() very slow compared to explicit matplotlib on large dataframes
3 participants