PERF: improve plotting performance by not stringifying all x data #18373

jorisvandenbossche · 2017-11-19T20:52:05Z

Currently when plotting all x / index data are converted to strings, while you typically only need a few tick labels. So when you have a lot of data, this cause the pandas plotter to be hugely slower than needed (and than a pure matplotlib one)

On master:

[ 33.33%] ··· Running plotting.Plotting.time_frame_plot                        4.43s
[ 50.00%] ··· Running plotting.Plotting.time_series_plot                       4.19s
[ 66.67%] ··· Running plotting.TimeseriesPlotting.time_plot_irregular          72.2±0.2ms
[ 83.33%] ··· Running plotting.TimeseriesPlotting.time_plot_regular            108±0.5ms
[100.00%] ··· Running plotting.TimeseriesPlotting.time_plot_regular_compat     71.5±0.8ms

with this branch:

[ 33.33%] ··· Running plotting.Plotting.time_frame_plot                        132±20ms
[ 50.00%] ··· Running plotting.Plotting.time_series_plot                       71.4±30ms
[ 66.67%] ··· Running plotting.TimeseriesPlotting.time_plot_irregular          58.7±0.5ms
[ 83.33%] ··· Running plotting.TimeseriesPlotting.time_plot_regular            96.9±2ms
[100.00%] ··· Running plotting.TimeseriesPlotting.time_plot_regular_compat     57.4±0.8ms

So for very simple plot when from 4s to ca 100ms (which is much closer to the pure matplotlib performance)

codecov · 2017-11-20T00:38:31Z

Codecov Report

Merging #18373 into master will decrease coverage by 0.01%.
The diff coverage is 85.71%.

@@            Coverage Diff             @@
##           master   #18373      +/-   ##
==========================================
- Coverage   91.38%   91.36%   -0.02%     
==========================================
  Files         164      164              
  Lines       49797    49800       +3     
==========================================
- Hits        45508    45502       -6     
- Misses       4289     4298       +9

Flag	Coverage Δ
#multiple	`89.17% <85.71%> (ø)`	⬆️
#single	`39.55% <0%> (-0.07%)`	⬇️

Impacted Files	Coverage Δ
pandas/plotting/_core.py	`82.49% <85.71%> (+0.03%)`	⬆️
pandas/io/gbq.py	`25% <0%> (-58.34%)`	⬇️
pandas/core/frame.py	`97.8% <0%> (-0.1%)`	⬇️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update a172ff9...449c296. Read the comment docs.

codecov · 2017-11-20T00:38:37Z

Codecov Report

Merging #18373 into master will decrease coverage by 0.03%.
The diff coverage is 85.71%.

@@            Coverage Diff             @@
##           master   #18373      +/-   ##
==========================================
- Coverage   91.38%   91.34%   -0.04%     
==========================================
  Files         164      164              
  Lines       49797    49721      -76     
==========================================
- Hits        45508    45420      -88     
- Misses       4289     4301      +12

Flag	Coverage Δ
#multiple	`89.14% <85.71%> (-0.03%)`	⬇️
#single	`39.61% <0%> (-0.01%)`	⬇️

Impacted Files	Coverage Δ
pandas/plotting/_core.py	`82.49% <85.71%> (+0.03%)`	⬆️
pandas/io/gbq.py	`25% <0%> (-58.34%)`	⬇️
pandas/core/indexes/interval.py	`92.52% <0%> (-0.34%)`	⬇️
pandas/core/indexes/category.py	`97.2% <0%> (-0.26%)`	⬇️
pandas/tseries/offsets.py	`96.7% <0%> (-0.23%)`	⬇️
pandas/core/frame.py	`97.8% <0%> (-0.1%)`	⬇️
pandas/core/indexes/base.py	`96.42% <0%> (ø)`	⬆️
pandas/core/indexes/multi.py	`96.4% <0%> (+0.02%)`	⬆️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update a172ff9...6124860. Read the comment docs.

jreback · 2017-11-20T11:13:34Z

lgtm. needs a whatsnew, prob ok for 0.21.1

jreback · 2017-11-20T11:13:53Z

is there an associated issue?

TomAugspurger · 2017-11-20T13:27:32Z

#18236

jorisvandenbossche · 2017-11-20T13:29:52Z

Ah, yes, didn't see that one, but from the description this PR is exactly fixing that :-)

jorisvandenbossche · 2017-11-20T15:06:43Z

@TomAugspurger any comments on the content of the PR?

TomAugspurger

Looks good.

…ndas-dev#18373) * add benchmark with basic default plotting (cherry picked from commit 8d04daf)

…8373) * add benchmark with basic default plotting (cherry picked from commit 8d04daf)

jorisvandenbossche added 2 commits November 19, 2017 18:11

PERF: improve plotting performance by not stringifying all x data

a74c837

add benchmark with basic default plotting

449c296

jorisvandenbossche added Performance Memory or execution speed performance Visualization plotting labels Nov 19, 2017

jreback added this to the 0.22.0 milestone Nov 20, 2017

jreback removed this from the 0.22.0 milestone Nov 20, 2017

jorisvandenbossche added this to the 0.21.1 milestone Nov 20, 2017

add whatsnew note

6124860

jorisvandenbossche mentioned this pull request Nov 20, 2017

df.plot() very slow compared to explicit matplotlib on large dataframes #18236

Closed

TomAugspurger approved these changes Nov 20, 2017

View reviewed changes

jorisvandenbossche merged commit 8d04daf into pandas-dev:master Nov 20, 2017

jorisvandenbossche deleted the perf-plotting branch November 20, 2017 18:21

TomAugspurger pushed a commit to TomAugspurger/pandas that referenced this pull request Dec 8, 2017

PERF: improve plotting performance by not stringifying all x data (pa…

3a1eeef

…ndas-dev#18373) * add benchmark with basic default plotting (cherry picked from commit 8d04daf)

TomAugspurger pushed a commit that referenced this pull request Dec 11, 2017

PERF: improve plotting performance by not stringifying all x data (#1…

018ab9a

…8373) * add benchmark with basic default plotting (cherry picked from commit 8d04daf)

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

PERF: improve plotting performance by not stringifying all x data #18373

PERF: improve plotting performance by not stringifying all x data #18373

jorisvandenbossche commented Nov 19, 2017 •

edited

Loading

codecov bot commented Nov 20, 2017 •

edited

Loading

codecov bot commented Nov 20, 2017 •

edited

Loading

jreback commented Nov 20, 2017

jreback commented Nov 20, 2017

TomAugspurger commented Nov 20, 2017

jorisvandenbossche commented Nov 20, 2017

jorisvandenbossche commented Nov 20, 2017

TomAugspurger left a comment

PERF: improve plotting performance by not stringifying all x data #18373

PERF: improve plotting performance by not stringifying all x data #18373

Conversation

jorisvandenbossche commented Nov 19, 2017 • edited Loading

codecov bot commented Nov 20, 2017 • edited Loading

Codecov Report

codecov bot commented Nov 20, 2017 • edited Loading

Codecov Report

jreback commented Nov 20, 2017

jreback commented Nov 20, 2017

TomAugspurger commented Nov 20, 2017

jorisvandenbossche commented Nov 20, 2017

jorisvandenbossche commented Nov 20, 2017

TomAugspurger left a comment

Choose a reason for hiding this comment

jorisvandenbossche commented Nov 19, 2017 •

edited

Loading

codecov bot commented Nov 20, 2017 •

edited

Loading

codecov bot commented Nov 20, 2017 •

edited

Loading