Skip to content

PERF: cache plotting date locators for DatetimeIndex plotting #58992

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 2 commits into from
Jun 13, 2024

Conversation

mroeschke
Copy link
Member

@mroeschke mroeschke commented Jun 12, 2024

Using the last plotting example from the issue

In [1]: import time
   ...: import pandas as pd
   ...: import matplotlib.pyplot as plt

In [2]: ts_index = pd.date_range('2020-01-01 00:00', periods=500_000, freq='10min')
   ...: 
   ...: s_a = pd.Series(data=1, index=ts_index)
   ...: s_b = pd.Series(data=2, index=ts_index)

In [3]: fig, ax = plt.subplots()
   ...: t_s4 = time.time()
   ...: s_a.plot(ax=ax)
   ...: print("Time elapsed Series.plot() on existing plt axis without matplotlib line plot: "
   ...:       f"{(time.time()-t_s4):.2f} seconds")
Time elapsed Series.plot() on existing plt axis without matplotlib line plot: 8.96 seconds  # main
Time elapsed Series.plot() on existing plt axis without matplotlib line plot: 1.34 seconds  # PR

@mroeschke mroeschke added Visualization plotting Performance Memory or execution speed performance labels Jun 12, 2024
@mroeschke mroeschke added this to the 2.2.3 milestone Jun 12, 2024
@@ -826,6 +828,7 @@ def _monthly_finder(vmin, vmax, freq: BaseOffset) -> np.ndarray:
return info


@functools.cache
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No chance of getting non-hashable inputs?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think vmax/vmin are always floats. I'll type these so we have some validation

@jbrockmendel
Copy link
Member

Nice!

@jbrockmendel jbrockmendel merged commit 566e592 into pandas-dev:main Jun 13, 2024
47 checks passed
@jbrockmendel
Copy link
Member

Thanks @mroeschke

@tacaswell
Copy link
Contributor

A slightly late comment: doing the caching this way means multiple axes may share the same result. I have not chased through but I am not 100% sure that the returned object is not mutable (so you are opening up a bunch of spooky action-at-a-distance bugs).

It may be be better to do the caching on a per-converter basis.

I am a bit curious why mpl is calling _get_default_locs many times though...

mroeschke added a commit that referenced this pull request Jun 13, 2024
…s for DatetimeIndex plotting) (#59002)

Backport PR #58992: PERF: cache plotting date locators for DatetimeIndex plotting

Co-authored-by: Matthew Roeschke <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Performance Memory or execution speed performance Visualization plotting
Projects
None yet
Development

Successfully merging this pull request may close these issues.

PERF: Slow Series.plot() for DatetimeIndex for pandas 2.2.0 compared to 2.1.4
3 participants