Skip to content

Pandas .plot() on regularly spaced timeseries result in slow plotting/interaction #31074

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
rhkarls opened this issue Jan 16, 2020 · 3 comments
Closed
Labels
Performance Memory or execution speed performance Visualization plotting

Comments

@rhkarls
Copy link
Contributor

rhkarls commented Jan 16, 2020

Code Sample, a copy-pastable example if possible

import numpy as np
import matplotlib.pyplot as plt
import datetime as dt
import pandas as pd

N = 100000
y = np.linspace(0, 10, N)

base_dt = dt.datetime(2000,1,1)
dt_range = [base_dt + dt.timedelta(minutes=x) for x in range(N)]

plt.figure('pyplot timeseries')
plt.plot(dt_range,y)

s = pd.Series(index=dt_range, data=y)
plt.figure('Pandas plot timeseries')
s.plot()

s2 = pd.Series(data=y)
plt.figure('Pandas plot series')
s2.plot()

Problem description

I noticed that plotting time series using .plot() sometimes resulted in very slow and unresponsive plots, where it is difficult to interact with the figure (e.g. pan). I think this happens with regularly spaced time series where there either is a frequency defined or pandas is able to infer the frequency. Perhaps it has something to do with the plot tick labels on the x (time) axis? It does not happen when plotting time series that are irregular, and thus when pandas does not style the plot ticks and tick labels.

In the example code the first and the third plot are smooth to interact with, while the second plot is lagging terribly. Also see screenshot of the second plot with the pandas styled tick labels:
image

If changing the dt_range from minute frequency to hour frequency (replace minutes with hours) the pandas.plot() becomes much smooth to interact with despite having the same series size, and I think because it has much fewer ticks and labels:
image

So it might be a combination of the size of series plotted and how the ticks are drawn/updated?
Are there ways to disable the styled pandas ticks somehow? I also notice that pyplot.plot() and pandas.plot() result in very different conversions of timestamps to numeric x values - is there a way to also disable that behavior in pandas so it is compatible with pyplot?

Expected Output

Equally responsive plotting as with pyplot.plot()

Output of pd.show_versions()

INSTALLED VERSIONS

commit : None
python : 3.7.6.final.0
python-bits : 64
OS : Windows
OS-release : 10
machine : AMD64
processor : Intel64 Family 6 Model 78 Stepping 3, GenuineIntel
byteorder : little
LC_ALL : None
LANG : en
LOCALE : None.None

pandas : 0.25.3
numpy : 1.17.4
pytz : 2019.3
dateutil : 2.8.1
pip : 19.3.1
setuptools : 44.0.0.post20200106
Cython : 0.29.14
pytest : 5.3.2
hypothesis : 4.54.2
sphinx : 2.3.1
blosc : None
feather : None
xlsxwriter : 1.2.7
lxml.etree : 4.4.2
html5lib : 1.0.1
pymysql : None
psycopg2 : 2.8.4 (dt dec pq3 ext lo64)
jinja2 : 2.10.3
IPython : 7.11.1
pandas_datareader: None
bs4 : 4.8.2
bottleneck : 1.3.1
fastparquet : None
gcsfs : None
lxml.etree : 4.4.2
matplotlib : 3.1.1
numexpr : 2.7.0
odfpy : None
openpyxl : 3.0.2
pandas_gbq : None
pyarrow : None
pytables : None
s3fs : None
scipy : 1.3.2
sqlalchemy : 1.3.12
tables : 3.6.1
xarray : 0.14.1
xlrd : 1.2.0
xlwt : 1.3.0
xlsxwriter : 1.2.7

Edit: This performance issue might be related to #15071

@charlesdong1991
Copy link
Member

charlesdong1991 commented Jan 16, 2020

maybe you want pd.plotting.plot_params["x_compat"] = False? or df.plot(x_compat=False)?

@rhkarls
Copy link
Contributor Author

rhkarls commented Jan 16, 2020

maybe you want pd.plotting.plot_params["x_compat"] = False? or df.plot(x_compat=False)?

Thanks! Yes this (x_compat=True) does provide a workaround when needing more responsive interaction for single axis plots for suppressing the tick adjustments. It does not really work that well with subplots and sharex plots. For example, plotting two series with the same index on two different axis with sharex=True and x_compat= True results in very messy ticks:
image
The same sharex=True subplot calling pyplot.plot() does not have that problem. I can post example code showing this, but it feels a bit off topic to this issue.

@jbrockmendel jbrockmendel added the Visualization plotting label Feb 25, 2020
jgehrcke added a commit to jgehrcke/ci-analysis that referenced this issue Dec 7, 2020
There's a lot of magic going on between how the datetime64 values
actually encode datetime in plots. Sharing an axis across (sub)plots is
brittle w.r.t. these differences. Work around this, here: make it so that
individual timestamps have a non-zero value for seconds, by simply adding
one second, shifting the whole data set by one second to the left. That
prevents, I guess, an optimization to hit in which would see that
individual timestamps hit the full hour or integer  multiples of 30 or 15
minutes. Also see

 pandas-dev/pandas#15874
 pandas-dev/pandas#15071
 pandas-dev/pandas#31074
 pandas-dev/pandas#29705
 pandas-dev/pandas#29719
 pandas-dev/pandas#18571
 pandas-dev/pandas#11574
 pandas-dev/pandas#22586
@mroeschke mroeschke added the Performance Memory or execution speed performance label Jul 27, 2021
@mroeschke
Copy link
Member

I think this is a duplicate (core issue) of #10578 so closing in favor of that issue

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Performance Memory or execution speed performance Visualization plotting
Projects
None yet
Development

No branches or pull requests

4 participants