Skip to content

MemoryError when plotting two DataFrames, one with odd DatetimeIndex #10578

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
mgabel-stratoscale opened this issue Jul 15, 2015 · 7 comments
Labels
Performance Memory or execution speed performance Visualization plotting

Comments

@mgabel-stratoscale
Copy link

Was trying to upsample a DataFrame by non-integer amount, then compare the two. When trying to plot the second DF, pandas tries to allocate a lot of memory, and finally throws MemoryError after a few seconds.

Minimal working example

import pandas as pd

# Create data for 288 seconds
index = pd.date_range(start='2015-07-13 12:18:47', freq='S', periods=288)
df = pd.DataFrame(range(288), index=index)
# Upsample to 500 samples
td = (df.index[-1] - df.index[0])/499
# Pandas does not allow interpolation when upsampling, so resort to bfill :(
df2 = df.resample(td, fill_method='bfill')
# Let's compare them!
ax = df.plot()
df2.plot(ax=ax) # BOOM! Allocates too much memory and crashes

Using pandas 0.16.2 in WinPython 2.7.10.1 x64. pd.show_versions()

INSTALLED VERSIONS
------------------
commit: None
python: 2.7.10.final.0
python-bits: 64
OS: Windows
OS-release: 7
machine: AMD64
processor: Intel64 Family 6 Model 58 Stepping 9, GenuineIntel
byteorder: little
LC_ALL: None
LANG: en_US

pandas: 0.16.2
nose: 1.3.7
Cython: 0.22.1
numpy: 1.9.2
scipy: 0.15.1
statsmodels: None
IPython: 3.2.0
sphinx: 1.3.1
patsy: 0.3.0
dateutil: 2.4.2
pytz: 2015.4
bottleneck: None
tables: 3.2.0
numexpr: 2.4.3
matplotlib: 1.4.3
openpyxl: None
xlrd: 0.9.3
xlwt: None
xlsxwriter: 0.7.3
lxml: 3.4.4
bs4: 4.3.2
html5lib: None
httplib2: None
apiclient: None
sqlalchemy: 1.0.5
pymysql: None
psycopg2: None

(I should also note that upsamping to arbitrary time index seems hopeless in pandas, as it does not allow interpolation between values near the sampling point, but that's a different issue).

@mgabel-stratoscale mgabel-stratoscale changed the title MemoryError when plotting two DataFrames, one with DatetimeIndex MemoryError when plotting two DataFrames, one with odd DatetimeIndex Jul 15, 2015
@sinhrks sinhrks added the Visualization plotting label Jul 15, 2015
@sinhrks
Copy link
Member

sinhrks commented Jul 15, 2015

Related to #9814, but it can't handle this example because Period adjustment results in too tiny freq segments. Maybe we should have upper limit for `Period`` adjustment and fall back to normal datetime plotting.

In my environment, specifying x_compat=True shows plot without problem.

...
ax = df.plot(x_compat=True)
df2.plot(ax=ax, x_compat=True) 
plt.show()

@mgabel-stratoscale
Copy link
Author

I can confirm that x_compat=True works for me as a workaround.

@alealv
Copy link

alealv commented Jun 26, 2019

Hi, I'm having the same issue but doing a downsampling. Indeed the x_compat=True works, but it also works when saving the DataFrame to a file and loading it back again.

@rivasd
Copy link

rivasd commented Jul 19, 2019

I have the same issue on 0.24.2, I am trying to upsample and interpolate a DataFrame to a fairly high frequency (35Hz) from a nearby but irregular sampling rate. Calling plot on the original dataframe works but after interpolation using my_dataframe.resample("31.25ms").mean().interpolate(), plot causes an out of memory error.

using x_compat=True fixes the problem but is that the expected behavior?

@TomAugspurger
Copy link
Contributor

using x_compat=True fixes the problem but is that the expected behavior?

It's not expected that DataFrame.plot should error.

@rivasd
Copy link

rivasd commented Jul 19, 2019

Then should I start a new issue since my problem is not specific to DateTimeIndex as the original issue but seems to be caused by any Time Index that has a fast freq?

@TomAugspurger
Copy link
Contributor

TomAugspurger commented Jul 19, 2019 via email

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Performance Memory or execution speed performance Visualization plotting
Projects
None yet
Development

No branches or pull requests

6 participants