Skip to content

BUG: Plotting with DatetimeIndex with specific time step results in MemoryError #53684

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
3 tasks done
ketakopter opened this issue Jun 15, 2023 · 5 comments
Closed
3 tasks done
Labels
Bug Needs Triage Issue that has not been reviewed by a pandas team member Visualization plotting

Comments

@ketakopter
Copy link

ketakopter commented Jun 15, 2023

Pandas version checks

  • I have checked that this issue has not already been reported.

  • I have confirmed this bug exists on the latest version of pandas.

  • I have confirmed this bug exists on the main branch of pandas.

Reproducible Example

import pandas as pd
import numpy as np

d = pd.DataFrame({'testcol': pd.Series([12, 13, 14], index=[np.datetime64('2023-06-12T13:59:57.223000000'), np.datetime64('2023-06-12T13:59:58.889333333'), np.datetime64('2023-06-12T14:00:00.555666666')])})

d.plot()

Issue Description

Plotting DateFrames with some specific timestamps give MemoryError:

MemoryError: Unable to allocate 24.8 GiB for an array with shape (3332666625,) and data type int64

Traceback (most recent call last):
  File "mypath/lib/python3.10/site-packages/matplotlib/backends/backend_qt.py", line 468, in _draw_idle
    self.draw()
  File "mypath/lib/python3.10/site-packages/matplotlib/backends/backend_agg.py", line 400, in draw
    self.figure.draw(self.renderer)
  File "mypath/lib/python3.10/site-packages/matplotlib/artist.py", line 95, in draw_wrapper
    result = draw(artist, renderer, *args, **kwargs)
  File "mypath/lib/python3.10/site-packages/matplotlib/artist.py", line 72, in draw_wrapper
    return draw(artist, renderer)
  File "mypath/lib/python3.10/site-packages/matplotlib/figure.py", line 3140, in draw
    mimage._draw_list_compositing_images(
  File "mypath/lib/python3.10/site-packages/matplotlib/image.py", line 131, in _draw_list_compositing_images
    a.draw(renderer)
  File "mypath/lib/python3.10/site-packages/matplotlib/artist.py", line 72, in draw_wrapper
    return draw(artist, renderer)
  File "mypath/lib/python3.10/site-packages/matplotlib/axes/_base.py", line 3064, in draw
    mimage._draw_list_compositing_images(
  File "mypath/lib/python3.10/site-packages/matplotlib/image.py", line 131, in _draw_list_compositing_images
    a.draw(renderer)
  File "mypath/lib/python3.10/site-packages/matplotlib/artist.py", line 72, in draw_wrapper
    return draw(artist, renderer)
  File "mypath/lib/python3.10/site-packages/matplotlib/axis.py", line 1376, in draw
    ticks_to_draw = self._update_ticks()
  File "mypath/lib/python3.10/site-packages/matplotlib/axis.py", line 1262, in _update_ticks
    major_locs = self.get_majorticklocs()
  File "mypath/lib/python3.10/site-packages/matplotlib/axis.py", line 1484, in get_majorticklocs
    return self.major.locator()
  File "mypath/lib/python3.10/site-packages/pandas/plotting/_matplotlib/converter.py", line 982, in __call__
    locs = self._get_default_locs(vmin, vmax)
  File "mypath/lib/python3.10/site-packages/pandas/plotting/_matplotlib/converter.py", line 962, in _get_default_locs
    self.plot_obj.date_axis_info = self.finder(vmin, vmax, self.freq)
  File "mypath/lib/python3.10/site-packages/pandas/plotting/_matplotlib/converter.py", line 581, in _daily_finder
    dates_ = period_range(start=vmin, end=vmax, freq=freq)
  File "mypath/lib/python3.10/site-packages/pandas/core/indexes/period.py", line 545, in period_range
    data, freq = PeriodArray._generate_range(start, end, periods, freq, fields={})
  File "mypath/lib/python3.10/site-packages/pandas/core/arrays/period.py", line 313, in _generate_range
    subarr, freq = _get_ordinal_range(start, end, periods, freq)
  File "mypath/lib/python3.10/site-packages/pandas/core/arrays/period.py", line 1079, in _get_ordinal_range
    data = np.arange(start.ordinal, end.ordinal + 1, mult, dtype=np.int64)
numpy.core._exceptions._ArrayMemoryError: Unable to allocate 24.8 GiB for an array with shape (3332666625,) and data type int64

I'm not sure why this is happening. The data is very small and should be no problem.

If I change some timestamp a bit, there is no error.

It might be related to issue #20575. There is no frequency information in this DataFrame, though.

Expected Behavior

It should plot the data.

Installed Versions

INSTALLED VERSIONS

commit : 965ceca
python : 3.10.8.final.0
python-bits : 64
OS : Linux
OS-release : 4.12.14-195-default
Version : #1 SMP Tue May 7 10:55:11 UTC 2019 (8fba516)
machine : x86_64
processor : x86_64
byteorder : little
LC_ALL : None
LANG : en_US.UTF-8
LOCALE : en_US.UTF-8

pandas : 2.0.2
numpy : 1.24.3
pytz : 2023.3
dateutil : 2.8.2
setuptools : 63.2.0
pip : 22.2.2
Cython : None
pytest : 7.3.1
hypothesis : None
sphinx : 7.0.1
blosc : None
feather : None
xlsxwriter : None
lxml.etree : 4.9.2
html5lib : None
pymysql : None
psycopg2 : None
jinja2 : 3.1.2
IPython : 8.13.2
pandas_datareader: None
bs4 : 4.12.2
bottleneck : None
brotli : None
fastparquet : None
fsspec : None
gcsfs : None
matplotlib : 3.7.1
numba : None
numexpr : None
odfpy : None
openpyxl : None
pandas_gbq : None
pyarrow : None
pyreadstat : None
pyxlsb : None
s3fs : None
scipy : 1.10.1
snappy : None
sqlalchemy : None
tables : None
tabulate : None
xarray : None
xlrd : 2.0.1
zstandard : None
tzdata : 2023.3
qtpy : None
pyqt5 : None

@ketakopter ketakopter added Bug Needs Triage Issue that has not been reviewed by a pandas team member labels Jun 15, 2023
@ketakopter
Copy link
Author

Does anyone know a workaround? I cannot have my application randomly failing depending on the input data.

@ketakopter
Copy link
Author

Just tested the development version. Bug still there.

INSTALLED VERSIONS ------------------ commit : 0bc16da python : 3.10.8.final.0 python-bits : 64 OS : Linux OS-release : 4.12.14-195-default Version : #1 SMP Tue May 7 10:55:11 UTC 2019 (8fba516) machine : x86_64 processor : x86_64 byteorder : little LC_ALL : None LANG : en_US.UTF-8 LOCALE : en_US.UTF-8

pandas : 2.1.0.dev0+977.g0bc16da1e5
numpy : 2.0.0.dev0+84.g828fba29e
pytz : 2023.3
dateutil : 2.8.2
setuptools : 63.2.0
pip : 22.2.2
Cython : None
pytest : None
hypothesis : None
sphinx : None
blosc : None
feather : None
xlsxwriter : None
lxml.etree : None
html5lib : None
pymysql : None
psycopg2 : None
jinja2 : None
IPython : 8.14.0
pandas_datareader: None
bs4 : None
bottleneck : None
brotli : None
fastparquet : None
fsspec : None
gcsfs : None
matplotlib : 3.7.1
numba : None
numexpr : None
odfpy : None
openpyxl : None
pandas_gbq : None
pyarrow : None
pyreadstat : None
pyxlsb : None
s3fs : None
scipy : None
snappy : None
sqlalchemy : None
tables : None
tabulate : None
xarray : None
xlrd : None
zstandard : None
tzdata : 2023.3
qtpy : None
pyqt5 : None

@XU-99
Copy link

XU-99 commented Jun 21, 2023

I also met this problem, and I tried testing with only one column of data, and it still reported this error.

Traceback (most recent call last):
SearchDbDataPatientMySQL()
File "D:\Simulation\DumpAnalaysis\src\search_db_data_patient_MySQL.py", line 85, in init
self.DB_data_processing()
File "D:\Simulation\DumpAnalaysis\src\search_db_data_patient_MySQL.py", line 181, in DB_data_processing
rob_data = rob_data.resample(self.resample_freq,closed='right').first() ## Resample if choise
File "C:\Users\junhao.xu\AppData\Local\Programs\Python\Python310\lib\site-packages\pandas\core\frame.py", line 10994, in resample
return super().resample(
File "C:\Users\junhao.xu\AppData\Local\Programs\Python\Python310\lib\site-packages\pandas\core\generic.py", line 8888, in resample
return get_resampler(
File "C:\Users\junhao.xu\AppData\Local\Programs\Python\Python310\lib\site-packages\pandas\core\resample.py", line 1523, in get_resampler
return tg._get_resampler(obj, kind=kind)
File "C:\Users\junhao.xu\AppData\Local\Programs\Python\Python310\lib\site-packages\pandas\core\resample.py", line 1687, in _get_resampler
return DatetimeIndexResampler(
File "C:\Users\junhao.xu\AppData\Local\Programs\Python\Python310\lib\site-packages\pandas\core\resample.py", line 169, in init
self.binner, self.grouper = self._get_binner()
File "C:\Users\junhao.xu\AppData\Local\Programs\Python\Python310\lib\site-packages\pandas\core\resample.py", line 231, in _get_binner
binner, bins, binlabels = self._get_binner_for_time()
File "C:\Users\junhao.xu\AppData\Local\Programs\Python\Python310\lib\site-packages\pandas\core\resample.py", line 1258, in _get_binner_for_time
return self._timegrouper._get_time_bins(self.ax)
File "C:\Users\junhao.xu\AppData\Local\Programs\Python\Python310\lib\site-packages\pandas\core\resample.py", line 1753, in _get_time_bins
binner = labels = date_range(
File "C:\Users\junhao.xu\AppData\Local\Programs\Python\Python310\lib\site-packages\pandas\core\indexes\datetimes.py", line 945, in date_range
dtarr = DatetimeArray._generate_range(
File "C:\Users\junhao.xu\AppData\Local\Programs\Python\Python310\lib\site-packages\pandas\core\arrays\datetimes.py", line 446, in _generate_range
i8values = generate_regular_range(start, end, periods, freq, unit=unit)
File "C:\Users\junhao.xu\AppData\Local\Programs\Python\Python310\lib\site-packages\pandas\core\arrays_ranges.py", line 84, in generate_regular_range
values = np.arange(b, e, stride, dtype=np.int64)
numpy.core._exceptions._ArrayMemoryError: Unable to allocate 126. GiB for an array with shape (16865492973,) and data type int64

@azjps
Copy link

azjps commented Dec 26, 2023

This is related to #52895 and is fixed by the same (pending) fix there, although there's likely variants of this problem that could still occur.

@mroeschke
Copy link
Member

Closing as an outcome of #52895

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Bug Needs Triage Issue that has not been reviewed by a pandas team member Visualization plotting
Projects
None yet
Development

No branches or pull requests

5 participants