Plotting 128Hz timeseries crashes #20575

alexlouden · 2018-04-02T06:38:16Z

Code Sample

import numpy as np
import pandas as pd

start = 0
freq = 128
samples = freq * 10  # 10 seconds
data = np.random.random(samples)
index = pd.date_range(start, periods=samples, freq='{}S'.format(1/freq))
ts = pd.Series(data=data, index=index, name='High freq')
ts.plot()

Problem description

Plotting a 10 second, 128Hz timeseries uses up all my RAM then crashes. 100Hz works fine, 128/150Hz (6666666N) crashes.

Expected Output

Ideally, a plot of my data. I wouldn't expect plotting 1280 points to require > 100GB of RAM! 😄

Output of `pd.show_versions()`

INSTALLED VERSIONS ------------------ commit: None python: 3.6.4.final.0 python-bits: 64 OS: Darwin OS-release: 17.4.0 machine: x86_64 processor: i386 byteorder: little LC_ALL: en_US.UTF-8 LANG: en_US.UTF-8 LOCALE: en_US.UTF-8

pandas: 0.22.0
pytest: None
pip: 9.0.3
setuptools: 39.0.1
Cython: None
numpy: 1.14.0
scipy: 1.0.1
pyarrow: None
xarray: None
IPython: 6.2.1
matplotlib: 2.2.2

The text was updated successfully, but these errors were encountered:

jorisvandenbossche · 2018-04-02T06:49:06Z

For me it luckily gives directly a MemoryError without crashing ...

The reason for the large memory, is because it is trying to generate a huge arange:

-> 1110         data = np.arange(start.ordinal, end.ordinal + 1, mult, dtype=np.int64)
   1111 
   1112     return data, freq

ipdb> start.ordinal
-499609375
ipdb> end.ordinal
10491796875
ipdb> mult
1

What seems off in this case is the mult of 1. The reason it needs to work in nano-second range is because of the 'strange' frequency:

In [64]: ts.index.freq
Out[64]: <7812500 * Nanos>

But, then it should generate a range with a step of 7812500 instead of one. So somewhere in the plotting code, the multiplier for the freq is lost.

alexlouden · 2018-04-04T04:41:00Z

Thanks for the update, and for having a look into it!

fredrik-1 · 2018-04-04T08:25:59Z

I tried to look into this but I didn't get that far. I believe that it might be a feature of the handling of the time index (x-labels, ticklabels etc) that is implemented.

I don't think that it possibly that mult in
data = np.arange(start.ordinal, end.ordinal + 1, mult, dtype=np.int64)
can be 7812500. The actual frequency in the series is just not there in that part of the code. A frequency of 100 Hz also result in mult=1 and ten times longer data than necessary.

So I believe that the implementation might be correct but that it doesn't work in practice if it is necessary to use nano seconds in the calculations when the actual sampling period is much larger.

jorisvandenbossche · 2018-04-04T08:28:27Z

@fredrik-1 Thanks for looking into it. I think you are right that it currently is just due to how it is implemented: if the freq is given in nanos, the frequency used for the plot axis is 1 nano, regardless of the how many nanos the actual frequency is.
That has the consequence that the plotting does not work with a freq of such a huge number of nanos.

jorisvandenbossche · 2018-04-04T08:42:22Z

The question then is: how can this be solved?
We could look into whether it is possible to actually take the multiplier in the freq into account, but not sure what the consequences would be for that in general.

fredrik-1 · 2018-04-04T16:13:53Z

I tried to look a little more on the code.

The problem or feature seems to be that TimeSeries_DateLocator (I don't know from where it is called, implicitely from matplotlib?) seems to use a freq variable which is a simple string with only the nano information. freq is then changed to be a pd.tseries.offsets.Nano object with n=1. The pd.tseries.offsets.Nano object in the actual data has n=7812500

TomAugspurger · 2018-04-04T19:04:34Z

@fredrik-1 I haven't read this issue closely, but if it helps TimeSeries_DateLocator (

pandas/pandas/plotting/_converter.py

Line 981 in aa3fefc

class TimeSeries_DateLocator(Locator):

) is registered as a units converter with matplotlib.

I don't know from where it is called, implicitely from matplotlib

Correct. I believe it's every time the figure is draw (so if it's interactively modified, it's called each time).

fredrik-1 · 2018-04-06T08:02:36Z

I tried to debug some more. It seems that the actual frequency data (the 100 before milli for example) is thrown away several times in the code by the use of
freq=freq.rule_code

I also found that .to_period() doesn't work on data with a "special" frequency.

import pandas as pd
import numpy as np

freq = 10
samples = freq * 1
data = np.random.random(samples)
index1 = pd.date_range(0, periods=samples, freq='{}S'.format(1/freq))
ts1 = pd.Series(data=data, index=index1, name='High freq')
index2 = pd.period_range(2000, periods=samples, freq='{}S'.format(1/freq))
ts2 = pd.Series(data=data, index=index2, name='High freq')
print(ts1.index)
print(ts2.index)

works as expected but
ts1.to_period()
throws an error because "100L" is not in
"_offset_to_period_map" in \pandas\pandas_libs\tslibs\offsets.pyx

The first one below works. The second one works but the frequency in the actual index is not equal to freq in the freq object.

tsPeriod1=ts1.to_period(freq='100L')
print(tsPeriod1.index)
tsPeriod2=ts1.to_period(freq='L')
print(tsPeriod2.index)

Series with a DateTime or PeriodIndex with a frequency that is not equal to the standard frequencies, 1 second, 1 millisecond, 1 day etc don't seem to be supported in all functions.

TomAugspurger · 2018-04-12T19:42:17Z

Thanks for digging. Any suggestion for how to proceed?

…

On Fri, Apr 6, 2018 at 3:02 AM, fredrik-1 ***@***.***> wrote: I tried to debug some more. It seems that the actual frequency data (the 100 before milli for example) is thrown away several times in the code by the use of freq=freq.rule_code I also found that .to_period() doesn't work on data with a "special" frequency. import pandas as pd import numpy as np freq = 10 samples = freq * 1 data = np.random.random(samples) index1 = pd.date_range(0, periods=samples, freq='{}S'.format(1/freq)) ts1 = pd.Series(data=data, index=index1, name='High freq') index2 = pd.period_range(2000, periods=samples, freq='{}S'.format(1/freq)) ts2 = pd.Series(data=data, index=index2, name='High freq') print(ts1.index) print(ts2.index) works as expected but ts1.to_period() throws an error because "100L" is not in "_offset_to_period_map" in \pandas\pandas_libs\tslibs\offsets.pyx The first on below works. The second one works but the frequency in the actual index is not equal to freq in the freq object. tsPeriod1=ts1.to_period(freq='100L') print(tsPeriod1.index) tsPeriod2=ts1.to_period(freq='L') print(tsPeriod2.index) Series with a DateTime or PeriodIndex with a frequency that is not equal to the standard frequencies, 1 second, 1 millisecond, 1 day etc doesn't seem to be supported in all functions. — You are receiving this because you commented. Reply to this email directly, view it on GitHub <#20575 (comment)>, or mute the thread <https://github.com/notifications/unsubscribe-auth/ABQHIhSRBsI0Dl8jnQW2tjhjEsVwWgI6ks5tlyEigaJpZM4TDNTg> .

jorisvandenbossche added the Visualization plotting label Apr 2, 2018

mroeschke added the Bug label Jun 19, 2021

ketakopter mentioned this issue Jun 15, 2023

BUG: Plotting with DatetimeIndex with specific time step results in MemoryError #53684

Closed

3 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Plotting 128Hz timeseries crashes #20575

Plotting 128Hz timeseries crashes #20575

alexlouden commented Apr 2, 2018

jorisvandenbossche commented Apr 2, 2018

alexlouden commented Apr 4, 2018

fredrik-1 commented Apr 4, 2018

jorisvandenbossche commented Apr 4, 2018

jorisvandenbossche commented Apr 4, 2018

fredrik-1 commented Apr 4, 2018

TomAugspurger commented Apr 4, 2018

fredrik-1 commented Apr 6, 2018 •

edited

Loading

TomAugspurger commented Apr 12, 2018 via email

Plotting 128Hz timeseries crashes #20575

Plotting 128Hz timeseries crashes #20575

Comments

alexlouden commented Apr 2, 2018

Code Sample

Problem description

Expected Output

Output of pd.show_versions()

jorisvandenbossche commented Apr 2, 2018

alexlouden commented Apr 4, 2018

fredrik-1 commented Apr 4, 2018

jorisvandenbossche commented Apr 4, 2018

jorisvandenbossche commented Apr 4, 2018

fredrik-1 commented Apr 4, 2018

TomAugspurger commented Apr 4, 2018

fredrik-1 commented Apr 6, 2018 • edited Loading

TomAugspurger commented Apr 12, 2018 via email

Output of `pd.show_versions()`

fredrik-1 commented Apr 6, 2018 •

edited

Loading