Memory Performance Regression in 0.13+ #6329

dhirschfeld · 2014-02-12T12:51:45Z

The following snippit runs fine in pandas 0.12 on a machine with 8GB of RAM but throws a MemoryError on a machine with 16GB of RAM when using pandas 0.13

date_index = pd.date_range('01-Jan-2012', '23-Jan-2013', freq='T')
daily_dates = date_index.to_period('D').to_timestamp('S','S')
fracofday = date_index.view(np.ndarray) - daily_dates.view(np.ndarray)
fracofday = fracofday.astype('timedelta64[ns]').astype(np.float64)/864e11
fracofday = pd.TimeSeries(fracofday, daily_dates)
index = pd.date_range(date_index.min().to_period('A').to_timestamp('D','S'),
                      date_index.max().to_period('A').to_timestamp('D','E'),
                      freq='D')
temp = pd.TimeSeries(1.0, index)
fracofday *= temp[fracofday.index]

The text was updated successfully, but these errors were encountered:

jreback · 2014-02-12T13:26:53Z

this is a 'bug' that surfaced because of the join of the non-unique indexes even though they are equal is being done.....should be straightforward to fix (and that's why the memory is blowing up is a n^2 memory operation i think)

you can also do this in a frame, and would be better IMHO (e.g. guarantee to have a unique index if they values are the columns).

non-unique is very tricky and problematic...so avoid if possible

dhirschfeld · 2014-02-12T14:15:51Z

Thanks for having a look.

I'm not sure how I could reformulate it as a DataFrame. I want to multiply each day (1440 minutes/values) in fracofday by the corresponding daily value in temp such that each temp value is repeated 1440 times.

Since I don't actually care about the indexes after the temp array has been aligned with fracofday (by reindexing) I can work around it by dropping down to numpy.

jreback · 2014-02-12T14:20:14Z

actually you could just make this a resample I think is much easier / better

jreback · 2014-02-12T14:36:53Z

(the assignment doesn't matter in this case for testing)

see #6330 you could make the change in your local copy if you want...all python code and pretty small

In [9]: %timeit fracofday * temp[fracofday.index]
100 loops, best of 3: 6.97 ms per loop

In [10]: %memit fracofday * temp[fracofday.index]
maximum of 1: 82.671875 MB per loop

jreback added Bug labels Feb 12, 2014

jreback added this to the 0.14.0 milestone Feb 12, 2014

jreback mentioned this issue Feb 12, 2014

BUG: Regression in join of non_unique_indexes (GH6329) #6330

Merged

jreback closed this as completed in #6330 Feb 12, 2014

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Memory Performance Regression in 0.13+ #6329

Memory Performance Regression in 0.13+ #6329

dhirschfeld commented Feb 12, 2014

jreback commented Feb 12, 2014

dhirschfeld commented Feb 12, 2014

jreback commented Feb 12, 2014

jreback commented Feb 12, 2014

Memory Performance Regression in 0.13+ #6329

Memory Performance Regression in 0.13+ #6329

Comments

dhirschfeld commented Feb 12, 2014

jreback commented Feb 12, 2014

dhirschfeld commented Feb 12, 2014

jreback commented Feb 12, 2014

jreback commented Feb 12, 2014