Skip to content

Memory Performance Regression in 0.13+ #6329

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
dhirschfeld opened this issue Feb 12, 2014 · 4 comments · Fixed by #6330
Closed

Memory Performance Regression in 0.13+ #6329

dhirschfeld opened this issue Feb 12, 2014 · 4 comments · Fixed by #6330
Labels
Bug Regression Functionality that used to work in a prior pandas version Reshaping Concat, Merge/Join, Stack/Unstack, Explode
Milestone

Comments

@dhirschfeld
Copy link
Contributor

The following snippit runs fine in pandas 0.12 on a machine with 8GB of RAM but throws a MemoryError on a machine with 16GB of RAM when using pandas 0.13

date_index = pd.date_range('01-Jan-2012', '23-Jan-2013', freq='T')
daily_dates = date_index.to_period('D').to_timestamp('S','S')
fracofday = date_index.view(np.ndarray) - daily_dates.view(np.ndarray)
fracofday = fracofday.astype('timedelta64[ns]').astype(np.float64)/864e11
fracofday = pd.TimeSeries(fracofday, daily_dates)
index = pd.date_range(date_index.min().to_period('A').to_timestamp('D','S'),
                      date_index.max().to_period('A').to_timestamp('D','E'),
                      freq='D')
temp = pd.TimeSeries(1.0, index)
fracofday *= temp[fracofday.index]
@jreback
Copy link
Contributor

jreback commented Feb 12, 2014

this is a 'bug' that surfaced because of the join of the non-unique indexes even though they are equal is being done.....should be straightforward to fix (and that's why the memory is blowing up is a n^2 memory operation i think)

you can also do this in a frame, and would be better IMHO (e.g. guarantee to have a unique index if they values are the columns).

non-unique is very tricky and problematic...so avoid if possible

@jreback jreback added this to the 0.14.0 milestone Feb 12, 2014
@dhirschfeld
Copy link
Contributor Author

Thanks for having a look.

I'm not sure how I could reformulate it as a DataFrame. I want to multiply each day (1440 minutes/values) in fracofday by the corresponding daily value in temp such that each temp value is repeated 1440 times.

Since I don't actually care about the indexes after the temp array has been aligned with fracofday (by reindexing) I can work around it by dropping down to numpy.

@jreback
Copy link
Contributor

jreback commented Feb 12, 2014

actually you could just make this a resample I think is much easier / better

@jreback
Copy link
Contributor

jreback commented Feb 12, 2014

(the assignment doesn't matter in this case for testing)

see #6330 you could make the change in your local copy if you want...all python code and pretty small

In [9]: %timeit fracofday * temp[fracofday.index]
100 loops, best of 3: 6.97 ms per loop

In [10]: %memit fracofday * temp[fracofday.index]
maximum of 1: 82.671875 MB per loop

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Bug Regression Functionality that used to work in a prior pandas version Reshaping Concat, Merge/Join, Stack/Unstack, Explode
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants