Skip to content

Suggestion: inplace=True option in drop() and dropna() would really help #2325

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
bluefir opened this issue Nov 22, 2012 · 6 comments
Closed
Labels
Duplicate Report Duplicate issue or pull request Enhancement

Comments

@bluefir
Copy link

bluefir commented Nov 22, 2012

You can have a large frame in memory and want to drop a small percentage of labels with NA data just to clean things up. If you use drop() or dropna(), you will get a new object and effectively double the memory footprint. Favoring immutability is great, but under memory constraints this can be a killer.

@wesm
Copy link
Member

wesm commented Nov 22, 2012

Agreed-- avoiding the "2x problem" is actually pretty tricky with NumPy arrays under the hood. I have some ideas but it won't be too simple

@bluefir
Copy link
Author

bluefir commented Nov 23, 2012

How about lazy deletion? Have an object attached to an Index that holds a sequence of index positions that were dropped. Most pandas objects will have it as None and all the current interfaces will work fine. Checking that it is None is also quite cheap I believe. When some labels are dropped in place, no data is actually deleted, but no data is created either. It only takes effect on all data retrieval (take?), including new object being created from the data. In addition, it will follow the spirit of data immutability because you can always "undrop".

@wesm
Copy link
Member

wesm commented Nov 26, 2012

What you're describing would be really great (and definitely something I've thought about), but it's a very large and difficult problem and not something that could be easily bolted on (I don't think).

@bluefir
Copy link
Author

bluefir commented Nov 27, 2012

You certainly know much better than I do! :-) pandas is a great product and it's very fast. That being said, people who want to save on memory should be willing to take a hit, even a significant one, on performance along the classic time-versus-space tradeoff. Memory bottlenecks, just as performance bottlenecks, are usually concentrated in just a few places in the code. If you consider not being that hard core on performance with inplace drops as you are in other areas, it might make it less difficult to implement. Just my two cents. Not another word from me on this topic, I promise :-)

@bburan
Copy link

bburan commented Jul 2, 2013

Possible duplicate of #1960?

@jreback
Copy link
Contributor

jreback commented Jul 2, 2013

yep...closing this one (as its the later one)....

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Duplicate Report Duplicate issue or pull request Enhancement
Projects
None yet
Development

No branches or pull requests

4 participants