-
-
Notifications
You must be signed in to change notification settings - Fork 18.4k
Suggestion: inplace=True option in drop() and dropna() would really help #2325
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
Agreed-- avoiding the "2x problem" is actually pretty tricky with NumPy arrays under the hood. I have some ideas but it won't be too simple |
How about lazy deletion? Have an object attached to an Index that holds a sequence of index positions that were dropped. Most pandas objects will have it as None and all the current interfaces will work fine. Checking that it is None is also quite cheap I believe. When some labels are dropped in place, no data is actually deleted, but no data is created either. It only takes effect on all data retrieval (take?), including new object being created from the data. In addition, it will follow the spirit of data immutability because you can always "undrop". |
What you're describing would be really great (and definitely something I've thought about), but it's a very large and difficult problem and not something that could be easily bolted on (I don't think). |
You certainly know much better than I do! :-) pandas is a great product and it's very fast. That being said, people who want to save on memory should be willing to take a hit, even a significant one, on performance along the classic time-versus-space tradeoff. Memory bottlenecks, just as performance bottlenecks, are usually concentrated in just a few places in the code. If you consider not being that hard core on performance with inplace drops as you are in other areas, it might make it less difficult to implement. Just my two cents. Not another word from me on this topic, I promise :-) |
Possible duplicate of #1960? |
yep...closing this one (as its the later one).... |
You can have a large frame in memory and want to drop a small percentage of labels with NA data just to clean things up. If you use drop() or dropna(), you will get a new object and effectively double the memory footprint. Favoring immutability is great, but under memory constraints this can be a killer.
The text was updated successfully, but these errors were encountered: