-
-
Notifications
You must be signed in to change notification settings - Fork 18.4k
ENH: drop_duplicates(consecutive=True) to drop only consecutive duplicates #10540
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
For numeric data it is probably fastest to use |
You could find the duplicates using |
You can do:
I prefer this to be on cookbook rather than new method / options as user may want more flexible, e.g. drop 3 consective. |
Any updates on this? I think it's a good idea, not sure if it makes more sense to extend |
DataFrame.drop_duplicates can be useful to 'sparsify' a frame, requiring less
memory/storage. However, it doesn't handle the case where a value later reverts
to an earlier value. For example:
Would be ideal to be able to do something like:
This should also be a much faster operation, since you only have to compare each
row with its successor, rather with all other rows.
You can achieve something like this with some shift trickery:
But this is somewhat cumbersome, and allocating the intermediate shifted
frame can be slow (particularly if done via a groupby with a lot of groups).
The text was updated successfully, but these errors were encountered: