-
-
Notifications
You must be signed in to change notification settings - Fork 18.4k
ENH: A new GroupBy method to slice rows preserving index and order #42864
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
.nth already does this |
As I pointed out, grouped.nth() does not preserve the order or index of the original df, particularly if it has a multiindex. So it is not compaible with head() or tail() and it takes a list rather than a slice argument. Also, the speed of grouped.nth(list_of_ints) grows with the length of the list. It is over 10 times slower than alternative 5 when working with a large slice. |
I appreciate that this has not yet been triaged, but I can propose a solution for GroupBy.iloc that addresses this issue. So I would like to take this. |
ok would be ok with .iloc as long as it's clear how this is different than nth head and tail - eg the usecases r clear in the docs and api |
take |
I have implemented #42947 and submitted a pull request. I'm not sure what happens next (this is my first contribution...) |
Update to my #42947. I decided that the syntax and behaviour of my index was too different from DataFrame.iloc to use the same name. I implemented it as GroupBy.rows. I do understand that we are trying to reduce attributes rather than add to them, but I believe that my code adds useful functionality that is not otherwise available. It also resolves multiple requests for GroupBy.head and tail to handle negative arguments. |
Is your feature request related to a problem?
Pandas provides DataFrameGroupBy.head() and tail(), which efficiently slice the beginning and end of each group while preserving the order and index. I would like to be able to do a general row slice with the same properties. DataFrame has head(), tail() and iloc that behave in a compatible way. There is no corresponding DataFrameGroupBy.iloc.
Describe the solution you'd like
Provide a new DataFrameGroupBy method to slice rows per group
API breaking implications
None
Describe alternatives you've considered
The following are existing ways to extract, say, the second and third entry of each group, assuming that there are a large number of rows in each group (~10000):
Additional context
There are three options:
The text was updated successfully, but these errors were encountered: