ENH: A new GroupBy method to slice rows preserving index and order #42864

johnzangwill · 2021-08-03T12:47:30Z

Is your feature request related to a problem?

Pandas provides DataFrameGroupBy.head() and tail(), which efficiently slice the beginning and end of each group while preserving the order and index. I would like to be able to do a general row slice with the same properties. DataFrame has head(), tail() and iloc that behave in a compatible way. There is no corresponding DataFrameGroupBy.iloc.

Describe the solution you'd like

Provide a new DataFrameGroupBy method to slice rows per group

API breaking implications

None

Describe alternatives you've considered

The following are existing ways to extract, say, the second and third entry of each group, assuming that there are a large number of rows in each group (~10000):

grouped.apply(lambda x: x.iloc[1:3, :]) - Extremely slow. Does not preserve the order or indexing.
grouped.take([1, 2]) - Extremely slow. Does not preserve the order or indexing.
grouped.nth([1, 2]) - Quite fast for a small list. Does not preserve the order or indexing.
grouped.head(3).groupby('...').tail(2) - Quite fast. Does preserve index and ordering.
grouped._selected_obj[mask] where mask is built from grouped.cumcount() - Very fast. Does preserve index and ordering. But uses private attribute of DataFrameGroupBy and takes several lines of code.

Additional context

There are three options:

Add an option to an existing method to force it to preserve index and order. But take() is very slow and nth() is quite slow. Neither accept a slice argument, so a range list has to be provided.
Easiest: Add a new method taking a slice as an argument and implementing it as in 5 above.
Most logical and complete: Add a new iloc attribute analogous to DataFrame.iloc

jreback · 2021-08-03T12:48:45Z

.nth already does this

johnzangwill · 2021-08-03T12:55:33Z

As I pointed out, grouped.nth() does not preserve the order or index of the original df, particularly if it has a multiindex. So it is not compaible with head() or tail() and it takes a list rather than a slice argument.

Also, the speed of grouped.nth(list_of_ints) grows with the length of the list. It is over 10 times slower than alternative 5 when working with a large slice.

johnzangwill · 2021-08-05T13:34:23Z

I appreciate that this has not yet been triaged, but I can propose a solution for GroupBy.iloc that addresses this issue. So I would like to take this.

jreback · 2021-08-05T13:55:40Z

ok would be ok with .iloc as long as it's clear how this is different than nth head and tail - eg the usecases r clear in the docs and api

johnzangwill · 2021-08-05T15:36:42Z

take

johnzangwill · 2021-08-10T08:42:57Z

I have implemented #42947 and submitted a pull request. I'm not sure what happens next (this is my first contribution...)

johnzangwill · 2021-09-02T13:20:13Z

Update to my #42947. I decided that the syntax and behaviour of my index was too different from DataFrame.iloc to use the same name. I implemented it as GroupBy.rows. I do understand that we are trying to reduce attributes rather than add to them, but I believe that my code adds useful functionality that is not otherwise available. It also resolves multiple requests for GroupBy.head and tail to handle negative arguments.

johnzangwill added Enhancement Needs Triage Issue that has not been reviewed by a pandas team member labels Aug 3, 2021

rhshadrach added the Groupby label Aug 4, 2021

johnzangwill changed the title ~~ENH: A new DataFrameGroupBy method to slice rows preserving index and order~~ ENH: A new GroupBy method to slice rows preserving index and order Aug 4, 2021

github-actions bot assigned johnzangwill Aug 5, 2021

johnzangwill mentioned this issue Aug 10, 2021

ENH: A new GroupBy method to slice rows preserving index and order #42947

Merged

mroeschke added Indexing Related to indexing on series/frames, not to indexes themselves Needs Discussion Requires discussion from core team before further action and removed Needs Triage Issue that has not been reviewed by a pandas team member labels Aug 21, 2021

jreback added this to the 1.4 milestone Oct 15, 2021

jreback closed this as completed in #42947 Oct 15, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

ENH: A new GroupBy method to slice rows preserving index and order #42864

ENH: A new GroupBy method to slice rows preserving index and order #42864

johnzangwill commented Aug 3, 2021 •

edited

Loading

jreback commented Aug 3, 2021

johnzangwill commented Aug 3, 2021 •

edited

Loading

johnzangwill commented Aug 5, 2021

jreback commented Aug 5, 2021

johnzangwill commented Aug 5, 2021

johnzangwill commented Aug 10, 2021

johnzangwill commented Sep 2, 2021

ENH: A new GroupBy method to slice rows preserving index and order #42864

ENH: A new GroupBy method to slice rows preserving index and order #42864

Comments

johnzangwill commented Aug 3, 2021 • edited Loading

Is your feature request related to a problem?

Describe the solution you'd like

API breaking implications

Describe alternatives you've considered

Additional context

jreback commented Aug 3, 2021

johnzangwill commented Aug 3, 2021 • edited Loading

johnzangwill commented Aug 5, 2021

jreback commented Aug 5, 2021

johnzangwill commented Aug 5, 2021

johnzangwill commented Aug 10, 2021

johnzangwill commented Sep 2, 2021

johnzangwill commented Aug 3, 2021 •

edited

Loading

johnzangwill commented Aug 3, 2021 •

edited

Loading