auto">I'm very often working with df.groupby.apply(), and there are many confusing (sometimes wrong) aspects about the behaviour of the output, particularly regarding what happens with the index of the output. v.0.23 cleaned up big parts of the apply API, but there's still a lot left...

Ideally, I wish there'd be a sort of matrix (not necessarily in the following form) in the documentation - and implemented by the API - along the following lines

For as_index=True:

function output   |  result type  |  (multi-)index levels |  groupby-cols  |  columns
--------------------------------------------------------------------------------------------
scalar            |    Series     |    groupby-columns    |      n/a       |  none
Series            |   DataFrame   |    groupby-columns    |     dropped    |  index (union) of Series
DataFrame         |   DataFrame   |   gb-cols + df.index  |     dropped    |  columns (union) of DFs
np.ndarray 1-dim  |   DataFrame   |  to dicuss / raise ?  |      n/a       |  to dicuss / raise ?
np.ndarray 2-dim  |   DataFrame   |  to dicuss / raise ?  |      n/a       |  to dicuss / raise ?
Index             |  MultiIndex?  |   gb-cols + output    |      n/a       |  n/a

For as_index=False:

function output   |  result type  |  (multi-)index levels |  groupby-cols  |  columns
--------------------------------------------------------------------------------------------
scalar            |   DataFrame?  |      RangeIndex       |      n/a       |  gb-cols + output?
Series            |   DataFrame   |      RangeIndex       |      kept      |  gb-cols + index of Series?
DataFrame         |   DataFrame   |  to dicuss / raise ?  |      kept      |  gb-cols + columns of DFs
np.ndarray 1-dim  |   DataFrame   |  to dicuss / raise ?  |      n/a       |  to dicuss / raise ?
np.ndarray 2-dim  |   DataFrame   |  to dicuss / raise ?  |      n/a       |  to dicuss / raise ?
Index             |    Series?    |  to dicuss / raise ?  |      n/a       |  n/a

Currently, the behaviour is much, much more complicated / inconsistent / wrong. I'm trying to fill corresponding tables with the current behaviour and some issue xrefs, but it's by far not complete yet:

For as_index=True:

function output   |  result type  |  (multi-)index levels |  groupby-cols  |  columns
--------------------------------------------------------------------------------------------
scalar            |    Series     |    groupby-columns    |      n/a       |  none
Series (same idx) |   DataFrame   |    groupby-columns    |     kept?!     |  index of Series
Series (diff idx) |    Series?!   |  gb-cols + output.idx |      n/a       |  none?!
group as-is       |   DataFrame   |    original index?!   |     kept?!     |  original columns
group selection   |   DataFrame   |  gb-cols + output.idx |     kept?!     |  original columns
DataFrame         |   DataFrame   |  gb-cols + output.idx |      n/a       |  columns (union) of DFs
np.ndarray 1-dim  |    Series?!   |   groupby-columns     |      n/a       |  none
np.ndarray 2-dim  |    Series?!   |   groupby-columns     |      n/a       |  none
Index             |    Series?!   |   groupby-columns     |      n/a       |  none #22541

For as_index=False:

function output   |  result type  |  (multi-)index levels |  groupby-cols  |  columns
--------------------------------------------------------------------------------------------
scalar            |    Series     |      RangeIndex       |      n/a       |  none
Series (same idx) |   DataFrame   |      RangeIndex       |     kept       |  index of Series
Series (diff idx) |    Series?!   | RngIdx + output.idx?! |      n/a       |  none?!
group as-is       |   DataFrame   |    original index?!   |     kept       |  original columns
group selection   |   DataFrame   | RngIdx + output.idx?! |     kept       |  original columns
DataFrame         |   DataFrame   | RngIdx + output.idx?! |      n/a       |  columns (union) of DFs
np.ndarray 1-dim  |    Series?!   |      RangeIndex       |      n/a       |  none
np.ndarray 2-dim  |    Series?!   |      RangeIndex       |      n/a       |  none
Index             |    Series?!   |      RangeIndex       |      n/a       |  none #22541

Some xrefs: #20420, #22541, #22542, #22546

Uh oh!

API/DOC: clean up DataFrame.groupby.apply #22545

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions