Skip to content

API: SeriesGroupBy.product with numeric_only and empty non-numeric data #41291

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
jbrockmendel opened this issue May 3, 2021 · 2 comments · Fixed by #41706
Closed

API: SeriesGroupBy.product with numeric_only and empty non-numeric data #41291

jbrockmendel opened this issue May 3, 2021 · 2 comments · Fixed by #41706
Labels
API - Consistency Internal Consistency of API/Behavior API Design Bug Groupby Nuisance Columns Identifying/Dropping nuisance columns in reductions, groupby.add, DataFrame.apply
Milestone

Comments

@jbrockmendel
Copy link
Member

By default numeric_only=True. But when there are no numeric columns and we raise DataError, the fallback ignores numeric_only. So we end up getting wonky results.

Below I think that gb1 raising is correct, gb3 should raise, gb2 and gb4 should return empty DataFrames. cc @jreback @jorisvandenbossche @TomAugspurger @WillAyd

dti = pd.date_range("2016-01-01", periods=3)
ser = pd.Series(dti)
df = ser.to_frame()

gb1 = ser.groupby([0, 0, 0])
gb2 = df.groupby(gb1.grouper)
gb3 = ser[:0].groupby([])
gb4 = df[:0].groupby([])

>>> gb1.prod()
TypeError: 'DatetimeArray' does not implement reduction 'prod'

>>> gb2.prod()  # <-float64
    0
0 NaN

>>> gb3.prod()
Series([], dtype: datetime64[ns])

>>> gb4.prod()  # <- dt64
Empty DataFrame
Columns: [0]
Index: []
@jbrockmendel jbrockmendel added Bug Needs Triage Issue that has not been reviewed by a pandas team member API - Consistency Internal Consistency of API/Behavior API Design Groupby Nuisance Columns Identifying/Dropping nuisance columns in reductions, groupby.add, DataFrame.apply and removed Needs Triage Issue that has not been reviewed by a pandas team member labels May 3, 2021
@jorisvandenbossche
Copy link
Member

Below I think that gb1 raising is correct, gb3 should raise, gb2 and gb4 should return empty DataFrames

That sounds correct to me, on first sight.
This is also consistent with non-grouped Series/DataFrame (ser.prod() raises, df.prod() gives emtpy series)

@jbrockmendel
Copy link
Member Author

Fixing this segues into another issue: NDFrame reductions have numeric_only kwarg default to None, and for Series numeric_only=True (i.e. user explicitly passed it) raises NotImplementedError. GroupBy reductions have numeric_only default to True, don't have a None option.

So for SeriesGroupBy, if we match the Series behavior without changing the default kwarg, then SeriesGroupBy[dt64].prod() will raise NotImplementedError. I think better to change the default but (for now) only for SeriesGroupBy, so we only get NotImplementedError if numeric_only is explicitly passed, and TypeError otherwise.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
API - Consistency Internal Consistency of API/Behavior API Design Bug Groupby Nuisance Columns Identifying/Dropping nuisance columns in reductions, groupby.add, DataFrame.apply
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants