WIP/BUG: Correct results for groupby(...).transform with null keys #45839

rhshadrach · 2022-02-05T19:06:54Z

closes Ambiguous behaviour when transform groupby with NaNs #17093 (Replace xxxx with the Github issue number)
Tests added and passed if fixing a bug or adding a new feature
All code checks passed.
Added an entry in the latest doc/source/whatsnew/vX.X.X.rst file if fixing a bug or adding a new feature.

Fixes a lot of issues with groupby(...).transform when there are null values in the groups. Still needs tests for SeriesGroupBy. Happy to break this up if requested, but I'm stuck on one part; see comment below.

Note this changes some tested behavior, but it changes it to conform to what is specified in the docs from DataFrameGroupBy.transform

Call function producing a like-indexed DataFrame on each group and return a DataFrame having the same indexes as the original object filled with the transformed values.

…sform_dropna

rhshadrach · 2022-02-05T19:12:44Z

pandas/core/groupby/groupby.py

+        # return [self.indices.get(name, []) for name in names]
+        # self.indices is a dict and doesn't handle looking up nulls in the groups
+        from pandas import (
+            Index,
+            Series,
+        )
+
+        index = Index(self.indices.keys(), tupleize_cols=False)
+        indices = Series(self.indices.values(), index=index)
+        result = [indices[name] for name in names]
+        return result


Here self.indices is a dict mapping the name to the indices it occurs at. Problem is when there are null values for the name (e.g. np.nan, (1, np.nan), etc). Here the lookup fails, and in general one shouldn't have null values as keys in a dict. I know Series gets around this using codes, hence the solution above.

The need to separate out the Index carefully is to handle odd cases like where self.indices is {(1,): np.array([0, 2]), (1, 2): np.array([1, 3])}. This is the particular case for a single test.

cc @jbrockmendel @phofl if there are any ideas to solve this in some more direct way rather than using a Series.

Here the lookup fails, and in general one shouldn't have null values as keys in a dict

Not sure if this is relevant, but possibly related to #43943?

i think in index.pyx we have some code for canonicalizing NA values, e.g. float("nan") -> np.nan, so that dict lookups are better-behaved. could be adapted/reused when defining indices?

Thanks - this is precisely what I was looking for. However, I found a simpler way to use the codes directly; I've put up #45953.

jbrockmendel · 2022-02-06T20:17:27Z

pandas/core/groupby/groupby.py

+            Series,
+        )
+
+        index = Index(self.indices.keys(), tupleize_cols=False)


can you add a comment about why this is two lines instead of just indices = Series(self.indices)? i assume its to avoid a MultiIndex

…sform_dropna

rhshadrach added 2 commits February 5, 2022 13:59

BUG: Correct results for groupby(...).transform with null keys

2143c59

Merge branch 'main' of https://github.com/pandas-dev/pandas into tran…

deb3edb

…sform_dropna

rhshadrach added Bug Groupby Apply Apply, Aggregate, Transform, Map API - Consistency Internal Consistency of API/Behavior labels Feb 5, 2022

rhshadrach commented Feb 5, 2022

View reviewed changes

jbrockmendel reviewed Feb 6, 2022

View reviewed changes

rhshadrach added 2 commits February 10, 2022 16:46

Remove reliance on Series

ad7bf3e

Merge branch 'main' of https://github.com/pandas-dev/pandas into tran…

39e1438

…sform_dropna

rhshadrach mentioned this pull request Feb 12, 2022

BUG: Fix some cases of groupby(...).transform with dropna=True #45953

Merged

3 tasks

rhshadrach closed this Feb 12, 2022

rhshadrach mentioned this pull request Mar 3, 2022

BUG: Fix some cases of groupby(...).transform with dropna=True #46209

Merged

3 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

WIP/BUG: Correct results for groupby(...).transform with null keys #45839

WIP/BUG: Correct results for groupby(...).transform with null keys #45839

rhshadrach commented Feb 5, 2022

rhshadrach Feb 5, 2022

jbrockmendel Feb 6, 2022

jbrockmendel Feb 6, 2022

rhshadrach Feb 12, 2022

jbrockmendel Feb 6, 2022

WIP/BUG: Correct results for groupby(...).transform with null keys #45839

WIP/BUG: Correct results for groupby(...).transform with null keys #45839

Conversation

rhshadrach commented Feb 5, 2022

rhshadrach Feb 5, 2022

Choose a reason for hiding this comment

jbrockmendel Feb 6, 2022

Choose a reason for hiding this comment

jbrockmendel Feb 6, 2022

Choose a reason for hiding this comment

rhshadrach Feb 12, 2022

Choose a reason for hiding this comment

jbrockmendel Feb 6, 2022

Choose a reason for hiding this comment