CLN: _wrap_applied_output #35412

rhshadrach · 2020-07-25T20:03:20Z

The majority of this change is the result of three operations:

Move code that returns up top, to avoid the heavily nested structure.
Move variable creation as close as possible to where they are used. Previously, in certain cases computations were being done and then going unused.
Combine duplicated code.

There are two sections that I was able to entirely remove:

ping = self.grouper.groupings[0]
if len(keys) == ping.ngroups:
    key_index = ping.group_index
    key_index.name = key_names[0]

    key_lookup = Index(keys)
    indexer = key_lookup.get_indexer(key_index)

    # reorder the values
    values = [values[i] for i in indexer]

    # update due to the potential reorder
    first_not_none = next(com.not_none(*values), None)

and

# GH5788 instead of stacking; concat gets the
# dtypes correct
from pandas.core.reshape.concat import concat

result = concat(
    values,
    keys=key_index,
    names=key_index.names,
    axis=self.axis,
).unstack()
result.columns = index

For the single test that I touched, it was checking that the categorical dtype of an index was being dropped after a groupby. I don't believe that is the correct behavior - that the categorical dtype should remain. I checked groupby/categorical bugs and didn't find any issues this closes.

WillAyd · 2020-08-04T15:23:39Z

pandas/core/groupby/generic.py

-                    # update due to the potential reorder
-                    first_not_none = next(com.not_none(*values), None)
-                else:
+        if not isinstance(v, (np.ndarray, Index, Series)):


Can this just be an else statement? Or are there more types we handle than these + NDFrame?

I think you're suggesting something like:

if isinstance(v, (np.ndarray, Index, Series)): ... else: ...

The reason I have opted not to do this is that the if-block is exceedingly long, whereas the else-block is quite short. Doing it this way would result in a more nested rather than flat structure.

Ack - sorry, I see what you're saying now. Ignore my previous response, will investigate.

…ply_output_cleanup

WillAyd

Thanks for refactoring this - these code paths as I'm sure you've noticed are thorny, so cleanups here are much appreciated

pandas/core/indexes/api.py

WillAyd · 2020-08-04T23:09:42Z

pandas/tests/groupby/test_apply.py

@@ -868,13 +868,15 @@ def test_apply_multi_level_name(category):
    b = [1, 2] * 5
    if category:
        b = pd.Categorical(b, categories=[1, 2, 3])
+        expected_index = pd.CategoricalIndex([1, 2], categories=[1, 2, 3], name="B")


Is there an open issue for this?

I don't believe so. There is only one issue tagged with categorical, groupby, and apply which is not relevant. I also took a look through those tagged as categorical and groupby and didn't see anything either.

Can you add a whatsnew for this? Something along the lines of groupby apply will now maintain a CategoricalIndex (assuming that is now the case)

rhshadrach · 2020-08-14T15:48:36Z

Codecov is finding a lot in this method that isn't hit. I'm going to see what tests can be added or branches that can be removed.

WillAyd

Cool I think this is a nice refactor @jreback

WillAyd · 2020-08-18T14:36:41Z

pandas/tests/groupby/test_apply.py

@@ -868,13 +868,15 @@ def test_apply_multi_level_name(category):
    b = [1, 2] * 5
    if category:
        b = pd.Categorical(b, categories=[1, 2, 3])
+        expected_index = pd.CategoricalIndex([1, 2], categories=[1, 2, 3], name="B")


Can you add a whatsnew for this? Something along the lines of groupby apply will now maintain a CategoricalIndex (assuming that is now the case)

jbrockmendel · 2020-08-18T17:55:22Z

The majority of this change is the result of three operations:

To the extent that you can split this into independent pieces, it will be easier to review.

CLN: Clean/Simplify _wrap_applied_output

aa02b83

rhshadrach added Apply Apply, Aggregate, Transform, Map Clean Groupby labels Jul 30, 2020

WillAyd reviewed Aug 4, 2020

View reviewed changes

rhshadrach added 2 commits August 4, 2020 18:04

Refactored if-else.

21e1fca

Merge branch 'master' of https://github.com/pandas-dev/pandas into ap…

0e16de3

…ply_output_cleanup

WillAyd reviewed Aug 4, 2020

View reviewed changes

Merge remote-tracking branch 'upstream/master' into apply_output_cleanup

bb032ea

rhshadrach marked this pull request as draft August 14, 2020 15:48

rhshadrach force-pushed the apply_output_cleanup branch 2 times, most recently from c871a9f to 5efa563 Compare August 15, 2020 12:35

Reworked logic for non-NDFrame cases

059405c

rhshadrach force-pushed the apply_output_cleanup branch from 5efa563 to 059405c Compare August 17, 2020 20:21

WillAyd reviewed Aug 18, 2020

View reviewed changes

rhshadrach closed this Aug 18, 2020

rhshadrach mentioned this pull request Aug 18, 2020

CLN/BUG: Clean/Simplify _wrap_applied_output #35792

Merged

5 tasks

rhshadrach mentioned this pull request Sep 1, 2020

CLN: _wrap_applied_output #36053

Merged

5 tasks

rhshadrach deleted the apply_output_cleanup branch September 10, 2020 00:07

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

CLN: _wrap_applied_output #35412

CLN: _wrap_applied_output #35412

rhshadrach commented Jul 25, 2020

WillAyd Aug 4, 2020

rhshadrach Aug 4, 2020

rhshadrach Aug 4, 2020

WillAyd left a comment

WillAyd Aug 4, 2020

rhshadrach Aug 4, 2020

WillAyd Aug 18, 2020

rhshadrach commented Aug 14, 2020

WillAyd left a comment

WillAyd Aug 18, 2020

jbrockmendel commented Aug 18, 2020

CLN: _wrap_applied_output #35412

CLN: _wrap_applied_output #35412

Conversation

rhshadrach commented Jul 25, 2020

WillAyd Aug 4, 2020

Choose a reason for hiding this comment

rhshadrach Aug 4, 2020

Choose a reason for hiding this comment

rhshadrach Aug 4, 2020

Choose a reason for hiding this comment

WillAyd left a comment

Choose a reason for hiding this comment

WillAyd Aug 4, 2020

Choose a reason for hiding this comment

rhshadrach Aug 4, 2020

Choose a reason for hiding this comment

WillAyd Aug 18, 2020

Choose a reason for hiding this comment

rhshadrach commented Aug 14, 2020

WillAyd left a comment

Choose a reason for hiding this comment

WillAyd Aug 18, 2020

Choose a reason for hiding this comment

jbrockmendel commented Aug 18, 2020