-
-
Notifications
You must be signed in to change notification settings - Fork 18.4k
CLN: _wrap_applied_output #35412
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
CLN: _wrap_applied_output #35412
Conversation
pandas/core/groupby/generic.py
Outdated
# update due to the potential reorder | ||
first_not_none = next(com.not_none(*values), None) | ||
else: | ||
if not isinstance(v, (np.ndarray, Index, Series)): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can this just be an else statement? Or are there more types we handle than these + NDFrame?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think you're suggesting something like:
if isinstance(v, (np.ndarray, Index, Series)):
...
else:
...
The reason I have opted not to do this is that the if-block is exceedingly long, whereas the else-block is quite short. Doing it this way would result in a more nested rather than flat structure.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ack - sorry, I see what you're saying now. Ignore my previous response, will investigate.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for refactoring this - these code paths as I'm sure you've noticed are thorny, so cleanups here are much appreciated
@@ -868,13 +868,15 @@ def test_apply_multi_level_name(category): | |||
b = [1, 2] * 5 | |||
if category: | |||
b = pd.Categorical(b, categories=[1, 2, 3]) | |||
expected_index = pd.CategoricalIndex([1, 2], categories=[1, 2, 3], name="B") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is there an open issue for this?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't believe so. There is only one issue tagged with categorical, groupby, and apply which is not relevant. I also took a look through those tagged as categorical and groupby and didn't see anything either.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you add a whatsnew for this? Something along the lines of groupby apply will now maintain a CategoricalIndex
(assuming that is now the case)
Codecov is finding a lot in this method that isn't hit. I'm going to see what tests can be added or branches that can be removed. |
c871a9f
to
5efa563
Compare
5efa563
to
059405c
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Cool I think this is a nice refactor @jreback
@@ -868,13 +868,15 @@ def test_apply_multi_level_name(category): | |||
b = [1, 2] * 5 | |||
if category: | |||
b = pd.Categorical(b, categories=[1, 2, 3]) | |||
expected_index = pd.CategoricalIndex([1, 2], categories=[1, 2, 3], name="B") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you add a whatsnew for this? Something along the lines of groupby apply will now maintain a CategoricalIndex
(assuming that is now the case)
To the extent that you can split this into independent pieces, it will be easier to review. |
The majority of this change is the result of three operations:
There are two sections that I was able to entirely remove:
and
For the single test that I touched, it was checking that the categorical dtype of an index was being dropped after a groupby. I don't believe that is the correct behavior - that the categorical dtype should remain. I checked groupby/categorical bugs and didn't find any issues this closes.