-
-
Notifications
You must be signed in to change notification settings - Fork 18.4k
REGR: Bug fix for ExtensionArray groupby aggregation on non-numeric types #38982
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
REGR: Bug fix for ExtensionArray groupby aggregation on non-numeric types #38982
Conversation
BryanCutler
commented
Jan 5, 2021
•
edited
Loading
edited
- closes REGR: ExtensionArray aggregation on non-numeric types fails #38980
- tests added / passed
- Ensure all linting tests pass, see here for how to run them
- whatsnew entry
Exception handling that checks for this particular message in order to fallback to non-cython impl is here https://github.com/pandas-dev/pandas/blob/master/pandas/core/groupby/groupby.py#L1047 I can work on adding a unit test for this |
need tests |
@BryanCutler this is a small example that I was testing that worked before and is failing now, so you can use that as a base for writing tests if you like
Specifying Also with the decimal test extension type that is included in our tests for such testing purposes, it can be reproduced:
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yep pls add tests
Thanks for the code samples @jorisvandenbossche. I tried to add a base extension test to cover more types and looks good for all except decimal and json, which give the following error:
Not sure if it's a problem with wrapping back up the extension array, or maybe these tests need a specialization. I'll look into it a little later.. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
looks like failing on 32-bit
e3e02a4
to
ec4bdc3
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks! Two small suggestions
result = df.groupby("A").agg({"B": "first"}).B.array | ||
|
||
expected = df["B"].iloc[[0, 2, 4, 7]].array | ||
|
||
self.assert_extension_array_equal(result, expected) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
result = df.groupby("A").agg({"B": "first"}).B.array | |
expected = df["B"].iloc[[0, 2, 4, 7]].array | |
self.assert_extension_array_equal(result, expected) | |
expected = df["B"].iloc[[0, 2, 4, 7]].array | |
result = df.groupby("A").agg({"B": "first"}).B.array | |
self.assert_extension_array_equal(result, expected) | |
result = df.groupby("A").agg("first").B.array | |
self.assert_extension_array_equal(result, expected) | |
result = df.groupby("A").first().B.array | |
self.assert_extension_array_equal(result, expected) |
Those different ways apparently take a somewhat different code path, given that the two added ones actually still worked (but so worth testing those different ways)
dd77772
to
6ba43dc
Compare
@jreback I addressed the feeback and tests have passed. Please take another look when you can, thanks! |
thanks @BryanCutler very nice! |
@meeseeksdev backport 1.2.x |
…y aggregation on non-numeric types
Thanks @BryanCutler ! |
…ion on non-numeric types (#39145) Co-authored-by: Bryan Cutler <[email protected]>
Thank you all for reviewing! |