-
-
Notifications
You must be signed in to change notification settings - Fork 18.4k
add uint64 support for some libgroupby funcs #28931
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Can you add tests for these as well? Would be nice to have something parametrized in groupby |
I'll take a look and see if we have tests that get at these functions directly (as opposed to via df.groupby.whatever). I can attest that these are hit by the python code for which this adds a comment. |
Following this and #28934 one more obvious cleanup is to move all of groupby_helper into groupby.pyx, since we are no longer using the templating |
@WillAyd it looks like we don't have any dedicated testing that gets directly a the functions in this file. |
else: | ||
out[i, j] = NAN | ||
else: | ||
out[i, j] = resx[i, j] | ||
|
||
if runtime_error: | ||
# We cannot raise directly above because that is within a nogil |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
where are these caught?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this would get caught by one of the except Exception
s in the groupby code that i'm trying to make more specific.
Any major concerns? This is a blocker for getting at the |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think you can add to the parametrization of this test here:
pandas/pandas/tests/groupby/test_function.py
Line 394 in 18a9e4c
def test_groupby_non_arithmetic_agg_types(dtype, method, data): |
pandas/_libs/groupby_helper.pxi.in
Outdated
@@ -439,6 +466,9 @@ def group_max(groupby_t[:, :] out, | |||
# Note: evaluated at compile-time | |||
maxx[:] = -_int64_max | |||
nan_val = NPY_NAT | |||
elif groupby_t is uint64_t: | |||
maxx[:] = 0 | |||
nan_val = 0 # We better not need this! TODO: Can we use e.g. NULL? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you clarify what this is?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We have to define a value to treat as NA, and it has to be of the appropriate dtype. But for uint64 there is no such value, so defining it here is just as a placeholder.
I'll clarify the comment in the code.
@WillAyd added uint64 to this parametrization. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
lgtm @jreback
I think a whatsnew for function support would be nice as well |
Only user-facing effect should be perf, which I haven't measured yet. Would prefer to do measurements in one thorough pass after the last |
@@ -378,7 +378,7 @@ def test_median_empty_bins(observed): | |||
|
|||
|
|||
@pytest.mark.parametrize( | |||
"dtype", ["int8", "int16", "int32", "int64", "float32", "float64"] | |||
"dtype", ["int8", "int16", "int32", "int64", "float32", "float64", "uint64"] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
maybe use any_real_type fixture here (can be followup)
thanks |
cc @WillAyd I hope you agree the runtime_error thing here is easier to implement/review with fused type than it would be with tempita.