-
-
Notifications
You must be signed in to change notification settings - Fork 18.4k
REF: avoid try/except in wrapping in cython_agg_blocks #38164
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
|
||
return result | ||
|
||
def py_fallback(bvalues: ArrayLike) -> ArrayLike: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
just a general note, inline functions are really hard to follow (of course in-line really lone if/else are even harder).....request would be to move this code to module level functions (later is totally fine)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yah, there are a few follow-ups in the works
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
(later is totally fine)
yah lets do it in a follow-up rather than clog the CI
…f-cython_agg_blocks
…f-cython_agg_blocks
…f-cython_agg_blocks
looks fine, ping on green. |
@jreback green |
# Cast back if feasible | ||
result = type(values)._from_sequence( | ||
result.ravel(), dtype=values.dtype | ||
) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why is this no longer needed?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
because its only relevant for Categorical, and never raises
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So the comment that values could be IntegerArray wasn't correct anymore?
And why is cast_agg_result
no longer be called with other extension arrays?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
IntegerArray is now handled correctly within _cython_operation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
OK, I see, thanks for the explanation
But so that means that for a generic EA (not implemented by ourselves), we won't try to cast back? (not sure if there are good use cases for that, but still)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't understand how we can avoid that in general.
I'm not clear what this is referring to.
That's potentially a regression? What's the reason for this? (now, I don't know how much of the groupby algos support object dtype, or would actually be faster than the python versions if they would)
We fall back to the SeriesGroupBy implementation so there shouldn't be a regression.
Many of these are ordinal-only, so using something like values_for_argsort is likely to be better than casting to object.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I was referring to your "i am trying to get rid of that altogether"
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I was referring to your "i am trying to get rid of that altogether"
Thanks.
For the functions that go through _cython_operation, there are only a handful of them and we can identify which are dtype-preserving and which are not. Some custom logic is in dtypes.cast.maybe_cast_result_dtype
, and the rest at this point is in _ea_wrap_cython_operation
. Basically anything that is "custom logic" represents a way we are treating pandas-internal EAs different from 3rd-party, which is something we want to avoid. Identifying and isolating that custom logic will make it easier to de-custom.
The only other place where the "try to cast back" that i am looking to get rid of altogether is used is in _aggregate_series_pure_python
, and #38254 removes that (buggy) usage.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't know if we want no custom handing of our own EAs at all?
For example what you wrote now in _ea_wrap_cython_operation
, are this things you want to further remove?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't know if we want no custom handing of our own EAs at all?
I mean, in the sense that we also want world peace and unlimited ice cream, yes. For the foreseeable future we're going to have some non-zero amount of custom handling.
For example what you wrote now in _ea_wrap_cython_operation, are this things you want to further remove?
In the short run I'm adding handling for FloatingArray and improving the handling for IntegerArray, so not removing at all.
No description provided.