-
-
Notifications
You must be signed in to change notification settings - Fork 18.4k
ENH: Add skipna to groupby.first and groupby.last #57102
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
ENH: Add skipna to groupby.first and groupby.last #57102
Conversation
if is_extension_array_dtype(any_real_nullable_dtype): | ||
na_value = Series(dtype=any_real_nullable_dtype).dtype.na_value | ||
else: | ||
na_value = np.nan |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not sure if there is any better way to get the NA value for a dtype when the code needs to span NumPy and EAs (both masked and pyarrow). This is the reason why I went with the string aliases in the any_real_nullable_dtype
fixture for pyarrow; this is at odds with some of the other fixtures but the code to the NA value was much worse with pyarrow dtype objects.
cc @mroeschke if you have any suggestions
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Use can use pandas_dtype
to get a dtype object from the string and na_value_for_dtype
to get the na value from the dtype object
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Also I think we've been using isinstance(..., ExtensionDtype)
instead of is_extension_array_dtype
if possible
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Use can use pandas_dtype to get a dtype object from the string
Makes sense - but I'd still need to have ALL_REAL_NULLABLE_DTYPES
contain the string alias for pyarrow dtypes whereas most other lists of dtypes in _testing.__init__
use the pyarrow dtype objects. I just don't want to introduce an inconsistency here (pyarrow dtype objects vs string alias) if it's avoidable.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I just don't want to introduce an inconsistency here (pyarrow dtype objects vs string alias) if it's avoidable.
Yeah I don't think it's avoidable as of now, so I'm okay the way you have it in this PR
Thanks @rhshadrach |
…and groupby.last) (#57141) Backport PR #57102: ENH: Add skipna to groupby.first and groupby.last Co-authored-by: Richard Shadrach <[email protected]>
* ENH: Add skipna to groupby.first and groupby.last * resample & tests * Improve test * Fixups * fixup test * Rework na_value determination
doc/source/whatsnew/vX.X.X.rst
file if fixing a bug or adding a new feature.There is a case to put this in 2.2.1 as the ability to agg the first element from every group including NA values was removed by changing the behavior of nth in pandas 2.0.0. But also okay to put this in 3.0.