Skip to content

ENH: Add skipna to groupby.first and groupby.last #57102

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 7 commits into from
Jan 30, 2024

Conversation

rhshadrach
Copy link
Member

There is a case to put this in 2.2.1 as the ability to agg the first element from every group including NA values was removed by changing the behavior of nth in pandas 2.0.0. But also okay to put this in 3.0.

@rhshadrach rhshadrach added Enhancement Groupby Regression Functionality that used to work in a prior pandas version Reduction Operations sum, mean, min, max, etc. labels Jan 27, 2024
@rhshadrach rhshadrach added this to the 2.2.1 milestone Jan 27, 2024
@rhshadrach rhshadrach requested a review from WillAyd as a code owner January 27, 2024 04:49
Comment on lines 397 to 400
if is_extension_array_dtype(any_real_nullable_dtype):
na_value = Series(dtype=any_real_nullable_dtype).dtype.na_value
else:
na_value = np.nan
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not sure if there is any better way to get the NA value for a dtype when the code needs to span NumPy and EAs (both masked and pyarrow). This is the reason why I went with the string aliases in the any_real_nullable_dtype fixture for pyarrow; this is at odds with some of the other fixtures but the code to the NA value was much worse with pyarrow dtype objects.

cc @mroeschke if you have any suggestions

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Use can use pandas_dtype to get a dtype object from the string and na_value_for_dtype to get the na value from the dtype object

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also I think we've been using isinstance(..., ExtensionDtype) instead of is_extension_array_dtype if possible

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Use can use pandas_dtype to get a dtype object from the string

Makes sense - but I'd still need to have ALL_REAL_NULLABLE_DTYPES contain the string alias for pyarrow dtypes whereas most other lists of dtypes in _testing.__init__ use the pyarrow dtype objects. I just don't want to introduce an inconsistency here (pyarrow dtype objects vs string alias) if it's avoidable.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I just don't want to introduce an inconsistency here (pyarrow dtype objects vs string alias) if it's avoidable.

Yeah I don't think it's avoidable as of now, so I'm okay the way you have it in this PR

@rhshadrach rhshadrach requested a review from mroeschke January 30, 2024 02:09
@mroeschke mroeschke merged commit ab3d4bf into pandas-dev:main Jan 30, 2024
@mroeschke
Copy link
Member

Thanks @rhshadrach

meeseeksmachine pushed a commit to meeseeksmachine/pandas that referenced this pull request Jan 30, 2024
@rhshadrach rhshadrach deleted the enh_groupby_first_skipna branch January 30, 2024 03:14
mroeschke pushed a commit that referenced this pull request Jan 30, 2024
…and groupby.last) (#57141)

Backport PR #57102: ENH: Add skipna to groupby.first and groupby.last

Co-authored-by: Richard Shadrach <[email protected]>
pmhatre1 pushed a commit to pmhatre1/pandas-pmhatre1 that referenced this pull request May 7, 2024
* ENH: Add skipna to groupby.first and groupby.last

* resample & tests

* Improve test

* Fixups

* fixup test

* Rework na_value determination
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Enhancement Groupby Reduction Operations sum, mean, min, max, etc. Regression Functionality that used to work in a prior pandas version
Projects
None yet
Development

Successfully merging this pull request may close these issues.

ENH: Add skipna to groupby.first and groupby.last
2 participants