ENH: Add skipna to groupby.first and groupby.last #57102

rhshadrach · 2024-01-27T04:49:08Z

closes ENH: Add skipna to groupby.first and groupby.last #57019 (Replace xxxx with the GitHub issue number)
Tests added and passed if fixing a bug or adding a new feature
All code checks passed.
Added type annotations to new arguments/methods/functions.
Added an entry in the latest doc/source/whatsnew/vX.X.X.rst file if fixing a bug or adding a new feature.

There is a case to put this in 2.2.1 as the ability to agg the first element from every group including NA values was removed by changing the behavior of nth in pandas 2.0.0. But also okay to put this in 3.0.

rhshadrach · 2024-01-27T17:49:47Z

pandas/tests/groupby/test_reductions.py

+    if is_extension_array_dtype(any_real_nullable_dtype):
+        na_value = Series(dtype=any_real_nullable_dtype).dtype.na_value
+    else:
+        na_value = np.nan


Not sure if there is any better way to get the NA value for a dtype when the code needs to span NumPy and EAs (both masked and pyarrow). This is the reason why I went with the string aliases in the any_real_nullable_dtype fixture for pyarrow; this is at odds with some of the other fixtures but the code to the NA value was much worse with pyarrow dtype objects.

cc @mroeschke if you have any suggestions

Use can use pandas_dtype to get a dtype object from the string and na_value_for_dtype to get the na value from the dtype object

Also I think we've been using isinstance(..., ExtensionDtype) instead of is_extension_array_dtype if possible

Use can use pandas_dtype to get a dtype object from the string

Makes sense - but I'd still need to have ALL_REAL_NULLABLE_DTYPES contain the string alias for pyarrow dtypes whereas most other lists of dtypes in _testing.__init__ use the pyarrow dtype objects. I just don't want to introduce an inconsistency here (pyarrow dtype objects vs string alias) if it's avoidable.

I just don't want to introduce an inconsistency here (pyarrow dtype objects vs string alias) if it's avoidable.

Yeah I don't think it's avoidable as of now, so I'm okay the way you have it in this PR

…kipna

mroeschke · 2024-01-30T02:14:32Z

Thanks @rhshadrach

…oupby.last

…and groupby.last) (#57141) Backport PR #57102: ENH: Add skipna to groupby.first and groupby.last Co-authored-by: Richard Shadrach <[email protected]>

* ENH: Add skipna to groupby.first and groupby.last * resample & tests * Improve test * Fixups * fixup test * Rework na_value determination

ENH: Add skipna to groupby.first and groupby.last

54830d9

rhshadrach added Enhancement Groupby Regression Functionality that used to work in a prior pandas version Reduction Operations sum, mean, min, max, etc. labels Jan 27, 2024

rhshadrach added this to the 2.2.1 milestone Jan 27, 2024

rhshadrach requested a review from WillAyd as a code owner January 27, 2024 04:49

rhshadrach added 3 commits January 27, 2024 12:35

resample & tests

b12541b

Improve test

207be12

Fixups

b207337

rhshadrach commented Jan 27, 2024

View reviewed changes

rhshadrach added 3 commits January 28, 2024 09:17

fixup test

a634d44

Rework na_value determination

b3bd9bb

Merge remote-tracking branch 'upstream/main' into enh_groupby_first_s…

a796dfd

…kipna

rhshadrach requested a review from mroeschke January 30, 2024 02:09

mroeschke approved these changes Jan 30, 2024

View reviewed changes

mroeschke merged commit ab3d4bf into pandas-dev:main Jan 30, 2024

meeseeksmachine pushed a commit to meeseeksmachine/pandas that referenced this pull request Jan 30, 2024

Backport PR pandas-dev#57102: ENH: Add skipna to groupby.first and gr…

f1beec4

…oupby.last

meeseeksmachine mentioned this pull request Jan 30, 2024

Backport PR #57102 on branch 2.2.x (ENH: Add skipna to groupby.first and groupby.last) #57141

Merged

rhshadrach deleted the enh_groupby_first_skipna branch January 30, 2024 03:14

mroeschke mentioned this pull request Feb 26, 2024

pandas 2.2.1 introduce API changes, for example for groupby().first, groupby().last and series.argsort #57631

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

ENH: Add skipna to groupby.first and groupby.last #57102

ENH: Add skipna to groupby.first and groupby.last #57102

rhshadrach commented Jan 27, 2024

rhshadrach Jan 27, 2024

mroeschke Jan 29, 2024

mroeschke Jan 29, 2024

rhshadrach Jan 29, 2024

mroeschke Jan 29, 2024

mroeschke commented Jan 30, 2024

ENH: Add skipna to groupby.first and groupby.last #57102

ENH: Add skipna to groupby.first and groupby.last #57102

Conversation

rhshadrach commented Jan 27, 2024

rhshadrach Jan 27, 2024

Choose a reason for hiding this comment

mroeschke Jan 29, 2024

Choose a reason for hiding this comment

mroeschke Jan 29, 2024

Choose a reason for hiding this comment

rhshadrach Jan 29, 2024

Choose a reason for hiding this comment

mroeschke Jan 29, 2024

Choose a reason for hiding this comment

mroeschke commented Jan 30, 2024