Skip to content

ENH: Enable .mode to sort with NA values #60702

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 2 commits into from
Jan 13, 2025

Conversation

rhshadrach
Copy link
Member

@rhshadrach rhshadrach commented Jan 12, 2025

  • closes #xxxx (Replace xxxx with the GitHub issue number)
  • Tests added and passed if fixing a bug or adding a new feature
  • All code checks passed.
  • Added type annotations to new arguments/methods/functions.
  • Added an entry in the latest doc/source/whatsnew/vX.X.X.rst file if fixing a bug or adding a new feature.

We sort values in groupby with NAs using safe_sort, we can do the same for mode. This resolves an inconsistency between Python and PyArrow backed strings as PyArrow's mode can sort the NA values.

The other option here is to not sort the result in all cases. This would be my preference as users can sort the result if they like but they cannot unsort it. But it would be a breaking change.

cc @WillAyd @jorisvandenbossche @mroeschke

@rhshadrach rhshadrach added Algos Non-arithmetic algos: value_counts, factorize, sorting, isin, clip, shift, diff Needs Discussion Requires discussion from core team before further action Enhancement labels Jan 12, 2025
@rhshadrach rhshadrach marked this pull request as draft January 12, 2025 15:14
@WillAyd WillAyd added this to the 2.3 milestone Jan 13, 2025
Copy link
Member

@jorisvandenbossche jorisvandenbossche left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM!

@WillAyd
Copy link
Member

WillAyd commented Jan 13, 2025

lgtm as well. I see marked for 2.3 but should this not wait until 3?

@rhshadrach
Copy link
Member Author

This is fixing an inconsistency between str dtypes, which is why I think we'd want it in 2.3. I am comfortable calling this inconsistency a bug. Do you think it is too breaking of a change?

@WillAyd
Copy link
Member

WillAyd commented Jan 13, 2025

Ah OK - that makes sense for 2.3 then. Can you take this out of draft mode?

@WillAyd WillAyd marked this pull request as ready for review January 13, 2025 22:27
@WillAyd WillAyd merged commit 1708e90 into pandas-dev:main Jan 13, 2025
61 of 63 checks passed
Copy link

lumberbot-app bot commented Jan 13, 2025

Owee, I'm MrMeeseeks, Look at me.

There seem to be a conflict, please backport manually. Here are approximate instructions:

  1. Checkout backport branch and update it.
git checkout 2.3.x
git pull
  1. Cherry pick the first parent branch of the this PR on top of the older branch:
git cherry-pick -x -m1 1708e9020c418e91fae430cf6a7a6ec09c466429
  1. You will likely have some merge/cherry-pick conflict here, fix them and commit:
git commit -am 'Backport PR #60702: ENH: Enable .mode to sort with NA values'
  1. Push to a named branch:
git push YOURFORK 2.3.x:auto-backport-of-pr-60702-on-2.3.x
  1. Create a PR against branch 2.3.x, I would have named this PR:

"Backport PR #60702 on branch 2.3.x (ENH: Enable .mode to sort with NA values)"

And apply the correct labels and milestones.

Congratulations — you did some good work! Hopefully your backport PR will be tested by the continuous integration and merged soon!

Remember to remove the Still Needs Manual Backport label once the PR gets merged.

If these instructions are inaccurate, feel free to suggest an improvement.

@rhshadrach
Copy link
Member Author

I put it in draft mode because it wasn't ready to merge - needs a what's new. No worries at all here, can do a followup, but just as a general ways of working can I request not having others take my PRs out of draft.

@WillAyd
Copy link
Member

WillAyd commented Jan 13, 2025

Sure no problem

@jorisvandenbossche
Copy link
Member

Manual backport -> #60754

jorisvandenbossche added a commit that referenced this pull request Jan 22, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Algos Non-arithmetic algos: value_counts, factorize, sorting, isin, clip, shift, diff Enhancement Needs Discussion Requires discussion from core team before further action
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants