Skip to content

ENH: Support na_position for sort_index and sortlevel #51672

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 4 commits into from
Mar 8, 2023

Conversation

phofl
Copy link
Member

@phofl phofl commented Feb 27, 2023

I think the special casing in sort level was done for performance reason. Avoiding the use of categorical solves this. Both branches were equally fast

%timeit midx.sortlevel(level=[0, 1], ascending=[True, True])
58.8 ms ± 674 µs per loop (mean ± std. dev. of 7 runs, 10 loops each)
%timeit midx.sortlevel(level=[0, 1], ascending=True)
60.7 ms ± 278 µs per loop (mean ± std. dev. of 7 runs, 10 loops each)

@mroeschke mroeschke added this to the 2.1 milestone Mar 8, 2023
@mroeschke mroeschke merged commit 161f762 into pandas-dev:main Mar 8, 2023
@mroeschke
Copy link
Member

Thanks @phofl

@phofl phofl deleted the 51612 branch March 8, 2023 00:31
level=None,
ascending: bool | list[bool] = True,
sort_remaining=None,
na_position: str_t = "first",
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@phofl - any reason to default to first here? With a quick grep I'm seeing we default to last everywhere else.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pretty sure that this was done for backwards compatibility reasons. Does this break anything?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nope - makes sense. I'll experiment with changing the default and see what breaks; depending on that may put up an issue to deprecate.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No objections to deprecate. IIRC this is used internally a bunch, so deprecating might be tricky

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah - I meant just deprecating the default and changing it to "last" to be consistent with other sorting methods. We can still specify "first" internally so hopefully it will be straight forward unless I'm missing something.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah got you. No that makes sense

@lukemanley lukemanley mentioned this pull request Oct 4, 2023
3 tasks
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

BUG: na_position ignored when sorting MultiIndex with level!=None
3 participants