Skip to content

PERF: remove categories #51344

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 2 commits into from
Feb 13, 2023
Merged

Conversation

lukemanley
Copy link
Member

cc @phofl, per #50857 (comment)

I dont think we should revert #50857 as that fixed a few bugs. I think this gets most of the perf back though.

No whatsnew as this was a slowdown on main only.

from asv_bench.benchmarks.categoricals import RemoveCategories

b = RemoveCategories()
b.setup()

%timeit b.time_remove_categories()

# 66.4 ms ± 1.92 ms per loop (mean ± std. dev. of 7 runs, 10 loops each)  <- main
# 45.2 ms ± 1.57 ms per loop (mean ± std. dev. of 7 runs, 10 loops each)  <- PR

@lukemanley lukemanley added Performance Memory or execution speed performance Categorical Categorical Data Type labels Feb 12, 2023
if not is_list_like(removals):
removals = [removals]

removals = {x for x in set(removals) if notna(x)}
removals = Index(removals).dropna().unique()
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Wouldn't the other way round be faster? e.g. unique().dropna()?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good point. Its not measurable with that benchmark, but I swapped it anyway.

@phofl phofl added this to the 2.0 milestone Feb 13, 2023
@phofl phofl merged commit 89b510c into pandas-dev:main Feb 13, 2023
@phofl
Copy link
Member

phofl commented Feb 13, 2023

thx @lukemanley for following up so quickly

@phofl
Copy link
Member

phofl commented Feb 14, 2023

looks like we got it back to the original performance, thx

@lukemanley lukemanley deleted the perf-remove-categories branch February 23, 2023 01:38
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Categorical Categorical Data Type Performance Memory or execution speed performance
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants