-
-
Notifications
You must be signed in to change notification settings - Fork 18.4k
PERF: uses bincount instead of hash table in categorical value counts #10874
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
I think the soln in the issues is faster than this no? |
I get
but this does not check for nulls, and the index is not categorical. |
with
|
24f1e3f
to
436e96e
Compare
wow, this does even better!
|
ping when green |
Maybe worth adding a benchmark? |
I will add benchmark later today |
436e96e
to
855b804
Compare
added the benchmark, all green. |
def time_value_counts(self): | ||
self.ts.value_counts(dropna=True) | ||
self.ts.value_counts(dropna=False) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
These should have only 1 action per timing function (so make 2 functions)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why should it be only 1 action?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You get a timing per function. So if you want to track performance of both with dropna True and False, it has to be in two functions.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
added separate calls
855b804
to
c5a47e3
Compare
PERF: uses bincount instead of hash table in categorical value counts
thank you sir! |
closes #10804
on branch: