-
-
Notifications
You must be signed in to change notification settings - Fork 18.4k
Rank categorical perf #15518
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Rank categorical perf #15518
Conversation
Codecov Report
@@ Coverage Diff @@
## master #15518 +/- ##
==========================================
- Coverage 91.06% 90.36% -0.71%
==========================================
Files 136 136
Lines 49099 49556 +457
==========================================
+ Hits 44714 44783 +69
- Misses 4385 4773 +388
Continue to review full report at Codecov.
|
can you add an asv similar to from the issue (ideally tests multple dtypes as well) |
also a whatsnew (in Performance) entry would be great. code lgtm. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good to me!
[1, 2, 3, 4, 5, 6], | ||
).astype('category').cat.set_categories( | ||
[1, 2, 3, 4, 5, 6], | ||
ordered=False |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
you can simplify this creation by doing astype('category', categories=[1, 2, 3, 4, 5, 6], ordered=False)
pandas/core/categorical.py
Outdated
values = np.array(self) | ||
else: | ||
values = np.array( | ||
self.rename_categories(Series(self.categories).rank()) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you add a small comment on why do this here?
I have made all the minor changes but can't seem to run the asv benchmarks. Tried running them on upstream/master as well but the tests fail with the following error TypeError: unbound methods must have non-NULL im_class I used the following command to run asv benchmarks asv continuous -f 1.1 -E virtualenv upstream/master HEAD -b ^categorical Googling for a solution hasn't helped. If you can point me towards a solution i will run the benchmarks and push my changes. Here's a complete trace
|
@ikilledthecat Can you push your changes anyway (regardless whether you get asv running), so we can already review (and getting the asv benchs running is not a blocker for merging this PR in this case). Regarding the asv, I don't have experience with running it using virtualenv, only used it with conda and never saw such an error. |
@ikilledthecat something went wrong with your rebase, as you have many other commits included now as well. Doing
should normally always do the right thing |
no need to rename categories where they are already ordered
check for numeric instead of monotonic
no need to rename categories where they are already ordered
2e790d2
to
ad38544
Compare
@jorisvandenbossche sorry, pushed after re-mergeing upstream/master. |
you'll want to rebase on master as travis was having some issues |
thanks @ikilledthecat |
closes pandas-dev#15498 Author: Prasanjit Prakash <[email protected]> Closes pandas-dev#15518 from ikilledthecat/rank_categorical_perf and squashes the following commits: 30b49b9 [Prasanjit Prakash] PERF: GH15498 - pep8 changes ad38544 [Prasanjit Prakash] PERF: GH15498 - asv tests and whatsnew 1ebdb56 [Prasanjit Prakash] PERF: categorical rank GH#15498 a67cd85 [Prasanjit Prakash] PERF: categorical rank GH#15498 81df7df [Prasanjit Prakash] PERF: categorical rank GH#15498 45dd125 [Prasanjit Prakash] PERF: categorical rank GH#15498 33249b3 [Prasanjit Prakash] PERF: categorical rank GH#15498
git diff upstream/master | flake8 --diff