-
-
Notifications
You must be signed in to change notification settings - Fork 18.4k
Performance of get_loc on non-unique MultiIndex #19464
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
8b777cf
to
804af26
Compare
Codecov Report
@@ Coverage Diff @@
## master #19464 +/- ##
==========================================
- Coverage 92.06% 91.82% -0.24%
==========================================
Files 169 153 -16
Lines 50694 49493 -1201
==========================================
- Hits 46671 45448 -1223
- Misses 4023 4045 +22
Continue to review full report at Codecov.
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can you add any asv's that are needed and show the results (or if there is already sufficient coverage, can you show the results)
doc/source/whatsnew/v0.23.0.txt
Outdated
@@ -381,6 +381,7 @@ Performance Improvements | |||
- :func:`Series` / :func:`DataFrame` tab completion limits to 100 values, for better performance. (:issue:`18587`) | |||
- Improved performance of :func:`DataFrame.median` with ``axis=1`` when bottleneck is not installed (:issue:`16468`) | |||
- Improved performance of :func:`MultiIndex.get_loc` for large indexes, at the cost of a reduction in performance for small ones (:issue:`18519`) | |||
- Improved performance of :func:`MultiIndex.get_loc` for non-unique indexes, which as a consequence does not emit a ``PerformanceWarning`` any more |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can you add the issue number (this PR number if no issue).
rebase when you can and pls show perf |
Unfortunately I found out that performance drops dramatically in some cases... until I'm able to fix and merge #19539. |
keep open then? |
definitely... I think they are both significant improvements, I just need to understand what's wrong with 2.7 on Windows |
804af26
to
cd7fd6b
Compare
Rebased, and #19539 did improve things significantly... but results are still not what one expects from a performance fix!
I will have to investigate more. |
can you rebase |
cd7fd6b
to
b72f9c5
Compare
closing as stale. if you'd like to continue pls ping. |
git diff upstream/master -u -- "*.py" | flake8 --diff
The second commit is related only in the sense that the docs mentioned a
PerformanceWarning
due to (non-)sorting, while it was actually due to (non-)uniqueness.