-
-
Notifications
You must be signed in to change notification settings - Fork 18.4k
Fixed HDFSTore.groups() performance. #21543
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fixed HDFSTore.groups() performance. #21543
Conversation
…rather every group.
I'm not sure if this deserves it's own 'whatsnew' entry... it is pretty simple. |
I believe we have an asv for this, can you run. and yes this does need a whatsnew, you can put in 0.23.2 perf. |
This is my first pull request for pandas. What is "asv"? I've updated the whatsnew. |
Codecov Report
@@ Coverage Diff @@
## master #21543 +/- ##
=======================================
Coverage 91.92% 91.92%
=======================================
Files 153 153
Lines 49563 49563
=======================================
Hits 45559 45559
Misses 4004 4004
Continue to review full report at Codecov.
|
Thanks. I ran the
Which is obviously an issue. Because this doesn't make sense (some of those tests don't even appear to be touching the part of the code that I changed...), I ran it again:
And I get what looks to be a completely different set of tests that are different. A third time gives me:
Which is at least a little better. I think this test is fairly sensitive to what else is going on on my computer. In order to get a more accurate result, I think I'll need to clean boot and run the tests without anything else running. |
yeah there don't appear to be any benchmarks for this (except for |
doc/source/whatsnew/v0.23.2.txt
Outdated
@@ -27,7 +27,9 @@ Performance Improvements | |||
- Improved performance of membership checks in :class:`CategoricalIndex` | |||
(i.e. ``x in ci``-style checks are much faster). :meth:`CategoricalIndex.contains` | |||
is likewise much faster (:issue:`21369`) | |||
- | |||
- Improved performance of :meth:`HDFStore.groups` (and dependent functions like |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
you don't need to say dependent functions, you can just list .keys
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There are a number of places that this hits (.keys, .info, .iterate, len, items, and those that call those functions.). Should I ignore them, list the ones that I know of, or just drop the "dependent functions" part and keep it simple?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
on 2nd thought this is ok
This particular performance issue wasn't really a problem till I hit a couple hundred (thousand?) keys in a hdf file, so I don't know if we want to create a dataset to test this. |
I rebased, so ping on green. |
doc/source/whatsnew/v0.23.2.txt
Outdated
@@ -27,7 +27,9 @@ Performance Improvements | |||
- Improved performance of membership checks in :class:`CategoricalIndex` | |||
(i.e. ``x in ci``-style checks are much faster). :meth:`CategoricalIndex.contains` | |||
is likewise much faster (:issue:`21369`) | |||
- | |||
- Improved performance of :meth:`HDFStore.groups` (and dependent functions like |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
on 2nd thought this is ok
thanks @spott |
No longer walks every node, but rather every group.
git diff upstream/master -u -- "*.py" | flake8 --diff