-
-
Notifications
You must be signed in to change notification settings - Fork 18.4k
HDFStore: Fix empty result of keys() method on non-pandas hdf5 file #32401
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Closed
roberthdevries
wants to merge
12
commits into
pandas-dev:master
from
roberthdevries:fix-empty-hdfstore-keys
Closed
Changes from all commits
Commits
Show all changes
12 commits
Select commit
Hold shift + click to select a range
b701c5b
HDFStore: Fix empty result of keys() method on non-pandas hdf5 file
roberthdevries bde4c96
Flake8 fixes
roberthdevries e1ce3d5
black reformatter
roberthdevries d3f4f04
Add whatsnew entry
roberthdevries ce4dba1
Correct the order of the keywords in the docstring
roberthdevries 27aac41
Remove unused variables in the test code
roberthdevries 25fa1e4
Minor cleanups
roberthdevries eabed52
Make test code for the non-pandas case a bit more explicit
roberthdevries 76bd80b
Simplify interface by using a fallback scenario
roberthdevries 54d43b1
Update the whatsnew entry to reflect the changed behavior
roberthdevries 9e44d2d
Forgot to remove commented out line of code
roberthdevries a903c6e
Remove redundant check and make mypy happy
roberthdevries File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this is not a good api.
either:
a) always return all tables (api breaking), or
b) add a keyword include='pandas'|'native'|'all', default would be 'pandas'
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Note I am not the PR author but reported the issue.
It really depends on what you want to achieve, IMHO better compatibility would be achieved with option (a) which basically reverts #21543 but for
keys()
and leavinggroups()
untouched. Don't know if that breaks anything else, but it might hurt performance in some cases although the original bug #21372 complained about the performance ofgroups()
(it links to #17593 about slowkeys()
). There is also awalk()
method which is very similar togroups()
.(b) would break backwards compatibility of all (future) projects that use that keyword wrt
pandas < 1.0.x
.There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The latest version is trying to keep some backwards compatibility with pandas 0.23 (plus or minus a minor revision). Apparently there are some tools that depend on the original behavior (hence the issue which this PR is trying to fix). So either we keep breaking backward compatibility or we add a keyword.
I think this is a matter of taste/policy.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@jreback I think this is a trade-off between having backwards compatible behavior with v0.23.x vs. a clean API.
I see one vote from @st-bender for being backwards compatible, and I see one vote from @jreback for the clean API
Note sure whose vote has more weight (:smile: but I can make a gues), but I have no strong feelings either way.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @roberthdevries for pushing this forward and @jreback thanks for the feedback.
As I see it there is no fully compatible way to change that behaviour back to <=0.23.x, and apparently I am the only one that noticed that when
dask.dataframe.read_hdf()
suddenly failed to read some files which worked before (dask/dask#5934).It seems that
dask.dataframe
is currently the only serious "user" of that interface, and I filed dask/dask#5934 to decide whether it is better handled withindask
instead ofpandas
. The change there would be similarly simple and would probably be more backwards compatible with older pandas versions.