-
-
Notifications
You must be signed in to change notification settings - Fork 18.4k
Make pd.read_hdf('data.h5') work when pandas object stored contained categorical columns #13359
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
…ork when storing a dataframe that contains categorical data.
Current coverage is 84.23%@@ master #13359 diff @@
==========================================
Files 138 138
Lines 50724 50736 +12
Methods 0 0
Messages 0 0
Branches 0 0
==========================================
+ Hits 42726 42737 +11
- Misses 7998 7999 +1
Partials 0 0
|
@@ -4877,13 +4877,26 @@ def test_read_nokey(self): | |||
df = DataFrame(np.random.rand(4, 5), | |||
index=list('abcd'), | |||
columns=list('ABCDE')) | |||
# Categorical dtype not supported for "fixed" format. So no need |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
blank line
looks good. I think there is a failing case that needs to be tested. |
key = keys[0] | ||
groups = store.groups() | ||
if len(groups) == 0: | ||
raise ValueError('No dataset in HDF file.') |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not sure that ValueError
is the right exception here, but at least it's the same type as the exception raised by 0.18.1. (Although the message for 0.18.1 in this case is "key must be provided when HDF file contains multiple datasets.", which is a bit confusing.) And by the way, the exception raised when trying to do pd.read_hdf('empty.h5', 'some_key')
is (sensibly) "KeyError: 'No object named some_key in the file'". But raising KeyError for the case where key=None and we are trying to automatically figure out the (single, valid) key in the file seems wrong to me. Let me know if you can think of a better exception (or set of exceptions) for this.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this is ok. The user is explicity trying to do something which doesn't work. Its the same type of error when you have multiple keys, so its consistent. (and ok)
thanks @chrish42 |
git diff upstream/master | flake8 --diff