-
-
Notifications
You must be signed in to change notification settings - Fork 18.4k
BUG: invalid column names in a HDF5 table format #9057 #10098
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
@@ -257,6 +257,30 @@ def _tables(): | |||
def to_hdf(path_or_buf, key, value, mode=None, complevel=None, complib=None, | |||
append=None, **kwargs): | |||
""" store this object, close it if we opened it """ | |||
|
|||
# PyTables has some limitations that we need to check for | |||
if kwargs.get('format', None) == 'table': |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is not the right place for this. Should go here: https://github.com/pydata/pandas/blob/master/pandas/io/pytables.py#L1546
Just add on logic to validate. This is an IndexCol
at this point, so should be straightforward.
I'm definitely open to moving this logic to the most appropriate place, and will rely on your judgement as to where that is. As far as I can tell, though, the code path through The calls to validate happen after table_creation (i.e. where the original error comes from). For example:
Should there be a separate validation call somewhere higher in write()? Or am I missing something obvious here? |
Try something like this
|
Hmm, yeah, that works. Can I ask, what's the expected behaviour when a user tries to use unicode column name? Is this test actually testing for the right behaviour or is it irrelevant? |
e.g.
All that said, I What we are actually asking here is if a non-ascii encoded string will work. |
…e more comprehensive
@jreback Thanks for the suggestions. I used your |
|
||
for index in types_should_run: | ||
df = DataFrame(np.random.randn(10, 2), columns=index(2)) | ||
with ensure_clean_path(self.path) as path: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
for the ones that should run, I would then select out say the first element and make sure you are getting a non-zero length frame, e.g.
result = pd.read_hdf(path,'df',where="index = [{0}]".format(df.index[0]))
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can you update to my comments?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sorry, I'm not following. Do you want me address this comment above? I thought that I did so two lines down...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ahh, didn't see that
pls add a release note in v0.17.0.txt |
4f0325d
to
a01d8c0
Compare
…e more comprehensive
can you rebase and squash, contributing docs are here if you need help |
a01d8c0
to
8273356
Compare
@jreback It should be good to go. |
@@ -73,7 +73,7 @@ Bug Fixes | |||
- Bug in ``Timestamp``'s' ``microsecond``, ``quarter``, ``dayofyear``, ``week`` and ``daysinmonth`` properties return ``np.int`` type, not built-in ``int``. (:issue:`10050`) | |||
- Bug in ``NaT`` raises ``AttributeError`` when accessing to ``daysinmonth``, ``dayofweek`` properties. (:issue:`10096`) | |||
|
|||
|
|||
- Bug in ``DataFrame.to_hdf()`` where table format would raise a seemingly unrelated error for invalid (non-string) column names. This is now explicitly forbidden. (:issue:`9057`) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
we are now going to release 0.16.2, so can you remove from here (it no longer exists) and put there?
8273356
to
56fcc9d
Compare
ok, why don't you rebase; ping when green. |
Have DataFrame.to_hdf() raise an error when using pytables with non-string column types. Fixes pandas-dev#9057
56fcc9d
to
33cfaea
Compare
@jreback This PR should be good to go once again. |
BUG: invalid column names in a HDF5 table format #9057
thanks! |
Have DataFrame.to_hdf() raise an error when using pytables with non-string
column types
closes #9057