Skip to content

str'ified column names in DataFrame.to_hdf(); Fixes #9057 #9902

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
wants to merge 2 commits into from

Conversation

jameshiebert
Copy link
Contributor

closes #9057

Converted column names to strings before creating pytables metadata object

This brute force str()ing method might not be the best approach, but AFAIK it hasn't broken any of the tests.

Converted column names to strings before creating pytables metadata object
periods=N,freq='ms'))

self.filename = NamedTemporaryFile(suffix='.h5', delete=False).name
df.to_hdf(self.filename,'df', mode='w', format='table',
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

these all need to go in io/tests/test_pytables.py

@jreback jreback added Bug IO HDF5 read_hdf, HDFStore labels Apr 14, 2015
@jameshiebert
Copy link
Contributor Author

Yeah, my bad, that makes sense. I moved them over.

@@ -4623,6 +4624,27 @@ def _test_sort(obj):
else:
raise ValueError('type not supported here')

class TestToHdfWithIntegerColumnNames(tm.TestCase):
# GH9057
def setUp(self):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

pls test the way everything else is done
eg use ensure_clean for a file

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

pls give a test with Unicode column names

@jreback
Copy link
Contributor

jreback commented Apr 15, 2015

the reason for not allowing this is that queries become non-obvious and the data does not correctly round trip

so I suppoose this is ok but needs a warning to show that this is losing data

@jreback
Copy link
Contributor

jreback commented Apr 17, 2015

pls do the tests like the others, e.g. use ensure_clean_path which handles the cleanup in the correct way

@jreback jreback added this to the 0.16.1 milestone Apr 17, 2015
@jreback
Copy link
Contributor

jreback commented Apr 17, 2015

pls add a warning when things are stringified as well (as this is generally something which is not obvious and makes querying different).

@jreback jreback modified the milestones: 0.17.0, 0.16.1 Apr 17, 2015
@jameshiebert
Copy link
Contributor Author

Do you think that it's better to just straight-up disallow to_hdf() on DataFrames with integer column names? If there are numerous places in the code where it's assumed that a column name can be mapped to a Python identifier, then simply working around the problem (as in this PR) may just be asking for a world of hurt. Thoughts?

@jreback
Copy link
Contributor

jreback commented Apr 20, 2015

I think it might be better to raise a more informative error message

the point of the Table format is to query and so have integer names though can work is a bit odd
(the actual exception comes from PyTables though as it cannot hold integer names)

@jreback
Copy link
Contributor

jreback commented May 9, 2015

@jameshiebert I think let's just raise an error message. Pls rebase and update.

@jameshiebert
Copy link
Contributor Author

Sounds good. Closing this in favor of #10098

@jorisvandenbossche jorisvandenbossche modified the milestones: 0.17.0, 0.16.2, No action Jun 2, 2015
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Bug IO HDF5 read_hdf, HDFStore
Projects
None yet
Development

Successfully merging this pull request may close these issues.

BUG: to_hdf failing with integer columns and data_columns=True
3 participants