Skip to content

Pandas hdf functions should support the hdf5 ExternalLink functionality when reading/writing. #6019

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
jasonbrent opened this issue Jan 20, 2014 · 4 comments
Labels
Enhancement IO HDF5 read_hdf, HDFStore

Comments

@jasonbrent
Copy link

When attempting to use pandas.read_hdf() to read a link from an h5py created data file with an ExternalLink to an HDFStore() created entry, the following backtrace is received:

/Users/xxx/.pyenv/versions/anaconda/lib/python2.7/site-packages/pandas/io/pytables.pyc in select(self, key, where, start, stop, columns, iterator, chunksize, auto_close, **kwargs)
618 # create the storer and axes
619 where = _ensure_term(where)
--> 620 s = self._create_storer(group)
621 s.infer_axes()
622

/Users/xxx/.pyenv/versions/anaconda/lib/python2.7/site-packages/pandas/io/pytables.pyc in _create_storer(self, group, format, value, append, **kwargs)
1119 )
1120
-> 1121 pt = _ensure_decoded(getattr(group._v_attrs, 'pandas_type', None))
1122 tt = _ensure_decoded(getattr(group._v_attrs, 'table_type', None))
1123

/Users/xxx/.pyenv/versions/anaconda/lib/python2.7/site-packages/tables/link.pyc in getattr(self, name)
77 def getattr(self, name):
78 raise KeyError("you cannot get attributes from this "
---> 79 "%s instance" % self.class.name)
80
81 def setattr(self, name, value):

KeyError: 'you cannot get attributes from this NoAttrs instance'

In this example, 'store.h5' was created with HDFStore() and a single Series was stored in it at the location '/banana'. h5py.File was then used to create 'external.h5' with a single entry for /external that pointed to store.h5:/banana using h5py.ExternalLink.

In my use case, I had written code to parse and store some complex data and associated metadata in hdf5 using h5py. My intention was then to read that raw data in using pandas and then re-store the cooked data using HDFStore.

Unfortunately, pandas HDFStore() did not like the metadata in my h5py written file.

--snip--
/Users/xxx/.pyenv/versions/anaconda/lib/python2.7/site-packages/tables/attributeset.py:294: DataTypeWarning: Unsupported type for attribute 'is_key' in node 'some_node'. Offending HDF5 class: 8
value = self._g_getattr(self._v_node, name)
--snip--

I expected that I could work around this by simply storing the pandas content in a native HDFStore() written file and use ExternalLinks in the parent file. Unfortunately, that did not work properly either.

This is with pandas '0.13.0'.

Thanks!

-jbl

@jreback
Copy link
Contributor

jreback commented Jan 21, 2014

I am not sure of using links across h5py and pytables ; in theory it should work but to be honest I think it is overly complicated

you are welcome to put in some tests for links in pytables as I have never used them so the support is unclear

it might be nice to support them

I'll mark this as an enhancement

@jreback
Copy link
Contributor

jreback commented Jan 21, 2014

http://pytables.github.io/usersguide/libref/link_classes.html

their are also some attribute restrictions on links apparently

their are some required attributes on a HDFStore node
so that might explain this error/issue (prob can be dealt with pretty easily though)

@kcw78
Copy link

kcw78 commented Dec 18, 2021

Referencing external links still throws an error. (You can create them, you just can't read them in Pandas.) Now that soft/symbolic links work (per Issue #20523 above), any chance external links can be fixed too?

@jreback
Copy link
Contributor

jreback commented Dec 18, 2021

pandas is all volunteer - community contributions are welcome

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Enhancement IO HDF5 read_hdf, HDFStore
Projects
None yet
Development

No branches or pull requests

3 participants