Skip to content

ENH: Read entire group of an HDF5 file in a single call #6833

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
VGonPa opened this issue Apr 7, 2014 · 3 comments
Open

ENH: Read entire group of an HDF5 file in a single call #6833

VGonPa opened this issue Apr 7, 2014 · 3 comments
Labels
Enhancement IO HDF5 read_hdf, HDFStore

Comments

@VGonPa
Copy link

VGonPa commented Apr 7, 2014

See this question in SO.

I guess that something like store.get_group('df') might do the job:

>>> import pandas as pd
>>> store = pd.HDFStore('test.h5',mode='w')
>>> store.append('df/foo1',DataFrame(np.random.randn(10,2)))
>>> store.append('df/foo2',DataFrame(np.random.randn(10,2)))
>>> store.get_group('df')
...           0         1
... 0 -0.495847 -1.449251
... 1 -0.494721  1.572560
... 2  1.219985  0.280878
... 3 -0.419651  1.975562
... 4 -0.489689 -2.712342
... 5 -0.022466 -0.238129
... 6 -1.195269 -0.028390
... 7 -0.192648  1.220730
... 8  1.331892  0.950508
... 9 -0.790354 -0.743006
... 0 -0.761820  0.847983
... 1 -0.126829  1.304889
... 2  0.667949 -1.481652
... 3  0.030162 -0.111911
... 4 -0.433762 -0.596412
... 5 -1.110968  0.411241
... 6 -0.428930  0.086527
... 7 -0.866701 -1.286884
... 8 -0.649420  0.227999
... 9 -0.100669 -0.205232
... 
... [20 rows x 2 columns]

I never did a pull request before, nor tried to dive into pandas code, but I guess that a possible implementation (inspired in this answer) would be something like:

class HDFStore():
    ...
    def get_group(self, group_name):
        return pd.concat([ self.select(node._v_pathname) for node in self.get_node(group_name) ])
    ...
@VGonPa VGonPa changed the title [ENH] Read entire group in an HDF5 file in a single call [ENH] Read entire group of an HDF5 file in a single call Apr 7, 2014
@jreback jreback added this to the 0.15.0 milestone Apr 7, 2014
@jreback jreback changed the title [ENH] Read entire group of an HDF5 file in a single call ENH: Read entire group of an HDF5 file in a single call Apr 7, 2014
@cpcloud
Copy link
Member

cpcloud commented Apr 7, 2014

I think this would be better if it returned this:

{'df': {'foo1': store.select('foo1'), 'foo2': store.select('foo2')}}

or

{'foo1': store.select('foo1'), 'foo2': store.select('foo2')}

Users can do the concat afterwards if they want to. This shouldn't assume that the frames are the same shape, dtype etc. For example, I have different metrics related to one "thing" all under one key, like

eeg/blocked
eeg/filtered
eye/blocked
eye/samples
eye/events

I don't want to concatenate the result but I often want to get just the eye or just the eeg datasets. This could even be a context manager where the context is the node. Something like:

with store.get_group('df') as group:
    foo1 = group['foo1']

@jreback
Copy link
Contributor

jreback commented Apr 7, 2014

the issue I have with this is that its tricky to pass a where to this as you can't assume anything about the tables. I like the returning a dict though.

@VGonPa
Copy link
Author

VGonPa commented Apr 8, 2014

I like the idea of returning a dict as well.

What about something like this?:

def get_group(self, group_name):
    return {node._v_name: self.select(node._v_pathname) for node in store.get_node(group_name) }

@jreback jreback modified the milestones: 0.16.0, Next Major Release Mar 3, 2015
@mroeschke mroeschke removed this from the Contributions Welcome milestone Oct 13, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Enhancement IO HDF5 read_hdf, HDFStore
Projects
None yet
Development

No branches or pull requests

4 participants