-
-
Notifications
You must be signed in to change notification settings - Fork 18.4k
COMPAT: reading generic PyTables Table format fails with sub-selection #11188
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
pandas can read a |
My file was created by pytables. That's why I was surprised it didn't work. |
well you would have to show a specific case then |
The example below reproduces the error. The problem is with the import tables
import pandas as pd
import numpy as np
output_fname = 'test.h5'
class LFloat(tables.IsDescription):
npart = tables.Int32Col(pos=1) # float id number, starts at 1
time = tables.Float32Col(pos=2) # time of the datapoint
x = tables.Float32Col(pos=3) # x position
y = tables.Float32Col(pos=4) # y position
z = tables.Float32Col(pos=5) # z position
dtype = tables.description.dtype_from_descr(LFloat)
nrecs = 10
with tables.openFile(output_fname, mode='w', title='Float Data') as h5file:
group = h5file.createGroup("/", 'floats', 'Float Data')
table = h5file.createTable(group, 'trajectories', LFloat,
"Float Trajectories", expectedrows=nrecs)
for n in range(nrecs):
d = np.empty(1, dtype)
d['npart'] = n
table.append(d)
table.cols.npart.createIndex()
table.flush()
df = pd.read_hdf('test.h5', '/floats/trajectories', start=0, stop=5) |
ok, this is a bug in the reading this tables. never had a need to do this, so its untested. So if you want to submit a pull-request fix would be great. The issue is here I think. |
Note that with simple floats,ints,strings you would be ok. But any other data types will simply fail (e.g. datetimes). You are almost certainly better off saving this using pandas format (which is readable by PyTables). |
Thanks @jreback! I'll see what I can do. I am not married to pytables for my application. I chose it because it allows me to write hdf files incrementally and thus scales to out-of-core file sizes. If pandas can do that directly, then I would rather use pure pandas. But my impression was that you have to first create the whole pandas dataframe in memory and then serialize to hdf. Correct? |
certainly not. You can create want you need then stream to hdf very much like Just create what you need, serialize, repeat. This is the See docs here. |
Just tried your suggestion. I ended up with segfaults when trying to append to my HDFStore. I think this is related to #10672. I guess I need to update pytables to 3.2.1. |
3.2 is buggy yep |
@jreback Could you point me again to the source of the issue? I think link you provided earlier is not valid anymore. I'm also encountering this issue and looking for the solution. In my use case I'm using I'm also encountering MemoryError if the index is bigger than RAM, but will prepare new bug report for it. |
#13267 fixed this for |
I do not quite understand what this means. @rabernat's comment above shows a nice minimal working example demonstrating the problem that he found, and that also @mchwalisz has found (and that I have right now). Can we modify the example (ideally just the reading part, and not the data generation part) so that it works? |
0.17.0 was released shortly after @jreback's comment (Sep 2015) and I think this is the line @jreback was pointing us to: https://github.com/pandas-dev/pandas/blob/v0.17.0/pandas/io/pytables.py#L1660 Now here: Line 1822 in 429078b
|
I created a file using pytables, and now I would like to read it into pandas. My (naive) expectation was that these two tools were compatible. But I am getting an error
gives
The file is 44 GB, so I can't really post it. I would be happy to post the
h5dump --head
metadata if that would help.(Cross post with dask/dask#747.)
The text was updated successfully, but these errors were encountered: