Skip to content

BUG: HDF5 Files cannot be read concurrently #14692

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
dragonator4 opened this issue Nov 18, 2016 · 1 comment
Closed

BUG: HDF5 Files cannot be read concurrently #14692

dragonator4 opened this issue Nov 18, 2016 · 1 comment
Labels
Duplicate Report Duplicate issue or pull request IO HDF5 read_hdf, HDFStore

Comments

@dragonator4
Copy link

dragonator4 commented Nov 18, 2016

A small, complete example of the issue

import pandas as pd
import numpy as np
from multiprocessing import Pool
import warnings

# To avoid natural name warnings
warnings.filterwarnings('ignore')

def init(hdf_store):
    global hdf_buff
    hdf_buff = hdf_store

def reader(name):
    df = hdf_buff[name]
    return (name, df)

def main():
    # Creating the store
    with pd.HDFStore('storage.h5', 'w') as store:
        for i in range(100):
            df = pd.DataFrame(np.random.rand(5,3), columns=list('abc'))
            store.append(str(i), df, index=False, expectedrows=5)
    # Reading concurrently with one connection
    with pd.HDFStore('storage.h5', 'r') as store:
        with Pool(4, initializer=init, initargs=(store,)) as p:
            ret = pd.concat(dict(p.map(reader, [str(i) for i in range(100)])))

if __name__ == '__main__':
    main()

The above code either fails loudly with the following error:

tables.exceptions.HDF5ExtError: HDF5 error back trace

  File "H5Dio.c", line 173, in H5Dread
    can't read data
  File "H5Dio.c", line 554, in H5D__read
    can't read data
  File "H5Dchunk.c", line 1856, in H5D__chunk_read
    error looking up chunk address
  File "H5Dchunk.c", line 2441, in H5D__chunk_lookup
    can't query chunk address
  File "H5Dbtree.c", line 998, in H5D__btree_idx_get_addr
    can't get chunk info
  File "H5B.c", line 340, in H5B_find
    unable to load B-tree node
  File "H5AC.c", line 1262, in H5AC_protect
    H5C_protect() failed.
  File "H5C.c", line 3574, in H5C_protect
    can't load entry
  File "H5C.c", line 7954, in H5C_load_entry
    unable to load entry
  File "H5Bcache.c", line 143, in H5B__load
    wrong B-tree signature

End of HDF5 error back trace

Or with the following error:

multiprocessing.pool.RemoteTraceback:
"""
Traceback (most recent call last):
  File "/home/kartik/miniconda3/lib/python3.5/multiprocessing/pool.py", line 119, in worker
    result = (True, func(*args, **kwds))
  File "/home/kartik/miniconda3/lib/python3.5/multiprocessing/pool.py", line 44, in mapstar
    return list(map(*args))
  File "/home/kartik/Documents/Code/Scripts/Benchmarking and optimization stuff/HDF_Concurrent.py", line 13, in reader
    df = hdf_buff[name]
  File "/home/kartik/miniconda3/lib/python3.5/site-packages/pandas/io/pytables.py", line 461, in __getitem__
    return self.get(key)
  File "/home/kartik/miniconda3/lib/python3.5/site-packages/pandas/io/pytables.py", line 677, in get
    raise KeyError('No object named %s in the file' % key)
KeyError: 'No object named 7 in the file'
"""

But in this case, object 7 clearly exists in the table. Any help?

Output of pd.show_versions()

INSTALLED VERSIONS

commit: None
python: 3.5.2.final.0
python-bits: 64
OS: Linux
OS-release: 4.4.0-47-generic
machine: x86_64
processor: x86_64
byteorder: little
LC_ALL: None
LANG: en_US.UTF-8
LOCALE: en_US.UTF-8

pandas: 0.19.0
nose: None
pip: 8.1.2
setuptools: 27.2.0
Cython: 0.25.1
numpy: 1.11.2
scipy: 0.18.1
statsmodels: 0.6.1
xarray: 0.8.2
IPython: 5.1.0
sphinx: None
patsy: 0.4.1
dateutil: 2.5.3
pytz: 2016.7
blosc: None
bottleneck: 1.1.0
tables: 3.3.0
numexpr: 2.6.1
matplotlib: 1.5.3
openpyxl: None
xlrd: None
xlwt: None
xlsxwriter: 0.9.3
lxml: None
bs4: None
html5lib: None
httplib2: None
apiclient: None
sqlalchemy: None
pymysql: None
psycopg2: None
jinja2: 2.8
boto: None
pandas_datareader: None

@jreback jreback added Duplicate Report Duplicate issue or pull request IO HDF5 read_hdf, HDFStore labels Nov 18, 2016
@jreback jreback added this to the No action milestone Nov 18, 2016
@jreback
Copy link
Contributor

jreback commented Nov 18, 2016

duplicate #12236

pull-request to fix are welcome. This is not actually that hard to fix.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Duplicate Report Duplicate issue or pull request IO HDF5 read_hdf, HDFStore
Projects
None yet
Development

No branches or pull requests

2 participants