-
-
Notifications
You must be signed in to change notification settings - Fork 18.4k
read_hdf crash python process when use it in multithread code #14263
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
Could you make a reproducible example? |
files is the array of string, contains the absolute paths of .h5 files, you will need code like this.
INSTALLED VERSIONScommit: None pandas: 0.18.1 |
Unfortunately, that still won't work for me since the directory |
@TomAugspurger thank you for your reply, actually if i didn't create code to generate small files for you, I wouldn't notice this problem when I created H5 files import numpy as np
import pandas as pd
from pandas.util import testing as tm
from multiprocessing.pool import ThreadPool
path = 'test.hdf'
path1 = 'test1.hdf'
files=[path,path1]
num_rows = 100000
num_tasks = 2
def make_df(num_rows=10000):
df = pd.DataFrame(np.random.rand(num_rows, 5), columns=list('abcde'))
df['foo'] = 'foo'
df['bar'] = 'bar'
df['baz'] = 'baz'
df['date'] = pd.date_range('20000101 09:00:00',
periods=num_rows,
freq='s')
df['int'] = np.arange(num_rows, dtype='int64')
return df
print("writing df")
df = make_df(num_rows=num_rows)
df.to_hdf(path, 'df',complib='zlib',complevel=9,append=False,mode='w',format='t')
df.to_hdf(path1, 'df',complib='zlib',complevel=9,append=False,**mode='a'**,format='t')
def readjob(x):
path = x
return pd.read_hdf(path,"df",mode='r')
pool = ThreadPool(num_tasks)
results = pool.map(readjob,files)
print results when I write to path1, i set the mode to append, the code crashes when the pool kicks in |
duplicate of #12236 |
the mode parameter doesn't fix the problem, after i tested the code more times, i found out it was just random to run through. |
one of my folder contains multiple h5 files, and I tried to load them into dataframes and then concat these df into one.
the python process crashes when the num_tasks>1, if I debug thread by thread, it works, in another, it crashes simply when two threads run at the same time, even though they read different files.
The text was updated successfully, but these errors were encountered: