-
Notifications
You must be signed in to change notification settings - Fork 24.3k
torch.utils.data.DataLoader
并行处理h5文件时错误,单线程正常,并行报错.
#3415
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
|
Maybe hdf5 is not thread safe? Does it work without threads? |
this is a HDF5 issue. The problem is that HDF5 concurrent reads aren't safe: To actually allow concurrent reads for a file you have to use SWMR feature of HDF5: https://support.hdfgroup.org/HDF5/docNewFeatures/NewFeaturesSwmrDocs.html |
Actually this thread gives proper workarounds as well: https://stackoverflow.com/questions/34906652/does-hdf5-support-concurrent-reads-or-writes-to-different-files I think if you use python 3 and at the top of your main script (not the dataset), before you import torch.multiprocessing as mp
mp.set_start_method('spawn') |
@soumith thanks |
@soumith when I add import torch.multiprocessing as mp
mp.set_start_method('spawn') into the top of my main script, another error occurred
any idea? thxwell, this error can be fixed by directly apply Dataloader on dataset rather than use an additional function get_train_valid_loader to wrap it. But still not solve this problem, in my situation, it returns another error:
Another odd thing is the process not returns as if it stuck. |
你解决这个问题了吗? @flystarhe |
@zhbbupt 没有,我拖鞋了,单进程运行的 |
同样的问题,单线程能正常读取,多线程就出错。 |
I think you can bypass the runtime error by exception handler. I am not 100% sure that it will work, but you can try. It's just 3 more lines of codes. |
请问你解决这个问题了吗?难道h5 dataset只能单线程来了? |
@RizhaoCai You can read a HDF5-file with multithreading using the SWMR feature in the newer h5py library version.
|
Thanks! However, I added this into my code:
I still got the error: If I add the below code at the top: Any ideas? |
Encounter the same problem. Do you solve it? I mean, make it work with the num_workers>1 |
I got this work on my code h5py.File(file_path, 'r', libver='latest', swmr=True) and do not set torch multiprocessing to 'spawn' |
I couldn't get DataLoader to work for num_workers>1 even with this trick. |
老铁别扯些没用的, 最简单的方法就是加锁,multiprocessing.Lock |
Do not write
In this case, you will not read one h5 file multiple times in multi-processing |
请问大家解决了吗?我也遇到了这个问题了 |
Just wanna add one small point here...When you are doing what @lumaku is suggested (which really works), i.e. opening the hdf5 in each worker process, make sure that you don't have any opened hdf5 in the parent process (or anywhere else in the program), otherwise it will still throw such errors as "Can't read data (inflate() failed)" or etc. |
torch.utils.data.DataLoader
并行处理h5文件时错误,单线程正常,并行报错.代码如下:
错误如下:
The text was updated successfully, but these errors were encountered: