Completed code with bug report for hdf5 dataset. How to fix? #18951

John1231983 · 2019-04-05T14:50:55Z

Hello all, I want to report the issue of pytorch with hdf5 loader. The full source code and bug are provided
The problem is that I want to call the test_dataloader.py in two terminals. The file is used to load the custom hdf5 dataset (custom_h5_loader). To generate h5 files, you may need first run the file convert_to_h5 to generate 100 random h5 files.
To reproduce the error. Please run follows steps

Step 1: Generate the hdf5

from __future__ import print_function
import h5py
import numpy as np
import random
import os

if not os.path.exists('./data_h5'):
        os.makedirs('./data_h5')

for index in range(100):
    data = np.random.uniform(0,1, size=(3,128,128))
    data = data[None, ...]
    print (data.shape)
    with h5py.File('./data_h5/' +'%s.h5' % (str(index)), 'w') as f:
        f['data'] = data

Step2: Create a python file custom_h5_loader.py and paste the code

import h5py
import torch.utils.data as data
import glob
import torch
import numpy as np
import os
class custom_h5_loader(data.Dataset):

    def __init__(self, root_path):
        self.hdf5_list = [x for x in glob.glob(os.path.join(root_path, '*.h5'))]
        self.data_list = []
        for ind in range (len(self.hdf5_list)):
            self.h5_file = h5py.File(self.hdf5_list[ind])
            data_i = self.h5_file.get('data')     
            self.data_list.append(data_i)

    def __getitem__(self, index):
        self.data = np.asarray(self.data_list[index])   
        return (torch.from_numpy(self.data).float())

    def __len__(self):
        return len(self.hdf5_list)

Step 3: Create a python file with name test_dataloader.py

from dataloader import custom_h5_loader
import torch
import torchvision.datasets as dsets

train_h5_dataset = custom_h5_loader('./data_h5')
h5_loader = torch.utils.data.DataLoader(dataset=train_h5_dataset, batch_size=2, shuffle=True, num_workers=4)      
for epoch in range(100000):
    for i, data in enumerate(h5_loader):       
        print (data.shape)

Step 4: Open first terminal and run (it worked)

python test_dataloader.py

Step 5: Open the second terminal and run (Error report in below)

python test_dataloader.py

The error is

Traceback (most recent call last):
  File "/home/john/anaconda3/lib/python3.6/site-packages/h5py/_hl/files.py", line 162, in make_fid
    fid = h5f.open(name, h5f.ACC_RDWR, fapl=fapl)
  File "h5py/_objects.pyx", line 54, in h5py._objects.with_phil.wrapper
  File "h5py/_objects.pyx", line 55, in h5py._objects.with_phil.wrapper
  File "h5py/h5f.pyx", line 78, in h5py.h5f.open
OSError: Unable to open file (unable to lock file, errno = 11, error message = 'Resource temporarily unavailable')

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/home/john/anaconda3/lib/python3.6/site-packages/h5py/_hl/files.py", line 165, in make_fid
    fid = h5f.open(name, h5f.ACC_RDONLY, fapl=fapl)
  File "h5py/_objects.pyx", line 54, in h5py._objects.with_phil.wrapper
  File "h5py/_objects.pyx", line 55, in h5py._objects.with_phil.wrapper
  File "h5py/h5f.pyx", line 78, in h5py.h5f.open
OSError: Unable to open file (unable to lock file, errno = 11, error message = 'Resource temporarily unavailable')

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "test_dataloader.py", line 5, in <module>
    train_h5_dataset = custom_h5_loader('./data_h5')
  File "/home/john/test_hdf5/dataloader.py", line 13, in __init__
    self.h5_file = h5py.File(self.hdf5_list[ind])
  File "/home/john/anaconda3/lib/python3.6/site-packages/h5py/_hl/files.py", line 312, in __init__
    fid = make_fid(name, mode, userblock_size, fapl, swmr=swmr)
  File "/home/john/anaconda3/lib/python3.6/site-packages/h5py/_hl/files.py", line 167, in make_fid
    fid = h5f.create(name, h5f.ACC_EXCL, fapl=fapl, fcpl=fcpl)
  File "h5py/_objects.pyx", line 54, in h5py._objects.with_phil.wrapper
  File "h5py/_objects.pyx", line 55, in h5py._objects.with_phil.wrapper
  File "h5py/h5f.pyx", line 98, in h5py.h5f.create
OSError: Unable to create file (unable to open file: name = './data_h5/47.h5', errno = 17, error message = 'File exists', flags = 15, o_flags = c2)

This is my configuration

HDF5 Version: 1.10.2
Configured on: Wed May  9 23:24:59 UTC 2018
Features:
---------
                  Parallel HDF5: no
             High-level library: yes
                   Threadsafety: yes
print (torch.__version__)
1.0.0.dev20181227

The text was updated successfully, but these errors were encountered:

fmassa · 2019-04-05T15:27:13Z

This happens because hdf5 is not thread safe.

Have a look at #3415 for further discussion and a potential solution.

John1231983 · 2019-04-05T16:46:57Z

@fmassa : It does not solve my problem. I also got same error when adding the solution

 ForkingPickler(file, protocol).dump(obj)
TypeError: can't pickle _thread._local objects

CDitzel · 2019-04-06T12:36:34Z

John I answered you in your thread on the pytorch board

fmassa closed this as completed Apr 5, 2019

fmassa added the duplicate label Apr 5, 2019

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Completed code with bug report for hdf5 dataset. How to fix? #18951

Completed code with bug report for hdf5 dataset. How to fix? #18951

John1231983 commented Apr 5, 2019

fmassa commented Apr 5, 2019

Uh oh!

John1231983 commented Apr 5, 2019

Uh oh!

CDitzel commented Apr 6, 2019

Uh oh!

Completed code with bug report for hdf5 dataset. How to fix? #18951

Completed code with bug report for hdf5 dataset. How to fix? #18951

Comments

John1231983 commented Apr 5, 2019

fmassa commented Apr 5, 2019

Uh oh!

John1231983 commented Apr 5, 2019

Uh oh!

CDitzel commented Apr 6, 2019

Uh oh!