Skip to content

Completed code with bug report for hdf5 dataset. How to fix? #18951

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
John1231983 opened this issue Apr 5, 2019 · 3 comments
Closed

Completed code with bug report for hdf5 dataset. How to fix? #18951

John1231983 opened this issue Apr 5, 2019 · 3 comments

Comments

@John1231983
Copy link

Hello all, I want to report the issue of pytorch with hdf5 loader. The full source code and bug are provided
The problem is that I want to call the test_dataloader.py in two terminals. The file is used to load the custom hdf5 dataset (custom_h5_loader). To generate h5 files, you may need first run the file convert_to_h5 to generate 100 random h5 files.
To reproduce the error. Please run follows steps

Step 1: Generate the hdf5

from __future__ import print_function
import h5py
import numpy as np
import random
import os

if not os.path.exists('./data_h5'):
        os.makedirs('./data_h5')

for index in range(100):
    data = np.random.uniform(0,1, size=(3,128,128))
    data = data[None, ...]
    print (data.shape)
    with h5py.File('./data_h5/' +'%s.h5' % (str(index)), 'w') as f:
        f['data'] = data

Step2: Create a python file custom_h5_loader.py and paste the code

import h5py
import torch.utils.data as data
import glob
import torch
import numpy as np
import os
class custom_h5_loader(data.Dataset):

    def __init__(self, root_path):
        self.hdf5_list = [x for x in glob.glob(os.path.join(root_path, '*.h5'))]
        self.data_list = []
        for ind in range (len(self.hdf5_list)):
            self.h5_file = h5py.File(self.hdf5_list[ind])
            data_i = self.h5_file.get('data')     
            self.data_list.append(data_i)

    def __getitem__(self, index):
        self.data = np.asarray(self.data_list[index])   
        return (torch.from_numpy(self.data).float())

    def __len__(self):
        return len(self.hdf5_list)

Step 3: Create a python file with name test_dataloader.py

from dataloader import custom_h5_loader
import torch
import torchvision.datasets as dsets

train_h5_dataset = custom_h5_loader('./data_h5')
h5_loader = torch.utils.data.DataLoader(dataset=train_h5_dataset, batch_size=2, shuffle=True, num_workers=4)      
for epoch in range(100000):
    for i, data in enumerate(h5_loader):       
        print (data.shape)

Step 4: Open first terminal and run (it worked)

python test_dataloader.py

Step 5: Open the second terminal and run (Error report in below)

python test_dataloader.py

The error is

Traceback (most recent call last):
  File "/home/john/anaconda3/lib/python3.6/site-packages/h5py/_hl/files.py", line 162, in make_fid
    fid = h5f.open(name, h5f.ACC_RDWR, fapl=fapl)
  File "h5py/_objects.pyx", line 54, in h5py._objects.with_phil.wrapper
  File "h5py/_objects.pyx", line 55, in h5py._objects.with_phil.wrapper
  File "h5py/h5f.pyx", line 78, in h5py.h5f.open
OSError: Unable to open file (unable to lock file, errno = 11, error message = 'Resource temporarily unavailable')

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/home/john/anaconda3/lib/python3.6/site-packages/h5py/_hl/files.py", line 165, in make_fid
    fid = h5f.open(name, h5f.ACC_RDONLY, fapl=fapl)
  File "h5py/_objects.pyx", line 54, in h5py._objects.with_phil.wrapper
  File "h5py/_objects.pyx", line 55, in h5py._objects.with_phil.wrapper
  File "h5py/h5f.pyx", line 78, in h5py.h5f.open
OSError: Unable to open file (unable to lock file, errno = 11, error message = 'Resource temporarily unavailable')

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "test_dataloader.py", line 5, in <module>
    train_h5_dataset = custom_h5_loader('./data_h5')
  File "/home/john/test_hdf5/dataloader.py", line 13, in __init__
    self.h5_file = h5py.File(self.hdf5_list[ind])
  File "/home/john/anaconda3/lib/python3.6/site-packages/h5py/_hl/files.py", line 312, in __init__
    fid = make_fid(name, mode, userblock_size, fapl, swmr=swmr)
  File "/home/john/anaconda3/lib/python3.6/site-packages/h5py/_hl/files.py", line 167, in make_fid
    fid = h5f.create(name, h5f.ACC_EXCL, fapl=fapl, fcpl=fcpl)
  File "h5py/_objects.pyx", line 54, in h5py._objects.with_phil.wrapper
  File "h5py/_objects.pyx", line 55, in h5py._objects.with_phil.wrapper
  File "h5py/h5f.pyx", line 98, in h5py.h5f.create
OSError: Unable to create file (unable to open file: name = './data_h5/47.h5', errno = 17, error message = 'File exists', flags = 15, o_flags = c2)

This is my configuration

HDF5 Version: 1.10.2
Configured on: Wed May  9 23:24:59 UTC 2018
Features:
---------
                  Parallel HDF5: no
             High-level library: yes
                   Threadsafety: yes
print (torch.__version__)
1.0.0.dev20181227

@fmassa
Copy link
Member

fmassa commented Apr 5, 2019

This happens because hdf5 is not thread safe.

Have a look at #3415 for further discussion and a potential solution.

@fmassa fmassa closed this as completed Apr 5, 2019
@John1231983
Copy link
Author

@fmassa : It does not solve my problem. I also got same error when adding the solution

 ForkingPickler(file, protocol).dump(obj)
TypeError: can't pickle _thread._local objects

@CDitzel
Copy link

CDitzel commented Apr 6, 2019

John I answered you in your thread on the pytorch board

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants