Skip to content

Suggestion/Discussion: read_hdf #7715

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
vfilimonov opened this issue Jul 10, 2014 · 3 comments · Fixed by #8373
Closed

Suggestion/Discussion: read_hdf #7715

vfilimonov opened this issue Jul 10, 2014 · 3 comments · Fixed by #8373
Labels
API Design IO HDF5 read_hdf, HDFStore
Milestone

Comments

@vfilimonov
Copy link
Contributor

At the moment read_hdf is implemented such way that it opens HDFStore connection immediately when it is called. If the file does not exist, then it creates an HDF file and then throws KeyError. There are two side effects:

  • You can not distinguish situation when the file does not exist from situation when the key does not exist in the file
  • If you have a typo in the filename, empty file is created -> polluting directory

I suggest to check if the file exist before and throw IOError if it does not.

@jreback
Copy link
Contributor

jreback commented Jul 10, 2014

sounds reasonable. The reason this behavior is that the default mode is a (append), to avoid certain issues with concurrency with some older pytables. iow. you can open a file multiple times (even across processes) no problem, but (even though this seems backward), opening in append mode with always work, but opening in read-only mode can raise if the file is opened in append mode elsewhere.

so you can open with mode='r' and this should raise`IOError`` if it doesn't exist.

the api change is to do the same even if the mode is append AND its called from read_hdf AND the file doesn't exist.

pull-requests are welcome!

@jreback jreback added this to the 0.15.0 milestone Jul 10, 2014
@jreback jreback modified the milestones: 0.15.1, 0.15.0 Sep 9, 2014
@jbradish
Copy link
Contributor

I was looking around the issues and this one popped out a relatively simple to fix. I'm new in pandas dev, so forgive me if I'm misunderstanding. Is the fix/change proposal for this simply some of kind file existence check in https://github.com/pydata/pandas/blob/master/pandas/io/pytables.py#L299 before we create an entire HDFStore object? Or is there still a need to create an HDFStore object, but to pop an IOError instead of a KeyError? When would we want to still create a HDFStore object, even if read_hdf fails?

@jreback
Copy link
Contributor

jreback commented Sep 23, 2014

your first suggestion is right. if the file doesn't exist (and you are in read_hdf), then don't create it, but raise an IOError

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
API Design IO HDF5 read_hdf, HDFStore
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants