Skip to content

allow HDFStore to remain open when TableIterator is returned from read_hdf #3937

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
wants to merge 1 commit into from

Conversation

seanyeh
Copy link

@seanyeh seanyeh commented Jun 17, 2013

Hi,
I'm using a TableIterator from pandas.read_hdf function (with the keyword argument iterator=True), I am unable to retrieve any data due to the error "ClosedNodeError: the node object is closed".

For instance:

pandas.DataFrame({'a':[1,2,3], 'b':[4,5,6]}).to_hdf("test.h5", "test", append=True)
it = pandas.read_hdf("test.h5","test",iterator=True)
iter(it).next()

Traceback (most recent call last):
  File "<ipython-input-22-5634d86698ab>", line 1, in <module>
    iter(it).next()
  File "/usr/local/lib/python2.7/site-packages/pandas/io/pytables.py", line 912, in __iter__
    v = self.func(current, stop)
  ...
  File "/usr/local/lib/python2.7/site-packages/tables/node.py", line 355, in _g_check_open
    raise ClosedNodeError("the node object is closed")
ClosedNodeError: the node object is closed

I looked through source code of panda.io.pytables and found that in the
get_store function, store.close() is always run when read_hdf returns, even if
it returns an TableIterator. My assumption is that store should remain open in
order for TableIterator to work. Can you please let me know if this fix is
acceptable, or is there an easier way to do this?

Thanks,
Sean

@jreback
Copy link
Contributor

jreback commented Jun 17, 2013

its only open/closed automatically when you use read_hdf, otherwise use store as normal. The example uses the more verbose syntax. http://pandas.pydata.org/pandas-docs/dev/io.html#iterator

@jreback
Copy link
Contributor

jreback commented Jun 18, 2013

@seanyeh anything further? otherwise pls close

@seanyeh
Copy link
Author

seanyeh commented Jun 18, 2013

I do find it odd that it allows an option that doesn't work. Thanks anyway

@seanyeh seanyeh closed this Jun 18, 2013
@jreback
Copy link
Contributor

jreback commented Jun 18, 2013

where does the option not work? it works exactly how its supposed to in read_hdf, which provides a context to open/close the file. When used with an already open store it doesn't close it.

How else would you expect it to work?

@jtratner
Copy link
Contributor

@jreback what's the point of passing iterator=True if you can't iterate over the result?

It seems intuitive that this should work, right?

for x in pandas.read_hdf("test.h5", "test", iterator=True):
    print x

but according to the example above, that would raise the closed node error.

Maybe it would make more sense to have TableIterator handle the cleanup if read_hdf is passed a path/string instead of an open store? (so, in __iter__, after control passes out of the while loop, do the cleanup that is necessary to close it up).

@jreback
Copy link
Contributor

jreback commented Jun 18, 2013

@jtratner @seanyeh
originally you always had to open/close stores yourself, read_hdf does this for you; I suppose it could be enabled with iterator/chunksize support like above (in which the context manager knowns to close it, but only after iter is done).

I guess if the context manager is passed an open handle then it shouldn't close it...

@jreback jreback reopened this Jun 18, 2013
@adgaudio
Copy link
Contributor

+1 for enabling read_hdf(..., iterator=True).

Arguably, since pytables automatically closes the h5 file via atexit.register(close_open_files), we don't need to explicitly close it.

@jreback
Copy link
Contributor

jreback commented Jun 18, 2013

@adgaudio I disagree.

These files need explicit open/close; or use with a context manager. ]

This is straightforward to address and will be fixed soon.

Relying on the system to close files is not good from a safety perspective nor good programming practice.

@adgaudio
Copy link
Contributor

Yea, true. Thanks everyone - I'll look forward to using the patch :)

@jtratner
Copy link
Contributor

Agree with @jreback. atexit is fragile and there's no reason not to handle
this explicitly.

On a separate note - @is it problematic to pass (or set) a reference to the
store to the TableIterator? Makes it much cleaner to handle that way...

@adgaudio https://github.com/adgaudio I disagree.

These files need explicit open/close; or use with a context manager. ]

This is straightforward to address and will be fixed soon.

Relying on the system to close files is not good from a safety perspective
nor good programming practice.


Reply to this email directly or view it on
GitHubhttps://github.com//pull/3937#issuecomment-19635061
.

@jreback
Copy link
Contributor

jreback commented Jun 18, 2013

support this in PR #3949, was pretty straightforward; handling exceptions is a bit tricky though,

@jtratner used your suggestion thanks

@seanyeh will be available in 0.11.1

@jreback jreback closed this Jun 18, 2013
@seanyeh
Copy link
Author

seanyeh commented Jun 18, 2013

Okay, this is great! thanks!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants