-
-
Notifications
You must be signed in to change notification settings - Fork 18.4k
BUG: properly close files opened by parsers #13940
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Argh, there are still some |
@agraboso looks a lot cleaner! https://travis-ci.org/pydata/pandas/jobs/150768150 you can leave the mmap one (if its not easy to fix). If you can get the stata ones would be great!. |
@@ -972,3 +972,5 @@ Bug Fixes | |||
|
|||
- Bug in ``Index`` raises ``KeyError`` displaying incorrect column when column is not in the df and columns contains duplicate values (:issue:`13822`) | |||
- Bug in ``Period`` and ``PeriodIndex`` creating wrong dates when frequency has combined offset aliases (:issue:`13874`) | |||
|
|||
- Bug where files opened by parsers (used by `pd.read_csv`, e.g.) were not closed properly. (:issue:`13940`) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
double back ticks around pd.read_csv
. Put this next to the other csv bug fixes.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
say when these were not closed properlty (e.g. nrows not None, when else?)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
are we sure that on iteration (e.g. chunksize / iterator specified) we are closing properly? (e.g. I am not sure if it closes, then re-opens each iteration).
cc @gfyoung do you know?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It does not re-open after each iteration AFAIK. You can see for yourself by writing to a CSV file, opening it with read_csv
and passing in chunksize
. Try then deleting the file in a separate window. You should be able to do so.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Indeed, when getting an iterator (chunksize
or iterator
specified) we do not close the file. When the iterator has been consumed and raises StopIteration
, we close the file. If a user requests an iterator but does not consume it fully, s/he must call .close() on it — though we could define a __del__
and take care of that.
Current coverage is 85.31% (diff: 61.44%)@@ master #13940 diff @@
==========================================
Files 139 139
Lines 50162 50370 +208
Methods 0 0
Messages 0 0
Branches 0 0
==========================================
+ Hits 42785 42975 +190
- Misses 7377 7395 +18
Partials 0 0
|
self._engine._reader.close() | ||
except: | ||
pass | ||
self._engine.close() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why can we get rid of this try
-except
block?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I shouldn't have removed it. But this was only used by CParserWrapper
, so I have put it there now in c7e9c9c.
@jreback The I have fixed the
|
if read_len <= 0: | ||
# Iterator has finished, should never be here unless | ||
# we are reading the file incrementally | ||
try: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
you don't need the try/except over everthing here. pls narrow down as much as possible.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I narrowed it down to the few places where an explicit ValueError
might come from another StataReader
method. It catches those cases that are in the tests, but it is possible for other lines to raise an error that is not caught — which would leave an open file handler.
I don't want to just wrap things at a high level, these should be as granular as possible.These can be actual errors (or errors in the tests, its hard to tell). Let uncaught resource warnings show thru (and then can investigate these individually). |
Notice that my proposal of wrapping most of |
@@ -253,6 +253,9 @@ def __init__(self, filepath_or_buffer, index=None, encoding='ISO-8859-1', | |||
|
|||
self._read_header() | |||
|
|||
def close(self): | |||
self.filepath_or_buffer.close() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
is this always a filehandle-like?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
hah! this turned into a rabbit hole!. lmk when you are ready. |
Tell me about it! It's possible that other parsers leave open files too, but I'd leave that for another PR. I think this one is ready.
And it's all green! |
thanks @agraboso I merged. left a couple comments that might be relevant for next PR :) we need to try to fix remaining |
@@ -393,11 +393,15 @@ def _read(filepath_or_buffer, kwds): | |||
raise NotImplementedError("'nrows' and 'chunksize' cannot be used" | |||
" together yet.") | |||
elif nrows is not None: | |||
return parser.read(nrows) | |||
data = parser.read(nrows) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
if you do
try:
return parser.read(nrows)
finally:
parse.close()
I think you can eliminate the .close()
calls where you raise on an exception (also needs changing at L402).
and slightly more idiomatic.
That way exceptions will bubble up, but we will still close.
git diff upstream/master | flake8 --diff
Addresses all
mostof theResourceWarning
s from #13932. There were three tests that opened files but did not close them. The rest were due to parsers (CParserWrapper
andPythonParser
) opening files or streams but never closing them. There was one case — themmap
one — that was tricky because there was a file handle going into it and immediately reaching a refcount of zero.I am at a loss about writing new tests for this. Is it even possible?