BUG: properly close files opened by parsers #13940

agraboso · 2016-08-08T21:42:25Z

closes CLN: resource warnings #13932
tests passed
passes git diff upstream/master | flake8 --diff
whatsnew entry

Addresses all ~~most~~ of the ResourceWarnings from #13932. There were three tests that opened files but did not close them. The rest were due to parsers (CParserWrapper and PythonParser) opening files or streams but never closing them. There was one case — the mmap one — that was tricky because there was a file handle going into it and immediately reaching a refcount of zero.

I am at a loss about writing new tests for this. Is it even possible?

agraboso · 2016-08-08T21:58:05Z

Argh, there are still some ResourceWarnings left... Let me investigate.

jreback · 2016-08-08T23:04:25Z

@agraboso looks a lot cleaner! https://travis-ci.org/pydata/pandas/jobs/150768150

you can leave the mmap one (if its not easy to fix). If you can get the stata ones would be great!.

jreback · 2016-08-08T23:05:00Z

doc/source/whatsnew/v0.19.0.txt

@@ -972,3 +972,5 @@ Bug Fixes

 - Bug in ``Index`` raises ``KeyError`` displaying incorrect column when column is not in the df and columns contains duplicate values (:issue:`13822`)
 - Bug in ``Period`` and ``PeriodIndex`` creating wrong dates when frequency has combined offset aliases (:issue:`13874`)
+
+- Bug where files opened by parsers (used by `pd.read_csv`, e.g.) were not closed properly. (:issue:`13940`)


double back ticks around pd.read_csv. Put this next to the other csv bug fixes.

say when these were not closed properlty (e.g. nrows not None, when else?)

are we sure that on iteration (e.g. chunksize / iterator specified) we are closing properly? (e.g. I am not sure if it closes, then re-opens each iteration).

cc @gfyoung do you know?

It does not re-open after each iteration AFAIK. You can see for yourself by writing to a CSV file, opening it with read_csv and passing in chunksize. Try then deleting the file in a separate window. You should be able to do so.

Indeed, when getting an iterator (chunksize or iterator specified) we do not close the file. When the iterator has been consumed and raises StopIteration, we close the file. If a user requests an iterator but does not consume it fully, s/he must call .close() on it — though we could define a __del__ and take care of that.

codecov-io · 2016-08-08T23:05:56Z

Current coverage is 85.31% (diff: 61.44%)

Merging #13940 into master will increase coverage by 0.02%

@@             master     #13940   diff @@
==========================================
  Files           139        139          
  Lines         50162      50370   +208   
  Methods           0          0          
  Messages          0          0          
  Branches          0          0          
==========================================
+ Hits          42785      42975   +190   
- Misses         7377       7395    +18   
  Partials          0          0

Powered by Codecov. Last update 5df9123...3fa7d25

gfyoung · 2016-08-09T01:09:19Z

pandas/io/parsers.py

-            self._engine._reader.close()
-        except:
-            pass
+        self._engine.close()


Why can we get rid of this try-except block?

I shouldn't have removed it. But this was only used by CParserWrapper, so I have put it there now in c7e9c9c.

agraboso · 2016-08-09T19:12:59Z

@jreback The mmap one is fixed since the first commit — I was just making the point that it took me a while to understand what was wrong.

I have fixed the stata cases too:

StataWriter was easy: I moved the opening of the file to the write_file method. The file is thus always opened and closed within it, whether errors are raised or not.
StataReader did not close the file when raising an error. I fixed it by wrapping most of the logic in its read method in a try/except block. This even takes care of closing the file when a StopIteration is raised.

jreback · 2016-08-09T22:06:43Z

pandas/io/stata.py

-        if read_len <= 0:
-            # Iterator has finished, should never be here unless
-            # we are reading the file incrementally
+        try:


you don't need the try/except over everthing here. pls narrow down as much as possible.

I narrowed it down to the few places where an explicit ValueError might come from another StataReader method. It catches those cases that are in the tests, but it is possible for other lines to raise an error that is not caught — which would leave an open file handler.

jreback · 2016-08-09T23:35:29Z

I don't want to just wrap things at a high level, these should be as granular as possible.These can be actual errors (or errors in the tests, its hard to tell). Let uncaught resource warnings show thru (and then can investigate these individually).

agraboso · 2016-08-09T23:50:54Z

Notice that my proposal of wrapping most of StataReader.read in a try/except block had both a self.close() and a raise, so that we would still see all warnings and errors while making sure we do close the file.

jreback · 2016-08-10T22:14:01Z

pandas/io/sas/sas_xport.py

@@ -253,6 +253,9 @@ def __init__(self, filepath_or_buffer, index=None, encoding='ISO-8859-1',

        self._read_header()

+    def close(self):
+        self.filepath_or_buffer.close()


is this always a filehandle-like?

It's either a file handle coming from L244 or a BytesIO object coming from L252. So it is always safe to close it.

jreback · 2016-08-10T22:14:55Z

hah! this turned into a rabbit hole!. lmk when you are ready.

agraboso · 2016-08-11T00:59:25Z

hah! this turned into a rabbit hole!

Tell me about it! It's possible that other parsers leave open files too, but I'd leave that for another PR. I think this one is ready.

lmk when you are ready.

And it's all green!

jreback · 2016-08-11T10:37:30Z

thanks @agraboso I merged. left a couple comments that might be relevant for next PR :)

we need to try to fix remaining ResourceWarnings then raise if ANY are detected (its actually pretty easy). in pandas.util.testing.TestCase.setUpClass you can set the warning to error. This way its not set globally when testing_mode() is called (in case testing is just imported by a user and not actually run in the test suite).

jreback · 2016-08-11T10:39:07Z

pandas/io/parsers.py

@@ -393,11 +393,15 @@ def _read(filepath_or_buffer, kwds):
        raise NotImplementedError("'nrows' and 'chunksize' cannot be used"
                                  " together yet.")
    elif nrows is not None:
-        return parser.read(nrows)
+        data = parser.read(nrows)


if you do

try: return parser.read(nrows) finally: parse.close()

I think you can eliminate the .close() calls where you raise on an exception (also needs changing at L402).
and slightly more idiomatic.

That way exceptions will bubble up, but we will still close.

jreback added IO CSV read_csv, to_csv IO Stata read_stata, to_stata Clean labels Aug 8, 2016

jreback reviewed Aug 8, 2016
View reviewed changes

gfyoung reviewed Aug 9, 2016
View reviewed changes

jreback reviewed Aug 9, 2016
View reviewed changes

agraboso added 14 commits August 9, 2016 19:57

BUG: properly close files opened by parsers

d759156

Properly close opened files in three tests

1e39a5e

Properly close opened files in StataReader

3b0f25f

Properly close opened files in StataWriter

30b61e6

Fix whatsnew entries

99e16dd

On close, CParserWrapper must call self._reader.close()

c7e9c9c

Fix long line

812e6ec

Make try/except blocks in StataReader.read as small as possible

75fc34d

Fix rebase

39dcd99

Fix linting error

52d1073

Properly close opened files in two tests

240383c

Properly close opened files in XportReader and SAS7BDATReader

7aa5184

Do not acquire list as file handler to close

6592c73

Close open files in TextFileReader upon StopIteration

3fa7d25

jreback reviewed Aug 10, 2016
View reviewed changes

jreback closed this in 4a80521 Aug 11, 2016

jreback added this to the 0.19.0 milestone Aug 11, 2016

jreback reviewed Aug 11, 2016
View reviewed changes

jreback mentioned this pull request Aug 11, 2016

TST: fail test suite on ResourceWarnings #13962

Closed

agraboso deleted the fix-13932 branch August 14, 2016 00:34

gfyoung mentioned this pull request Aug 20, 2016

TST: remove ResourceWarnings #12897

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

BUG: properly close files opened by parsers #13940

BUG: properly close files opened by parsers #13940

agraboso commented Aug 8, 2016 •

edited

Loading

agraboso commented Aug 8, 2016

jreback commented Aug 8, 2016

jreback Aug 8, 2016

jreback Aug 8, 2016

jreback Aug 8, 2016

gfyoung Aug 9, 2016

agraboso Aug 9, 2016 •

edited

Loading

codecov-io commented Aug 8, 2016 •

edited

Loading

gfyoung Aug 9, 2016

agraboso Aug 9, 2016 •

edited

Loading

agraboso commented Aug 9, 2016

jreback Aug 9, 2016 •

edited

Loading

agraboso Aug 9, 2016

jreback commented Aug 9, 2016

agraboso commented Aug 9, 2016

jreback Aug 10, 2016

agraboso Aug 11, 2016

jreback commented Aug 10, 2016

agraboso commented Aug 11, 2016 •

edited

Loading

jreback commented Aug 11, 2016

jreback Aug 11, 2016 •

edited

Loading

BUG: properly close files opened by parsers #13940

BUG: properly close files opened by parsers #13940

Conversation

agraboso commented Aug 8, 2016 • edited Loading

agraboso commented Aug 8, 2016

jreback commented Aug 8, 2016

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

agraboso Aug 9, 2016 • edited Loading

Choose a reason for hiding this comment

codecov-io commented Aug 8, 2016 • edited Loading

Current coverage is 85.31% (diff: 61.44%)

Choose a reason for hiding this comment

agraboso Aug 9, 2016 • edited Loading

Choose a reason for hiding this comment

agraboso commented Aug 9, 2016

jreback Aug 9, 2016 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

jreback commented Aug 9, 2016

agraboso commented Aug 9, 2016

Choose a reason for hiding this comment

Choose a reason for hiding this comment

jreback commented Aug 10, 2016

agraboso commented Aug 11, 2016 • edited Loading

jreback commented Aug 11, 2016

jreback Aug 11, 2016 • edited Loading

Choose a reason for hiding this comment

agraboso commented Aug 8, 2016 •

edited

Loading

agraboso Aug 9, 2016 •

edited

Loading

codecov-io commented Aug 8, 2016 •

edited

Loading

agraboso Aug 9, 2016 •

edited

Loading

jreback Aug 9, 2016 •

edited

Loading

agraboso commented Aug 11, 2016 •

edited

Loading

jreback Aug 11, 2016 •

edited

Loading