Skip to content

Ensure file is closed promptly in case of error #35587

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 1 commit into from
Sep 13, 2020
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
1 change: 1 addition & 0 deletions doc/source/whatsnew/v1.2.0.rst
Original file line number Diff line number Diff line change
Expand Up @@ -289,6 +289,7 @@ MultiIndex
I/O
^^^

- :func:`read_sas` no longer leaks resources on failure (:issue:`35566`)
- Bug in :meth:`to_csv` caused a ``ValueError`` when it was called with a filename in combination with ``mode`` containing a ``b`` (:issue:`35058`)
- In :meth:`read_csv` `float_precision='round_trip'` now handles `decimal` and `thousands` parameters (:issue:`35365`)
- :meth:`to_pickle` and :meth:`read_pickle` were closing user-provided file objects (:issue:`35679`)
Expand Down
8 changes: 6 additions & 2 deletions pandas/io/sas/sas7bdat.py
Original file line number Diff line number Diff line change
Expand Up @@ -142,8 +142,12 @@ def __init__(
self._path_or_buf = open(self._path_or_buf, "rb")
self.handle = self._path_or_buf

self._get_properties()
self._parse_metadata()
try:
self._get_properties()
self._parse_metadata()
except Exception:
self.close()
raise

def column_data_lengths(self):
"""Return a numpy int64 array of the column data lengths"""
Expand Down
6 changes: 5 additions & 1 deletion pandas/io/sas/sas_xport.py
Original file line number Diff line number Diff line change
Expand Up @@ -264,7 +264,11 @@ def __init__(
# should already be opened in binary mode in Python 3.
self.filepath_or_buffer = filepath_or_buffer

self._read_header()
try:
self._read_header()
except Exception:
self.close()
raise

def close(self):
self.filepath_or_buffer.close()
Expand Down
10 changes: 5 additions & 5 deletions pandas/io/sas/sasreader.py
Original file line number Diff line number Diff line change
Expand Up @@ -136,8 +136,8 @@ def read_sas(
if iterator or chunksize:
return reader

data = reader.read()

if ioargs.should_close:
reader.close()
return data
try:
return reader.read()
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is this needed that you are catching error (and closing) at the lower level? maybe only catch here

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The close at the lower level (sas7bdat.py, sas_xport.py) is to match with the get_filepath_or_buffer call 10 lines earlier, and also with various calls to close scatter around those files.
It is possible (if both 'lower level' file are only ever used via sasreader.py) that all close calls are unnecessary lower down, but I don't think they do any harm so I would prefer to leave the code symmetrical for the moment (open/close) and potentially get rid of all the close calls in a separate PR.
What do you think?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ok fair enough

finally:
if ioargs.should_close:
reader.close()
Binary file added pandas/tests/io/sas/data/corrupt.sas7bdat
Binary file not shown.
8 changes: 8 additions & 0 deletions pandas/tests/io/sas/test_sas7bdat.py
Original file line number Diff line number Diff line change
Expand Up @@ -217,6 +217,14 @@ def test_zero_variables(datapath):
pd.read_sas(fname)


def test_corrupt_read(datapath):
# We don't really care about the exact failure, the important thing is
# that the resource should be cleaned up afterwards (BUG #35566)
fname = datapath("io", "sas", "data", "corrupt.sas7bdat")
with pytest.raises(AttributeError):
pd.read_sas(fname)


def round_datetime_to_ms(ts):
if isinstance(ts, datetime):
return ts.replace(microsecond=int(round(ts.microsecond, -3) / 1000) * 1000)
Expand Down