Skip to content

Commit 00ab182

Browse files
committed
BUG: Ensure incomplete stata files are deleted
Attempt to delete failed writes and warn if not able to delete
1 parent 122edfc commit 00ab182

File tree

3 files changed

+24
-3
lines changed

3 files changed

+24
-3
lines changed

doc/source/whatsnew/v0.24.0.rst

+2-1
Original file line numberDiff line numberDiff line change
@@ -366,7 +366,7 @@ Other Enhancements
366366
- :meth:`Index.difference` now has an optional ``sort`` parameter to specify whether the results should be sorted if possible (:issue:`17839`)
367367
- :meth:`read_excel()` now accepts ``usecols`` as a list of column names or callable (:issue:`18273`)
368368
- :meth:`MultiIndex.to_flat_index` has been added to flatten multiple levels into a single-level :class:`Index` object.
369-
- :meth:`DataFrame.to_stata` and :class:` pandas.io.stata.StataWriter117` can write mixed sting columns to Stata strl format (:issue:`23633`)
369+
- :meth:`DataFrame.to_stata` and :class:`pandas.io.stata.StataWriter117` can write mixed sting columns to Stata strl format (:issue:`23633`)
370370
- :meth:`DataFrame.between_time` and :meth:`DataFrame.at_time` have gained the an ``axis`` parameter (:issue: `8839`)
371371
- :class:`IntervalIndex` has gained the :attr:`~IntervalIndex.is_overlapping` attribute to indicate if the ``IntervalIndex`` contains any overlapping intervals (:issue:`23309`)
372372

@@ -1561,6 +1561,7 @@ Notice how we now instead output ``np.nan`` itself instead of a stringified form
15611561
- :func:`DataFrame.to_string()`, :func:`DataFrame.to_html()`, :func:`DataFrame.to_latex()` will correctly format output when a string is passed as the ``float_format`` argument (:issue:`21625`, :issue:`22270`)
15621562
- Bug in :func:`read_csv` that caused it to raise ``OverflowError`` when trying to use 'inf' as ``na_value`` with integer index column (:issue:`17128`)
15631563
- Bug in :func:`json_normalize` that caused it to raise ``TypeError`` when two consecutive elements of ``record_path`` are dicts (:issue:`22706`)
1564+
- Bug in :meth:`DataFrame.to_stata`, :class:`pandas.io.stata.StataWriter` and :class:`pandas.io.stata.StataWriter117` where a exception would leave a partially written and invalid dta file (:issue:`23573`)
15641565

15651566
Plotting
15661567
^^^^^^^^

pandas/io/stata.py

+12-1
Original file line numberDiff line numberDiff line change
@@ -12,6 +12,7 @@
1212

1313
from collections import OrderedDict
1414
import datetime
15+
import os
1516
import struct
1617
import sys
1718
import warnings
@@ -2209,7 +2210,17 @@ def write_file(self):
22092210
self._write_value_labels()
22102211
self._write_file_close_tag()
22112212
self._write_map()
2212-
finally:
2213+
except Exception as exc:
2214+
self._close()
2215+
try:
2216+
if self._own_file:
2217+
os.unlink(self._fname)
2218+
except Exception:
2219+
warnings.warn('This save was not successful but {0} could not '
2220+
'be deleted. This file is not '
2221+
'valid.'.format(self._fname), ResourceWarning)
2222+
raise exc
2223+
else:
22132224
self._close()
22142225

22152226
def _close(self):

pandas/tests/io/test_stata.py

+10-1
Original file line numberDiff line numberDiff line change
@@ -16,7 +16,7 @@
1616
import pandas as pd
1717
import pandas.util.testing as tm
1818
import pandas.compat as compat
19-
from pandas.compat import iterkeys
19+
from pandas.compat import iterkeys, PY3
2020
from pandas.core.dtypes.common import is_categorical_dtype
2121
from pandas.core.frame import DataFrame, Series
2222
from pandas.io.parsers import read_csv
@@ -1546,3 +1546,12 @@ def test_all_none_exception(self, version):
15461546
output.to_stata(path, version=version)
15471547
assert 'Only string-like' in excinfo.value.args[0]
15481548
assert 'Column `none`' in excinfo.value.args[0]
1549+
1550+
@pytest.mark.parametrize('version', [114, 117])
1551+
def test_invalid_file_not_written(self, version):
1552+
content = 'Here is one __�__ Another one __·__ Another one __½__'
1553+
df = DataFrame([content], columns=['invalid'])
1554+
expected_exc = UnicodeEncodeError if PY3 else UnicodeDecodeError
1555+
with tm.ensure_clean() as path:
1556+
with pytest.raises(expected_exc):
1557+
df.to_stata(path)

0 commit comments

Comments
 (0)