Skip to content

Commit ae7b6d6

Browse files
authored
CLN/FIX/PERF: Don't buffer entire Stata file into memory (#49228)
* CLN: StataReader: refactor repeated struct.unpack/read calls to helpers * CLN: StataReader: replace string concatenations with f-strings * CLN: StataReader: prefix internal state with underscore * FIX: StataReader: defer opening file to when data is required * FIX: StataReader: don't buffer entire file into memory unless necessary Refs #48922 * DOC: Note that StataReaders are context managers * FIX: StataReader: don't close stream implicitly * Apply review changes
1 parent 25775ba commit ae7b6d6

File tree

4 files changed

+351
-253
lines changed

4 files changed

+351
-253
lines changed

doc/source/user_guide/io.rst

+8
Original file line numberDiff line numberDiff line change
@@ -6033,6 +6033,14 @@ values will have ``object`` data type.
60336033
``int64`` for all integer types and ``float64`` for floating point data. By default,
60346034
the Stata data types are preserved when importing.
60356035

6036+
.. note::
6037+
6038+
All :class:`~pandas.io.stata.StataReader` objects, whether created by :func:`~pandas.read_stata`
6039+
(when using ``iterator=True`` or ``chunksize``) or instantiated by hand, must be used as context
6040+
managers (e.g. the ``with`` statement).
6041+
While the :meth:`~pandas.io.stata.StataReader.close` method is available, its use is unsupported.
6042+
It is not part of the public API and will be removed in with future without warning.
6043+
60366044
.. ipython:: python
60376045
:suppress:
60386046

doc/source/whatsnew/v2.0.0.rst

+3
Original file line numberDiff line numberDiff line change
@@ -857,6 +857,7 @@ Deprecations
857857
- Deprecated :meth:`Series.backfill` in favor of :meth:`Series.bfill` (:issue:`33396`)
858858
- Deprecated :meth:`DataFrame.pad` in favor of :meth:`DataFrame.ffill` (:issue:`33396`)
859859
- Deprecated :meth:`DataFrame.backfill` in favor of :meth:`DataFrame.bfill` (:issue:`33396`)
860+
- Deprecated :meth:`~pandas.io.stata.StataReader.close`. Use :class:`~pandas.io.stata.StataReader` as a context manager instead (:issue:`49228`)
860861

861862
.. ---------------------------------------------------------------------------
862863
.. _whatsnew_200.prior_deprecations:
@@ -1163,6 +1164,8 @@ Performance improvements
11631164
- Fixed a reference leak in :func:`read_hdf` (:issue:`37441`)
11641165
- Fixed a memory leak in :meth:`DataFrame.to_json` and :meth:`Series.to_json` when serializing datetimes and timedeltas (:issue:`40443`)
11651166
- Decreased memory usage in many :class:`DataFrameGroupBy` methods (:issue:`51090`)
1167+
- Memory improvement in :class:`StataReader` when reading seekable files (:issue:`48922`)
1168+
11661169

11671170
.. ---------------------------------------------------------------------------
11681171
.. _whatsnew_200.bug_fixes:

0 commit comments

Comments
 (0)