Skip to content

Commit dd1f69e

Browse files
DanfernoJessemroeschke
authored
ENH: Update DataFrame.to_stata to handle pd.NA and None values in strL columns (#61286)
* ENH: Update DataFrame.to_stata to handle pd.NA and None values in strL columns * Update pandas/io/stata.py Co-authored-by: Matthew Roeschke <[email protected]> * Moved changelog msg to 3.0.0 and adapted phrasing --------- Co-authored-by: Jesse <[email protected]> Co-authored-by: Matthew Roeschke <[email protected]>
1 parent c38e1f1 commit dd1f69e

File tree

3 files changed

+17
-2
lines changed

3 files changed

+17
-2
lines changed

Diff for: doc/source/whatsnew/v3.0.0.rst

+1
Original file line numberDiff line numberDiff line change
@@ -767,6 +767,7 @@ I/O
767767
- Bug in :meth:`DataFrame.to_dict` raises unnecessary ``UserWarning`` when columns are not unique and ``orient='tight'``. (:issue:`58281`)
768768
- Bug in :meth:`DataFrame.to_excel` when writing empty :class:`DataFrame` with :class:`MultiIndex` on both axes (:issue:`57696`)
769769
- Bug in :meth:`DataFrame.to_excel` where the :class:`MultiIndex` index with a period level was not a date (:issue:`60099`)
770+
- Bug in :meth:`DataFrame.to_stata` when exporting a column containing both long strings (Stata strL) and :class:`pd.NA` values (:issue:`23633`)
770771
- Bug in :meth:`DataFrame.to_stata` when writing :class:`DataFrame` and ``byteorder=`big```. (:issue:`58969`)
771772
- Bug in :meth:`DataFrame.to_stata` when writing more than 32,000 value labels. (:issue:`60107`)
772773
- Bug in :meth:`DataFrame.to_string` that raised ``StopIteration`` with nested DataFrames. (:issue:`16098`)

Diff for: pandas/io/stata.py

+2-2
Original file line numberDiff line numberDiff line change
@@ -3196,8 +3196,8 @@ def generate_table(self) -> tuple[dict[str, tuple[int, int]], DataFrame]:
31963196
for o, (idx, row) in enumerate(selected.iterrows()):
31973197
for j, (col, v) in enumerate(col_index):
31983198
val = row[col]
3199-
# Allow columns with mixed str and None (GH 23633)
3200-
val = "" if val is None else val
3199+
# Allow columns with mixed str and None or pd.NA (GH 23633)
3200+
val = "" if isna(val) else val
32013201
key = gso_table.get(val, None)
32023202
if key is None:
32033203
# Stata prefers human numbers

Diff for: pandas/tests/io/test_stata.py

+14
Original file line numberDiff line numberDiff line change
@@ -2587,3 +2587,17 @@ def test_many_strl(temp_file, version):
25872587
lbls = ["".join(v) for v in itertools.product(*([string.ascii_letters] * 3))]
25882588
value_labels = {"col": {i: lbls[i] for i in range(n)}}
25892589
df.to_stata(temp_file, value_labels=value_labels, version=version)
2590+
2591+
2592+
@pytest.mark.parametrize("version", [117, 118, 119, None])
2593+
def test_strl_missings(temp_file, version):
2594+
# GH 23633
2595+
# Check that strl supports None and pd.NA
2596+
df = DataFrame(
2597+
[
2598+
{"str1": "string" * 500, "number": 0},
2599+
{"str1": None, "number": 1},
2600+
{"str1": pd.NA, "number": 1},
2601+
]
2602+
)
2603+
df.to_stata(temp_file, version=version)

0 commit comments

Comments
 (0)