Skip to content

Commit b91af62

Browse files
WillAydTomAugspurger
authored andcommitted
Propogating NaN values when using str.split (#18450) (#18462)
(cherry picked from commit 20f6512)
1 parent 604fcb6 commit b91af62

File tree

3 files changed

+21
-1
lines changed

3 files changed

+21
-1
lines changed

doc/source/whatsnew/v0.21.1.txt

+5-1
Original file line numberDiff line numberDiff line change
@@ -140,9 +140,13 @@ Categorical
140140
- ``CategoricalIndex`` can now correctly take a ``pd.api.types.CategoricalDtype`` as its dtype (:issue:`18116`)
141141
- Bug in ``Categorical.unique()`` returning read-only ``codes`` array when all categories were ``NaN`` (:issue:`18051`)
142142

143+
String
144+
^^^^^^
145+
146+
- :meth:`Series.str.split()` will now propogate ``NaN`` values across all expanded columns instead of ``None`` (:issue:`18450`)
147+
143148
Other
144149
^^^^^
145150

146151
-
147152
-
148-
-

pandas/core/strings.py

+4
Original file line numberDiff line numberDiff line change
@@ -1423,6 +1423,10 @@ def cons_row(x):
14231423
return [x]
14241424

14251425
result = [cons_row(x) for x in result]
1426+
if result:
1427+
# propogate nan values to match longest sequence (GH 18450)
1428+
max_len = max(len(x) for x in result)
1429+
result = [x * max_len if x[0] is np.nan else x for x in result]
14261430

14271431
if not isinstance(expand, bool):
14281432
raise ValueError("expand must be True or False")

pandas/tests/test_strings.py

+12
Original file line numberDiff line numberDiff line change
@@ -2086,6 +2086,18 @@ def test_rsplit_to_multiindex_expand(self):
20862086
tm.assert_index_equal(result, exp)
20872087
assert result.nlevels == 2
20882088

2089+
def test_split_nan_expand(self):
2090+
# gh-18450
2091+
s = Series(["foo,bar,baz", NA])
2092+
result = s.str.split(",", expand=True)
2093+
exp = DataFrame([["foo", "bar", "baz"], [NA, NA, NA]])
2094+
tm.assert_frame_equal(result, exp)
2095+
2096+
# check that these are actually np.nan and not None
2097+
# TODO see GH 18463
2098+
# tm.assert_frame_equal does not differentiate
2099+
assert all(np.isnan(x) for x in result.iloc[1])
2100+
20892101
def test_split_with_name(self):
20902102
# GH 12617
20912103

0 commit comments

Comments
 (0)