Skip to content

Commit a192650

Browse files
montanalowTomAugspurger
authored andcommitted
BUG: Don't raise exceptions splitting a blank string (#20067)
* don't raise exceptions splitting a blank string
1 parent 699a48b commit a192650

File tree

3 files changed

+16
-1
lines changed

3 files changed

+16
-1
lines changed

doc/source/whatsnew/v0.23.0.txt

+1
Original file line numberDiff line numberDiff line change
@@ -843,6 +843,7 @@ Categorical
843843
``self`` but in a different order (:issue:`19551`)
844844
- Bug in :meth:`Index.astype` with a categorical dtype where the resultant index is not converted to a :class:`CategoricalIndex` for all types of index (:issue:`18630`)
845845
- Bug in :meth:`Series.astype` and ``Categorical.astype()`` where an existing categorical data does not get updated (:issue:`10696`, :issue:`18593`)
846+
- Bug in :meth:`Series.str.split` with ``expand=True`` incorrectly raising an IndexError on empty strings (:issue:`20002`).
846847
- Bug in :class:`Index` constructor with ``dtype=CategoricalDtype(...)`` where ``categories`` and ``ordered`` are not maintained (issue:`19032`)
847848
- Bug in :class:`Series` constructor with scalar and ``dtype=CategoricalDtype(...)`` where ``categories`` and ``ordered`` are not maintained (issue:`19565`)
848849
- Bug in ``Categorical.__iter__`` not converting to Python types (:issue:`19909`)

pandas/core/strings.py

+2-1
Original file line numberDiff line numberDiff line change
@@ -1749,7 +1749,8 @@ def cons_row(x):
17491749
if result:
17501750
# propagate nan values to match longest sequence (GH 18450)
17511751
max_len = max(len(x) for x in result)
1752-
result = [x * max_len if x[0] is np.nan else x for x in result]
1752+
result = [x * max_len if len(x) == 0 or x[0] is np.nan
1753+
else x for x in result]
17531754

17541755
if not isinstance(expand, bool):
17551756
raise ValueError("expand must be True or False")

pandas/tests/test_strings.py

+13
Original file line numberDiff line numberDiff line change
@@ -1992,6 +1992,19 @@ def test_rsplit(self):
19921992
exp = Series([['a_b', 'c'], ['c_d', 'e'], NA, ['f_g', 'h']])
19931993
tm.assert_series_equal(result, exp)
19941994

1995+
def test_split_blank_string(self):
1996+
# expand blank split GH 20067
1997+
values = Series([''], name='test')
1998+
result = values.str.split(expand=True)
1999+
exp = DataFrame([[]])
2000+
tm.assert_frame_equal(result, exp)
2001+
2002+
values = Series(['a b c', 'a b', '', ' '], name='test')
2003+
result = values.str.split(expand=True)
2004+
exp = DataFrame([['a', 'b', 'c'], ['a', 'b', np.nan],
2005+
[np.nan, np.nan, np.nan], [np.nan, np.nan, np.nan]])
2006+
tm.assert_frame_equal(result, exp)
2007+
19952008
def test_split_noargs(self):
19962009
# #1859
19972010
s = Series(['Wes McKinney', 'Travis Oliphant'])

0 commit comments

Comments
 (0)