Skip to content

ENH: Make explode work for sets #35637

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 10 commits into from
Sep 5, 2020
2 changes: 1 addition & 1 deletion doc/source/whatsnew/v1.2.0.rst
Original file line number Diff line number Diff line change
Expand Up @@ -103,7 +103,7 @@ Other enhancements

- Added :meth:`~DataFrame.set_flags` for setting table-wide flags on a ``Series`` or ``DataFrame`` (:issue:`28394`)
- :class:`Index` with object dtype supports division and multiplication (:issue:`34160`)
-
- :meth:`DataFrame.explode` and :meth:`Series.explode` now support exploding of sets (:issue:`35614`)
-

.. _whatsnew_120.api_breaking.python:
Expand Down
6 changes: 4 additions & 2 deletions pandas/_libs/reshape.pyx
Original file line number Diff line number Diff line change
Expand Up @@ -124,7 +124,8 @@ def explode(ndarray[object] values):
counts = np.zeros(n, dtype='int64')
for i in range(n):
v = values[i]
if c_is_list_like(v, False):

if c_is_list_like(v, True):
if len(v):
counts[i] += len(v)
else:
Expand All @@ -138,8 +139,9 @@ def explode(ndarray[object] values):
for i in range(n):
v = values[i]

if c_is_list_like(v, False):
if c_is_list_like(v, True):
if len(v):
v = list(v)
for j in range(len(v)):
result[count] = v[j]
count += 1
Expand Down
7 changes: 4 additions & 3 deletions pandas/core/frame.py
Original file line number Diff line number Diff line change
Expand Up @@ -7091,10 +7091,11 @@ def explode(

Notes
-----
This routine will explode list-likes including lists, tuples,
This routine will explode list-likes including lists, tuples, sets,
Series, and np.ndarray. The result dtype of the subset rows will
be object. Scalars will be returned unchanged. Empty list-likes will
result in a np.nan for that row.
be object. Scalars will be returned unchanged, and empty list-likes will
result in a np.nan for that row. In addition, the ordering of rows in the
output will be non-deterministic when exploding sets.

Examples
--------
Expand Down
7 changes: 4 additions & 3 deletions pandas/core/series.py
Original file line number Diff line number Diff line change
Expand Up @@ -3829,10 +3829,11 @@ def explode(self, ignore_index: bool = False) -> "Series":

Notes
-----
This routine will explode list-likes including lists, tuples,
This routine will explode list-likes including lists, tuples, sets,
Series, and np.ndarray. The result dtype of the subset rows will
be object. Scalars will be returned unchanged. Empty list-likes will
result in a np.nan for that row.
be object. Scalars will be returned unchanged, and empty list-likes will
result in a np.nan for that row. In addition, the ordering of elements in
the output will be non-deterministic when exploding sets.

Examples
--------
Expand Down
8 changes: 8 additions & 0 deletions pandas/tests/frame/methods/test_explode.py
Original file line number Diff line number Diff line change
Expand Up @@ -172,3 +172,11 @@ def test_ignore_index():
{"id": [0, 0, 10, 10], "values": list("abcd")}, index=[0, 1, 2, 3]
)
tm.assert_frame_equal(result, expected)


def test_explode_sets():
# https://github.com/pandas-dev/pandas/issues/35614
df = pd.DataFrame({"a": [{"x", "y"}], "b": [1]}, index=[1])
result = df.explode(column="a").sort_values(by="a")
expected = pd.DataFrame({"a": ["x", "y"], "b": [1, 1]}, index=[1, 1])
tm.assert_frame_equal(result, expected)
8 changes: 8 additions & 0 deletions pandas/tests/series/methods/test_explode.py
Original file line number Diff line number Diff line change
Expand Up @@ -126,3 +126,11 @@ def test_ignore_index():
result = s.explode(ignore_index=True)
expected = pd.Series([1, 2, 3, 4], index=[0, 1, 2, 3], dtype=object)
tm.assert_series_equal(result, expected)


def test_explode_sets():
# https://github.com/pandas-dev/pandas/issues/35614
s = pd.Series([{"a", "b", "c"}], index=[1])
result = s.explode().sort_values()
expected = pd.Series(["a", "b", "c"], index=[1, 1, 1])
tm.assert_series_equal(result, expected)