Skip to content

BUG: do not crash on a callable grouper returning tuples #22279

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
wants to merge 1 commit into from
Closed
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
5 changes: 3 additions & 2 deletions doc/source/whatsnew/v0.23.5.txt
Original file line number Diff line number Diff line change
Expand Up @@ -20,10 +20,11 @@ and bug fixes. We recommend that all users upgrade to this version.
Fixed Regressions
~~~~~~~~~~~~~~~~~

- Constructing a DataFrame with an index argument that wasn't already an
- Constructing a :class:`DataFrame` with an index argument that wasn't already an
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

let' move to 0.24

instance of :class:`~pandas.core.Index` was broken in `4efb39f
<https://github.com/pandas-dev/pandas/commit/4efb39f01f5880122fa38d91e12d217ef70fad9e>`_ (:issue:`22227`).
-
- Passing :meth:`DataFrame.groupby` as grouper a callable or mapping which
returns tuples was broken in 0.21.1 (:issue:`22257`).
-

.. _whatsnew_0235.bug_fixes:
Expand Down
9 changes: 9 additions & 0 deletions pandas/core/groupby/grouper.py
Original file line number Diff line number Diff line change
Expand Up @@ -508,6 +508,15 @@ def _get_grouper(obj, key=None, axis=0, level=None, sort=True,
warnings.warn(msg, FutureWarning, stacklevel=5)
key = list(key)

if callable(key) or isinstance(key, dict):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is the isinstance(key, dict) new here? If so probably want a test case to cover

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(done - expanded the test)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can you integrate this with the below (e.g. i/elif maybe enough)?
can add a top-levevel comment on what is going on here

if level is None:
key = group_axis.map(key)
else:
key = group_axis.get_level_values(level=level).map(key)
# If the grouper is a mapping, 'level' is _only_ used to determine
# the mapping input
level = None

if not isinstance(key, list):
keys = [key]
match_axis_length = False
Expand Down
19 changes: 18 additions & 1 deletion pandas/tests/groupby/test_grouping.py
Original file line number Diff line number Diff line change
Expand Up @@ -202,6 +202,23 @@ def test_grouper_creation_bug(self):
expected = s.groupby(level='one').sum()
assert_series_equal(result, expected)

@pytest.mark.parametrize('func', [False, True])
def test_grouper_returning_tuples(self, func):
# GH 22257 , both with dict and with callable
df = pd.DataFrame({'X': ['A', 'B', 'A', 'B'], 'Y': [1, 4, 3, 2]})
mapping = dict(zip(range(4), [('C', 5), ('D', 6)] * 2))

if func:
gb = df.groupby(by=lambda idx: mapping[idx], sort=False)
else:
gb = df.groupby(by=mapping, sort=False)

name, expected = list(gb)[0]
assert name == ('C', 5)
result = gb.get_group(name)

assert_frame_equal(result, expected)

def test_grouper_column_and_index(self):
# GH 14327

Expand Down Expand Up @@ -346,7 +363,7 @@ def test_groupby_grouper_f_sanity_checked(self):
# when the elements are Timestamp.
# the result is Index[0:6], very confusing.

pytest.raises(AssertionError, ts.groupby, lambda key: key[0:6])
pytest.raises(ValueError, ts.groupby, lambda key: key[0:6])
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What is the reason for this change?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The single callable/mapping case used to follow the same code path as lists of callables/mappings, while it now follows the same code path as e.g. lists of Series. The two happen to check the length of the grouper in different ways (the former should probably be changed to ValueError too, but we might decide to remove it altogether in #22278).

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Notice this test will also change when we remove the absurd DatetimeIndex.map implementation (see #22312 ) but that is presumably not happening in a bugfix release.


def test_grouping_error_on_multidim_input(self, df):
pytest.raises(ValueError,
Expand Down