-
-
Notifications
You must be signed in to change notification settings - Fork 18.4k
BUG: groupby any/all raising on extension types #40621
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
BUG: groupby any/all raising on extension types #40621
Conversation
pandas/core/groupby/groupby.py
Outdated
if is_object_dtype(vals): | ||
vals = np.array([bool(x) for x in vals]) | ||
else: | ||
if isinstance(vals, ExtensionArray): | ||
vals = vals.to_numpy(dtype=bool, na_value=np.nan) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
in these cases we may want to call the EA method rather than cast to numpy, cc @jorisvandenbossche ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yep not sure this is the best approach, was just looking at how masked groupby quantile was handled (IIUC the only masked groupby algo, along with this one). Want to start looking into converting cython algos to better handle EA through taking masks, any advice on a good approach here and elsewhere would be much appreciated.
What do you mean by call the EA method? Won't we eventually need to end up as an ndarray for cython algo?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
also pls use elif
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done
pandas/core/groupby/groupby.py
Outdated
if is_object_dtype(vals): | ||
vals = np.array([bool(x) for x in vals]) | ||
else: | ||
if isinstance(vals, ExtensionArray): | ||
vals = vals.to_numpy(dtype=bool, na_value=np.nan) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
also pls use elif
pandas/tests/groupby/test_groupby.py
Outdated
@@ -1994,6 +1994,19 @@ def test_bool_aggs_dup_column_labels(bool_agg_func): | |||
tm.assert_frame_equal(result, expected) | |||
|
|||
|
|||
@pytest.mark.parametrize("bool_agg_func", ["any", "all"]) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
these should be in test_function.py, next to / part of the test_groupby_bool_aggs which fully parametrizes these.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
moved
@mzeitlin11 - Does this PR also impact SeriesGroupBy? |
Thanks for bringing this up! Was also failing for |
@pytest.mark.parametrize("bool_agg_func", ["any", "all"]) | ||
def test_bool_aggs_dup_column_labels(bool_agg_func): | ||
# 21668 | ||
df = DataFrame([[True, True]], columns=["a", "a"]) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can you add in skipna as a parameter in these
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Have added skipna parameterization to test_bool_aggs_dup_column_labels
(this was just a test moved to keep any/all tests together)
For the added nullable dtype test below, problem is that skipna=False
case doesn't currently give the correct output (#37506)
Going to close in favor of a more comprehensive approach for #37506, for which this does not end up a helpful starting point |
Will follow up by looking into #37506 so that the
skipna=False
case can also be handled correctly.