Skip to content

BUG: groupby any/all raising on extension types #40621

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed

Conversation

mzeitlin11
Copy link
Member

Will follow up by looking into #37506 so that the skipna=False case can also be handled correctly.

@mzeitlin11 mzeitlin11 added Bug Groupby NA - MaskedArrays Related to pd.NA and nullable extension arrays labels Mar 24, 2021
if is_object_dtype(vals):
vals = np.array([bool(x) for x in vals])
else:
if isinstance(vals, ExtensionArray):
vals = vals.to_numpy(dtype=bool, na_value=np.nan)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

in these cases we may want to call the EA method rather than cast to numpy, cc @jorisvandenbossche ?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yep not sure this is the best approach, was just looking at how masked groupby quantile was handled (IIUC the only masked groupby algo, along with this one). Want to start looking into converting cython algos to better handle EA through taking masks, any advice on a good approach here and elsewhere would be much appreciated.

What do you mean by call the EA method? Won't we eventually need to end up as an ndarray for cython algo?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

also pls use elif

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

if is_object_dtype(vals):
vals = np.array([bool(x) for x in vals])
else:
if isinstance(vals, ExtensionArray):
vals = vals.to_numpy(dtype=bool, na_value=np.nan)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

also pls use elif

@@ -1994,6 +1994,19 @@ def test_bool_aggs_dup_column_labels(bool_agg_func):
tm.assert_frame_equal(result, expected)


@pytest.mark.parametrize("bool_agg_func", ["any", "all"])
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

these should be in test_function.py, next to / part of the test_groupby_bool_aggs which fully parametrizes these.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

moved

@rhshadrach
Copy link
Member

@mzeitlin11 - Does this PR also impact SeriesGroupBy?

@mzeitlin11
Copy link
Member Author

@mzeitlin11 - Does this PR also impact SeriesGroupBy?

Thanks for bringing this up! Was also failing for SeriesGroupBy, have amended test and whatsnew

@pytest.mark.parametrize("bool_agg_func", ["any", "all"])
def test_bool_aggs_dup_column_labels(bool_agg_func):
# 21668
df = DataFrame([[True, True]], columns=["a", "a"])
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can you add in skipna as a parameter in these

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Have added skipna parameterization to test_bool_aggs_dup_column_labels (this was just a test moved to keep any/all tests together)

For the added nullable dtype test below, problem is that skipna=False case doesn't currently give the correct output (#37506)

@mzeitlin11
Copy link
Member Author

Going to close in favor of a more comprehensive approach for #37506, for which this does not end up a helpful starting point

@mzeitlin11 mzeitlin11 closed this Apr 6, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Bug Groupby NA - MaskedArrays Related to pd.NA and nullable extension arrays
Projects
None yet
Development

Successfully merging this pull request may close these issues.

BUG: Groupby.any(skipna=True)
4 participants