-
-
Notifications
You must be signed in to change notification settings - Fork 18.4k
Preserving boolean dtype in Series.any/all function with level keyword #33449 #33543
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Hello @KenilMehta! Thanks for updating this PR. We checked the lines you've touched for PEP 8 issues, and found: There are currently no PEP 8 issues detected in this Pull Request. Cheers! 🍻 Comment last updated at 2021-04-12 14:15:01 UTC |
pandas/core/generic.py
Outdated
@@ -11222,6 +11222,8 @@ def logical_func(self, axis=0, bool_only=None, skipna=True, level=None, **kwargs | |||
raise NotImplementedError( | |||
"Option bool_only is not implemented with option level." | |||
) | |||
if isinstance(self, pd.Series) and self.dtype.name == "boolean": |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
pd.Series -> ABCSeries
is_bool_dtype(self.dtype)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@jbrockmendel
We need to distinguish between bool and boolean and is_bool_dtype() returns true in both cases.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
makes sense, thanks
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We need to distinguish between bool and boolean and is_bool_dtype() returns true in both cases.
Might be worth something like is_bool_na_dtype
@jbrockmendel @jorisvandenbossche How do I solve this? |
For the linting, running |
tm.assert_series_equal(s.all(level=0), Series([False, True, False])) | ||
tm.assert_series_equal(s.any(level=0), Series([False, True, True])) | ||
tm.assert_series_equal(s.all(level=0), Series([False, True, False], dtype="boolean")) | ||
tm.assert_series_equal(s.any(level=0), Series([False, True, True], dtype="boolean")) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can you use the pattern
result = ...
expected = ...
tm.assert_series_equal(result, expected)
@jreback |
@jreback @jbrockmendel |
method = getattr(type(self), name) | ||
applyf = lambda x: method(x, axis=axis, skipna=skipna, **kwargs) | ||
result = grouped.aggregate(applyf) | ||
if isinstance(self, ABCSeries) and self.dtype.name == "boolean": |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can you instead trace exactly where this is not getting converted propertly; likely its in EA method itself. We can to handle much lower down rather than a specific case here
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In pandas/core/groupby/groupby.py file, there is a _get_cythonized_result() function which has takes a pre_processing lambda where the return type of result is decided.
We can have an if/else statement there which checks the dtype of input and decides the dtype of the result.
@@ -912,8 +912,13 @@ def test_all_any_boolean(self): | |||
index=[0, 0, 1, 1, 2, 2], | |||
dtype="boolean", | |||
) | |||
tm.assert_series_equal(s.all(level=0), Series([False, True, False])) | |||
tm.assert_series_equal(s.any(level=0), Series([False, True, True])) | |||
result = s.all(level=0) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can you pull L910 out to a new test. Then parameterize on bool and boolean.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
okay
@KenilMehta if you want to merge master and respond to comments will look |
@KenilMehta can you merge upstream/master to resolve conflict. |
@KenilMehta is this still active? |
Closing for now. @KenilMehta ping us whenever you'd like to continue and we'll reopen! |
@arw2019 |
Note that the issue is also going to be solved indirectly by #40819, which fixes the actual groupby algorithm to work better for boolean dtypes. |
Thanks for the PR @KenilMehta, but does appear that #40819 has solved the original issue. Going to close this PR, but we're always happy to have other PRs tackling other issues. |
No description provided.