Skip to content

Preserving boolean dtype in Series.any/all function with level keyword #33449 #33543

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
wants to merge 1 commit into from

Conversation

KenilMehta
Copy link
Contributor

No description provided.

@pep8speaks
Copy link

pep8speaks commented Apr 14, 2020

Hello @KenilMehta! Thanks for updating this PR. We checked the lines you've touched for PEP 8 issues, and found:

There are currently no PEP 8 issues detected in this Pull Request. Cheers! 🍻

Comment last updated at 2021-04-12 14:15:01 UTC

@@ -11222,6 +11222,8 @@ def logical_func(self, axis=0, bool_only=None, skipna=True, level=None, **kwargs
raise NotImplementedError(
"Option bool_only is not implemented with option level."
)
if isinstance(self, pd.Series) and self.dtype.name == "boolean":
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

pd.Series -> ABCSeries

is_bool_dtype(self.dtype)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@jbrockmendel
We need to distinguish between bool and boolean and is_bool_dtype() returns true in both cases.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

makes sense, thanks

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We need to distinguish between bool and boolean and is_bool_dtype() returns true in both cases.

Might be worth something like is_bool_na_dtype

@KenilMehta
Copy link
Contributor Author

KenilMehta commented Apr 14, 2020

@jbrockmendel @jorisvandenbossche
As you can see in the following image, one of the CI checks is failing.
ss

How do I solve this?

@jbrockmendel
Copy link
Member

For the linting, running black pandas should fix it

tm.assert_series_equal(s.all(level=0), Series([False, True, False]))
tm.assert_series_equal(s.any(level=0), Series([False, True, True]))
tm.assert_series_equal(s.all(level=0), Series([False, True, False], dtype="boolean"))
tm.assert_series_equal(s.any(level=0), Series([False, True, True], dtype="boolean"))
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can you use the pattern

result = ...
expected = ...
tm.assert_series_equal(result, expected)

@KenilMehta
Copy link
Contributor Author

@jreback
I have made the changes you requested. Can you please review it once again?

@KenilMehta
Copy link
Contributor Author

@jreback @jbrockmendel
Are there any more changes that need to be done?

method = getattr(type(self), name)
applyf = lambda x: method(x, axis=axis, skipna=skipna, **kwargs)
result = grouped.aggregate(applyf)
if isinstance(self, ABCSeries) and self.dtype.name == "boolean":
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can you instead trace exactly where this is not getting converted propertly; likely its in EA method itself. We can to handle much lower down rather than a specific case here

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In pandas/core/groupby/groupby.py file, there is a _get_cythonized_result() function which has takes a pre_processing lambda where the return type of result is decided.
We can have an if/else statement there which checks the dtype of input and decides the dtype of the result.

@@ -912,8 +912,13 @@ def test_all_any_boolean(self):
index=[0, 0, 1, 1, 2, 2],
dtype="boolean",
)
tm.assert_series_equal(s.all(level=0), Series([False, True, False]))
tm.assert_series_equal(s.any(level=0), Series([False, True, True]))
result = s.all(level=0)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can you pull L910 out to a new test. Then parameterize on bool and boolean.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

okay

@jreback jreback added the Dtype Conversions Unexpected or buggy dtype conversions label Jun 14, 2020
@jreback
Copy link
Contributor

jreback commented Jul 17, 2020

@KenilMehta if you want to merge master and respond to comments will look

@simonjayhawkins
Copy link
Member

@KenilMehta can you merge upstream/master to resolve conflict.

@dsaxton dsaxton added the Stale label Sep 16, 2020
@jbrockmendel jbrockmendel added the Reduction Operations sum, mean, min, max, etc. label Sep 23, 2020
@WillAyd
Copy link
Member

WillAyd commented Sep 30, 2020

@KenilMehta is this still active?

@arw2019
Copy link
Member

arw2019 commented Nov 6, 2020

Closing for now. @KenilMehta ping us whenever you'd like to continue and we'll reopen!

@arw2019 arw2019 closed this Nov 6, 2020
@KenilMehta
Copy link
Contributor Author

@arw2019
Sorry
I couldn't complete the merge. Can you reopen this issue?
I would like to continue this issue.

@jorisvandenbossche
Copy link
Member

Note that the issue is also going to be solved indirectly by #40819, which fixes the actual groupby algorithm to work better for boolean dtypes.

@mroeschke
Copy link
Member

Thanks for the PR @KenilMehta, but does appear that #40819 has solved the original issue. Going to close this PR, but we're always happy to have other PRs tackling other issues.

@mroeschke mroeschke closed this Apr 20, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Dtype Conversions Unexpected or buggy dtype conversions Reduction Operations sum, mean, min, max, etc. Stale
Projects
None yet
Development

Successfully merging this pull request may close these issues.

10 participants