Skip to content

API: disallow div/floordiv/pow operators for BooleanArray ? #41165

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
jorisvandenbossche opened this issue Apr 26, 2021 · 7 comments
Open

API: disallow div/floordiv/pow operators for BooleanArray ? #41165

jorisvandenbossche opened this issue Apr 26, 2021 · 7 comments
Labels
API - Consistency Internal Consistency of API/Behavior NA - MaskedArrays Related to pd.NA and nullable extension arrays Numeric Operations Arithmetic, Comparison, and Logical operations

Comments

@jorisvandenbossche
Copy link
Member

jorisvandenbossche commented Apr 26, 2021

Currently, for the plain bool dtype we explicitly check for some operations and raise an error, while those actually work in numpy. For example:

>>> arr = np.array([True, False, True])
>>> arr / True
array([1., 0., 1.])

>>> pd.Series(arr) / True
...
NotImplementedError: operator '/' not implemented for bool dtypes

This is done for the division and power operations (not_allowed={"/", "//", "**"}):

if op_str in not_allowed:
raise NotImplementedError(
f"operator {repr(op_str)} not implemented for bool dtypes"
)

For the nullable BooleanArray, for now we simply relied on the operations as defined by the underlying numpy bool array:

>>> pd.array(arr) / True
<FloatingArray>
[1.0, 0.0, 1.0]
Length: 3, dtype: Float64

That's for the BooleanArray, but the check is currently done on the "array_op" level (but because it is done within expressions.py, we don't run that check for EAs, xref #41161).

So questions:

  • Do we actually want to continue disallowing this operation, while numpy allows it? (also, eg pd.Series(arr) / 1 does work, it's only disallowed if both operands are boolean)
  • If we keep disallowing it, we should probably also disallow it on the BooleanArray level, and not only check it on the DataFrame/Series ops level in array_ops.py ?
@jorisvandenbossche jorisvandenbossche added Bug Needs Triage Issue that has not been reviewed by a pandas team member API - Consistency Internal Consistency of API/Behavior NA - MaskedArrays Related to pd.NA and nullable extension arrays Numeric Operations Arithmetic, Comparison, and Logical operations and removed Bug Needs Triage Issue that has not been reviewed by a pandas team member labels Apr 26, 2021
@dsaxton
Copy link
Member

dsaxton commented May 3, 2021

@jorisvandenbossche What's the argument for disallowing at the moment? To me it seems more natural / Pythonic to allow this operation since booleans are essentially ints.

@jorisvandenbossche
Copy link
Member Author

I am not sure what the historic reasons are.
Maybe because those operations were regarded as not that useful (although I would say it's up to the user to decide that), or as potentially confusing because they don't have a "boolean" interpretation, but only a numerical one (eg + and * still are a boolean operation resulting in booleans, and eg - raises an error in numpy about not being supported for booleans).
It's only for division and power that the booleans are interpreted as numeric values (and currently in pandas the user needs to be explicit about such casting).

@dsaxton
Copy link
Member

dsaxton commented May 4, 2021

@jorisvandenbossche Yeah, I can see the argument of "why would anyone do this?" but if it's easy to allow and makes for greater consistency with numpy and Python more generally, I personally like the idea of allowing this rather than making a special case here.

@jbrockmendel
Copy link
Member

conditional on the Series behavior (which i wouldnt object to deprecating), I lean towards having BooleanArray behave like Series, i.e. raising here

@jbrockmendel
Copy link
Member

Current thought here: BooleanArray (and IntegerArray and FloatingArray) ops should be wrappers around their core.ops.array_ops counterparts. This will allow for pushing more logic down from BooleanArray/NumericArray into BaseMaskedArray, which in turn will make it easier to extend BaseMaskedArray to wrap arbitrary dtypes.

@wence-
Copy link
Contributor

wence- commented Dec 2, 2022

or as potentially confusing because they don't have a "boolean" interpretation

FWIW, booleans are a canonical identification for GF(2) so I would argue that these operations are all well-defined and have a unique interpretation.

@jbrockmendel
Copy link
Member

GF(2)

We don't have an official policy on this, but in general pandas is more averse to overflows than numpy, which corresponds to not treating arithmetic as modular.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
API - Consistency Internal Consistency of API/Behavior NA - MaskedArrays Related to pd.NA and nullable extension arrays Numeric Operations Arithmetic, Comparison, and Logical operations
Projects
None yet
Development

No branches or pull requests

4 participants