Skip to content

API: setting np.nan into Series[bool] #38692

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
jbrockmendel opened this issue Dec 25, 2020 · 11 comments · Fixed by #38709
Closed

API: setting np.nan into Series[bool] #38692

jbrockmendel opened this issue Dec 25, 2020 · 11 comments · Fixed by #38709
Labels
API - Consistency Internal Consistency of API/Behavior Dtype Conversions Unexpected or buggy dtype conversions
Milestone

Comments

@jbrockmendel
Copy link
Member

ser = pd.Series([True, False, True, False])
ser[0] = np.nan

>>> ser
0    NaN
1    0.0
2    1.0
3    0.0
dtype: float64

It would be more consistent (and would simplify the code) if this cast to object and did not convert True/False to 1.0/0.0. ATM we have 2 tests that such a change would break.

@jbrockmendel jbrockmendel added Bug Needs Triage Issue that has not been reviewed by a pandas team member API - Consistency Internal Consistency of API/Behavior Dtype Conversions Unexpected or buggy dtype conversions and removed Bug Needs Triage Issue that has not been reviewed by a pandas team member labels Dec 25, 2020
@jreback jreback added this to the 1.3 milestone Dec 27, 2020
@jorisvandenbossche
Copy link
Member

Isn't this actually consistent with how other numpy dtypes work? We generally upcast on setitem-like operations, and for boolean this means upcasting to float (like integers also get upcasted to float)?

@jbrockmendel
Copy link
Member Author

I think it is consistent with how numpy works, but bool-casting is one of the cases where we often behave differently from numpy. Ideally we would consistently use find_common_type:

>>> find_common_type([np.float64, np.bool_])
dtype('O')

@jreback
Copy link
Contributor

jreback commented Dec 28, 2020

i agree with @jbrockmendel here

this is something we should do correctly

numpy does this because it for backwards compatibility not because it's the right thing to do

we want to preserve dtypes whenever possible

bools are a wart when converted to float (ideally we would actually just convert to Boolean though)

@jorisvandenbossche
Copy link
Member

Where is this find_common_type used? Since it are not extension dtypes, for example concat also doesn't follow that:

In [17]: pd.concat([pd.Series([True, False]), pd.Series([0.1, 2])])
Out[17]: 
0    1.0
1    0.0
0    0.1
1    2.0
dtype: float64

@jorisvandenbossche
Copy link
Member

bools are a wart when converted to float

Bools are just as much a wart when converted to object dtype, though.
There is no such thing as "correct" IMO, just choices with trade-offs (and for sure, we should discuss those trade-offs and make a choice, but converting to object dtype being "correct" is not an argument IMO)

@jbrockmendel
Copy link
Member Author

Where is this find_common_type used?

If we were to change this behavior, the change would look like #38709, which uses coerce_to_target_dtype in place of the custom logic currently in Block.setitem. coerce_to_target_dtype now uses find_common_type.

for example concat also doesn't follow that:

I haven't looked at concat all that closely, but based on the example you gave id like that to be object too.

@jreback
Copy link
Contributor

jreback commented Jan 1, 2021

what if we convert directly to Boolean here? that is the long term type we want anyhow.

@jbrockmendel
Copy link
Member Author

what if we convert directly to Boolean here? that is the long term type we want anyhow.

long-term that'll be nice, but its still opt-in. Plus that comes with surprising behavior changes (copy/view semantics, 1D/2D shenanigans)

@jreback
Copy link
Contributor

jreback commented Jan 20, 2021

what if we convert directly to Boolean here? that is the long term type we want anyhow.

long-term that'll be nice, but its still opt-in. Plus that comes with surprising behavior changes (copy/view semantics, 1D/2D shenanigans)

hmm I actually think changing to nullable int would totally be fine here (sure its a breaking change), but so is what you are proposing (convert to object)

@jbrockmendel
Copy link
Member Author

nullable int

-1 while it is still opt-in

@jbrockmendel
Copy link
Member Author

-1 while it is still opt-in

or more specifically, until we have a BooleanArray-backed index, so we can keep the Series/Index behavior consistent

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
API - Consistency Internal Consistency of API/Behavior Dtype Conversions Unexpected or buggy dtype conversions
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants