Skip to content

ENH: infer_dtype should infer integer-na #27283

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
jreback opened this issue Jul 8, 2019 · 7 comments · Fixed by #27392
Closed

ENH: infer_dtype should infer integer-na #27283

jreback opened this issue Jul 8, 2019 · 7 comments · Fixed by #27392
Labels
Dtype Conversions Unexpected or buggy dtype conversions ExtensionArray Extending pandas with custom dtypes or arrays. Missing-data np.nan, pd.NaT, pd.NA, dropna, isnull, interpolate
Milestone

Comments

@jreback
Copy link
Contributor

jreback commented Jul 8, 2019

xref #26272
xref #27267

As the first step of moving towards integer-na dtypes as the primary integer type, we need to teach infer_dtype that integer-na is a valid inferred type, right now

In [1]: from pandas.api.types import infer_dtype                                                                                                                                                                                                             

In [3]: infer_dtype([2, 3,4], skipna=False)                                                                                                                                                                                                                  
Out[3]: 'integer'

In [4]: infer_dtype([2, 3, 4, np.nan], skipna=False)                                                                                                                                                                                                         
Out[4]: 'mixed-integer-float'

In [5]: infer_dtype([2, 3, 4.2, np.nan], skipna=False)                                                                                                                                                                                                       
Out[5]: 'mixed-integer-float'

[4] could return 'integer-na' to indicate that we might want to infer Int64 dtype and is distinct from the inferred type of [5] which must become float64.

This will allow us to then support changing integer columns when we add nulls to Int64 rather than coerce to float64; this is pretty common in indexing setting operations.

Secondly we can then enable .to_numeric to infer to integer-na (or unsigned-na) and the corresponding dtypes (#26272).

Finally we could support coercion of object dtypes from integers and nulls to coerce to Int64 (#27267 for .explode() and .infer_objects()

This issue itself only is a very minor user facing change (infer_dtype itself).

@jreback jreback added Missing-data np.nan, pd.NaT, pd.NA, dropna, isnull, interpolate Dtype Conversions Unexpected or buggy dtype conversions ExtensionArray Extending pandas with custom dtypes or arrays. labels Jul 8, 2019
@jreback jreback added this to the Contributions Welcome milestone Jul 8, 2019
@TomAugspurger
Copy link
Contributor

Is this a post 1.0 thing? Right now, my preference is to keep the "numpy by default" data model for 1.0 (I don't fully understand what changing just infer_dtype does for us).

@jreback
Copy link
Contributor Author

jreback commented Jul 8, 2019

This is a pre-cursor to actually changing the default int type; as I said above this is a necessary requirement.

@jorisvandenbossche
Copy link
Member

But as Tom implied, "actually changing the default int type" is something for sure for after 1.0.
So if we already want to change infer_dtype, we would need to keep mapping 'integer-na' internally to 'mixed-integer-float' or 'float' to keep the same behaviour

@jreback
Copy link
Contributor Author

jreback commented Jul 8, 2019

But as Tom implied, "actually changing the default int type" is something for sure for after 1.0.
So if we already want to change infer_dtype, we would need to keep mapping 'integer-na' internally to 'mixed-integer-float' or 'float' to keep the same behaviour

sure, my point is that this is a non-trivial amount of work that can be done w/o user changes and is a pre-cursor for other things.

@TomAugspurger
Copy link
Contributor

infer_dtype is public, right?

Do we have an idea of what we'll need to update internally to return integer-na from infer_dtype? IIRC there are a few places where we do an if / elif on the result types.

@jreback
Copy link
Contributor Author

jreback commented Jul 12, 2019

see #27335 (comment)

this is a relatively small change

@h-vetinari
Copy link
Contributor

h-vetinari commented Jul 13, 2019

Other relevant xrefs for tackling infer_dtype resp EAs: #23553 #23554.

And while I haven't filed a separate issue, I'm also hitting the fact that infer_dtype does not correctly infer DatetimeTZDtype in #23833 #25425

@jreback jreback modified the milestones: Contributions Welcome, 1.0 Jul 25, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Dtype Conversions Unexpected or buggy dtype conversions ExtensionArray Extending pandas with custom dtypes or arrays. Missing-data np.nan, pd.NaT, pd.NA, dropna, isnull, interpolate
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants