Skip to content

ENH: enable concat for nullable Int with other dtypes #34095

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
jorisvandenbossche opened this issue May 9, 2020 · 5 comments · Fixed by #34985
Closed

ENH: enable concat for nullable Int with other dtypes #34095

jorisvandenbossche opened this issue May 9, 2020 · 5 comments · Fixed by #34985
Labels
Enhancement NA - MaskedArrays Related to pd.NA and nullable extension arrays
Milestone

Comments

@jorisvandenbossche
Copy link
Member

Following-up on #33607 which added the ExtensionDtype._get_common_dtype method to the EA interface to determine concat/append behaviour, we can now use this to refine the concat behaviour for nullable integer and boolean dtypes.

See comment here: #33607 (comment)

For now, the PR only enabled concat of all IntegerDtype objects (any other combination will result in object dtype). But, we can expand this to the combination of IntegerDtype with numpy int dtype, Int with boolean, Int with float, etc

@jorisvandenbossche jorisvandenbossche added Enhancement NA - MaskedArrays Related to pd.NA and nullable extension arrays labels May 9, 2020
@jorisvandenbossche jorisvandenbossche added this to the 1.1 milestone May 9, 2020
@jorisvandenbossche jorisvandenbossche changed the title ENH: enable concat for nullable Int dtypes ENH: enable concat for nullable Int with other dtypes May 9, 2020
@jorisvandenbossche
Copy link
Member Author

jorisvandenbossche commented May 11, 2020

As a concrete example, this currently gives object dtype:

In [14]: pd.concat([pd.Series([1, 2, None], dtype="Int64"), pd.Series([4, 5], dtype="int64")])
Out[14]: 
0       1
1       2
2    <NA>
0       4
1       5
dtype: object

while it could perfectly preserve the nullable Int64 dtype.

(cc @dsaxton in case you are interested)

@dsaxton
Copy link
Member

dsaxton commented May 12, 2020

@jorisvandenbossche Interesting, how do you envision the case of concating nullable ints and floats? Try casting the float values to nullable int if possible, else use float? I suppose always converting to the more general float type seems more natural in that it would depend entirely on the types and not the values, but then the NAs would be lost (at least until there's a nullable float).

@jorisvandenbossche
Copy link
Member Author

We want to move not have any value-depending concat/cast behaviour (right now we have such a few cases), so this means that int+float should always result in float.
Yes, that for now means that nullable dtype gets lost, so in theory we could also preserve object. But I don't think which of the two we take now is that important, in the hope there will be a beginning of a nullable float dtype soon.

@TomAugspurger
Copy link
Contributor

TomAugspurger commented Jun 24, 2020

@jorisvandenbossche what else do you want to do here for 1.1? Currently, concat([Int64, int64]) correctly returns Int64.

I think the remaining issue is for concat([Int64, boolean]) to return Int64? Anything else?

@jorisvandenbossche
Copy link
Member Author

Indeed, with boolean we can already preserve the nullable dtype. For float we have to wait on the Floating EA PR. And Int64+int64 (with numpy dtype) I did in a previous PR.
-> #34985 for the boolean case + adding tests for Int64+int64

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Enhancement NA - MaskedArrays Related to pd.NA and nullable extension arrays
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants