Skip to content

REGR: concat of Sparse with incompatible dtype now gives Sparse[object] instead of object #34336

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
jorisvandenbossche opened this issue May 23, 2020 · 1 comment · Fixed by #34338
Assignees
Labels
Blocker Blocking issue or pull request for an upcoming release Reshaping Concat, Merge/Join, Stack/Unstack, Explode Sparse Sparse Data Type
Milestone

Comments

@jorisvandenbossche
Copy link
Member

jorisvandenbossche commented May 23, 2020

Regression on master related to the common_dtype mechanism, I suppose:

In [5]: pd.__version__    
Out[5]: '1.0.3'

In [6]: s1 = pd.Series([1, 0, 2], dtype=pd.SparseDtype("int64", 0)) 

In [7]: s2 = pd.Series(["a", "b", "c"], dtype="category") 

In [8]: pd.concat([s1, s2]) 
Out[8]: 
0    1
1    0
2    2
0    a
1    b
2    c
dtype: object

In [9]: pd.concat([s2, s1]) 
Out[9]: 
0    a
1    b
2    c
0    1
1    0
2    2
dtype: object

(and the same on v0.25.3)

But on master:

# raising in SparseDtype._get_common_dtype
In [3]: pd.concat([s1, s2])  
...
TypeError: data type not understood

In [4]: pd.concat([s2, s1])   
Out[4]: 
0    a
1    b
2    c
0    1
1    0
2    2
dtype: Sparse[object, 0]
@jorisvandenbossche jorisvandenbossche added Reshaping Concat, Merge/Join, Stack/Unstack, Explode Sparse Sparse Data Type labels May 23, 2020
@jorisvandenbossche jorisvandenbossche added this to the 1.1 milestone May 23, 2020
@jorisvandenbossche jorisvandenbossche self-assigned this May 23, 2020
@jorisvandenbossche
Copy link
Member Author

So actually the behaviour is a bit more complicated ..

This is also on pandas 1.0.3:

In [9]: s3 = pd.Series(["a", "b", "c"]) 

In [10]: pd.concat([s1, s3]) 
Out[10]: 
0    1
1    0
2    2
0    a
1    b
2    c
dtype: Sparse[object, 0]

So my original example with a string categorical, that resulted in object dtype, but a plain string series, results in Sparse[object] ..

I am not fully sure whether we should always try to preserve "sparseness" (so giving Sparse[object]) or follow the "normal" rule (concatting incompatible dtypes results in object dtype).

cc @TomAugspurger

@jorisvandenbossche jorisvandenbossche added the Blocker Blocking issue or pull request for an upcoming release label May 23, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Blocker Blocking issue or pull request for an upcoming release Reshaping Concat, Merge/Join, Stack/Unstack, Explode Sparse Sparse Data Type
Projects
None yet
Development

Successfully merging a pull request may close this issue.

1 participant