Skip to content

Bug in internals alignment for Series.combine_first with Extension dtype #24147

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
TomAugspurger opened this issue Dec 7, 2018 · 0 comments
Closed
Labels
Internals Related to non-user accessible pandas implementation
Milestone

Comments

@TomAugspurger
Copy link
Contributor

TomAugspurger commented Dec 7, 2018

In [5]: a = pd.Series(pd.Categorical([0, 1, 2], categories=list(range(5))))

In [6]: b = pd.Series(pd.Categorical([2, 3, 4], categories=list(range(5))), index=[2, 3, 4])

In [7]: a.combine_first(b)
Out[7]:
0    0.0
1    1.0
2    2.0
3    NaN
4    NaN
dtype: category
Categories (5, int64): [0, 1, 2, 3, 4]

Compare with the expected (aside from dtype)

In [8]: a = pd.Series([0, 1, 2])

In [9]: b = pd.Series([2, 3, 4], index=[2, 3, 4])

In [10]: a.combine_first(b)
Out[10]:
0    0.0
1    1.0
2    2.0
3    3.0
4    4.0
dtype: float64

Something is going wrong inside Block.apply at https://github.com/pandas-dev/pandas/blob/master/pandas/core/internals/managers.py#L386-L387

(Pdb) pp b.mgr_locs.indexer
slice(0, 1, 1)
(Pdb) pp self.items[b.mgr_locs.indexer]
Int64Index([0], dtype='int64')

that should be

(Pdb) pp b.mgr_locs.indexer
slice(0, 5, 1)
(Pdb) pp b_items
Int64Index([0, 1, 2, 3, 4], dtype='int64')

I'm hitting this in the DatetimeArray refactor.

I suspect that this is a symptom of #23023

@TomAugspurger TomAugspurger added the Internals Related to non-user accessible pandas implementation label Dec 7, 2018
@TomAugspurger TomAugspurger added this to the 0.24.0 milestone Dec 7, 2018
TomAugspurger added a commit to TomAugspurger/pandas that referenced this issue Dec 7, 2018
Pingviinituutti pushed a commit to Pingviinituutti/pandas that referenced this issue Feb 28, 2019
Pingviinituutti pushed a commit to Pingviinituutti/pandas that referenced this issue Feb 28, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Internals Related to non-user accessible pandas implementation
Projects
None yet
Development

No branches or pull requests

1 participant