Skip to content

REGR: reindex with sparse data #35286

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
jorisvandenbossche opened this issue Jul 15, 2020 · 3 comments · Fixed by #35287
Closed

REGR: reindex with sparse data #35286

jorisvandenbossche opened this issue Jul 15, 2020 · 3 comments · Fixed by #35287
Labels
Regression Functionality that used to work in a prior pandas version Sparse Sparse Data Type
Milestone

Comments

@jorisvandenbossche
Copy link
Member

jorisvandenbossche commented Jul 15, 2020

See discussion at #34158 (comment) and below. There was some discussion on the PR, but I think we didn't yet create an issue to track this regression.

The PR introduced a regression in reindex when having sparse data. Based on the linked discussion, there doesn't seem to direct / easy solution (but I didn't look into it again). So we might want to revert the PR for 1.1 until we figure out a correct solution.

cc @TomAugspurger

@jorisvandenbossche jorisvandenbossche added Regression Functionality that used to work in a prior pandas version Sparse Sparse Data Type labels Jul 15, 2020
@jorisvandenbossche jorisvandenbossche added this to the 1.1 milestone Jul 15, 2020
@TomAugspurger
Copy link
Contributor

Thanks for following up. I'll look at this a bit today, and revert if I don't find an easy solution.

@TomAugspurger
Copy link
Contributor

I think I'm just going to revert #34158 for now. It looks to have a few issues. Some places in

# sp_indexer may be -1 for two reasons
# 1.) we took for an index of -1 (new)
# 2.) we took a value that was self.fill_value (old)
new_fill_indices = indices == -1
old_fill_indices = (sp_indexer == -1) & ~new_fill_indices
# Fill in two steps.
# Old fill values
# New fill values
# potentially coercing to a new dtype at each stage.
m0 = sp_indexer[old_fill_indices] < 0
m1 = sp_indexer[new_fill_indices] < 0
result_type = taken.dtype
if m0.any():
result_type = np.result_type(result_type, type(self.fill_value))
taken = taken.astype(result_type)
taken[old_fill_indices] = self.fill_value
if m1.any():
result_type = np.result_type(result_type, type(fill_value))
taken = taken.astype(result_type)
taken[new_fill_indices] = fill_value
are likely using self.fill_value rather than fill_value (that's why we get 0s instead of NaNs). A simple change didn't look locally, and I'm not planning to look more today.

@scottgigante
Copy link
Contributor

I might have time to take a stab at this in the next couple weeks (but if anyone else feels the urge to give it a go don't wait for me.)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Regression Functionality that used to work in a prior pandas version Sparse Sparse Data Type
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants