REGR: reindex with sparse data #35286

jorisvandenbossche · 2020-07-15T13:11:34Z

See discussion at #34158 (comment) and below. There was some discussion on the PR, but I think we didn't yet create an issue to track this regression.

The PR introduced a regression in reindex when having sparse data. Based on the linked discussion, there doesn't seem to direct / easy solution (but I didn't look into it again). So we might want to revert the PR for 1.1 until we figure out a correct solution.

cc @TomAugspurger

The text was updated successfully, but these errors were encountered:

TomAugspurger · 2020-07-15T13:13:54Z

Thanks for following up. I'll look at this a bit today, and revert if I don't find an easy solution.

TomAugspurger · 2020-07-15T14:06:04Z

I think I'm just going to revert #34158 for now. It looks to have a few issues. Some places in

pandas/pandas/core/arrays/sparse/array.py

Lines 874 to 899 in 492e3e9

    
           # sp_indexer may be -1 for two reasons 
        
           # 1.) we took for an index of -1 (new) 
        
           # 2.) we took a value that was self.fill_value (old) 
        
           new_fill_indices = indices == -1 
        
           old_fill_indices = (sp_indexer == -1) & ~new_fill_indices 
        
           # Fill in two steps. 
        
           # Old fill values 
        
           # New fill values 
        
           # potentially coercing to a new dtype at each stage. 
        
           m0 = sp_indexer[old_fill_indices] < 0 
        
           m1 = sp_indexer[new_fill_indices] < 0 
        
           result_type = taken.dtype 
        
           if m0.any(): 
        
               result_type = np.result_type(result_type, type(self.fill_value)) 
        
               taken = taken.astype(result_type) 
        
               taken[old_fill_indices] = self.fill_value 
        
           if m1.any(): 
        
               result_type = np.result_type(result_type, type(fill_value)) 
        
               taken = taken.astype(result_type) 
        
               taken[new_fill_indices] = fill_value

are likely using self.fill_value rather than fill_value (that's why we get 0s instead of NaNs). A simple change didn't look locally, and I'm not planning to look more today.

scottgigante · 2020-07-15T14:46:15Z

I might have time to take a stab at this in the next couple weeks (but if anyone else feels the urge to give it a go don't wait for me.)

jorisvandenbossche added Regression Functionality that used to work in a prior pandas version Sparse Sparse Data Type labels Jul 15, 2020

jorisvandenbossche added this to the 1.1 milestone Jul 15, 2020

This was referenced Jul 15, 2020

Fix indexing, reindex on all-sparse SparseArray. #35287

Merged

Filtering dataframe with sparse column leads to NAs in sparse column #27781

Closed

TomAugspurger closed this as completed in #35287 Jul 16, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

REGR: reindex with sparse data #35286

REGR: reindex with sparse data #35286

jorisvandenbossche commented Jul 15, 2020 •

edited

Loading

TomAugspurger commented Jul 15, 2020

TomAugspurger commented Jul 15, 2020

scottgigante commented Jul 15, 2020

REGR: reindex with sparse data #35286

REGR: reindex with sparse data #35286

Comments

jorisvandenbossche commented Jul 15, 2020 • edited Loading

TomAugspurger commented Jul 15, 2020

TomAugspurger commented Jul 15, 2020

scottgigante commented Jul 15, 2020

jorisvandenbossche commented Jul 15, 2020 •

edited

Loading