Skip to content

BUG: incorrect type when indexing sparse dataframe with iterable #34908

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 8 commits into from
Jul 8, 2020
Merged

Conversation

suvayu
Copy link
Contributor

@suvayu suvayu commented Jun 20, 2020

closes #34526
closes #34540

  • 2 tests added / passed
  • passes black pandas
  • passes git diff upstream/master -u -- "*.py" | flake8 --diff
  • whatsnew entry

The problem arose when indexing with an iterable. If the result consisted of columns which originally only had sparse values, the dtype was solely inferred from the fill_value, which defaults to 0. Hence, the resulting columns have the dtype Sparse[int64, 0]. This commit changes the type inference logic to use the numpy type promotion rules between the underlying subtype of the SparseArray.dtype and the type of the fill value.

@jreback jreback added Indexing Related to indexing on series/frames, not to indexes themselves Sparse Sparse Data Type Testing pandas testing functions or related to the test suite labels Jun 20, 2020
@jreback jreback added this to the 1.1 milestone Jun 20, 2020
@suvayu suvayu marked this pull request as draft June 20, 2020 16:10
Copy link
Contributor

@jreback jreback left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

looks fine, can you create a test_sparse.py (in this directory) and move other sparse indexing tests as well (e.g. the one right above, maybe more)

@suvayu
Copy link
Contributor Author

suvayu commented Jun 20, 2020

@jreback Thanks for the review! I'll try fix the bug in the next few days.

@suvayu suvayu changed the title TST: regression tests for indexing sparse dataframe with iterable BUG: regression tests for indexing sparse dataframe with iterable Jun 24, 2020
@suvayu suvayu changed the title BUG: regression tests for indexing sparse dataframe with iterable BUG: incorrect type when indexing sparse dataframe with iterable Jun 24, 2020
@suvayu suvayu marked this pull request as ready for review June 24, 2020 13:37
@suvayu suvayu requested a review from jreback June 24, 2020 14:25
@suvayu
Copy link
Contributor Author

suvayu commented Jun 24, 2020

For the type inference logic, I wasn't sure which to use, SparseArray.dtype.subtype or the sparse values themselves (SparseArray.sp_values). I went with the former because it seemed conceptually cleaner.

@suvayu
Copy link
Contributor Author

suvayu commented Jul 2, 2020

Hi @jreback, just bringing this to your attention. I wasn't sure if you noticed it's ready for review (I used the "request review" option in the side panel). Apologies if this is already on your list.

@TomAugspurger
Copy link
Contributor

@suvayu can you add a whatsnew to 1.1.0.rst in the bug fixes section?

Copy link
Contributor

@TomAugspurger TomAugspurger left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The changes look good, just need a release note. Make sure to include references to both issues fixed.

@suvayu
Copy link
Contributor Author

suvayu commented Jul 7, 2020

Hi @TomAugspurger, I added the the whatsnew entry. There was a merge conflict with master (another entry in the same section), so I had to force push after resolving it.

@TomAugspurger TomAugspurger merged commit 02a6acf into pandas-dev:master Jul 8, 2020
@TomAugspurger
Copy link
Contributor

Thanks @suvayu!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Indexing Related to indexing on series/frames, not to indexes themselves Sparse Sparse Data Type Testing pandas testing functions or related to the test suite
Projects
None yet
3 participants