Skip to content

BUG: pandas.cut(data, ..., include_lowest=True) raises IndexError when data is masked array Float64 dtype #42817

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
2 of 3 tasks
hottwaj opened this issue Jul 30, 2021 · 4 comments
Labels
Bug Needs Triage Issue that has not been reviewed by a pandas team member

Comments

@hottwaj
Copy link

hottwaj commented Jul 30, 2021

  • I have checked that this issue has not already been reported.

  • I have confirmed this bug exists on the latest version of pandas.

  • (optional) I have confirmed this bug exists on the master branch of pandas.


Code Sample, a copy-pastable example

# this works
pandas.cut(pandas.Series(numpy.arange(10)), pandas.Series([3, 7]), include_lowest=True)

# this fails with IndexError - stack trace below
pandas.cut(pandas.Series(numpy.arange(10)).astype('Float64'), pandas.Series([3, 7]), include_lowest=True)

Software used

tried pandas 1.2.0-1.3.1 (same error in each case)
python 3.8.10
ubuntu 20.04

Stack trace

---------------------------------------------------------------------------
IndexError                                Traceback (most recent call last)
/tmp/ipykernel_220282/2616585682.py in <module>
----> 1 pandas.cut(pandas.Series(numpy.arange(10)).astype('Float64'), pandas.Series([3, 7]), include_lowest=True)

~/.pyenv/versions/3.8.10/envs/testvenv/lib/python3.8/site-packages/pandas/core/reshape/tile.py in cut(x, bins, right, labels, retbins, precision, include_lowest, duplicates, ordered)
    271             raise ValueError("bins must increase monotonically.")
    272 
--> 273     fac, bins = _bins_to_cuts(
    274         x,
    275         bins,

~/.pyenv/versions/3.8.10/envs/testvenv/lib/python3.8/site-packages/pandas/core/reshape/tile.py in _bins_to_cuts(x, bins, right, labels, precision, include_lowest, dtype, duplicates, ordered)
    408 
    409     if include_lowest:
--> 410         ids[x == bins[0]] = 1
    411 
    412     na_mask = isna(x) | (ids == len(bins)) | (ids == 0)

IndexError: only integers, slices (`:`), ellipsis (`...`), numpy.newaxis (`None`) and integer or boolean arrays are valid indices

From a quick look at line 410 in pdb it seems that ids is a numpy array, while x==bins[0] returns a pandas.Series of dtype boolean i.e. nullable, which cannot be used to index ids.

Thanks!

@hottwaj hottwaj added Bug Needs Triage Issue that has not been reviewed by a pandas team member labels Jul 30, 2021
@hottwaj
Copy link
Author

hottwaj commented Jul 30, 2021

Just edited my original report: the first working example should not contain the cast to Float64, i.e. it should be:

pandas.cut(pandas.Series(numpy.arange(10)), pandas.Series([3, 7]), include_lowest=True)

Thanks!

@debnathshoham
Copy link
Member

Hi @hottwaj. I am getting the below on master (for your second line). I hope this is expected.

0             NaN
1             NaN
2             NaN
3    (2.999, 7.0]
4    (2.999, 7.0]
5    (2.999, 7.0]
6    (2.999, 7.0]
7    (2.999, 7.0]
8             NaN
9             NaN
dtype: category
Categories (1, interval[float64, right]): [(2.999, 7.0]]

@hottwaj
Copy link
Author

hottwaj commented Jul 30, 2021

Hi @debnathshoham!

I'm really sorry, it does seem to work in 1.3.1 (and 1.3.0). Maybe I messed up reducing to a simple example that failed, but it seems more likely that I got confused as I was testing multiple recent versions of pandas for another issue I found at the same time.

Will close and sorry for the noise. Thanks!

@hottwaj hottwaj closed this as completed Jul 30, 2021
@simonjayhawkins
Copy link
Member

tried pandas 1.2.0-1.3.1 (same error in each case)

looks to also work on 1.3.0 and 1.3.1. maybe fixed in #40969

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Bug Needs Triage Issue that has not been reviewed by a pandas team member
Projects
None yet
Development

No branches or pull requests

3 participants