Skip to content

IndexError when indexing numpy array with boolean Series #6168

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
stefsmeets opened this issue Jan 29, 2014 · 7 comments
Closed

IndexError when indexing numpy array with boolean Series #6168

stefsmeets opened this issue Jan 29, 2014 · 7 comments
Labels
API Design Duplicate Report Duplicate issue or pull request Indexing Related to indexing on series/frames, not to indexes themselves

Comments

@stefsmeets
Copy link

Previously (version 0.11) it was possible to generate a boolean Series, and use that to index a numpy array. Version 0.13.0 breaks this behaviour and raises "IndexError: unsupported iterator index"

    import numpy as np
    import pandas as pd    

    rng = np.arange(5)

    rng[rng > 2]                       # works as expected
    >>> array([3, 4])

    b = pd.Series(rng > 2)
    rng[b]                               # doesn't work anymore
    >>> IndexError: unsupported iterator index
@dsm054
Copy link
Contributor

dsm054 commented Jan 29, 2014

Same for me with pd 0.13.0-321-gaf73a6f, np 1.9.0.dev-631655e:

>>> rng[b]
*** Reference count error detected: 
an attempt was made to deallocate 5 (i) ***
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
IndexError: unsupported iterator index

@jreback
Copy link
Contributor

jreback commented Jan 29, 2014

this is numpy bug exposed because of the changes in Series in 0.13 (its no longer a ndarray sub-class), see here: http://pandas.pydata.org/pandas-docs/dev/whatsnew.html#whatsnew-0130-refactoring

numpy doesn't follow the their own protocol, there is a bug report somewhere

but to be honest if you are using pandas then no need to do this at all

just wrap it in a Series

>>> pd.Series(rng)[b]
3    3
4    4
dtype: int64

@jreback jreback closed this as completed Jan 29, 2014
@jreback
Copy link
Contributor

jreback commented Jan 29, 2014

@dsm054
If I could some how have the c-api return True for this:
PyArray_Check(obj) then this would work

In essence numpy only allows sub-classes of ndarray, not duck-typed where the object actually works correctly (as Series does). maybe because of perf.

I thought their was a bug report, but maybe this is an enhancement request, to treat a duck-typed has-as ndarray similarly to a isa-a

@dsm054
Copy link
Contributor

dsm054 commented Jan 29, 2014

@jreback: yeah, numpy doesn't play well with others. This is a problem in Sage too, where we wrap integer literals typed in at the console with Integer. Unfortunately because of how numpy.isscalar works this breaks array indexing.

@jreback
Copy link
Contributor

jreback commented Jan 29, 2014

@dsm054 I created an issue, see above...I think that if they relaxed the type checking (and provide a more duck-typing model), then it would work; not sure how much work this is though.

I have tried to hack around this to get the type checks to work, but they are in the c-api, so not easy way. did I miss anything?

@jreback
Copy link
Contributor

jreback commented Feb 17, 2014

@Tarlitz @dsm054

good news!

numpy 1.9 will now handle this correctly, you can in fact install numpy master and check out for your self.....

@stefsmeets
Copy link
Author

@jreback Cheers mate, I appreciate your efforts :)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
API Design Duplicate Report Duplicate issue or pull request Indexing Related to indexing on series/frames, not to indexes themselves
Projects
None yet
Development

No branches or pull requests

3 participants