-
-
Notifications
You must be signed in to change notification settings - Fork 18.4k
BUG: Index.get_value implementation for ExtensionArray #29926
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
BUG: Index.get_value implementation for ExtensionArray #29926
Conversation
# array) the ExtensionArray unnecessary | ||
|
||
def DecimalArray__array__(self, dtype=None): | ||
raise Exception("tried to convert a DecimalArray to a numpy array") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is not fully fool proof, as such an Exception could be catched somewhere in the indexing code. But I at least verified that for the s[s > 0.5]
case, this test actually failed as desired before making the fix.
The linked issue mentions densifying, which I think mostly affects SparseArray and Categorical. Would it make sense to test on those? Or is DecimalArray affected too? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
does this need a whatsnew note?
# See https://github.com/pandas-dev/pandas/issues/29708 | ||
# Ensure that indexing operations do not materialize (convert to a numpy | ||
# array) the ExtensionArray unnecessary | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can you make this test more generic (e.g. hit more EA arrays)?
For DecimalArray, converting to object array is indeed not a problem (it stores a numpy object array anyway), while for Categorical/Sparse this is more expensive. But it's the mechanism I want to test, and for that it doesn't matter much that it's not expensive for DecimalArray. Making the test more generic (a base test so it is automatically run for all EAs) would be possible, but I am not sure it would be worth it: I would need to add several exceptions (overriding it in several places for certain dtypes to skip it: for sparse it still densifies (which is a separate issue), for numpy ones the test is not valid (it needs to convert to array), for interval it fails for another reason, for boolean is currently still converts to numpy array for the comparison operation, ...). And in the end, I mainly want to test the mechanism in the Index.get_value, which is not dependent on implementation details of the specific EA. |
thanks |
Closes #29708
cc @jreback @jbrockmendel