Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
API: generalized check_array_indexer for validating array-like getitem indexers #31150
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
API: generalized check_array_indexer for validating array-like getitem indexers #31150
Changes from 1 commit
e8f539a
4fa9f5a
b55dfd2
095b741
58bfe78
5ce8d85
ebc2150
4a51d97
50490aa
c979df8
ce2e042
4d447bf
d930e84
9ed8fe9
2f8cd27
4d9a201
097d221
3c5e4c6
1ca35d1
e5ea9b4
File filter
Filter by extension
Conversations
Jump to
There are no files selected for viewing
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
is this repeated non purpose?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
repeated from where?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
the next check is_bool_indexer is duplicative
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's not fully duplicative, see my long explanation at #31150 (comment). It's mainly for dealing with object dtype.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can you type these at all, shouldn't indexer -> key and be Label (or maybe something more sophisticated); not looking to solve this in this PR necessarily
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
1.0 or 1.1?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
1.0 if we're planning to subsume check_bool_array_indexer.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, this is replacing
check_bool_array_indexer
which is already in 1.0.0, so we should do the replacement also for 1.0.0There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can this be made more specific, e.g. "np.ndarray or EA"?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's only used to get the length, so made it "array-like" (can in principle also be a Series)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In a few places above, you've done
is_list_like
, but here we require an array (with a dtype).Thoughts on what we want? Requiring an array is certainly easier, so that we don't have to infer the types. But users may be passing arbitrary objects to
__getitem__
.There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We actually don't require an array with a dtype. The first thing that this function does is:
to deal with eg lists.
So I probably meant to update the array into "list-like" instead of "array-like"
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
does int vs np.int64 vs np.intp matter here? are there failure modes other than the presence of NAs?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this does matter; indexers are intp
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, that was on my todo to fix up. Need to figure out the easiest way to convert to numpy array preserving the bit-ness of the dtype (or can we always convert to intp?)
Will update tomorrow
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
OK, went with np.intp. From a quick test, when you pass non-intp integers to index with numpy, it's not slower to do the conversion to intp yourself beforehand (although while writing this, what happens if you try to index with a too large int64 that doesn't fit into int32 on a 32-bit platform?)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ensure_platform_int is a well established pattern
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do you prefer to update
ensure_platform_int
to handle extension arrays so I can use it here? (it's basically the same asnp.asarray(.., dtype=np.intp)
, not really sure why the code inensure_platform_int
takes more hoops, performance I suppose)There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
either way - but should be consistent and use only 1 pattern; ensure_platform_int is used extensively already