-
-
Notifications
You must be signed in to change notification settings - Fork 18.4k
TST: add tests for take() on empty arrays #20582
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Changes from 3 commits
4d7cc21
aba95d5
3d6fd2e
eacb3d5
ee66d61
c4faf8e
9257203
a3f88ee
b5d357f
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -458,6 +458,15 @@ def take(self, indexer, allow_fill=True, fill_value=None): | |
Fill value to replace -1 values with. If applicable, this should | ||
use the sentinel missing value for this type. | ||
|
||
Returns | ||
------- | ||
ExtensionArray | ||
|
||
Raises | ||
------ | ||
IndexError | ||
When the indexer is out of bounds for the array. | ||
|
||
Notes | ||
----- | ||
This should follow pandas' semantics where -1 indicates missing values. | ||
|
@@ -477,6 +486,16 @@ def take(self, indexer, allow_fill=True, fill_value=None): | |
def take(self, indexer, allow_fill=True, fill_value=None): | ||
indexer = np.asarray(indexer) | ||
mask = indexer == -1 | ||
|
||
# take on empty array not handled as desired by numpy | ||
if not len(self): | ||
# only valid if result is an all-missing array | ||
if mask.all(): | ||
return type(self)([self._na_value] * len(indexer)) | ||
else: | ||
raise IndexError( | ||
"cannot do a non-empty take from an empty array.") | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. @TomAugspurger Adding the full example is maybe a bit too much (it defeats a bit the purpose of the small illustrative example)? On the other hand, it is needed for a correct implementation. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Agreed... I think that just having a comment noting the correct thing to do for There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. @TomAugspurger I added some explanation in the docstring, but I decided to keep the code example as well: it is now a bit shorter + it is actually really needed to get a correct implementation. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. is this raise condition hit in a test? There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Yes, see the single test I added in this PR: it tests both the all missing case and this error |
||
|
||
result = self.data.take(indexer) | ||
result[mask] = np.nan # NA for this type | ||
return type(self)(result) | ||
|
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -1740,6 +1740,12 @@ def take_nd(self, indexer, allow_fill=True, fill_value=None): | |
# but is passed thru internally | ||
assert isna(fill_value) | ||
|
||
indexer = np.asarray(indexer) | ||
|
||
# take on empty array only valid if result is an all-missing array | ||
if not len(self) and not (indexer == -1).all(): | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. this should be in There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. pls make this change There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Did you see my answer/question the first time you asked this? I gave reasons not to do this, so at least please answer to them. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I already responsded. |
||
raise IndexError("cannot do a non-empty take from an empty array.") | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. this also needs a test There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. This is already tested in the extension array tests for categorical (do you want a duplicate test in the categorical tests?) |
||
|
||
codes = take_1d(self._codes, indexer, allow_fill=True, fill_value=-1) | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. For the internal use cases (Categorical at the moment, later maybe more arrays), I could also push this test to There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. this should not be here rather in take1d (i see you comment) There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I simplified the check a bit here (as it only needed to check the empty and not-all -1 case, as when indexer is all -1 @jreback I am not sure if I should move it to |
||
result = self._constructor(codes, categories=self.categories, | ||
ordered=self.ordered, fastpath=True) | ||
|
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -1,6 +1,7 @@ | ||
import numpy as np | ||
|
||
import pandas as pd | ||
import pandas.util.testing as tm | ||
|
||
from .base import BaseExtensionTests | ||
|
||
|
@@ -120,3 +121,11 @@ def test_take_sequence(self, data): | |
assert result.iloc[0] == data[0] | ||
assert result.iloc[1] == data[1] | ||
assert result.iloc[2] == data[3] | ||
|
||
def test_take_empty_missing(self, data, na_value, na_cmp): | ||
empty = data[:0] | ||
result = empty.take([-1]) | ||
na_cmp(result[0], na_value) | ||
|
||
with tm.assert_raises_regex(IndexError, "cannot do a non-empty take"): | ||
empty.take([0, 1]) | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. @TomAugspurger I didn't find any existing tests for There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. This seems like the right place for testing a behavior specific to take. |
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -62,7 +62,7 @@ def __len__(self): | |
return len(self.values) | ||
|
||
def __repr__(self): | ||
return repr(self.values) | ||
return "DecimalArray: " + repr(self.values) | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Thanks, meant to add this a while ago :) |
||
|
||
@property | ||
def nbytes(self): | ||
|
@@ -78,6 +78,15 @@ def take(self, indexer, allow_fill=True, fill_value=None): | |
indexer = np.asarray(indexer) | ||
mask = indexer == -1 | ||
|
||
# take on empty array | ||
if not len(self): | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. same There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. DecimalArray.take does not use There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. then you need to ask yourself if this is the correct implementation. pandas implementst take_1d for exactly this reason. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
|
||
# only valid if result is an all-missing array | ||
if mask.all(): | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. writing take is a fair amount of effort on the extension writer. meaning additional complexity and have to handle edge cases. take is a pure indexing operation and so ideally should be implemetned as high up in the stack as possible (meaning in the base class for EA) if the EA writer wants to override then ok, but it should just work generally. All of the machinery is already there. The point of the EA is to make it easy, having to handle all of these edge cases is not easy. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
Assuming you have an ndarray. Do we want to provide a default implementation that does an I'm going to make a PR documenting which methods use There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
It will almost never be what you want I think. It could still serve as an example implementation of course, like we now have one in the docstring. |
||
return type(self)([self._na_value] * len(indexer)) | ||
else: | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. no else needed. is the raise hit? |
||
raise IndexError( | ||
"cannot do a non-empty take from an empty array.") | ||
|
||
indexer = _ensure_platform_int(indexer) | ||
out = self.values.take(indexer) | ||
out[mask] = self._na_value | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
you don't need the else, just raise
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, the 'else' clause is indeed not essential. I adapted the error message slightly ("from empty axes" -> "from empty array") because I did not find it the best for ExtensionArrays, but given the minor change I can certainly leave it out for code simplicity.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
pls make this change.
also combine the comments, this doesn't read very well
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yep, did it already locally, didn't push yet