-
-
Notifications
You must be signed in to change notification settings - Fork 18.4k
BUG: Accept Generic Array-Like for .where #15414
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Changes from all commits
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -4732,17 +4732,28 @@ def _where(self, cond, other=np.nan, inplace=False, axis=None, level=None, | |
cond, _ = cond.align(self, join='right', broadcast_axis=1) | ||
else: | ||
if not hasattr(cond, 'shape'): | ||
raise ValueError('where requires an ndarray like object for ' | ||
'its condition') | ||
cond = np.asanyarray(cond) | ||
if cond.shape != self.shape: | ||
raise ValueError('Array conditional must be same shape as ' | ||
'self') | ||
cond = self._constructor(cond, **self._construct_axes_dict()) | ||
|
||
if inplace: | ||
cond = -(cond.fillna(True).astype(bool)) | ||
fill_value = True if inplace else False | ||
cond = cond.fillna(fill_value) | ||
|
||
msg = "Boolean array expected for the condition, not {dtype}" | ||
|
||
if not isinstance(cond, pd.DataFrame): | ||
# This is a single-dimensional object. | ||
if not is_bool_dtype(cond): | ||
raise ValueError(msg.format(dtype=cond.dtype)) | ||
else: | ||
cond = cond.fillna(False).astype(bool) | ||
for dt in cond.dtypes: | ||
if not is_bool_dtype(dt): | ||
raise ValueError(msg.format(dtype=dt)) | ||
|
||
cond = cond.astype(bool, copy=False) | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. and you are flipping this again here? (according to inplace). This is really convoluted There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Perhaps, but this change is just a refactor of what was there before. Also, test break if we don't do that inversion, as you saw above. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I understand, but your refactor is now much more confusing. can you try to make this simpler? There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Wait, are you saying the original code on There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. well, your new code is VERY confusing. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. That doesn't really answer my question. |
||
cond = -cond if inplace else cond | ||
|
||
# try to align | ||
try_quick = True | ||
|
@@ -4891,26 +4902,20 @@ def _where(self, cond, other=np.nan, inplace=False, axis=None, level=None, | |
|
||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. This belonged to the There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I personally found that confusing. I wouldn't have known that from reading the original doc. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. In the html docs it is slightly clearer. But indeed, for a correct syntax, the empty line should not be there, like
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. @jorisvandenbossche : PR up to patch (#15509) |
||
Parameters | ||
---------- | ||
cond : boolean %(klass)s, array or callable | ||
cond : boolean %(klass)s, array-like, or callable | ||
If cond is callable, it is computed on the %(klass)s and | ||
should return boolean %(klass)s or array. | ||
The callable must not change input %(klass)s | ||
(though pandas doesn't check it). | ||
should return boolean %(klass)s or array. The callable must | ||
not change input %(klass)s (though pandas doesn't check it). | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. maybe add 0.20.0 now supports array-like? There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I don't consider it an enhancement. We generally accept array-like across the board. |
||
|
||
.. versionadded:: 0.18.1 | ||
|
||
A callable can be used as cond. | ||
|
||
other : scalar, %(klass)s, or callable | ||
If other is callable, it is computed on the %(klass)s and | ||
should return scalar or %(klass)s. | ||
The callable must not change input %(klass)s | ||
(though pandas doesn't check it). | ||
should return scalar or %(klass)s. The callable must not | ||
change input %(klass)s (though pandas doesn't check it). | ||
|
||
.. versionadded:: 0.18.1 | ||
|
||
A callable can be used as other. | ||
|
||
inplace : boolean, default False | ||
Whether to perform the operation in place on the data | ||
axis : alignment axis if needed, default None | ||
|
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -2479,6 +2479,95 @@ def _check_set(df, cond, check_dtypes=True): | |
expected = df[df['a'] == 1].reindex(df.index) | ||
assert_frame_equal(result, expected) | ||
|
||
def test_where_array_like(self): | ||
# see gh-15414 | ||
klasses = [list, tuple, np.array] | ||
|
||
df = DataFrame({'a': [1, 2, 3]}) | ||
cond = [[False], [True], [True]] | ||
expected = DataFrame({'a': [np.nan, 2, 3]}) | ||
|
||
for klass in klasses: | ||
result = df.where(klass(cond)) | ||
assert_frame_equal(result, expected) | ||
|
||
df['b'] = 2 | ||
expected['b'] = [2, np.nan, 2] | ||
cond = [[False, True], [True, False], [True, True]] | ||
|
||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. can you add a test for rejected things (e.g. non-boolean list-like, Timestamp, string, etcc) There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. @gfyoung everything looks lgtm. Can you add some tests for rejected things, ping on green. |
||
for klass in klasses: | ||
result = df.where(klass(cond)) | ||
assert_frame_equal(result, expected) | ||
|
||
def test_where_invalid_input(self): | ||
# see gh-15414: only boolean arrays accepted | ||
df = DataFrame({'a': [1, 2, 3]}) | ||
msg = "Boolean array expected for the condition" | ||
|
||
conds = [ | ||
[[1], [0], [1]], | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. can you also add some that have an invalid shape? e.g. a scalar, an unalignable Series, a 1-dim ndarray (or is that broadcast)? There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. It depends in fact on the type of input you pass in. If you pass in an There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. ok so this would be good to test then :> There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Will do once this recent commit has been tested on Travis and Appveyor. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. @jreback : Tests have been added, and everything looks good. Have a look. |
||
Series([[2], [5], [7]]), | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. why would you use a Series with a list-like? that is really odd here (even to test), just use a regular alignable, non-boolean series. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. This is because a list-of-lists is already aligned with |
||
DataFrame({'a': [2, 5, 7]}), | ||
[["True"], ["False"], ["True"]], | ||
[[Timestamp("2017-01-01")], | ||
[pd.NaT], [Timestamp("2017-01-02")]] | ||
] | ||
|
||
for cond in conds: | ||
with tm.assertRaisesRegexp(ValueError, msg): | ||
df.where(cond) | ||
|
||
df['b'] = 2 | ||
conds = [ | ||
[[0, 1], [1, 0], [1, 1]], | ||
Series([[0, 2], [5, 0], [4, 7]]), | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. same here There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. same answer as above |
||
[["False", "True"], ["True", "False"], | ||
["True", "True"]], | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. try at least a DataFrame as well There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Done. |
||
DataFrame({'a': [2, 5, 7], 'b': [4, 8, 9]}), | ||
[[pd.NaT, Timestamp("2017-01-01")], | ||
[Timestamp("2017-01-02"), pd.NaT], | ||
[Timestamp("2017-01-03"), Timestamp("2017-01-03")]] | ||
] | ||
|
||
for cond in conds: | ||
with tm.assertRaisesRegexp(ValueError, msg): | ||
df.where(cond) | ||
|
||
def test_where_dataframe_col_match(self): | ||
df = DataFrame([[1, 2, 3], [4, 5, 6]]) | ||
cond = DataFrame([[True, False, True], [False, False, True]]) | ||
|
||
out = df.where(cond) | ||
expected = DataFrame([[1.0, np.nan, 3], [np.nan, np.nan, 6]]) | ||
tm.assert_frame_equal(out, expected) | ||
|
||
cond.columns = ["a", "b", "c"] # Columns no longer match. | ||
msg = "Boolean array expected for the condition" | ||
with tm.assertRaisesRegexp(ValueError, msg): | ||
df.where(cond) | ||
|
||
def test_where_ndframe_align(self): | ||
msg = "Array conditional must be same shape as self" | ||
df = DataFrame([[1, 2, 3], [4, 5, 6]]) | ||
|
||
cond = [True] | ||
with tm.assertRaisesRegexp(ValueError, msg): | ||
df.where(cond) | ||
|
||
expected = DataFrame([[1, 2, 3], [np.nan, np.nan, np.nan]]) | ||
|
||
out = df.where(Series(cond)) | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. This already worked like this before I think, but since we explicitly added a test for it: how can this result be explained? The Series aligned on the index and broadcasted along the columns? That is not how alignment otherwise works .. (alignment of There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. For There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. But those do not just align (unless you specify the axis on which to align):
There is an axis keyword in So it seems that There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I know that. The code does it internally. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Yes, I don't doubt that you know that, but my initial question was: how do we explain this (to users)? This is not documented, this is IMO inconsistent, and one would expect the There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. @jorisvandenbossche : Actually, no. Regardless of Whether or not this behavior is correct is another story. I can see your point regarding this inconsistency, but mind you that changing it to err would be an API change. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Yes, I know this behaviour was already like that before your patch, as I mentioned in my initial comment. I just raised this because we where explicitly adding a test for it (while the behaviour could also have a case of 'undefined/uncatched behaviour' that was not explicitly intended). But probably better to discuss in separate issue. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Other question: what is the purpose of the There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. OK, I see in the code this is used to align There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
Agreed. |
||
tm.assert_frame_equal(out, expected) | ||
|
||
cond = np.array([False, True, False, True]) | ||
with tm.assertRaisesRegexp(ValueError, msg): | ||
df.where(cond) | ||
|
||
expected = DataFrame([[np.nan, np.nan, np.nan], [4, 5, 6]]) | ||
|
||
out = df.where(Series(cond)) | ||
tm.assert_frame_equal(out, expected) | ||
|
||
def test_where_bug(self): | ||
|
||
# GH 2793 | ||
|
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -497,6 +497,18 @@ def test_where(self): | |
result = i.where(cond) | ||
tm.assert_index_equal(result, expected) | ||
|
||
def test_where_array_like(self): | ||
i = self.create_index() | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. issue number There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. These are just confirmation tests. The behavior was never broken for indices. |
||
|
||
_nan = i._na_value | ||
cond = [False] + [True] * (len(i) - 1) | ||
klasses = [list, tuple, np.array, pd.Series] | ||
expected = pd.Index([_nan] + i[1:].tolist(), dtype=i.dtype) | ||
|
||
for klass in klasses: | ||
result = i.where(klass(cond)) | ||
tm.assert_index_equal(result, expected) | ||
|
||
def test_setops_errorcases(self): | ||
for name, idx in compat.iteritems(self.indices): | ||
# # non-iterable input | ||
|
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -1193,6 +1193,60 @@ def f(): | |
expected = Series(np.nan, index=[9]) | ||
assert_series_equal(result, expected) | ||
|
||
def test_where_array_like(self): | ||
# see gh-15414 | ||
s = Series([1, 2, 3]) | ||
cond = [False, True, True] | ||
expected = Series([np.nan, 2, 3]) | ||
klasses = [list, tuple, np.array, Series] | ||
|
||
for klass in klasses: | ||
result = s.where(klass(cond)) | ||
assert_series_equal(result, expected) | ||
|
||
def test_where_invalid_input(self): | ||
# see gh-15414: only boolean arrays accepted | ||
s = Series([1, 2, 3]) | ||
msg = "Boolean array expected for the condition" | ||
|
||
conds = [ | ||
[1, 0, 1], | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. maybe add a DataFrame as input (maybe that works if the names match?) There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Works if names match. Fails if they do not. This is because we "align" the There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. same There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Yep, test was added just now. Just waiting for Travis and Appveyor to approve now. |
||
Series([2, 5, 7]), | ||
["True", "False", "True"], | ||
[Timestamp("2017-01-01"), | ||
pd.NaT, Timestamp("2017-01-02")] | ||
] | ||
|
||
for cond in conds: | ||
with tm.assertRaisesRegexp(ValueError, msg): | ||
s.where(cond) | ||
|
||
msg = "Array conditional must be same shape as self" | ||
with tm.assertRaisesRegexp(ValueError, msg): | ||
s.where([True]) | ||
|
||
def test_where_ndframe_align(self): | ||
msg = "Array conditional must be same shape as self" | ||
s = Series([1, 2, 3]) | ||
|
||
cond = [True] | ||
with tm.assertRaisesRegexp(ValueError, msg): | ||
s.where(cond) | ||
|
||
expected = Series([1, np.nan, np.nan]) | ||
|
||
out = s.where(Series(cond)) | ||
tm.assert_series_equal(out, expected) | ||
|
||
cond = np.array([False, True, False, True]) | ||
with tm.assertRaisesRegexp(ValueError, msg): | ||
s.where(cond) | ||
|
||
expected = Series([np.nan, 2, np.nan]) | ||
|
||
out = s.where(Series(cond)) | ||
tm.assert_series_equal(out, expected) | ||
|
||
def test_where_setitem_invalid(self): | ||
|
||
# GH 2702 | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
none of this appears to be tested.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I saw your comment above, and I have yet to write those tests. In addition, this branch gets hit all the time for the current
.where
tests.There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
true, but not specifically for validation boolean-like inputs.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not sure what you mean here. This branch gets hit whenever you call
Series.where
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
these have not tests for the condition you added at all.
In [1]: s = Series([1,2,3])
makes no sense and is not tested
sure this gets hit all the time with expected input. but I suspect is barely tests with unexpected input.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm getting to it with the tests for invalid input. That example should break if you pull down my changes at this point.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ok great, thanks @gfyoung ping when you want review
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@jreback : Tests have been added. Have a look.