Skip to content

BUG: Don't raise for NDFrame.mask with nullable boolean #36201

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
wants to merge 39 commits into from
Closed
Show file tree
Hide file tree
Changes from 4 commits
Commits
Show all changes
39 commits
Select commit Hold shift + click to select a range
4197885
BUG: Don't raise for NDFrame.mask with nullable boolean
dsaxton Sep 7, 2020
0ff4f41
Note
dsaxton Sep 7, 2020
d79fcc7
Merge remote-tracking branch 'upstream/master' into mask-with-nullabl…
dsaxton Sep 7, 2020
6be29fb
Dummy commit
dsaxton Sep 7, 2020
17bdca0
Merge remote-tracking branch 'upstream/master' into mask-with-nullabl…
dsaxton Sep 8, 2020
aa3c357
Move note
dsaxton Sep 8, 2020
566d0d6
Merge remote-tracking branch 'upstream/master' into mask-with-nullabl…
dsaxton Sep 9, 2020
d6998dd
Merge remote-tracking branch 'upstream/master' into mask-with-nullabl…
dsaxton Sep 11, 2020
8d35401
Move note
dsaxton Sep 11, 2020
3169909
Invert in _where
dsaxton Sep 11, 2020
db45e2e
Comment out some tests for now
dsaxton Sep 11, 2020
a1fcf51
Merge remote-tracking branch 'upstream/master' into mask-with-nullabl…
dsaxton Sep 11, 2020
bd584da
xor
dsaxton Sep 11, 2020
f2e8b7d
Merge remote-tracking branch 'upstream/master' into mask-with-nullabl…
dsaxton Sep 13, 2020
ea457ea
Doc
dsaxton Sep 13, 2020
346606c
Version added
dsaxton Sep 13, 2020
92ae6bf
Merge remote-tracking branch 'upstream/master' into mask-with-nullabl…
dsaxton Sep 13, 2020
d4704a0
Update whatsnew
dsaxton Sep 13, 2020
f7a1f64
Use invert
dsaxton Sep 13, 2020
0ac0930
One line
dsaxton Sep 13, 2020
6dd6c1e
Undo
dsaxton Sep 13, 2020
b40243e
Merge remote-tracking branch 'upstream/master' into mask-with-nullabl…
dsaxton Sep 13, 2020
363560c
Boolean magic
dsaxton Sep 13, 2020
2020fd6
Only comment
dsaxton Sep 13, 2020
1cf5a0c
Merge remote-tracking branch 'upstream/master' into mask-with-nullabl…
dsaxton Sep 16, 2020
d6fb23a
Add frame indexing test
dsaxton Sep 16, 2020
e292606
Merge remote-tracking branch 'upstream/master' into mask-with-nullabl…
dsaxton Sep 17, 2020
2f024fe
Merge remote-tracking branch 'upstream/master' into mask-with-nullabl…
dsaxton Sep 19, 2020
0b90786
Remove arg
dsaxton Sep 19, 2020
054a8cd
Merge remote-tracking branch 'upstream/master' into mask-with-nullabl…
dsaxton Sep 19, 2020
cd601f6
Merge remote-tracking branch 'upstream/master' into mask-with-nullabl…
dsaxton Sep 21, 2020
424b6bd
Merge remote-tracking branch 'upstream/master' into mask-with-nullabl…
dsaxton Sep 23, 2020
8e8cb9d
Merge remote-tracking branch 'upstream/master' into mask-with-nullabl…
dsaxton Sep 25, 2020
2151cc0
Merge remote-tracking branch 'upstream/master' into mask-with-nullabl…
dsaxton Sep 25, 2020
59cf6a1
Merge remote-tracking branch 'upstream/master' into mask-with-nullabl…
dsaxton Sep 28, 2020
dfb7b59
Merge remote-tracking branch 'upstream/master' into mask-with-nullabl…
dsaxton Sep 28, 2020
a313268
Merge remote-tracking branch 'upstream/master' into mask-with-nullabl…
dsaxton Sep 29, 2020
ee2849c
Merge remote-tracking branch 'upstream/master' into mask-with-nullabl…
dsaxton Oct 7, 2020
5a4bacf
Merge remote-tracking branch 'upstream/master' into mask-with-nullabl…
dsaxton Oct 10, 2020
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
1 change: 1 addition & 0 deletions doc/source/whatsnew/v1.1.2.rst
Original file line number Diff line number Diff line change
Expand Up @@ -23,6 +23,7 @@ Fixed regressions
- Regression in :meth:`DataFrame.replace` where a ``TypeError`` would be raised when attempting to replace elements of type :class:`Interval` (:issue:`35931`)
- Fix regression in pickle roundtrip of the ``closed`` attribute of :class:`IntervalIndex` (:issue:`35658`)
- Fixed regression in :meth:`DataFrameGroupBy.agg` where a ``ValueError: buffer source array is read-only`` would be raised when the underlying array is read-only (:issue:`36014`)
- Fixed regression in :meth:`DataFrame.mask` and :meth:`DataFrame.where` raising an ``AseertionError`` when using a nullable boolean mask (:issue:`35429`)
- Fixed regression in :meth:`Series.groupby.rolling` number of levels of :class:`MultiIndex` in input was compressed to one (:issue:`36018`)
-

Expand Down
2 changes: 2 additions & 0 deletions pandas/core/frame.py
Original file line number Diff line number Diff line change
Expand Up @@ -3122,6 +3122,8 @@ def _setitem_frame(self, key, value):

self._check_inplace_setting(value)
self._check_setitem_copy()
if not self._indexed_same(key):
key = key.reindex_like(self).fillna(False)
self._where(-key, value, inplace=True)

def _iset_item(self, loc: int, value):
Expand Down
10 changes: 6 additions & 4 deletions pandas/core/generic.py
Original file line number Diff line number Diff line change
Expand Up @@ -8816,8 +8816,7 @@ def _where(
cond = self._constructor(cond, **self._construct_axes_dict())

# make sure we are boolean
fill_value = bool(inplace)
cond = cond.fillna(fill_value)
cond = cond.fillna(False)

msg = "Boolean array expected for the condition, not {dtype}"

Expand All @@ -8834,7 +8833,7 @@ def _where(
# GH#21947 we have an empty DataFrame/Series, could be object-dtype
cond = cond.astype(bool)

cond = -cond if inplace else cond
cond = ~cond if inplace else cond
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

if you are goin to change this then let's clean starting on L834.


# try to align with other
try_quick = True
Expand Down Expand Up @@ -9094,9 +9093,12 @@ def mask(
cond = com.apply_if_callable(cond, self)

# see gh-21891
if not hasattr(cond, "__invert__"):
if not hasattr(cond, "__array__"):
cond = np.array(cond)

cond[isna(cond)] = False
cond = cond.astype(bool, copy=False)

return self.where(
~cond,
other=other,
Expand Down
5 changes: 3 additions & 2 deletions pandas/core/internals/blocks.py
Original file line number Diff line number Diff line change
Expand Up @@ -2933,7 +2933,8 @@ def _extract_bool_array(mask: ArrayLike) -> np.ndarray:
# Except for BooleanArray, this is equivalent to just
# np.asarray(mask, dtype=bool)
mask = mask.to_numpy(dtype=bool, na_value=False)
else:
assert isinstance(mask, np.ndarray), type(mask)
mask = mask.astype(bool, copy=False)

assert isinstance(mask, np.ndarray), type(mask)
assert mask.dtype == bool, mask.dtype
return mask
18 changes: 17 additions & 1 deletion pandas/tests/frame/indexing/test_mask.py
Original file line number Diff line number Diff line change
Expand Up @@ -3,8 +3,9 @@
"""

import numpy as np
import pytest

from pandas import DataFrame, isna
from pandas import DataFrame, Series, isna
import pandas._testing as tm


Expand Down Expand Up @@ -83,3 +84,18 @@ def test_mask_dtype_conversion(self):
expected = bools.astype(float).mask(mask)
result = bools.mask(mask)
tm.assert_frame_equal(result, expected)

@pytest.mark.parametrize("inplace", [True, False])
def test_mask_nullable_boolean(self, inplace):
# https://github.com/pandas-dev/pandas/issues/35429
df = DataFrame([1, 2, 3])
mask = Series([True, False, None], dtype="boolean")
expected = DataFrame([999, 2, 3])

if inplace:
result = df.copy()
result.mask(mask, 999, inplace=True)
else:
result = df.mask(mask, 999, inplace=False)

tm.assert_frame_equal(result, expected)
19 changes: 17 additions & 2 deletions pandas/tests/frame/indexing/test_where.py
Original file line number Diff line number Diff line change
Expand Up @@ -159,7 +159,7 @@ def test_where_set(self, where_frame, float_string_frame):

def _check_set(df, cond, check_dtypes=True):
dfi = df.copy()
econd = cond.reindex_like(df).fillna(True)
econd = cond.reindex_like(df).fillna(False)
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This test was broken by no longer setting fill_value = bool(inplace) so had to update

expected = dfi.mask(~econd)

return_value = dfi.where(cond, np.nan, inplace=True)
Expand All @@ -169,7 +169,7 @@ def _check_set(df, cond, check_dtypes=True):
# dtypes (and confirm upcasts)x
if check_dtypes:
for k, v in df.dtypes.items():
if issubclass(v.type, np.integer) and not cond[k].all():
if issubclass(v.type, np.integer) and not econd[k].all():
v = np.dtype("float64")
assert dfi[k].dtype == v

Expand Down Expand Up @@ -642,3 +642,18 @@ def test_df_where_with_category(self, kwargs):
expected = Series(A, name="A")

tm.assert_series_equal(result, expected)

@pytest.mark.parametrize("inplace", [True, False])
def test_where_nullable_boolean_mask(self, inplace):
# https://github.com/pandas-dev/pandas/issues/35429
df = DataFrame([1, 2, 3])
mask = Series([True, False, None], dtype="boolean")
expected = DataFrame([1, 999, 999])

if inplace:
result = df.copy()
result.where(mask, 999, inplace=True)
else:
result = df.where(mask, 999, inplace=False)

tm.assert_frame_equal(result, expected)
16 changes: 16 additions & 0 deletions pandas/tests/series/indexing/test_mask.py
Original file line number Diff line number Diff line change
Expand Up @@ -63,3 +63,19 @@ def test_mask_inplace():
rs = s.copy()
rs.mask(cond, -s, inplace=True)
tm.assert_series_equal(rs, s.mask(cond, -s))


@pytest.mark.parametrize("inplace", [True, False])
def test_mask_nullable_boolean(inplace):
# https://github.com/pandas-dev/pandas/issues/35429
ser = Series([1, 2, 3])
mask = Series([True, False, None], dtype="boolean")
expected = Series([999, 2, 3])

if inplace:
result = ser.copy()
result.mask(mask, 999, inplace=True)
else:
result = ser.mask(mask, 999, inplace=False)

tm.assert_series_equal(result, expected)
16 changes: 16 additions & 0 deletions pandas/tests/series/indexing/test_where.py
Original file line number Diff line number Diff line change
Expand Up @@ -452,3 +452,19 @@ def test_where_empty_series_and_empty_cond_having_non_bool_dtypes():
ser = Series([], dtype=float)
result = ser.where([])
tm.assert_series_equal(result, ser)


@pytest.mark.parametrize("inplace", [True, False])
def test_where_nullable_boolean_mask(inplace):
# https://github.com/pandas-dev/pandas/issues/35429
ser = Series([1, 2, 3])
mask = Series([True, False, None], dtype="boolean")
expected = Series([1, 999, 999])

if inplace:
result = ser.copy()
result.where(mask, 999, inplace=True)
else:
result = ser.where(mask, 999, inplace=False)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

following on from previous comment.

>>> ser = pd.Series([1, 2, 3], dtype="Int64")
>>> mask = pd.Series([True, False, None], dtype="boolean")
>>> ser.where(mask, 999)
0      1
1    999
2    999
dtype: Int64

using a nullable mask with a nullable type gives the wrong output?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This output is correct IMO, but the output changes if the operation is inplace:

[ins] In [4]: s.where(mask, 999)
Out[4]: 
0      1
1    999
2    999
dtype: Int64

[ins] In [5]: s.where(mask, 999, inplace=True)

[ins] In [6]: s
Out[6]: 
0      1
1    999
2      3
dtype: Int64


tm.assert_series_equal(result, expected)