ENH: Processing of .mask() for pd.NA #56844 #58730

mafaldam · 2024-05-15T13:07:09Z

closes ENH: Processing of .mask() for pd.NA #56844
Tests added and passed if fixing a bug or adding a new feature
All code checks passed.
Added type annotations to new arguments/methods/functions.
Added an entry in the latest doc/source/whatsnew/vX.X.X.rst file if fixing a bug or adding a new feature.

This commit addresses an issue where using NA in conjunction with the mask() method that resulted in unexpected behavior. The problem arose when comparing a Series containing NA as a condition within mask(), causing inconsistencies in the output. This fix ensures that NA behaves consistently as False in logical operations within mask().

This commit addresses an issue where using pandas.NA in conjunction with the mask() method resulted in unexpected behavior. The problem arose when comparing a Series containing pandas.NA with a condition within mask(), causing inconsistencies in the output. This fix ensures that pandas.NA behaves consistently with False in logical operations within mask().

Co-authored-by: Xiao Yuan <[email protected]>

This commit addresses an issue where using pandas.NA in conjunction with the mask() method resulted in unexpected behavior. The problem arose when comparing a Series containing pandas.NA with a condition within mask(), causing inconsistencies in the output. This fix ensures that pandas.NA behaves consistently with False in logical operations within mask().

Co-authored-by: Xiao Yuan <[email protected]>

WillAyd · 2024-05-15T14:47:01Z

pandas/core/generic.py

@@ -9999,6 +9999,9 @@ def mask(
        cond = common.apply_if_callable(cond, self)
        other = common.apply_if_callable(other, self)

+        if isinstance(cond, (ABCDataFrame, ABCSeries)):


Rather than a fillna like this do you see where the logic is breaking down that causes .mask to treat pd.NA the same was as True? We shouldn't have to mess around with pd.NA equality semantics for this fix

For indexing this goes through check_array_indexer which essentially also converts NA to False (but lower in the stack):

pandas/pandas/core/indexers/utils.py

Lines 532 to 534 in 2aa155a

if is_bool_dtype(dtype):

if isinstance(dtype, ExtensionDtype):

indexer = indexer.to_numpy(dtype=bool, na_value=False)

pandas/tests/frame/indexing/test_mask.py

jbrockmendel · 2024-05-21T16:00:52Z

@phofl @jorisvandenbossche this seems related to part of the ice cream agreement. Are we ready to move forward with that part? I’d expect to do the indexing bits all at once

jorisvandenbossche · 2024-05-22T16:16:47Z

@jbrockmendel I think we can simply treat this as a bug fix of the current implementation. We do handle the nullable boolean dtype specifically in indexing already (considering NA as False, a behaviour which has been discussed in the past already, #31591), so fixing that for mask as well seems like a bug fix to me / fix to make things consistent.

Co-authored-by: William Ayd <[email protected]>

yuanx749 · 2024-06-02T15:35:45Z

pandas/tests/frame/indexing/test_mask.py

+
+def test_mask_with_NA():
+    df = DataFrame({"A": [0, 1, 2]})
+    cond = Series([True, False, pd.NA], dtype=pd.BooleanDtype())


To pass the tests, I think dtype=BooleanDtype() should be used here, and from pandas import BooleanDtype at the top of the file, just like other dtypes.

mafaldam · 2024-06-05T12:36:05Z

Hi,
I've addressed the initial feedback on mask() handling pd.NA values. However, I'm unsure about the next steps, particularly regarding pd.NA equality semantics and indexing consistency. Any specific guidance would be appreciated!

github-actions · 2024-07-06T00:05:59Z

This pull request is stale because it has been open for thirty days with no activity. Please update and respond to this comment if you're still interested in working on this.

mroeschke · 2024-07-22T17:56:06Z

Thanks for the pull request, but it appears to have gone stale. If interested in continuing, please merge in the main branch, address any review comments and/or failing tests, and we can reopen.

mafaldam and others added 5 commits April 3, 2024 11:40

Update pandas/core/generic.py

a0369c7

Co-authored-by: Xiao Yuan <[email protected]>

Update pandas/core/generic.py

3b20b84

Co-authored-by: Xiao Yuan <[email protected]>

fix: rebase with master

15c74b5

WillAyd requested changes May 15, 2024

View reviewed changes

pandas/tests/frame/indexing/test_mask.py Outdated Show resolved Hide resolved

mroeschke added the Missing-data np.nan, pd.NaT, pd.NA, dropna, isnull, interpolate label May 15, 2024

Update pandas/tests/frame/indexing/test_mask.py

d005ecc

Co-authored-by: William Ayd <[email protected]>

yuanx749 reviewed Jun 2, 2024

View reviewed changes

github-actions bot added the Stale label Jul 6, 2024

mroeschke closed this Jul 22, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

ENH: Processing of .mask() for pd.NA #56844 #58730

ENH: Processing of .mask() for pd.NA #56844 #58730

mafaldam commented May 15, 2024

WillAyd May 15, 2024

jorisvandenbossche May 22, 2024

jbrockmendel commented May 21, 2024

jorisvandenbossche commented May 22, 2024 •

edited

Loading

yuanx749 Jun 2, 2024

mafaldam commented Jun 5, 2024

github-actions bot commented Jul 6, 2024

mroeschke commented Jul 22, 2024

	if is_bool_dtype(dtype):
	if isinstance(dtype, ExtensionDtype):
	indexer = indexer.to_numpy(dtype=bool, na_value=False)

ENH: Processing of .mask() for pd.NA #56844 #58730

ENH: Processing of .mask() for pd.NA #56844 #58730

Conversation

mafaldam commented May 15, 2024

WillAyd May 15, 2024

Choose a reason for hiding this comment

jorisvandenbossche May 22, 2024

Choose a reason for hiding this comment

jbrockmendel commented May 21, 2024

jorisvandenbossche commented May 22, 2024 • edited Loading

yuanx749 Jun 2, 2024

Choose a reason for hiding this comment

mafaldam commented Jun 5, 2024

github-actions bot commented Jul 6, 2024

mroeschke commented Jul 22, 2024

jorisvandenbossche commented May 22, 2024 •

edited

Loading