-
-
Notifications
You must be signed in to change notification settings - Fork 18.4k
BUG: Add fillna at the beginning of _where not to fill NA. #60729 #60772
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Open
sanggon6107
wants to merge
34
commits into
pandas-dev:main
Choose a base branch
from
sanggon6107:add-mask-fillna
base: main
Could not load branches
Branch not found: {{ refName }}
Loading
Could not load tags
Nothing to show
Loading
Are you sure you want to change the base?
Some commits from the old base branch may be removed from the timeline,
and old review comments may become outdated.
Open
Changes from all commits
Commits
Show all changes
34 commits
Select commit
Hold shift + click to select a range
bc9a942
BUG: Add fillna so that cond doesnt contain NA at the beginning of _w…
sanggon6107 558569f
TST: Add tests for mask with NA. (#60729)
sanggon6107 bbbc720
BUG: Fix _where to make np.ndarray mutable. (#60729)
sanggon6107 e2f32cb
DOC: Add documentation regarding the bug (#60729)
sanggon6107 6fd8986
Merge branch 'main' into add-mask-fillna
sanggon6107 d2d5f62
ENH: Optimze test_mask_na()
sanggon6107 475f2d1
BUG: Fix a bug in test_mask_na() (#60729)
sanggon6107 db30b58
Update doc/source/whatsnew/v3.0.0.rst
sanggon6107 55fe420
Merge branch 'main' into add-mask-fillna
sanggon6107 cb94cf7
Add test arguments for test_mask_na
sanggon6107 71e442e
Fix whatsnew
sanggon6107 b6bd3af
Fix test failures by adding importorskip
sanggon6107 8bac997
Fill True when tuple or list cond has np.nan/pd.NA
sanggon6107 89bc1b4
Merge branch 'main' into add-mask-fillna
sanggon6107 f154cf5
Optimize _where
sanggon6107 eed6121
Optimize test_mask_na
sanggon6107 9ac81f0
Add np.array for read-only ndarray
sanggon6107 7e3fd3a
Merge branch 'main' into add-mask-fillna
sanggon6107 8c5ffff
Update generic.py
sanggon6107 5516517
Revert generic.py
sanggon6107 2437ce2
Merge branch 'main' into add-mask-fillna
sanggon6107 9556aa4
Replace np.array with fillna
sanggon6107 b64b8a7
Correct the unintended deletion
sanggon6107 9574746
Merge branch 'main' into add-mask-fillna
sanggon6107 c073c0b
Add test for list and ndarray
sanggon6107 4eea08e
Handle list with NA
sanggon6107 bbc5612
Fix code checks
sanggon6107 0851593
Fix type
sanggon6107 98fb602
Optimize operation
sanggon6107 915b8a7
Prevent a list with np.nan converting to float
sanggon6107 7611f59
Add test for a list with np.nan
sanggon6107 88b0530
Merge branch 'main' into add-mask-fillna
sanggon6107 044d0a9
fillna after constructor
sanggon6107 601f6c9
Parametrize cond type
sanggon6107 File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -28,6 +28,7 @@ | |
|
||
from pandas._libs import lib | ||
from pandas._libs.lib import is_range_indexer | ||
from pandas._libs.missing import NA | ||
from pandas._libs.tslibs import ( | ||
Period, | ||
Timestamp, | ||
|
@@ -9702,6 +9703,7 @@ def _where( | |
# align the cond to same shape as myself | ||
cond = common.apply_if_callable(cond, self) | ||
if isinstance(cond, NDFrame): | ||
cond = cond.fillna(True) | ||
# CoW: Make sure reference is not kept alive | ||
if cond.ndim == 1 and self.ndim == 2: | ||
cond = cond._constructor_expanddim( | ||
|
@@ -9712,10 +9714,11 @@ def _where( | |
cond = cond.align(self, join="right")[0] | ||
else: | ||
if not hasattr(cond, "shape"): | ||
cond = np.asanyarray(cond) | ||
cond = np.asanyarray(cond, dtype=object) | ||
if cond.shape != self.shape: | ||
raise ValueError("Array conditional must be same shape as self") | ||
cond = self._constructor(cond, **self._construct_axes_dict(), copy=False) | ||
cond = cond.fillna(True) | ||
|
||
# make sure we are boolean | ||
fill_value = bool(inplace) | ||
|
@@ -10096,7 +10099,17 @@ def mask( | |
|
||
# see gh-21891 | ||
if not hasattr(cond, "__invert__"): | ||
cond = np.array(cond) | ||
cond = np.array(cond, dtype=object) | ||
|
||
if isinstance(cond, np.ndarray): | ||
if all( | ||
x is NA or isinstance(x, (np.bool_, bool)) or x is np.nan | ||
for x in cond.flatten() | ||
): | ||
Comment on lines
+10105
to
+10108
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Can you detail why this is necessary. |
||
if not cond.flags.writeable: | ||
cond.setflags(write=True) | ||
cond[isna(cond)] = False | ||
cond = cond.astype(bool) | ||
|
||
return self._where( | ||
~cond, | ||
|
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Oops, something went wrong.
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Below we
fillna
using the value ofinplace
. UsingTrue
as done here looks correct, usinginplace
seems to be a bug. On main:There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @rhshadrach ,
I think
align
atL9714
sometimes returns aDataFrame
with NA in it, and they need to be filled with either of True or False depending on caller. (__setitem__
requires the former, and that's why it calls_where()
withinplace=True
)My suggestion is to add an argument to
_where
so that the caller can decide whether or not we are going to fill the missing value withTrue
afteralign
. Could you let me know if this change would be acceptable?