-
-
Notifications
You must be signed in to change notification settings - Fork 18.4k
REGR: RecursionError when attempting to replace np.nan values #46443
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Changes from all commits
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -436,15 +436,15 @@ def split_and_operate(self, func, *args, **kwargs) -> list[Block]: | |
# Up/Down-casting | ||
|
||
@final | ||
def coerce_to_target_dtype(self, other) -> Block: | ||
def coerce_to_target_dtype(self, other, strict_na: bool = False) -> Block: | ||
""" | ||
coerce the current block to a dtype compat for other | ||
we will return a block, possibly object, and not raise | ||
|
||
we can also safely try to coerce to the same dtype | ||
and will receive the same block | ||
""" | ||
new_dtype = find_result_type(self.values, other) | ||
new_dtype = find_result_type(self.values, other, strict_na=strict_na) | ||
|
||
return self.astype(new_dtype, copy=False) | ||
|
||
|
@@ -601,7 +601,7 @@ def replace( | |
return blocks | ||
|
||
elif self.ndim == 1 or self.shape[0] == 1: | ||
blk = self.coerce_to_target_dtype(value) | ||
blk = self.coerce_to_target_dtype(value, strict_na=True) | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. ATM i think rather than add special cases to the lower-level functions, we should keep it isolated here, similar to what you did for #45836 (and ideally down the line combine both fixes into something more elegant) There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. sure. from #45725 (comment)
I think the issues arises basically from the difference between _can_hold_element and is_valid_na_for_dtype not being strict (i.e. None and pd.NA are considered valid for float array). I have added a strict flag to pass through to the lower level library casting functions (i've put this in But it probably does make more sense that the flag is passed further down through to is_valid_na_for_dtype and then maybe use _can_hold_element in is_valid_na_for_dtype when the strict flag is True. also from #45725 (comment)
Seems reasonable. I'm not sure if there are any other use cases. I suspect we need some logic that uses is_valid_na_for_dtype in Block.replace for this to work but also suspect that there will also be some duplication with the lower level library functions to pull this off. As mentioned above, would also be quite happy doing this as a refactor after the recursion issue is resolved by the fix suggested here.
I've not yet looked but off the top of your head, is the code added in #45568 to the lower level functions a special case for Interval? The reason I ask is removing that would also fix the recursion error. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
Most of the edits in #45568 should be correct for The General Case. The only special case for IntervalDtype is in ensure_dtype_can_hold_na, which I don't think is what you're referring to. |
||
return blk.replace( | ||
to_replace=to_replace, | ||
value=value, | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
im wary of introducing this here. is there a hurry on this or can i take some time to catch up on the recent PRs that got us here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The addition here is where the regression occurred fixing an Interval issue. Are you thinking of pushing the fix further down into
is_valid_na_for_dtype
orensure_dtype_can_hold_na
or adding more special casing/code duplication in Block.replace?As I see it, whichever way it's done there is always a risk of changing behavior whereas the fix here keeps the fix for the interval issue and also keeps the existing behavior for replace.
This is on master and I'm quite happy with follow-ups to this that refactor as it's easier to revert a refactor is there is an issue down the line whereas if we need to revert a regression fix we end up re-introducing the regression.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
no hurry (unless there are more refactors to this area of the code which make it more difficult to fix later)