-
-
Notifications
You must be signed in to change notification settings - Fork 18.4k
REGR: RecursionError when attempting to replace np.nan values #46443
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
REGR: RecursionError when attempting to replace np.nan values #46443
Conversation
@@ -1454,7 +1454,7 @@ def _ensure_nanosecond_dtype(dtype: DtypeObj) -> DtypeObj: | |||
|
|||
# TODO: other value-dependent functions to standardize here include | |||
# dtypes.concat.cast_to_common_type and Index._find_common_type_compat | |||
def find_result_type(left: ArrayLike, right: Any) -> DtypeObj: | |||
def find_result_type(left: ArrayLike, right: Any, strict_na: bool = False) -> DtypeObj: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
im wary of introducing this here. is there a hurry on this or can i take some time to catch up on the recent PRs that got us here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The addition here is where the regression occurred fixing an Interval issue. Are you thinking of pushing the fix further down into is_valid_na_for_dtype
or ensure_dtype_can_hold_na
or adding more special casing/code duplication in Block.replace?
As I see it, whichever way it's done there is always a risk of changing behavior whereas the fix here keeps the fix for the interval issue and also keeps the existing behavior for replace.
This is on master and I'm quite happy with follow-ups to this that refactor as it's easier to revert a refactor is there is an issue down the line whereas if we need to revert a regression fix we end up re-introducing the regression.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
is there a hurry on this or can i take some time to catch up on the recent PRs that got us here?
no hurry (unless there are more refactors to this area of the code which make it more difficult to fix later)
@jbrockmendel if we want to get the RecursionError fixed quickly without this fix we could revert #45568 and then use the test added here as a regression test to close the issue. We could then look into moving i've hit the revert button on #45568 and got the |
@@ -601,7 +601,7 @@ def replace( | |||
return blocks | |||
|
|||
elif self.ndim == 1 or self.shape[0] == 1: | |||
blk = self.coerce_to_target_dtype(value) | |||
blk = self.coerce_to_target_dtype(value, strict_na=True) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ATM i think rather than add special cases to the lower-level functions, we should keep it isolated here, similar to what you did for #45836 (and ideally down the line combine both fixes into something more elegant)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
sure.
from #45725 (comment)
The alternative is to intercept this particular situation somewhere. I think @simonjayhawkins has a branch that intercepts it at the level of find_common_type,
I think the issues arises basically from the difference between _can_hold_element and is_valid_na_for_dtype not being strict (i.e. None and pd.NA are considered valid for float array).
I have added a strict flag to pass through to the lower level library casting functions (i've put this in find_result_type
for now since that's where the code was changed arising in the regression reported here.
But it probably does make more sense that the flag is passed further down through to is_valid_na_for_dtype and then maybe use _can_hold_element in is_valid_na_for_dtype when the strict flag is True.
also from #45725 (comment)
but unless there are other use cases I'm inclined to handle it in Block.replace
Seems reasonable. I'm not sure if there are any other use cases. I suspect we need some logic that uses is_valid_na_for_dtype in Block.replace for this to work but also suspect that there will also be some duplication with the lower level library functions to pull this off.
As mentioned above, would also be quite happy doing this as a refactor after the recursion issue is resolved by the fix suggested here.
rather than add special cases to the lower-level functions
I've not yet looked but off the top of your head, is the code added in #45568 to the lower level functions a special case for Interval? The reason I ask is removing that would also fix the recursion error.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've not yet looked but off the top of your head, is the code added in #45568 to the lower level functions a special case for Interval? The reason I ask is removing that would also fix the recursion error.
Most of the edits in #45568 should be correct for The General Case. The only special case for IntervalDtype is in ensure_dtype_can_hold_na, which I don't think is what you're referring to.
This pull request is stale because it has been open for thirty days with no activity. Please update and respond to this comment if you're still interested in working on this. |
np.nan
values under main branch #45725 (Replace xxxx with the Github issue number)doc/source/whatsnew/vX.X.X.rst
file if fixing a bug or adding a new feature.self._can_hold_element(value)
for pd.NA on a float block isFalse
so we need to ensure the block is coerced to a dtype with can hold the value to prevent the RecursionErrorThe
None
case in the issue OP was already fixed by #46404 which short circuits for the list like case and does not split the block for backwards compatibilityThe test added here is NOT parametrized for various null values since this is a targeted regression fix of the recursion error and to maintain the pre-regression behavior and NOT looking to make null value handling consistent for replace.
This targeted regression fix is milestoned 1.5 as the regression that resulted in the RecursionError is only on main.