REGR: RecursionError when attempting to replace np.nan values #46443

simonjayhawkins · 2022-03-20T13:46:33Z

closes BUG: RecursionError when attempting to replace np.nan values under main branch #45725 (Replace xxxx with the Github issue number)
Tests added and passed if fixing a bug or adding a new feature
All code checks passed.
Added an entry in the latest doc/source/whatsnew/vX.X.X.rst file if fixing a bug or adding a new feature.

self._can_hold_element(value) for pd.NA on a float block is False so we need to ensure the block is coerced to a dtype with can hold the value to prevent the RecursionError

The None case in the issue OP was already fixed by #46404 which short circuits for the list like case and does not split the block for backwards compatibility

The test added here is NOT parametrized for various null values since this is a targeted regression fix of the recursion error and to maintain the pre-regression behavior and NOT looking to make null value handling consistent for replace.

This targeted regression fix is milestoned 1.5 as the regression that resulted in the RecursionError is only on main.

jbrockmendel · 2022-03-20T15:56:57Z

pandas/core/dtypes/cast.py

@@ -1454,7 +1454,7 @@ def _ensure_nanosecond_dtype(dtype: DtypeObj) -> DtypeObj:

 # TODO: other value-dependent functions to standardize here include
 #  dtypes.concat.cast_to_common_type and Index._find_common_type_compat
-def find_result_type(left: ArrayLike, right: Any) -> DtypeObj:
+def find_result_type(left: ArrayLike, right: Any, strict_na: bool = False) -> DtypeObj:


im wary of introducing this here. is there a hurry on this or can i take some time to catch up on the recent PRs that got us here?

The addition here is where the regression occurred fixing an Interval issue. Are you thinking of pushing the fix further down into is_valid_na_for_dtype or ensure_dtype_can_hold_na or adding more special casing/code duplication in Block.replace?

As I see it, whichever way it's done there is always a risk of changing behavior whereas the fix here keeps the fix for the interval issue and also keeps the existing behavior for replace.

This is on master and I'm quite happy with follow-ups to this that refactor as it's easier to revert a refactor is there is an issue down the line whereas if we need to revert a regression fix we end up re-introducing the regression.

is there a hurry on this or can i take some time to catch up on the recent PRs that got us here?

no hurry (unless there are more refactors to this area of the code which make it more difficult to fix later)

simonjayhawkins · 2022-03-20T21:02:53Z

@jbrockmendel if we want to get the RecursionError fixed quickly without this fix we could revert #45568 and then use the test added here as a regression test to close the issue.

We could then look into moving None handling into block replace, xref #46404 (comment)

i've hit the revert button on #45568 and got the Sorry, this pull request couldn’t be reverted automatically. It may have already been reverted, or the content may have changed since it was merged. but not looked further whether this is also an option.

jbrockmendel · 2022-03-21T01:08:06Z

pandas/core/internals/blocks.py

@@ -601,7 +601,7 @@ def replace(
            return blocks

        elif self.ndim == 1 or self.shape[0] == 1:
-            blk = self.coerce_to_target_dtype(value)
+            blk = self.coerce_to_target_dtype(value, strict_na=True)


ATM i think rather than add special cases to the lower-level functions, we should keep it isolated here, similar to what you did for #45836 (and ideally down the line combine both fixes into something more elegant)

sure.

from #45725 (comment)

The alternative is to intercept this particular situation somewhere. I think @simonjayhawkins has a branch that intercepts it at the level of find_common_type,

I think the issues arises basically from the difference between _can_hold_element and is_valid_na_for_dtype not being strict (i.e. None and pd.NA are considered valid for float array).

I have added a strict flag to pass through to the lower level library casting functions (i've put this in find_result_type for now since that's where the code was changed arising in the regression reported here.

But it probably does make more sense that the flag is passed further down through to is_valid_na_for_dtype and then maybe use _can_hold_element in is_valid_na_for_dtype when the strict flag is True.

also from #45725 (comment)

but unless there are other use cases I'm inclined to handle it in Block.replace

Seems reasonable. I'm not sure if there are any other use cases. I suspect we need some logic that uses is_valid_na_for_dtype in Block.replace for this to work but also suspect that there will also be some duplication with the lower level library functions to pull this off.

As mentioned above, would also be quite happy doing this as a refactor after the recursion issue is resolved by the fix suggested here.

rather than add special cases to the lower-level functions

I've not yet looked but off the top of your head, is the code added in #45568 to the lower level functions a special case for Interval? The reason I ask is removing that would also fix the recursion error.

I've not yet looked but off the top of your head, is the code added in #45568 to the lower level functions a special case for Interval? The reason I ask is removing that would also fix the recursion error.

Most of the edits in #45568 should be correct for The General Case. The only special case for IntervalDtype is in ensure_dtype_can_hold_na, which I don't think is what you're referring to.

github-actions · 2022-04-21T00:06:08Z

This pull request is stale because it has been open for thirty days with no activity. Please update and respond to this comment if you're still interested in working on this.

REGR: RecursionError when attempting to replace np.nan values

6e1accb

simonjayhawkins added Regression Functionality that used to work in a prior pandas version replace replace method labels Mar 20, 2022

simonjayhawkins added this to the 1.5 milestone Mar 20, 2022

jbrockmendel reviewed Mar 20, 2022

View reviewed changes

jbrockmendel reviewed Mar 21, 2022

View reviewed changes

simonjayhawkins mentioned this pull request Apr 12, 2022

BUG: dtype not being preserved for replace on a CategoricalDtype #46672

Closed

3 tasks

github-actions bot added the Stale label Apr 21, 2022

simonjayhawkins closed this Jul 26, 2022

simonjayhawkins mentioned this pull request Aug 25, 2022

REGR: Fix regression RecursionError when replacing numeric scalar with None #48234

Merged

5 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

REGR: RecursionError when attempting to replace np.nan values #46443

REGR: RecursionError when attempting to replace np.nan values #46443

Uh oh!

simonjayhawkins commented Mar 20, 2022

Uh oh!

jbrockmendel Mar 20, 2022

Uh oh!

simonjayhawkins Mar 20, 2022

Uh oh!

simonjayhawkins Mar 20, 2022

Uh oh!

simonjayhawkins commented Mar 20, 2022 •

edited

Loading

Uh oh!

jbrockmendel Mar 21, 2022

Uh oh!

simonjayhawkins Mar 21, 2022

Uh oh!

jbrockmendel Mar 21, 2022

Uh oh!

github-actions bot commented Apr 21, 2022

Uh oh!

Uh oh!

Uh oh!

REGR: RecursionError when attempting to replace np.nan values #46443

REGR: RecursionError when attempting to replace np.nan values #46443

Uh oh!

Conversation

simonjayhawkins commented Mar 20, 2022

Uh oh!

jbrockmendel Mar 20, 2022

Choose a reason for hiding this comment

Uh oh!

simonjayhawkins Mar 20, 2022

Choose a reason for hiding this comment

Uh oh!

simonjayhawkins Mar 20, 2022

Choose a reason for hiding this comment

Uh oh!

simonjayhawkins commented Mar 20, 2022 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

jbrockmendel Mar 21, 2022

Choose a reason for hiding this comment

Uh oh!

simonjayhawkins Mar 21, 2022

Choose a reason for hiding this comment

Uh oh!

jbrockmendel Mar 21, 2022

Choose a reason for hiding this comment

Uh oh!

github-actions bot commented Apr 21, 2022

Uh oh!

Uh oh!

simonjayhawkins commented Mar 20, 2022 •

edited

Loading