REGR: Don't ignore compiled patterns in replace #35697

dsaxton · 2020-08-13T02:19:55Z

closes BUG: Different results from Series.replace with compiled regex #35680
tests added / passed
passes black pandas
passes git diff upstream/master -u -- "*.py" | flake8 --diff
whatsnew entry

This reverts commit 5006980.

dsaxton · 2020-08-13T18:37:32Z

Currently failing mypy and not quite sure of the fix. 5006980 did not work.

mypy --version
mypy 0.730
Performing static analysis using mypy
pandas/core/internals/managers.py:1952: error: Module has no attribute "Pattern"  [attr-defined]
Found 1 error in 1 file (checked 1030 source files)
Performing static analysis using mypy DONE
##[error]Process completed with exit code 1.

arw2019

@dsaxton looked into that CI error you're getting.

xref python/typing#23

On my machine adding

from typing import Pattern

at the top of pandas/core/internals/managers.py and then using plain Pattern in the line you modify fixed the mypy complaint.

Hope this helps!

…place

dsaxton · 2020-08-15T13:47:58Z

@dsaxton looked into that CI error you're getting.

xref python/typing#23

On my machine adding
from typing import Pattern
at the top of pandas/core/internals/managers.py and then using plain Pattern in the line you modify fixed the mypy complaint.

Hope this helps!

Awesome thanks @arw2019

simonjayhawkins

Thanks @dsaxton generally lgtm

simonjayhawkins · 2020-08-15T18:04:15Z

pandas/core/internals/managers.py

@@ -1949,7 +1959,7 @@ def _check_comparison_types(
    else:
        op = np.vectorize(
            lambda x: bool(re.search(b, x))
-            if isinstance(x, str) and isinstance(b, str)
+            if isinstance(x, str) and isinstance(b, (str, Pattern))


b is typed as Scalar, where
Scalar = Union[PythonScalar, PandasScalar]

but b resolves to Union[builtins.str, builtins.float, Any] due to unresolved imports and hence mypy is green

but should probably update types and docstring anyway

Thanks @simonjayhawkins. Would the proper type for b be Union[Scalar, Pattern], or should PythonScalar be extended to include Pattern?

I would keep PythonScalar unchanged for now.

Updated. Odd that b is typed as scalar but there's an isinstance check for np.ndarray inside the function, maybe that's not needed.

the docstring also implies b could be an array. I guess could add ArrayLike while here or leave. either works for me.

I'll add it. Actually a can also be a scalar so the type shouldn't only be ArrayLike I don't think:

In [17]: b Out[17]: array(['a', 'a', 'a'], dtype='<U1') In [18]: _compare_or_regex_search("a", b) Out[18]: array([ True, True, True])

it looks like the types were originally this way when added, see also #32890 (comment), but changed in #33598

maybe best to keep the changes here minimal for now as this PR is being backported

(and separate PR for ensuring the types are correct and consistent with docstrings.)

simonjayhawkins · 2020-08-15T18:06:40Z

Currently failing mypy and not quite sure of the fix. 5006980 did not work.

mypy --version
mypy 0.730
Performing static analysis using mypy
pandas/core/internals/managers.py:1952: error: Module has no attribute "Pattern"  [attr-defined]
Found 1 error in 1 file (checked 1030 source files)
Performing static analysis using mypy DONE
##[error]Process completed with exit code 1.

I think this is a mypy issue that is fixed in later mypy, but can't see a relevant issue on mypy issue tracker (probably a later typeshed)

This reverts commit 264d17d.

simonjayhawkins · 2020-08-16T14:14:10Z

pandas/core/internals/managers.py

@@ -1928,7 +1941,7 @@ def _compare_or_regex_search(
    """

    def _check_comparison_types(
-        result: Union[ArrayLike, bool], a: ArrayLike, b: Scalar,
+        result: Union[ArrayLike, bool], a: ArrayLike, b: Union[Scalar, Pattern],


is a Pattern ever passed through to _check_comparison_types

I guess it's just for the error message

simonjayhawkins

Thanks @dsaxton lgtm

simonjayhawkins · 2020-08-16T14:23:03Z

not a regression, but I guess regex=True shouldn't be necessary when compiled regex is passed and if regex=False is explicitly passed should error.

simonjayhawkins · 2020-08-17T10:58:41Z

Thanks @dsaxton merging and backporting this as addresses reported regression. followups can be left for 1.2.

simonjayhawkins · 2020-08-17T11:00:47Z

@meeseeksdev backport 1.1.x

… replace

…35765) Co-authored-by: Daniel Saxton <[email protected]>

dsaxton added 3 commits August 12, 2020 21:17

REGR: Don't ignore compiled patterns in replace

d21e4f0

Type

5006980

Revert "Type"

40bb67c

This reverts commit 5006980.

simonjayhawkins added Regression Functionality that used to work in a prior pandas version Strings String extension data type and string data labels Aug 13, 2020

simonjayhawkins added this to the 1.1.1 milestone Aug 13, 2020

arw2019 reviewed Aug 15, 2020

View reviewed changes

dsaxton added 2 commits August 15, 2020 08:45

Type

a05f3b2

Merge remote-tracking branch 'upstream/master' into compiled-regex-re…

3d5c167

…place

simonjayhawkins reviewed Aug 15, 2020

View reviewed changes

dsaxton added 3 commits August 15, 2020 13:50

Type and docstring

01e90cb

Fix

264d17d

Revert "Fix"

ae6f53f

This reverts commit 264d17d.

simonjayhawkins reviewed Aug 16, 2020

View reviewed changes

simonjayhawkins approved these changes Aug 16, 2020

View reviewed changes

simonjayhawkins merged commit 934e9f8 into pandas-dev:master Aug 17, 2020

meeseeksmachine mentioned this pull request Aug 17, 2020

Backport PR #35697 on branch 1.1.x (REGR: Don't ignore compiled patterns in replace) #35765

Merged

meeseeksmachine pushed a commit to meeseeksmachine/pandas that referenced this pull request Aug 17, 2020

Backport PR pandas-dev#35697: REGR: Don't ignore compiled patterns in…

ae1b1be

… replace

simonjayhawkins added the replace replace method label Aug 17, 2020

simonjayhawkins pushed a commit that referenced this pull request Aug 17, 2020

Backport PR #35697: REGR: Don't ignore compiled patterns in replace (#…

ac8845b

…35765) Co-authored-by: Daniel Saxton <[email protected]>

dsaxton deleted the compiled-regex-replace branch August 17, 2020 13:03

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

REGR: Don't ignore compiled patterns in replace #35697

REGR: Don't ignore compiled patterns in replace #35697

dsaxton commented Aug 13, 2020

dsaxton commented Aug 13, 2020

arw2019 left a comment

dsaxton commented Aug 15, 2020

simonjayhawkins left a comment

simonjayhawkins Aug 15, 2020

dsaxton Aug 15, 2020

simonjayhawkins Aug 15, 2020

dsaxton Aug 15, 2020

simonjayhawkins Aug 15, 2020

dsaxton Aug 16, 2020

simonjayhawkins Aug 16, 2020

simonjayhawkins commented Aug 15, 2020 •

edited

Loading

simonjayhawkins Aug 16, 2020

simonjayhawkins Aug 16, 2020

simonjayhawkins left a comment

simonjayhawkins commented Aug 16, 2020

simonjayhawkins commented Aug 17, 2020

simonjayhawkins commented Aug 17, 2020

REGR: Don't ignore compiled patterns in replace #35697

REGR: Don't ignore compiled patterns in replace #35697

Conversation

dsaxton commented Aug 13, 2020

dsaxton commented Aug 13, 2020

arw2019 left a comment

Choose a reason for hiding this comment

dsaxton commented Aug 15, 2020

simonjayhawkins left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

simonjayhawkins commented Aug 15, 2020 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

simonjayhawkins left a comment

Choose a reason for hiding this comment

simonjayhawkins commented Aug 16, 2020

simonjayhawkins commented Aug 17, 2020

simonjayhawkins commented Aug 17, 2020

simonjayhawkins commented Aug 15, 2020 •

edited

Loading