Skip to content

BUG: Fix for .str.replace with repl function #15056

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed

Conversation

hzpc-joostk
Copy link

@hzpc-joostk hzpc-joostk commented Jan 4, 2017

.str.replace now accepts a callable (function) as replacement string. It now raises a TypeError when repl is not string like nor a callable. Docstring updated accordingly.

.str.replace now accepts a callable (function) as replacement string. It now raises a TypeError when repl is not string like nor a callable. Docstring updated accordingly.
@chris-b1
Copy link
Contributor

chris-b1 commented Jan 4, 2017

Looks good, can you add tests and a whatsnew entry?

repl : string
Replacement sequence
repl : string or function
Replacement string or a function, which passed the match object and
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

function -> callable

@hzpc-joostk
Copy link
Author

Thanks, @chris-b1 and @jreback. Where should I put the whatsnew entry? Can I just pick an empty spot in doc/source/whatsnew/v0.20.0.txt?

- Bug in ``.str.replace`` now accepts a callable replacement which is passed to ``re.sub`` (:issue:`15055`)

This maybe a new feature or improvement rather than a bug, right?

@chris-b1
Copy link
Contributor

chris-b1 commented Jan 4, 2017

Yeah, I think new features is fine - I had marked the issue a bug because this would have worked in older versions, although basically undocumented.

@sinhrks sinhrks added Bug Strings String extension data type and string data labels Jan 5, 2017
# GH 15055, callable repl
repl = lambda m: m.group(0).swapcase()
result = values.str.replace('[a-z][A-Z]{2}', repl, n=2)
exp = Series([u('foObaD__baRbaD'), NA])
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

test where the callable has different than 1 arg (should be error)

test using named field in re (and callable works correctly)

Copy link
Author

@hzpc-joostk hzpc-joostk Jan 5, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks, @jreback . Named field/group is a good idea.

If the callable raises a TypeError because of a wrong number of arguments, this is caught in pandas.core.strings._map, returning a Series of NaNs.

We could overcome this by either inspecting the callable signature: inspect.signature or trying with a dummy Regex Match object. But this might have the side effect that the callable is called once before doing the replace on the array, confusing users with some kind of counter within the callable.

Sorry about the checks going wrong. It seems I am too careless with checking my commits. 😕

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

With inspect I could do

n_empty_args = sum(1 for p in inspect.signature(repl).parameters.values() 
                   if p.kind in (p.POSITIONAL_ONLY, p.POSITIONAL_OR_KEYWORD, p.KEYWORD_ONLY) 
                   and p.default is p.empty)

# or better readable syntax

signature = inspect.signature(repl)
empty_args = [p for p in signature.parameters.values() 
              if p.kind in (p.POSITIONAL_ONLY, p.POSITIONAL_OR_KEYWORD, p.KEYWORD_ONLY) 
              and p.default is p.empty]
n_empty_args = len(empty_args)

# followed by

if n_empty_args > 1:
    raise TypeError("callable should have at most one positional without a default")

n_empty_args != 1 might work, unless the first arg has a default for some reason.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this is too complicated, instead maybe you can introspect the actual error message in _map and if its a certain class of errors, just re-raise the TypeError

Joost Kranendonk added 2 commits January 6, 2017 09:20
I copied a previous test as a draft, assumed that I could reuse `values` 
but did not mention it was redefined in the unicode and GH 13438 tests. 
Now corrected. Just redefine `values`. I could have put my test before 
the unicode tests but this keeps the order.
@codecov-io
Copy link

codecov-io commented Jan 6, 2017

Current coverage is 85.98% (diff: 100%)

Merging #15056 into master will increase coverage by 1.22%

@@             master     #15056   diff @@
==========================================
  Files           145        140     -5   
  Lines         51146      51261   +115   
  Methods           0          0          
  Messages          0          0          
  Branches          0          0          
==========================================
+ Hits          43351      44075   +724   
+ Misses         7795       7186   -609   
  Partials          0          0          

Powered by Codecov. Last update 018414e...90779ce

Joost Kranendonk added 2 commits January 6, 2017 17:11
This influences all StringMethods were a used passes a callable to a StringMethod, 
e.g. `.str.replace`, that expects a wrong number of arguments.

See pandas-dev#15056
Copy link
Contributor

@jreback jreback left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm. just some added documenations / minor comments. ping on green.

@@ -435,6 +435,31 @@ def test_replace(self):
for data in (['a', 'b', None], ['a', 'b', 'c', 'ad']):
values = klass(data)
self.assertRaises(TypeError, values.str.replace, 'a', repl)

## GH 15055
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can you make a separate test

@@ -167,7 +167,14 @@ def _map(f, arr, na_mask=False, na_value=np.nan, dtype=object):
try:
convert = not all(mask)
result = lib.map_infer_mask(arr, f, mask.view(np.uint8), convert)
except (TypeError, AttributeError):
except (TypeError, AttributeError) as e:
re_missing = (r'missing \d+ required (positional|keyword-only) '
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can you add a 1-line comment on what types of exceptions you are re-raising here (imagine you are a future reader and are like what?)

@@ -23,6 +23,7 @@ New features
~~~~~~~~~~~~

- Integration with the ``feather-format``, including a new top-level ``pd.read_feather()`` and ``DataFrame.to_feather()`` method, see :ref:`here <io.feather>`.
- ``.str.replace`` now accepts a callable replacement which is passed to ``re.sub`` (:issue:`15055`)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

callable, as a replacement, which ....

raise TypeError("repl must be a string")
use_re = not case or len(pat) > 1 or flags
if not (is_string_like(repl) or callable(repl)):
raise TypeError("repl must be a string or function")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

function -> callable

Replacement sequence
repl : string or callable
Replacement string or a callable, it's passed the match object and
must return a replacement string to be used. See :func:`re.sub`.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

add the version (0.20.0) after callable

Replacement sequence
repl : string or callable
Replacement string or a callable, it's passed the match object and
must return a replacement string to be used. See :func:`re.sub`.
n : int, default -1 (all)
Number of replacements to make from start
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can you add some examples to the doc-string?

also would be nice to have an example in text.rst

Joost Kranendonk added 2 commits January 23, 2017 09:15
- separate test function
- fix missing assertRaisesRegex(p)
- add inline comments
- add examples with string and callable in __doc__ and text.rst doc
- extend method summary Series.str.replace in text.rst doc
@hzpc-joostk
Copy link
Author

Thanks. I added changes based on your review. Only building with python=2.7 fails. I have little to no experience with Python 2x. I didn't caught the TypeError in Python 2:

@jreback
Copy link
Contributor

jreback commented Jan 23, 2017

so

  • some linting issues, use git diff master | flake8 --diff to show locally
  • setup a venv or conda env with py2.7 to step thru the issues there.

Generally python 3 does the right thing with exceptions, while py2 may not be re-raising exceptions correctly (so your catching may not work here).

So you may need to have a slightly different strategy in py2 and py3 for catching the invalid args.

If you get stuck, ping.

@hzpc-joostk
Copy link
Author

Currently I'm designing new regexes for both py27 and py3k. I'll look into the linting.

@jreback
Copy link
Contributor

jreback commented Jan 23, 2017

@hzpc-joostk

Currently I'm designing new regexes for both py27 and py3k. I'll look into the linting.

what do you mean? you can prob do something very straightforward, you don't even need regexes, you are just trying to find out if the error message if related to the calling of the function (and not the dtype of the data).

Joost Kranendonk added 2 commits January 23, 2017 15:31
The regex filtering TypeErrors with argument number problems is now unified and suitable for Python 2.7
@hzpc-joostk
Copy link
Author

@jreback earlier you reviewed a part of my code:

test where the callable has different than 1 arg (should be error)

In .core.strings._map these string methods are mapped. On both TypeError and AttributeError, NaN is returned instead. Using regex, implemented in commit 27065a2, only TypeErrors regarding wrong number of args are reraised. However, these regexes only worked for Python 3. With e15dcdf, these regexes are unified into one sophisticated (complex) regex and work for Python 2 (tested).

tl;dr the tests should pass, as this now raises an error.

@jreback jreback added this to the 0.20.0 milestone Jan 23, 2017
@jreback
Copy link
Contributor

jreback commented Jan 23, 2017

@hzpc-joostk lgtm. ping on green.

@hzpc-joostk
Copy link
Author

hzpc-joostk commented Jan 23, 2017

Now Done fixing linting issues:

C:\Users\joostk\Documents\git\pandas>git diff 6ecc43d83801bd0337b771390ea5f4063689f275 | flake8 --diff
.\pandas/core/strings.py:313:75: W291 trailing whitespace
.\pandas/core/strings.py:314:70: W291 trailing whitespace
.\pandas/core/strings.py:332:65: W291 trailing whitespace
.\pandas/core/strings.py:341:69: W291 trailing whitespace
.\pandas/core/strings.py:342:71: W291 trailing whitespace
.\pandas/tests/test_strings.py:438:1: W293 blank line contains whitespace
.\pandas/tests/test_strings.py:439:1: D102 Missing docstring in public method
.\pandas/tests/test_strings.py:440:9: E266 too many leading '#' for block comment
.\pandas/tests/test_strings.py:468:1: D102 Missing docstring in public method

- trailing whitespace
- whitespace empty line
- too many '#' in comment
@hzpc-joostk
Copy link
Author

@jreback Thanks for your input and help. All changes made. Waiting for CI... Besides that: all green!

@jreback
Copy link
Contributor

jreback commented Jan 23, 2017

https://travis-ci.org/pandas-dev/pandas/jobs/194493716

@hzpc-joostk

so you can use

if PY2:
    # match py2 excetion
else:
    # match PY3 exception

and as I said you can do a minimal matching, because sometimes these exception messages change in versions (e.g. 3.5 -> 3.6 for example, maybe not in this case, but the wording can change). You are just generically interested in a specific type of error message (and not its specific wording per se)

@jreback jreback closed this in be3f2ae Jan 23, 2017
@jreback
Copy link
Contributor

jreback commented Jan 23, 2017

thanks @hzpc-joostk

nicely done!

@hzpc-joostk
Copy link
Author

Thanks @jreback

Awesome. My first (accepted) PR on GitHub. 🍾 Cheers

jreback pushed a commit that referenced this pull request Jan 24, 2017
xref #15056

Author: Joris Van den Bossche <[email protected]>

Closes #15210 from jorisvandenbossche/doc-replace-callable and squashes the following commits:

811cc34 [Joris Van den Bossche] DOC: clean-up docstring str.replace (GH15056)
AnkurDedania pushed a commit to AnkurDedania/pandas that referenced this pull request Mar 21, 2017
.str.replace now accepts a callable (function) as replacement string.
It now raises a TypeError when repl is not string like nor a callable.
Docstring updated accordingly.

closes pandas-dev#15055

Author: Joost Kranendonk <[email protected]>
Author: Joost Kranendonk <[email protected]>

Closes pandas-dev#15056 from hzpc-joostk/pandas-GH15055-patch-1 and squashes the following commits:

826730c [Joost Kranendonk] simplify .str.replace TypeError reraising and test
90779ce [Joost Kranendonk] fix linting issues
c2cc13a [Joost Kranendonk] Update v0.20.0.txt
e15dcdf [Joost Kranendonk] fix bug catch TypeError with wrong number of args
40c0d97 [Joost Kranendonk] improve .str.replace with callable
f15ee2a [Joost Kranendonk] improve test .str.replace with callable
14beb21 [Joost Kranendonk] Add test for .str.replace with regex named groups
27065a2 [Joost Kranendonk] Reraise TypeError only with wrong number of args
ae04a3e [Joost Kranendonk] Add whatsnew for .str.replace with callable repl
067a7a8 [Joost Kranendonk] Fix testing bug for .str.replace
30d4727 [Joost Kranendonk] Bug fix in .str.replace type checking done wrong
4baa0a7 [Joost Kranendonk] add tests for .str.replace with callable repl
91c883d [Joost Kranendonk] Update .str.replace docstring
6ecc43d [Joost Kranendonk] BUG: Fix for .str.replace with repl function
AnkurDedania pushed a commit to AnkurDedania/pandas that referenced this pull request Mar 21, 2017
xref pandas-dev#15056

Author: Joris Van den Bossche <[email protected]>

Closes pandas-dev#15210 from jorisvandenbossche/doc-replace-callable and squashes the following commits:

811cc34 [Joris Van den Bossche] DOC: clean-up docstring str.replace (GH15056)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Bug Strings String extension data type and string data
Projects
None yet
Development

Successfully merging this pull request may close these issues.

.str.replace does not accept a repl function
5 participants