add Series.str.remove(pre|suf)fix #43328

janosh · 2021-08-31T12:24:44Z

closes ENH: add string method remove prefix and suffix, python 3.9 #36944
tests added / passed
Ensure all linting tests pass, see here for how to run them
whatsnew entry

Based on work in #39226

alimcmaster1

Looks like an excellent first PR. Could we also add benchmarks as suggested by @simonjayhawkins here: #39226 (comment)

Ref: https://pandas.pydata.org/docs/development/contributing_codebase.html#running-the-performance-test-suite

alimcmaster1 · 2021-08-31T12:52:54Z

pandas/tests/strings/test_strings.py

@@ -535,6 +535,26 @@ def test_strip_lstrip_rstrip_args(any_string_dtype):
    tm.assert_series_equal(result, expected)


+def test_remove_suffix_prefix(any_string_dtype):


Can we parameterize these tests cases using https://docs.pytest.org/en/6.2.x/parametrize.html

You mean like this?

@pytest.mark.parametrize( "prefix,expected", [("x", ["xABCxx", "x BNSD", "LDFJH xx"]), ("xx ", ["xxABCxx", "BNSD", "LDFJH xx"])], ) def test_removeprefix(any_string_dtype, prefix, expected): ser = Series(["xxABCxx", "xx BNSD", "LDFJH xx"], dtype=any_string_dtype) result = ser.str.removeprefix(prefix) ser_expected = Series(expected, dtype=any_string_dtype) tm.assert_series_equal(result, ser_expected) @pytest.mark.parametrize( "suffix,expected", [("x", ["xxABCx", "xx BNSD", "LDFJH x"]), ("xx ", ["xxABCxx", "xx BNSD", "LDFJH"])], ) def test_removesuffix(any_string_dtype, suffix, expected): ser = Series(["xxABCxx", "xx BNSD", "LDFJH xx"], dtype=any_string_dtype) result = ser.str.removesuffix(suffix) ser_expected = Series(expected, dtype=any_string_dtype) tm.assert_series_equal(result, ser_expected)

It's actually longer and I find it harder to read. But your call?

Not quite sure why the strings need to be so complex. A minimal case is surely isomorphic..

@pytest.mark.parametrize( "prefix, expected", [("a", ["b", " b c", "bc"]), ("ab", ["", "a b c", "bc"])], ) def test_removeprefix(any_string_dtype, prefix, expected): ser = Series(["ab", "a b c", "bc], dtype=any_string_dtype) result = ser.str.removeprefix(prefix) ser_expected = Series(expected, dtype=any_string_dtype) tm.assert_series_equal(result, ser_expected) @pytest.mark.parametrize( "suffix, expected", [("c", ["ab", "a b ", "b"]), ("bc", ["ab", "a b c", ""])], ) def test_removesuffix(any_string_dtype, prefix, expected): ser = Series(["ab", "a b c", "bc"], dtype=any_string_dtype) result = ser.str.removesuffix(suffix) ser_expected = Series(expected, dtype=any_string_dtype) tm.assert_series_equal(result, ser_expected)

Not quite sure why the strings need to be so complex.

Don't know either. I just copied those strings from the other test cases.

Lets simplify them then - to what @attack68 suggested above.

Yes, already have. Just waiting for feedback on return value type hints and perf benchmark before pushing.

alimcmaster1 · 2021-08-31T12:56:07Z

pandas/core/strings/object_array.py

@@ -414,6 +414,33 @@ def _str_lstrip(self, to_strip=None):
    def _str_rstrip(self, to_strip=None):
        return self._str_map(lambda x: x.rstrip(to_strip))

+    def _str_removeprefix(self, prefix):


Can you add type hints for these newly added methods?

Just the inputs or return value as well?

Both would be great :)

janosh · 2021-08-31T13:50:23Z

I tried asv continuous -f 1.1 -E virtualenv upstream/master HEAD -b strings but got

· Unknown commit upstream/master

Would be good to add a hint to the docs to run git fetch upstream master in that case.

Also, the following hint in asv_bench/asv.conf.json appears to be wrong:

// The Pythons you'd like to test against. If not provided, defaults
// to the current version of Python used to run asv.
// "pythons": ["2.7", "3.4"],
"pythons": ["3.8"],

I got an error

· No executable found for python 3.8
· No environments selected

Changing to "3.9" and rerunning, I get this error

·· Failure creating environment for virtualenv-py3.9-Cython0.29.21-jinja2-matplotlib-numba-numexpr-numpy-odfpy-openpyxl-pyarrow-scipy-sqlalchemy-tables-xlrd-xlsxwriter-xlwt
Traceback (most recent call last):
File "/Users/janosh/.venv/py39/bin/asv", line 8, in
sys.exit(main())
File "/Users/janosh/.venv/py39/lib/python3.9/site-packages/asv/main.py", line 38, in main
result = args.func(args)
File "/Users/janosh/.venv/py39/lib/python3.9/site-packages/asv/commands/init.py", line 49, in run_from_args
return cls.run_from_conf_args(conf, args)
File "/Users/janosh/.venv/py39/lib/python3.9/site-packages/asv/commands/continuous.py", line 75, in run_from_conf_args
return cls.run(
File "/Users/janosh/.venv/py39/lib/python3.9/site-packages/asv/commands/continuous.py", line 114, in run
result = Run.run(
File "/Users/janosh/.venv/py39/lib/python3.9/site-packages/asv/commands/run.py", line 294, in run
Setup.perform_setup(environments, parallel=parallel)
File "/Users/janosh/.venv/py39/lib/python3.9/site-packages/asv/commands/setup.py", line 89, in perform_setup
list(map(_create, environments))
File "/Users/janosh/.venv/py39/lib/python3.9/site-packages/asv/commands/setup.py", line 21, in _create
env.create()
File "/Users/janosh/.venv/py39/lib/python3.9/site-packages/asv/environment.py", line 704, in create
self._setup()
File "/Users/janosh/.venv/py39/lib/python3.9/site-packages/asv/plugins/virtualenv.py", line 148, in _setup
self._install_requirements()
File "/Users/janosh/.venv/py39/lib/python3.9/site-packages/asv/plugins/virtualenv.py", line 172, in _install_requirements
self._run_pip(args, timeout=self._install_timeout, env=env)
File "/Users/janosh/.venv/py39/lib/python3.9/site-packages/asv/plugins/virtualenv.py", line 177, in _run_pip
return self.run_executable('python', ['-mpip'] + list(args), **kwargs)
File "/Users/janosh/.venv/py39/lib/python3.9/site-packages/asv/environment.py", line 949, in run_executable
return util.check_output([exe] + args, **kwargs)
File "/Users/janosh/.venv/py39/lib/python3.9/site-packages/asv/util.py", line 754, in check_output
raise ProcessError(args, retcode, stdout, stderr)
asv.util.ProcessError: Command '/Users/janosh/Repos/pandas/asv_bench/env/94b401bb918963c7d4d914702f131a3f/bin/python -mpip install -v --upgrade numpy Cython==0.29.21 matplotlib sqlalchemy scipy numba numexpr pyarrow tables openpyxl xlsxwriter xlrd xlwt odfpy jinja2' returned non-zero exit status 1

Does the Python interpreter have to be in a clean virtualenv or something like that?

janosh · 2021-09-04T07:09:54Z

I think I need some help here fixing this error:

+ pytest -m 'not slow and not network and not clipboard' pandas --junitxml=test-data.xml
ImportError while loading conftest '/pandas/pandas/conftest.py'.
pandas/__init__.py:46: in <module>
    from pandas.core.api import (
pandas/core/api.py:29: in <module>
    from pandas.core.arrays import Categorical
pandas/core/arrays/__init__.py:7: in <module>
    from pandas.core.arrays.categorical import Categorical
pandas/core/arrays/categorical.py:113: in <module>
    from pandas.core.strings.object_array import ObjectStringArrayMixin
pandas/core/strings/__init__.py:31: in <module>
    from pandas.core.strings.base import BaseStringArrayMethods
pandas/core/strings/base.py:11: in <module>
    from pandas.core.series import Series
pandas/core/series.py:91: in <module>
    from pandas.core import (
pandas/core/generic.py:111: in <module>
    from pandas.core import (
pandas/core/indexing.py:57: in <module>
    from pandas.core.indexes.api import (
pandas/core/indexes/api.py:11: in <module>
    from pandas.core.indexes.base import (
pandas/core/indexes/base.py:128: in <module>
    from pandas.core.arrays import (
E   ImportError: cannot import name 'Categorical' from partially initialized module 'pandas.core.arrays' (most likely due to a circular import) (/pandas/pandas/core/arrays/__init__.py)

simonjayhawkins · 2021-09-04T08:48:11Z

pandas/core/strings/base.py

@@ -8,6 +8,8 @@

 from pandas._typing import Scalar

+from pandas.core.series import Series


import of Series is added just for the type annotations and causing the circular import?

try

from typing import TYPE_CHECKING ... if TYPE_CHECKING: from pandas import Series

… unpack (expected 3, got 2)

jreback

can you add to docs:

doc/source/reference/series.rst
doc/source/user_guide/text.rst

merge master and ping on green

doc/source/whatsnew/v1.4.0.rst

Co-authored-by: Simon Hawkins <[email protected]>

…es a space before the colon separating the parameter name and type

janosh · 2021-09-05T20:31:31Z

@jreback 3 tests timed out after 60 min. Is that equivalent to 'green' or anything still to do?

jreback

minor comment. ping on greenish

jreback · 2021-09-06T15:26:56Z

doc/source/user_guide/text.rst

@@ -335,6 +335,19 @@ regular expression object will raise a ``ValueError``.
    ---------------------------------------------------------------------------
    ValueError: case and flags cannot be set when pat is a compiled regex

+``removeprefix`` and ``removesuffix`` have the same effect as ``str.removeprefix`` and ``str.removesuffix`` added in Python 3.9


can you add a versionadded 1.4. tag here

i think you need this instead of the one on L349

jreback · 2021-09-06T19:46:19Z

thanks @janosh very nice!

and @erfannariman for the original!

janosh mentioned this pull request Aug 31, 2021

ENH: add string method remove prefix and suffix, python 3.9 #36944

Closed

alimcmaster1 added Python 3.9 Enhancement Strings String extension data type and string data labels Aug 31, 2021

alimcmaster1 reviewed Aug 31, 2021

View reviewed changes

janosh added 4 commits September 3, 2021 14:47

add Series.str.remove(pre|suf)fix (new in Python 3.9) (#36944)

b179002

add arg type hints

39d5f2a

parametrize test cases

422bc14

change type annotations from np.typing to Series

afd4682

simonjayhawkins reviewed Sep 4, 2021

View reviewed changes

janosh added 2 commits September 4, 2021 12:05

defer Series type import to type-check time

9583c1f

fix pandas/tests/strings/conftest.py ValueError: not enough values to…

d3e6d88

… unpack (expected 3, got 2)

jreback requested changes Sep 5, 2021

View reviewed changes

janosh added 2 commits September 5, 2021 08:59

Merge branch 'master' into master

a721a19

add docs in reference/series.rst, user_guide/text.rst

1c20c47

simonjayhawkins reviewed Sep 5, 2021

View reviewed changes

doc/source/whatsnew/v1.4.0.rst Outdated Show resolved Hide resolved

janosh and others added 3 commits September 5, 2021 14:58

add issue # to whatsnew/v1.4.0.rst

107d729

Co-authored-by: Simon Hawkins <[email protected]>

fix _str_removesuffix

5352adf

fix string accessor docs: unknown section "See also", "prefix" requir…

f9f5c14

…es a space before the colon separating the parameter name and type

fix doc test

58ad460

jreback requested changes Sep 6, 2021

View reviewed changes

fix docs versionchange to versionadded

eb618ea

jreback added this to the 1.4 milestone Sep 6, 2021

jreback approved these changes Sep 6, 2021

View reviewed changes

jreback merged commit 0a9f9ee into pandas-dev:master Sep 6, 2021

feefladder pushed a commit to feefladder/pandas that referenced this pull request Sep 7, 2021

add Series.str.remove(pre|suf)fix (pandas-dev#43328)

821ca1c

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

add Series.str.remove(pre|suf)fix #43328

add Series.str.remove(pre|suf)fix #43328

janosh commented Aug 31, 2021 •

edited by alimcmaster1

Loading

alimcmaster1 left a comment

alimcmaster1 Aug 31, 2021

janosh Aug 31, 2021

attack68 Aug 31, 2021

janosh Aug 31, 2021

alimcmaster1 Aug 31, 2021

janosh Aug 31, 2021

alimcmaster1 Aug 31, 2021

janosh Aug 31, 2021

alimcmaster1 Aug 31, 2021

janosh commented Aug 31, 2021

janosh commented Sep 4, 2021

simonjayhawkins Sep 4, 2021

jreback left a comment

janosh commented Sep 5, 2021

jreback left a comment

jreback Sep 6, 2021

jreback Sep 6, 2021

jreback commented Sep 6, 2021

		@@ -535,6 +535,26 @@ def test_strip_lstrip_rstrip_args(any_string_dtype):
		tm.assert_series_equal(result, expected)


		def test_remove_suffix_prefix(any_string_dtype):

		@@ -8,6 +8,8 @@

		from pandas._typing import Scalar

		from pandas.core.series import Series

add Series.str.remove(pre|suf)fix #43328

add Series.str.remove(pre|suf)fix #43328

Conversation

janosh commented Aug 31, 2021 • edited by alimcmaster1 Loading

alimcmaster1 left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

janosh commented Aug 31, 2021

janosh commented Sep 4, 2021

Choose a reason for hiding this comment

jreback left a comment

Choose a reason for hiding this comment

janosh commented Sep 5, 2021

jreback left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

jreback commented Sep 6, 2021

janosh commented Aug 31, 2021 •

edited by alimcmaster1

Loading