-
-
Notifications
You must be signed in to change notification settings - Fork 18.4k
add Series.str.remove(pre|suf)fix #43328
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks like an excellent first PR. Could we also add benchmarks as suggested by @simonjayhawkins here: #39226 (comment)
pandas/tests/strings/test_strings.py
Outdated
@@ -535,6 +535,26 @@ def test_strip_lstrip_rstrip_args(any_string_dtype): | |||
tm.assert_series_equal(result, expected) | |||
|
|||
|
|||
def test_remove_suffix_prefix(any_string_dtype): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we parameterize these tests cases using https://docs.pytest.org/en/6.2.x/parametrize.html
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You mean like this?
@pytest.mark.parametrize(
"prefix,expected",
[("x", ["xABCxx", "x BNSD", "LDFJH xx"]), ("xx ", ["xxABCxx", "BNSD", "LDFJH xx"])],
)
def test_removeprefix(any_string_dtype, prefix, expected):
ser = Series(["xxABCxx", "xx BNSD", "LDFJH xx"], dtype=any_string_dtype)
result = ser.str.removeprefix(prefix)
ser_expected = Series(expected, dtype=any_string_dtype)
tm.assert_series_equal(result, ser_expected)
@pytest.mark.parametrize(
"suffix,expected",
[("x", ["xxABCx", "xx BNSD", "LDFJH x"]), ("xx ", ["xxABCxx", "xx BNSD", "LDFJH"])],
)
def test_removesuffix(any_string_dtype, suffix, expected):
ser = Series(["xxABCxx", "xx BNSD", "LDFJH xx"], dtype=any_string_dtype)
result = ser.str.removesuffix(suffix)
ser_expected = Series(expected, dtype=any_string_dtype)
tm.assert_series_equal(result, ser_expected)
It's actually longer and I find it harder to read. But your call?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not quite sure why the strings need to be so complex. A minimal case is surely isomorphic..
@pytest.mark.parametrize(
"prefix, expected", [("a", ["b", " b c", "bc"]), ("ab", ["", "a b c", "bc"])],
)
def test_removeprefix(any_string_dtype, prefix, expected):
ser = Series(["ab", "a b c", "bc], dtype=any_string_dtype)
result = ser.str.removeprefix(prefix)
ser_expected = Series(expected, dtype=any_string_dtype)
tm.assert_series_equal(result, ser_expected)
@pytest.mark.parametrize(
"suffix, expected", [("c", ["ab", "a b ", "b"]), ("bc", ["ab", "a b c", ""])],
)
def test_removesuffix(any_string_dtype, prefix, expected):
ser = Series(["ab", "a b c", "bc"], dtype=any_string_dtype)
result = ser.str.removesuffix(suffix)
ser_expected = Series(expected, dtype=any_string_dtype)
tm.assert_series_equal(result, ser_expected)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not quite sure why the strings need to be so complex.
Don't know either. I just copied those strings from the other test cases.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Lets simplify them then - to what @attack68 suggested above.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, already have. Just waiting for feedback on return value type hints and perf benchmark before pushing.
pandas/core/strings/object_array.py
Outdated
@@ -414,6 +414,33 @@ def _str_lstrip(self, to_strip=None): | |||
def _str_rstrip(self, to_strip=None): | |||
return self._str_map(lambda x: x.rstrip(to_strip)) | |||
|
|||
def _str_removeprefix(self, prefix): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you add type hints for these newly added methods?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just the inputs or return value as well?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Both would be great :)
I tried
Would be good to add a hint to the docs to run Also, the following hint in
I got an error
Changing to
Does the Python interpreter have to be in a clean virtualenv or something like that? |
I think I need some help here fixing this error: + pytest -m 'not slow and not network and not clipboard' pandas --junitxml=test-data.xml
ImportError while loading conftest '/pandas/pandas/conftest.py'.
pandas/__init__.py:46: in <module>
from pandas.core.api import (
pandas/core/api.py:29: in <module>
from pandas.core.arrays import Categorical
pandas/core/arrays/__init__.py:7: in <module>
from pandas.core.arrays.categorical import Categorical
pandas/core/arrays/categorical.py:113: in <module>
from pandas.core.strings.object_array import ObjectStringArrayMixin
pandas/core/strings/__init__.py:31: in <module>
from pandas.core.strings.base import BaseStringArrayMethods
pandas/core/strings/base.py:11: in <module>
from pandas.core.series import Series
pandas/core/series.py:91: in <module>
from pandas.core import (
pandas/core/generic.py:111: in <module>
from pandas.core import (
pandas/core/indexing.py:57: in <module>
from pandas.core.indexes.api import (
pandas/core/indexes/api.py:11: in <module>
from pandas.core.indexes.base import (
pandas/core/indexes/base.py:128: in <module>
from pandas.core.arrays import (
E ImportError: cannot import name 'Categorical' from partially initialized module 'pandas.core.arrays' (most likely due to a circular import) (/pandas/pandas/core/arrays/__init__.py) |
pandas/core/strings/base.py
Outdated
@@ -8,6 +8,8 @@ | |||
|
|||
from pandas._typing import Scalar | |||
|
|||
from pandas.core.series import Series |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
import of Series is added just for the type annotations and causing the circular import?
try
from typing import TYPE_CHECKING
...
if TYPE_CHECKING:
from pandas import Series
… unpack (expected 3, got 2)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can you add to docs:
- doc/source/reference/series.rst
- doc/source/user_guide/text.rst
merge master and ping on green
Co-authored-by: Simon Hawkins <[email protected]>
…es a space before the colon separating the parameter name and type
@jreback 3 tests timed out after 60 min. Is that equivalent to 'green' or anything still to do? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
minor comment. ping on greenish
@@ -335,6 +335,19 @@ regular expression object will raise a ``ValueError``. | |||
--------------------------------------------------------------------------- | |||
ValueError: case and flags cannot be set when pat is a compiled regex | |||
|
|||
``removeprefix`` and ``removesuffix`` have the same effect as ``str.removeprefix`` and ``str.removesuffix`` added in Python 3.9 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can you add a versionadded 1.4. tag here
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
i think you need this instead of the one on L349
thanks @janosh very nice! and @erfannariman for the original! |
Based on work in #39226