Skip to content

ENH: add string method remove prefix and suffix, python 3.9 #36944

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
erfannariman opened this issue Oct 7, 2020 · 10 comments · Fixed by #43328
Closed

ENH: add string method remove prefix and suffix, python 3.9 #36944

erfannariman opened this issue Oct 7, 2020 · 10 comments · Fixed by #43328
Labels
Enhancement Strings String extension data type and string data
Milestone

Comments

@erfannariman
Copy link
Member

erfannariman commented Oct 7, 2020

Opening this issue for discussion:

Since python 3.9 is out, and we have the new string methods removeprefix and removesuffix, it would be nice to add them to the pandas string methods as well

In [2]: import pandas as pd

In [3]: df = pd.DataFrame({'A': ['str_string1', 'str_string2', 'str_string3']})

In [4]: print(df)
             A
0  str_string1
1  str_string2
2  str_string3

In [5]: df['A'].str.removeprefix('str_')
Out[5]: 
0    string1
1    string2
2    string3
Name: A, dtype: object

An argument not to add this is that it's pretty easily to achieve with str.split:

df['A'].str.split('_').str[-1]
@erfannariman erfannariman added Enhancement Needs Triage Issue that has not been reviewed by a pandas team member labels Oct 7, 2020
@rhshadrach
Copy link
Member

+1

@erfannariman Your alternative to removeprefix will not work on any input, e.g. str_str_string1 or other_str_string1.

@erfannariman
Copy link
Member Author

+1

@erfannariman Your alternative to removeprefix will not work on any input, e.g. str_str_string1 or other_str_string1.

True, but you could solve that with: str.split('_', n=1).str[-1]

@ghost
Copy link

ghost commented Jan 15, 2021

@erfannariman, your updated version won't work if the prefix is not separated by a delimiter.

e.g. for prefix str: strFollowedByOtherThings, strWithDifferentOtherThingsFollowing.

The most robust method is the one documented in PEP 616.

e.g.

prefix = 'str'
df2 = df.copy()
df2.loc[df['A'].str.startswith(prefix), 'A'] = df2.loc[df['A'].str.startswith(prefix), 'A'].str[len(prefix):]
df2

I think this makes a pretty good case that it would be nice to add the new string methods!

@erfannariman
Copy link
Member Author

@erfannariman, your updated version won't work if the prefix is not separated by a delimiter.

e.g. for prefix str: strFollowedByOtherThings, strWithDifferentOtherThingsFollowing.

The most robust method is the one documented in PEP 616.

e.g.

prefix = 'str'
df2 = df.copy()
df2.loc[df['A'].str.startswith(prefix), 'A'] = df2.loc[df['A'].str.startswith(prefix), 'A'].str[len(prefix):]
df2

I think this makes a pretty good case that it would be nice to add the new string methods!

Yes, I agree.

@jreback
Copy link
Contributor

jreback commented Jan 16, 2021

+1 in adding this

@rhshadrach rhshadrach removed the Needs Triage Issue that has not been reviewed by a pandas team member label Feb 6, 2021
@rhshadrach rhshadrach added this to the Contributions Welcome milestone Feb 6, 2021
@jreback jreback modified the milestones: Contributions Welcome, 1.3 Feb 16, 2021
@jreback jreback added the Strings String extension data type and string data label Feb 16, 2021
@simonjayhawkins simonjayhawkins modified the milestones: 1.3, Contributions Welcome Jun 8, 2021
@janosh
Copy link
Contributor

janosh commented Aug 29, 2021

@simonjayhawkins Would this be a good first issue? Happy to help in that case.

@simonjayhawkins
Copy link
Member

Thanks @janosh. sure. go for it!

@janosh
Copy link
Contributor

janosh commented Aug 31, 2021

Great. Anything to be aware of? Is it okay to use removeprefix or should the implementation be backwards compatible, i.e. work for people on Python <3.9?

@simonjayhawkins
Copy link
Member

From the comments here and the stalled PR #39226, I believe that we would want to use the implementation in PEP 616 to order to provide support for Python <3.9.

I think there is an outstanding question on whether we use the python native functions when available #39226 (comment).

@janosh
Copy link
Contributor

janosh commented Aug 31, 2021

Cool, I opened a draft PR at #43328. Let's discuss details there.

@jreback jreback modified the milestones: Contributions Welcome, 1.4 Sep 6, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Enhancement Strings String extension data type and string data
Projects
None yet
Development

Successfully merging a pull request may close this issue.

6 participants