Skip to content

ENH: note interpretation of pattern for str.contains as regex #44811

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
nealmcb opened this issue Dec 8, 2021 · 0 comments · Fixed by #44812
Closed

ENH: note interpretation of pattern for str.contains as regex #44811

nealmcb opened this issue Dec 8, 2021 · 0 comments · Fixed by #44812
Labels
Milestone

Comments

@nealmcb
Copy link
Contributor

nealmcb commented Dec 8, 2021

Is your feature request related to a problem?

I didn't understand this error message:

>>> contests[contests.contest_name.str.contains("Proposition 120 (STATUTORY)")]
UserWarning: This pattern has match groups. To actually get the groups, use str.extract.

Only after fiddling around and searching did it finally dawn on me that the groups it was talking about weren't pandas group_by groups, but regular expression groups. I was using pandas group_by elsewhere in the project, but not had no intention of using regular expressions at all. Since the python __contains__ functionality doesn't use regular expressions, and people who aren't thinking in those terms may have no idea what "match groups" are, I suggest a more helpful error message.

The fix in my case was easy: either quoting the parentheses so they weren't interpreted as grouping metacharacters, or using regex=False as explained at How do I select by partial string from a pandas DataFrame.

But I'd like others to figure out the issue more quickly than I did.

Describe the solution you'd like

I suggest e.g. changing the message to:

UserWarning: This pattern is interpreted as a regular expression and has match groups. To actually get the groups, use str.extract.

API breaking implications

This suggestion has no impact on the Pandas API.

Additional context

Other error messages related to regular expressions may also benefit from more context and clarity.

@nealmcb nealmcb added Enhancement Needs Triage Issue that has not been reviewed by a pandas team member labels Dec 8, 2021
@jreback jreback added this to the 1.4 milestone Dec 8, 2021
@jreback jreback added Docs and removed Enhancement Needs Triage Issue that has not been reviewed by a pandas team member labels Dec 8, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants