-
-
Notifications
You must be signed in to change notification settings - Fork 18.5k
ENH: More convenient regex? #4685
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
I do hate match objects (is this like a get groups thing?)... this sounds useful. I think should be able to extract using group name (and the moment get is by integer location only), not sure how that would work with mulitple. I also dislike how findall returns a series of lists... |
This is how I think single and multiple groups should work.
What are you thinking for group names? |
Would this / does this only work for series? Also, how would you handle |
I think it would only work neatly for Series. As for named match groups, what about this?
As above, if there is no match, the whole row is NaN. If there is a optional group and that group is absent from a given entry, only the absent group is NaN.
|
That's a good way to do it. |
Yep, that was exactly what I was getting at with group names, this would be a great enhancement. The stupid thing I was also thinking was multiple optional things, but actually this doesn't appear to be supported in re anyway (nor does it really work with unnamed stuff):
|
Two examples that work in the PR referenced above. Will write tests out of these, and more.
Notice that, in the column names, I follow the re module's convention of labeling any unnamed groups with 1. It's actually |
Oh sorry, just commented about that same thing. Not sure what's better. Probably should follow their (weird) convention. |
Ha. Saw your comment and changed it. I'm really not sure which is better. |
How about circumventing annoying match objects by providing an additional, somewhat redundant string method?
as a shortcut for
If the pattern contains multiple groups, a DataFrame should be returned. Thoughts? Maybe @hayd would be into this one.
The text was updated successfully, but these errors were encountered: