-
-
Notifications
You must be signed in to change notification settings - Fork 18.4k
possible error in documentation #26865
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
Thanks.. You are probably right (don`t have the computer right here to double check). A contribution on this would be welcomed. |
From what I tested the line should be Only then I get the output the documentation requires: import numpy as np
import pandas as pd
#pd.show_versions()
d = {'a': list(range(4)), 'b': list('ab..'), 'c': ['a', 'b', np.nan, 'd']}
df = pd.DataFrame(d)
#print(df)
#print("-------------")
print(df.replace([r'\.', r'(a)'], ['dot', 'stuff'], regex=True)) Output:
|
Both versions work, but the original example was meant to show a regex -> regex grouped replacement, so if you just make the string a raw string as you originally suggested, that will fix the error. |
@topper-123
Is it wanted that the character 'a' is in there in the second line, third and fourth word, of the output? Code: import numpy as np
import pandas as pd
d = {'a': list(range(4)), 'b': list('ab..'), 'c': ['a', 'b', np.nan, 'd']}
df = pd.DataFrame(d)
print(df.replace([r'\.', r'(a)'], ['dot', r'\1stuff'], regex=True)) |
In my view the example intends to demonstrate the regex -> regex transformation and at the same time show how to use capturing brackets in the regular expression. The original dataframe is d = {'a': list(range(4)), 'b': list('ab..'), 'c': ['a', 'b', np.nan, 'd']}
df = pd.DataFrame(d) i.e. the original data frame is a b c with the intended example df.replace([r'\.', r'(a)'], ['dot', r'\1stuff'], regex=True) we would like to replace '.' with 'dot' and also replace 'a' with 'astuff'. Indeed, the above code does exactly this and yields: a b c that is what the example intends to show. |
Agree with @karpanGit on the example's intention. Having said that, the used strings have no meaning, it you want to find an example where the strings/regex operation give better meaning, that could help people understand the example better. |
I agree with you @topper-123. One naive question: how does it work with improving the documentation? Are users like myself supposed to make concrete proposals or only report observations? Many thanks and apologies for my ignorance on how things work. |
If you asking specifically about the process on how an issue is resolved in pandas, then it's all on volunteer basis, and no one is obliged to fix a bug that you've reported. So in practice the best way to get things fixed is to submit a pull request yourself, including to the dcumentation :-). And contributions are always welcome, as I mentioned. |
Okay, if that is intended, than the correct line is df.replace([r'\.', r'(a)'], ['dot', r'\1stuff'], regex=True) |
Yes, that`s right. |
Perfect, is the pull request correct the way I did it? |
Yes. |
The page
https://pandas.pydata.org/pandas-docs/stable/user_guide/missing_data.html#string-regular-expression-replacement
seems to have an example that can be improved. The page lists
as an example. However '\1' is ignored because the replacement regex is not a raw string. I think what you mean is likely
if my understanding is correct please consider updating the page.
Regards,
Panos Karamertzanis
The text was updated successfully, but these errors were encountered: