possible error in documentation #26865

karpanGit · 2019-06-15T09:02:55Z

The page

https://pandas.pydata.org/pandas-docs/stable/user_guide/missing_data.html#string-regular-expression-replacement

seems to have an example that can be improved. The page lists

df.replace([r'\.', r'(a)'], ['dot', '\1stuff'], regex=True)

as an example. However '\1' is ignored because the replacement regex is not a raw string. I think what you mean is likely

df.replace([r'\.', r'(a)'], ['dot', r'\1stuff'], regex=True)

if my understanding is correct please consider updating the page.

Regards,

Panos Karamertzanis

topper-123 · 2019-06-15T11:37:03Z

Thanks.. You are probably right (don`t have the computer right here to double check). A contribution on this would be welcomed.

Kischy · 2019-06-16T14:26:49Z

From what I tested the line should be
df.replace([r'\.', r'(a)'], ['dot', 'stuff'], regex=True)

Only then I get the output the documentation requires:

import numpy as np
import pandas as pd


#pd.show_versions()


d = {'a': list(range(4)), 'b': list('ab..'), 'c': ['a', 'b', np.nan, 'd']}
df = pd.DataFrame(d)

#print(df)
#print("-------------")
print(df.replace([r'\.', r'(a)'], ['dot', 'stuff'], regex=True))

Output:

   a      b      c
0  0  stuff  stuff
1  1      b      b
2  2    dot    NaN
3  3    dot      d

topper-123 · 2019-06-16T14:52:58Z

Both versions work, but the original example was meant to show a regex -> regex grouped replacement, so if you just make the string a raw string as you originally suggested, that will fix the error.

Kischy · 2019-06-16T15:37:20Z

@topper-123
If I do it as originally sugested, than the output is

   a       b       c
0  0  astuff  astuff
1  1       b       b
2  2     dot     NaN
3  3     dot       d

Is it wanted that the character 'a' is in there in the second line, third and fourth word, of the output?

Code:

import numpy as np
import pandas as pd

d = {'a': list(range(4)), 'b': list('ab..'), 'c': ['a', 'b', np.nan, 'd']}
df = pd.DataFrame(d)

print(df.replace([r'\.', r'(a)'], ['dot', r'\1stuff'], regex=True))

karpanGit · 2019-06-16T17:00:13Z

In my view the example intends to demonstrate the regex -> regex transformation and at the same time show how to use capturing brackets in the regular expression. The original dataframe is

d = {'a': list(range(4)), 'b': list('ab..'), 'c': ['a', 'b', np.nan, 'd']}
df = pd.DataFrame(d)

i.e. the original data frame is

a b c
0 0 a a
1 1 b b
2 2 . NaN
3 3 . d

with the intended example

df.replace([r'\.', r'(a)'], ['dot', r'\1stuff'], regex=True)

we would like to replace '.' with 'dot' and also replace 'a' with 'astuff'. Indeed, the above code does exactly this and yields:

a b c
0 0 astuff astuff
1 1 b b
2 2 dot NaN
3 3 dot d

that is what the example intends to show.

topper-123 · 2019-06-16T18:19:42Z

Agree with @karpanGit on the example's intention.

Having said that, the used strings have no meaning, it you want to find an example where the strings/regex operation give better meaning, that could help people understand the example better.

karpanGit · 2019-06-16T18:24:30Z

I agree with you @topper-123.

One naive question: how does it work with improving the documentation? Are users like myself supposed to make concrete proposals or only report observations?

Many thanks and apologies for my ignorance on how things work.

topper-123 · 2019-06-16T18:39:40Z

If you asking specifically about the process on how an issue is resolved in pandas, then it's all on volunteer basis, and no one is obliged to fix a bug that you've reported. So in practice the best way to get things fixed is to submit a pull request yourself, including to the dcumentation :-). And contributions are always welcome, as I mentioned.

Kischy · 2019-06-16T19:42:44Z

Okay, if that is intended, than the correct line is

df.replace([r'\.', r'(a)'], ['dot', r'\1stuff'], regex=True)

topper-123 · 2019-06-16T20:15:43Z

Yes, that`s right.

Kischy · 2019-06-16T20:17:28Z

Perfect, is the pull request correct the way I did it?

topper-123 · 2019-06-16T20:34:19Z

Yes.

topper-123 added Docs good first issue labels Jun 15, 2019

Kischy mentioned this issue Jun 16, 2019

DOC: Changed string to give correct output (#26865) #26886

Closed

1 task

Kischy mentioned this issue Jun 16, 2019

DOC: Changed string to give intended explanation #26865 #26891

Merged

1 task

topper-123 closed this as completed in #26891 Jun 16, 2019

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

possible error in documentation #26865

possible error in documentation #26865

karpanGit commented Jun 15, 2019 •

edited

Loading

topper-123 commented Jun 15, 2019

Kischy commented Jun 16, 2019 •

edited

Loading

topper-123 commented Jun 16, 2019

Kischy commented Jun 16, 2019 •

edited

Loading

karpanGit commented Jun 16, 2019 •

edited

Loading

topper-123 commented Jun 16, 2019

karpanGit commented Jun 16, 2019

topper-123 commented Jun 16, 2019 •

edited

Loading

Kischy commented Jun 16, 2019 •

edited

Loading

topper-123 commented Jun 16, 2019

Kischy commented Jun 16, 2019

topper-123 commented Jun 16, 2019

possible error in documentation #26865

possible error in documentation #26865

Comments

karpanGit commented Jun 15, 2019 • edited Loading

topper-123 commented Jun 15, 2019

Kischy commented Jun 16, 2019 • edited Loading

topper-123 commented Jun 16, 2019

Kischy commented Jun 16, 2019 • edited Loading

karpanGit commented Jun 16, 2019 • edited Loading

topper-123 commented Jun 16, 2019

karpanGit commented Jun 16, 2019

topper-123 commented Jun 16, 2019 • edited Loading

Kischy commented Jun 16, 2019 • edited Loading

topper-123 commented Jun 16, 2019

Kischy commented Jun 16, 2019

topper-123 commented Jun 16, 2019

karpanGit commented Jun 15, 2019 •

edited

Loading

Kischy commented Jun 16, 2019 •

edited

Loading

Kischy commented Jun 16, 2019 •

edited

Loading

karpanGit commented Jun 16, 2019 •

edited

Loading

topper-123 commented Jun 16, 2019 •

edited

Loading

Kischy commented Jun 16, 2019 •

edited

Loading