Skip to content

Series.replace and DataFrame.replace have same docstring? #13852

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
kyleabeauchamp opened this issue Jul 30, 2016 · 19 comments · Fixed by #18100
Closed

Series.replace and DataFrame.replace have same docstring? #13852

kyleabeauchamp opened this issue Jul 30, 2016 · 19 comments · Fixed by #18100

Comments

@kyleabeauchamp
Copy link

kyleabeauchamp commented Jul 30, 2016

In [4]: pd.__version__
Out[4]: u'0.18.1'

The docstring for Series.replace refers to looking up column names, which AFAIK doesn't make sense for the Series version of replace. It seems like the Series function is just blindly inheriting the DataFrame docstring?

http://pandas.pydata.org/pandas-docs/stable/generated/pandas.Series.replace.html?highlight=replace#pandas.Series.replace

http://pandas.pydata.org/pandas-docs/stable/generated/pandas.DataFrame.replace.html?highlight=replace#pandas.DataFrame.replace

For example, the following trivial example of Series.replace() does not agree with the docstring:

import pandas as pd

x = pd.DataFrame([dict(A=1, B=2, C=3), dict(A=2, B=3, C=4), dict(A=3, B=4, C=5)])

print(x.A.replace({1:0}))
print(x.A)

0    0
1    2
2    3
Name: A, dtype: int64
0    1
1    2
2    3
Name: A, dtype: int64
@kyleabeauchamp
Copy link
Author

One other minor nit. Should the following edge case be supported? It currently raises a ValueError if you pass an empty dict.

x.A.replace({})

~/miniconda2/lib/python2.7/site-packages/pandas/core/generic.pyc in replace(self, to_replace, value, inplace, limit, regex, method, axis)
   3334 
   3335             items = list(compat.iteritems(to_replace))
-> 3336             keys, values = zip(*items)
   3337 
   3338             are_mappings = [is_dictlike(v) for v in values]

ValueError: need more than 0 values to unpack

I'm not 100% sure this should even be allowed because of the docstring confusion, though.

@sinhrks
Copy link
Member

sinhrks commented Jul 30, 2016

Thanks for the report. Yeah doc improvement is appreciated. It should be re-defined in Series/DateFrame to show correct signature as docstring. You can find an example here.

-https://github.com/pydata/pandas/blob/master/pandas/core/generic.py#L2219

I think the empty case should be supported to do nothing like other methods.

x.rename({})
#    A  B  C
# 0  1  2  3
# 1  2  3  4
# 2  3  4  5

@sinhrks sinhrks added Docs Error Reporting Incorrect or improved errors from pandas labels Jul 30, 2016
@RaghavBatra
Copy link

Hey, I am interested! How should I go about solving this?

@gfyoung
Copy link
Member

gfyoung commented Aug 9, 2017

@RaghavBatra : I think the comment by @sinhrks above nicely describes what needs to be done.

If you can write a new docstring for Series.replace that properly reflects the behavior of the method, that will be great!

@RaghavBatra
Copy link

@gfyoung, I looked at the counter example @kyleabeauchamp gave above and I don't see any problem with it:
{1: 0} implies you replace the 1 with 0, which the code does!

0 1
1 2
2 3
Name: A, dtype: int64
0 0
1 2
2 3
Name: A, dtype: int64

@gfyoung
Copy link
Member

gfyoung commented Aug 10, 2017

@RaghavBatra : Can you confirm that passing in both {1: 0} and {} don't break anything?

@RaghavBatra
Copy link

@gfyoung, yes it works for the DataFrame counterexample above.

I also tried the Series.replace function. That worked as well!

ser = pd.Series([1, 2, 3, 4])
ser
0 1
1 2
2 3
3 4
dtype: int64
ser.replace({1:0})
0 0
1 2
2 3
3 4
dtype: int64
ser.replace({})
0 1
1 2
2 3
3 4
dtype: int64

@gfyoung
Copy link
Member

gfyoung commented Aug 10, 2017

@sinhrks @jreback : It seems like the behavior matches the docs now. Closing in light of this development.

@gfyoung gfyoung closed this as completed Aug 10, 2017
@gfyoung gfyoung added this to the No action milestone Aug 10, 2017
@gfyoung
Copy link
Member

gfyoung commented Aug 10, 2017

@RaghavBatra : Thanks for looking into this!

@RaghavBatra
Copy link

@gfyoung glad to be of help!
Let me know if there are other beginner level issues to work on!

@gfyoung
Copy link
Member

gfyoung commented Aug 10, 2017

@RaghavBatra : https://github.com/pandas-dev/pandas/issues lists all open issues that you are more than welcome to tackle!

@jorisvandenbossche
Copy link
Member

For me, this issue was about the doctring in general, and not only that specific issue. The docstring of Series.replace still mentions dataframe specific things that do not work for Series (eg the explanation for a dict only mentions nested dicts).
So I would say to either make two separate docstrings (or make separate versions of part of the docstring), or either reword the docstring so it reads fine (and includes explanation) for both DataFrame and Series.
Also the mention of NDFrame should be removed (or replaced by Series / DataFrame appropriately).

@jorisvandenbossche jorisvandenbossche removed this from the No action milestone Aug 10, 2017
@jorisvandenbossche jorisvandenbossche added Difficulty Novice and removed Error Reporting Incorrect or improved errors from pandas labels Aug 10, 2017
@RaghavBatra
Copy link

OK, so I downloaded and built the documentation. I even found the HTML version of the Series.replace function.
My question is how do I go about changing the actual documentation now?
Any help would be appreciated!

@gfyoung
Copy link
Member

gfyoung commented Aug 10, 2017

Do a GitHub search in this repository for a portion of the docstring, and you should be able to figure out where it is.

@RaghavBatra
Copy link

That only seems to bring up the HTML generated doc! Any other tips?

@gfyoung
Copy link
Member

gfyoung commented Aug 11, 2017

The docstring can be found in core/generic.py

@RaghavBatra
Copy link

Nope, still don't see it. :(

@gfyoung
Copy link
Member

gfyoung commented Aug 11, 2017

It's right here (just click the hyperlink)

RaghavBatra added a commit to RaghavBatra/pandas that referenced this issue Aug 11, 2017
Updated documentation so that it matched the functionality of
pd.Series.replace and NOT pd.DataFrame.replace
Fixes pandas-dev#13852
@RaghavBatra
Copy link

Done!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging a pull request may close this issue.

7 participants