-
-
Notifications
You must be signed in to change notification settings - Fork 18.4k
DOC: update the pandas.DataFrame.replace docstring #20271
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
DOC: update the pandas.DataFrame.replace docstring #20271
Conversation
pandas/core/generic.py
Outdated
|
||
.. versionchanged:: 0.23.0 | ||
Added to DataFrame | ||
.. versionchanged:: 0.23.0 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This isn't what you want to do - make sure you keep the versionchanged
directive below the method
argument as that's what was added in v0.23
pandas/core/generic.py
Outdated
regex : bool or same types as ``to_replace``, default False | ||
Whether to interpret ``to_replace`` and/or ``value`` as regular | ||
expressions. If this is ``True`` then ``to_replace`` *must* be a | ||
string. Alternatively, this could be a regular expression or a | ||
list, dict, or array of regular expressions in which case | ||
``to_replace`` must be ``None``. | ||
method : string, optional, {'pad', 'ffill', 'bfill'} | ||
method : string, optional, {'pad', 'ffill', 'bfill'}, default is 'pad' |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
method : {'pad', 'ffill', 'bfill', `None`}
pandas/core/generic.py
Outdated
The method to use when for replacement, when ``to_replace`` is a | ||
scalar, list or tuple and ``value`` is None. | ||
axis : None | ||
Deprecated. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Warning says this will be removed in v0.13? Woof...I guess OK to document for this change but should have a follow up change to actually go ahead and remove - care to take a stab at that?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@WillAyd where is this warning?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just a couple of lines into the function definition
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm happy to take a stab at this - always nice when I can remove code too
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@math-and-data awesome thanks! Can you open a separate issue for this?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
will do.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@WillAyd I was waiting for this PR to be approved, then I would open a new request where I change the relevant code (remove the 'axis' reference) and edit the documentation accordingly. Is there anything else I had missed in this PR (other than the suggestion of breaking out the DataFrame and Series examples)?
pandas/core/generic.py
Outdated
@@ -4869,6 +4869,10 @@ def bfill(self, axis=None, inplace=False, limit=None, downcast=None): | |||
_shared_docs['replace'] = (""" | |||
Replace values given in 'to_replace' with 'value'. | |||
|
|||
Values of the DataFrame or a Series are being replaced with |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not sure this extended description is adding much. Better served to make mention of how this can replace values with a dynamic set of inputs like dicts
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done, thank you for the suggestion
pandas/core/generic.py
Outdated
Values of the DataFrame or a Series are being replaced with | ||
other values. One or several values can be replaced with one | ||
or several values. | ||
|
||
Parameters | ||
---------- | ||
to_replace : str, regex, list, dict, Series, numeric, or None |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Say int, float instead of numeric (if float is even valid?)
pandas/core/generic.py
Outdated
the pecularities of the ``to_replace`` parameter. | ||
``s.replace('a', None)`` is actually equivalent to | ||
``s.replace(to_replace='a', value=None, method='pad')``, | ||
because when ``value=None`` and ``to_replace`` is a scalar, list or |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is interesting as I was not aware of this behavior. Certainly great to have it documented, though I would move the majority of the writing into the Notes section and shorten the blurb introducing the comparison here.
pandas/core/generic.py
Outdated
``s.replace(to_replace='a', value=None, method='pad')``, | ||
because when ``value=None`` and ``to_replace`` is a scalar, list or | ||
tuple, ``replace`` uses the method parameter to do the replacement. | ||
So this is why the 'a' values are being replaced by 30 in rows 3 and 4 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe just reinforce that it's the fill
behavior that is really replacing values here
pandas/core/generic.py
Outdated
like the value(s) in the dict are equal to the value parameter. | ||
|
||
>>> s = pd.Series([10, 20, 30, 'a', 'a', 'b', 'a']) | ||
>>> print(s) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This Series is simple enough where you don't need to explicitly print it - the constructor shows you everything of interest
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I personally have found the visual of inspecting the changes before/after easier for such replacements (both in vertical positions). You have more experience and I'll rely on your suggestion and make the change.
pandas/core/generic.py
Outdated
5 b | ||
6 b | ||
dtype: object | ||
>>> print(s.replace({'a': None})) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would put this example first as it is (from my perspective) the behavior most would expect. Having it first makes it a better segue into the nuance that you want to describe with the other example
pandas/core/generic.py
Outdated
when you use a dict as the ``to_replace`` value. In this case, it is | ||
like the value(s) in the dict are equal to the value parameter. | ||
|
||
>>> s = pd.Series([10, 20, 30, 'a', 'a', 'b', 'a']) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just to keep things concise why don't you get rid of 10 and 20 in this example? They don't serve any real purpose but make the documentation longer. Can also replace 30 with 1
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
great suggestion of simplifying.
pandas/core/generic.py
Outdated
The method to use when for replacement, when ``to_replace`` is a | ||
scalar, list or tuple and ``value`` is None. | ||
axis : None | ||
Deprecated. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@WillAyd where is this warning?
pandas/core/generic.py
Outdated
5 b | ||
6 a | ||
dtype: object | ||
>>> print(s.replace('a', None)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
you don't need the prints, use a blank line between cases. Having an expl for each case is also nice.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done.
|
Codecov Report
@@ Coverage Diff @@
## master #20271 +/- ##
==========================================
+ Coverage 91.82% 91.84% +0.02%
==========================================
Files 152 153 +1
Lines 49248 49305 +57
==========================================
+ Hits 45222 45286 +64
+ Misses 4026 4019 -7
Continue to review full report at Codecov.
|
Updated
|
I would personally split this docstring in separate ones for series and dataframe, it's becoming quite a monster :) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
One very minor edit but otherwise lgtm
pandas/core/generic.py
Outdated
|
||
See Also | ||
-------- | ||
%(klass)s.fillna : Fill NA/NaN values | ||
%(klass)s.fillna : Fill `NaN` values |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This would be better as Fill NA values
since it is talking about the concept of missing data and not necessarily the NaN
value itself
minor change as requested
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fixed the linting failure. Let's get this merged when that passes.
Thanks @math-and-data! |
scripts/validate_docstrings.py <your-function-or-method>
git diff upstream/master -u -- "*.py" | flake8 --diff
python doc/make.py --single <your-function-or-method>
Note: Just did a minor improvement, not a full change!
Still a few verification errors: