-
-
Notifications
You must be signed in to change notification settings - Fork 18.4k
DOC: update the pandas.DataFrame.replace docstring #20271
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Merged
TomAugspurger
merged 8 commits into
pandas-dev:master
from
math-and-data:docstring_series_dataframe_replace
Apr 22, 2018
Merged
Changes from 6 commits
Commits
Show all changes
8 commits
Select commit
Hold shift + click to select a range
068127c
DOC: changed pandas.DataFrame/Series.replace docstring
math-and-data 44b7de3
DOC: changed pandas.DataFrame/Series.replace docstring
math-and-data b502d34
DOC: pandas.DataFrame.replace - implemented feedback
math-and-data 80bc7ca
DOC: pandas.DataFrame.replace - int/float instead of numeric
math-and-data f05d0af
DOC: pandas.DataFrame.replace - formatting
math-and-data 0a081fe
Updates.
TomAugspurger cf6d655
DOC: update the pandas.DataFrame.replace docstring
math-and-data 58f6531
Fixed linting
TomAugspurger File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -4867,64 +4867,69 @@ def bfill(self, axis=None, inplace=False, limit=None, downcast=None): | |
limit=limit, downcast=downcast) | ||
|
||
_shared_docs['replace'] = (""" | ||
Replace values given in 'to_replace' with 'value'. | ||
Replace values given in `to_replace` with `value`. | ||
|
||
Values of the %(klass)s are replaced with other values dynamically. | ||
This differs from updating with ``.loc`` or ``.iloc``, which require | ||
you to specify a location to update with some value. | ||
|
||
Parameters | ||
---------- | ||
to_replace : str, regex, list, dict, Series, numeric, or None | ||
to_replace : str, regex, list, dict, Series, int, float, or None | ||
How to find the values that will be replaced. | ||
|
||
* numeric, str or regex: | ||
|
||
- numeric: numeric values equal to ``to_replace`` will be | ||
replaced with ``value`` | ||
- str: string exactly matching ``to_replace`` will be replaced | ||
with ``value`` | ||
- regex: regexs matching ``to_replace`` will be replaced with | ||
``value`` | ||
- numeric: numeric values equal to `to_replace` will be | ||
replaced with `value` | ||
- str: string exactly matching `to_replace` will be replaced | ||
with `value` | ||
- regex: regexs matching `to_replace` will be replaced with | ||
`value` | ||
|
||
* list of str, regex, or numeric: | ||
|
||
- First, if ``to_replace`` and ``value`` are both lists, they | ||
- First, if `to_replace` and `value` are both lists, they | ||
**must** be the same length. | ||
- Second, if ``regex=True`` then all of the strings in **both** | ||
lists will be interpreted as regexs otherwise they will match | ||
directly. This doesn't matter much for ``value`` since there | ||
directly. This doesn't matter much for `value` since there | ||
are only a few possible substitution regexes you can use. | ||
- str, regex and numeric rules apply as above. | ||
|
||
* dict: | ||
|
||
- Dicts can be used to specify different replacement values | ||
for different existing values. For example, | ||
{'a': 'b', 'y': 'z'} replaces the value 'a' with 'b' and | ||
'y' with 'z'. To use a dict in this way the ``value`` | ||
parameter should be ``None``. | ||
``{'a': 'b', 'y': 'z'}`` replaces the value 'a' with 'b' and | ||
'y' with 'z'. To use a dict in this way the `value` | ||
parameter should be `None`. | ||
- For a DataFrame a dict can specify that different values | ||
should be replaced in different columns. For example, | ||
{'a': 1, 'b': 'z'} looks for the value 1 in column 'a' and | ||
the value 'z' in column 'b' and replaces these values with | ||
whatever is specified in ``value``. The ``value`` parameter | ||
``{'a': 1, 'b': 'z'}`` looks for the value 1 in column 'a' | ||
and the value 'z' in column 'b' and replaces these values | ||
with whatever is specified in `value`. The `value` parameter | ||
should not be ``None`` in this case. You can treat this as a | ||
special case of passing two lists except that you are | ||
specifying the column to search in. | ||
- For a DataFrame nested dictionaries, e.g., | ||
{'a': {'b': np.nan}}, are read as follows: look in column 'a' | ||
for the value 'b' and replace it with NaN. The ``value`` | ||
``{'a': {'b': np.nan}}``, are read as follows: look in column | ||
'a' for the value 'b' and replace it with NaN. The `value` | ||
parameter should be ``None`` to use a nested dict in this | ||
way. You can nest regular expressions as well. Note that | ||
column names (the top-level dictionary keys in a nested | ||
dictionary) **cannot** be regular expressions. | ||
|
||
* None: | ||
|
||
- This means that the ``regex`` argument must be a string, | ||
compiled regular expression, or list, dict, ndarray or Series | ||
of such elements. If ``value`` is also ``None`` then this | ||
**must** be a nested dictionary or ``Series``. | ||
- This means that the `regex` argument must be a string, | ||
compiled regular expression, or list, dict, ndarray or | ||
Series of such elements. If `value` is also ``None`` then | ||
this **must** be a nested dictionary or Series. | ||
|
||
See the examples section for examples of each of these. | ||
value : scalar, dict, list, str, regex, default None | ||
Value to replace any values matching ``to_replace`` with. | ||
Value to replace any values matching `to_replace` with. | ||
For a DataFrame a dict of values can be used to specify which | ||
value to use for each column (columns not in the dict will not be | ||
filled). Regular expressions, strings and lists or dicts of such | ||
|
@@ -4934,45 +4939,49 @@ def bfill(self, axis=None, inplace=False, limit=None, downcast=None): | |
other views on this object (e.g. a column from a DataFrame). | ||
Returns the caller if this is True. | ||
limit : int, default None | ||
Maximum size gap to forward or backward fill | ||
regex : bool or same types as ``to_replace``, default False | ||
Whether to interpret ``to_replace`` and/or ``value`` as regular | ||
expressions. If this is ``True`` then ``to_replace`` *must* be a | ||
Maximum size gap to forward or backward fill. | ||
regex : bool or same types as `to_replace`, default False | ||
Whether to interpret `to_replace` and/or `value` as regular | ||
expressions. If this is ``True`` then `to_replace` *must* be a | ||
string. Alternatively, this could be a regular expression or a | ||
list, dict, or array of regular expressions in which case | ||
``to_replace`` must be ``None``. | ||
method : string, optional, {'pad', 'ffill', 'bfill'} | ||
The method to use when for replacement, when ``to_replace`` is a | ||
scalar, list or tuple and ``value`` is None. | ||
`to_replace` must be ``None``. | ||
method : {'pad', 'ffill', 'bfill', `None`} | ||
The method to use when for replacement, when `to_replace` is a | ||
scalar, list or tuple and `value` is ``None``. | ||
|
||
.. versionchanged:: 0.23.0 | ||
Added to DataFrame | ||
.. versionchanged:: 0.23.0 | ||
Added to DataFrame. | ||
axis : None | ||
Deprecated. | ||
|
||
See Also | ||
-------- | ||
%(klass)s.fillna : Fill NA/NaN values | ||
%(klass)s.fillna : Fill `NaN` values | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. This would be better as |
||
%(klass)s.where : Replace values based on boolean condition | ||
Series.str.replace : Simple string replacement. | ||
|
||
Returns | ||
------- | ||
filled : %(klass)s | ||
%(klass)s | ||
Object after replacement. | ||
|
||
Raises | ||
------ | ||
AssertionError | ||
* If ``regex`` is not a ``bool`` and ``to_replace`` is not | ||
* If `regex` is not a ``bool`` and `to_replace` is not | ||
``None``. | ||
TypeError | ||
* If ``to_replace`` is a ``dict`` and ``value`` is not a ``list``, | ||
* If `to_replace` is a ``dict`` and `value` is not a ``list``, | ||
``dict``, ``ndarray``, or ``Series`` | ||
* If ``to_replace`` is ``None`` and ``regex`` is not compilable | ||
* If `to_replace` is ``None`` and `regex` is not compilable | ||
into a regular expression or is a list, dict, ndarray, or | ||
Series. | ||
* When replacing multiple ``bool`` or ``datetime64`` objects and | ||
the arguments to ``to_replace`` does not match the type of the | ||
the arguments to `to_replace` does not match the type of the | ||
value being replaced | ||
ValueError | ||
* If a ``list`` or an ``ndarray`` is passed to ``to_replace`` and | ||
* If a ``list`` or an ``ndarray`` is passed to `to_replace` and | ||
`value` but they are not the same length. | ||
|
||
Notes | ||
|
@@ -4986,10 +4995,15 @@ def bfill(self, axis=None, inplace=False, limit=None, downcast=None): | |
numbers *are* strings, then you can do this. | ||
* This method has *a lot* of options. You are encouraged to experiment | ||
and play with this method to gain intuition about how it works. | ||
* When dict is used as the `to_replace` value, it is like | ||
key(s) in the dict are the to_replace part and | ||
value(s) in the dict are the value parameter. | ||
|
||
Examples | ||
-------- | ||
|
||
**Scalar `to_replace` and `value`** | ||
|
||
>>> s = pd.Series([0, 1, 2, 3, 4]) | ||
>>> s.replace(0, 5) | ||
0 5 | ||
|
@@ -4998,6 +5012,7 @@ def bfill(self, axis=None, inplace=False, limit=None, downcast=None): | |
3 3 | ||
4 4 | ||
dtype: int64 | ||
|
||
>>> df = pd.DataFrame({'A': [0, 1, 2, 3, 4], | ||
... 'B': [5, 6, 7, 8, 9], | ||
... 'C': ['a', 'b', 'c', 'd', 'e']}) | ||
|
@@ -5009,20 +5024,24 @@ def bfill(self, axis=None, inplace=False, limit=None, downcast=None): | |
3 3 8 d | ||
4 4 9 e | ||
|
||
**List-like `to_replace`** | ||
|
||
>>> df.replace([0, 1, 2, 3], 4) | ||
A B C | ||
0 4 5 a | ||
1 4 6 b | ||
2 4 7 c | ||
3 4 8 d | ||
4 4 9 e | ||
|
||
>>> df.replace([0, 1, 2, 3], [4, 3, 2, 1]) | ||
A B C | ||
0 4 5 a | ||
1 3 6 b | ||
2 2 7 c | ||
3 1 8 d | ||
4 4 9 e | ||
|
||
>>> s.replace([1, 2], method='bfill') | ||
0 0 | ||
1 3 | ||
|
@@ -5031,20 +5050,24 @@ def bfill(self, axis=None, inplace=False, limit=None, downcast=None): | |
4 4 | ||
dtype: int64 | ||
|
||
**dict-like `to_replace`** | ||
|
||
>>> df.replace({0: 10, 1: 100}) | ||
A B C | ||
0 10 5 a | ||
1 100 6 b | ||
2 2 7 c | ||
3 3 8 d | ||
4 4 9 e | ||
|
||
>>> df.replace({'A': 0, 'B': 5}, 100) | ||
A B C | ||
0 100 100 a | ||
1 1 6 b | ||
2 2 7 c | ||
3 3 8 d | ||
4 4 9 e | ||
|
||
>>> df.replace({'A': {0: 100, 4: 400}}) | ||
A B C | ||
0 100 5 a | ||
|
@@ -5053,45 +5076,87 @@ def bfill(self, axis=None, inplace=False, limit=None, downcast=None): | |
3 3 8 d | ||
4 400 9 e | ||
|
||
**Regular expression `to_replace`** | ||
|
||
>>> df = pd.DataFrame({'A': ['bat', 'foo', 'bait'], | ||
... 'B': ['abc', 'bar', 'xyz']}) | ||
>>> df.replace(to_replace=r'^ba.$', value='new', regex=True) | ||
A B | ||
0 new abc | ||
1 foo new | ||
2 bait xyz | ||
|
||
>>> df.replace({'A': r'^ba.$'}, {'A': 'new'}, regex=True) | ||
A B | ||
0 new abc | ||
1 foo bar | ||
2 bait xyz | ||
|
||
>>> df.replace(regex=r'^ba.$', value='new') | ||
A B | ||
0 new abc | ||
1 foo new | ||
2 bait xyz | ||
|
||
>>> df.replace(regex={r'^ba.$':'new', 'foo':'xyz'}) | ||
A B | ||
0 new abc | ||
1 xyz new | ||
2 bait xyz | ||
|
||
>>> df.replace(regex=[r'^ba.$', 'foo'], value='new') | ||
A B | ||
0 new abc | ||
1 new new | ||
2 bait xyz | ||
|
||
Note that when replacing multiple ``bool`` or ``datetime64`` objects, | ||
the data types in the ``to_replace`` parameter must match the data | ||
the data types in the `to_replace` parameter must match the data | ||
type of the value being replaced: | ||
|
||
>>> df = pd.DataFrame({'A': [True, False, True], | ||
... 'B': [False, True, False]}) | ||
>>> df.replace({'a string': 'new value', True: False}) # raises | ||
Traceback (most recent call last): | ||
... | ||
TypeError: Cannot compare types 'ndarray(dtype=bool)' and 'str' | ||
|
||
This raises a ``TypeError`` because one of the ``dict`` keys is not of | ||
the correct type for replacement. | ||
|
||
Compare the behavior of ``s.replace({'a': None})`` and | ||
``s.replace('a', None)`` to understand the pecularities | ||
of the `to_replace` parameter: | ||
|
||
>>> s = pd.Series([10, 'a', 'a', 'b', 'a']) | ||
|
||
When one uses a dict as the `to_replace` value, it is like the | ||
value(s) in the dict are equal to the `value` parameter. | ||
``s.replace({'a': None})`` is equivalent to | ||
``s.replace(to_replace={'a': None}, value=None, method=None)``: | ||
|
||
>>> s.replace({'a': None}) | ||
0 10 | ||
1 None | ||
2 None | ||
3 b | ||
4 None | ||
dtype: object | ||
|
||
When ``value=None`` and `to_replace` is a scalar, list or | ||
tuple, `replace` uses the method parameter (default 'pad') to do the | ||
replacement. So this is why the 'a' values are being replaced by 10 | ||
in rows 1 and 2 and 'b' in row 4 in this case. | ||
The command ``s.replace('a', None)`` is actually equivalent to | ||
``s.replace(to_replace='a', value=None, method='pad')``: | ||
|
||
>>> s.replace('a', None) | ||
0 10 | ||
1 10 | ||
2 10 | ||
3 b | ||
4 b | ||
dtype: object | ||
""") | ||
|
||
@Appender(_shared_docs['replace'] % _shared_doc_kwargs) | ||
|
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This isn't what you want to do - make sure you keep the
versionchanged
directive below themethod
argument as that's what was added in v0.23