Skip to content

DOC: update the pandas.DataFrame.replace docstring #20271

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

147 changes: 106 additions & 41 deletions pandas/core/generic.py
Original file line number Diff line number Diff line change
Expand Up @@ -4867,64 +4867,69 @@ def bfill(self, axis=None, inplace=False, limit=None, downcast=None):
limit=limit, downcast=downcast)

_shared_docs['replace'] = ("""
Replace values given in 'to_replace' with 'value'.
Replace values given in `to_replace` with `value`.

Values of the %(klass)s are replaced with other values dynamically.
This differs from updating with ``.loc`` or ``.iloc``, which require
you to specify a location to update with some value.

Parameters
----------
to_replace : str, regex, list, dict, Series, numeric, or None
to_replace : str, regex, list, dict, Series, int, float, or None
How to find the values that will be replaced.

* numeric, str or regex:

- numeric: numeric values equal to ``to_replace`` will be
replaced with ``value``
- str: string exactly matching ``to_replace`` will be replaced
with ``value``
- regex: regexs matching ``to_replace`` will be replaced with
``value``
- numeric: numeric values equal to `to_replace` will be
replaced with `value`
- str: string exactly matching `to_replace` will be replaced
with `value`
- regex: regexs matching `to_replace` will be replaced with
`value`

* list of str, regex, or numeric:

- First, if ``to_replace`` and ``value`` are both lists, they
- First, if `to_replace` and `value` are both lists, they
**must** be the same length.
- Second, if ``regex=True`` then all of the strings in **both**
lists will be interpreted as regexs otherwise they will match
directly. This doesn't matter much for ``value`` since there
directly. This doesn't matter much for `value` since there
are only a few possible substitution regexes you can use.
- str, regex and numeric rules apply as above.

* dict:

- Dicts can be used to specify different replacement values
for different existing values. For example,
{'a': 'b', 'y': 'z'} replaces the value 'a' with 'b' and
'y' with 'z'. To use a dict in this way the ``value``
parameter should be ``None``.
``{'a': 'b', 'y': 'z'}`` replaces the value 'a' with 'b' and
'y' with 'z'. To use a dict in this way the `value`
parameter should be `None`.
- For a DataFrame a dict can specify that different values
should be replaced in different columns. For example,
{'a': 1, 'b': 'z'} looks for the value 1 in column 'a' and
the value 'z' in column 'b' and replaces these values with
whatever is specified in ``value``. The ``value`` parameter
``{'a': 1, 'b': 'z'}`` looks for the value 1 in column 'a'
and the value 'z' in column 'b' and replaces these values
with whatever is specified in `value`. The `value` parameter
should not be ``None`` in this case. You can treat this as a
special case of passing two lists except that you are
specifying the column to search in.
- For a DataFrame nested dictionaries, e.g.,
{'a': {'b': np.nan}}, are read as follows: look in column 'a'
for the value 'b' and replace it with NaN. The ``value``
``{'a': {'b': np.nan}}``, are read as follows: look in column
'a' for the value 'b' and replace it with NaN. The `value`
parameter should be ``None`` to use a nested dict in this
way. You can nest regular expressions as well. Note that
column names (the top-level dictionary keys in a nested
dictionary) **cannot** be regular expressions.

* None:

- This means that the ``regex`` argument must be a string,
compiled regular expression, or list, dict, ndarray or Series
of such elements. If ``value`` is also ``None`` then this
**must** be a nested dictionary or ``Series``.
- This means that the `regex` argument must be a string,
compiled regular expression, or list, dict, ndarray or
Series of such elements. If `value` is also ``None`` then
this **must** be a nested dictionary or Series.

See the examples section for examples of each of these.
value : scalar, dict, list, str, regex, default None
Value to replace any values matching ``to_replace`` with.
Value to replace any values matching `to_replace` with.
For a DataFrame a dict of values can be used to specify which
value to use for each column (columns not in the dict will not be
filled). Regular expressions, strings and lists or dicts of such
Expand All @@ -4934,45 +4939,49 @@ def bfill(self, axis=None, inplace=False, limit=None, downcast=None):
other views on this object (e.g. a column from a DataFrame).
Returns the caller if this is True.
limit : int, default None
Maximum size gap to forward or backward fill
regex : bool or same types as ``to_replace``, default False
Whether to interpret ``to_replace`` and/or ``value`` as regular
expressions. If this is ``True`` then ``to_replace`` *must* be a
Maximum size gap to forward or backward fill.
regex : bool or same types as `to_replace`, default False
Whether to interpret `to_replace` and/or `value` as regular
expressions. If this is ``True`` then `to_replace` *must* be a
string. Alternatively, this could be a regular expression or a
list, dict, or array of regular expressions in which case
``to_replace`` must be ``None``.
method : string, optional, {'pad', 'ffill', 'bfill'}
The method to use when for replacement, when ``to_replace`` is a
scalar, list or tuple and ``value`` is None.
`to_replace` must be ``None``.
method : {'pad', 'ffill', 'bfill', `None`}
The method to use when for replacement, when `to_replace` is a
scalar, list or tuple and `value` is ``None``.

.. versionchanged:: 0.23.0
Added to DataFrame
.. versionchanged:: 0.23.0
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This isn't what you want to do - make sure you keep the versionchanged directive below the method argument as that's what was added in v0.23

Added to DataFrame.
axis : None
Deprecated.

See Also
--------
%(klass)s.fillna : Fill NA/NaN values
%(klass)s.fillna : Fill `NaN` values
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This would be better as Fill NA values since it is talking about the concept of missing data and not necessarily the NaN value itself

%(klass)s.where : Replace values based on boolean condition
Series.str.replace : Simple string replacement.

Returns
-------
filled : %(klass)s
%(klass)s
Object after replacement.

Raises
------
AssertionError
* If ``regex`` is not a ``bool`` and ``to_replace`` is not
* If `regex` is not a ``bool`` and `to_replace` is not
``None``.
TypeError
* If ``to_replace`` is a ``dict`` and ``value`` is not a ``list``,
* If `to_replace` is a ``dict`` and `value` is not a ``list``,
``dict``, ``ndarray``, or ``Series``
* If ``to_replace`` is ``None`` and ``regex`` is not compilable
* If `to_replace` is ``None`` and `regex` is not compilable
into a regular expression or is a list, dict, ndarray, or
Series.
* When replacing multiple ``bool`` or ``datetime64`` objects and
the arguments to ``to_replace`` does not match the type of the
the arguments to `to_replace` does not match the type of the
value being replaced
ValueError
* If a ``list`` or an ``ndarray`` is passed to ``to_replace`` and
* If a ``list`` or an ``ndarray`` is passed to `to_replace` and
`value` but they are not the same length.

Notes
Expand All @@ -4986,10 +4995,15 @@ def bfill(self, axis=None, inplace=False, limit=None, downcast=None):
numbers *are* strings, then you can do this.
* This method has *a lot* of options. You are encouraged to experiment
and play with this method to gain intuition about how it works.
* When dict is used as the `to_replace` value, it is like
key(s) in the dict are the to_replace part and
value(s) in the dict are the value parameter.

Examples
--------

**Scalar `to_replace` and `value`**

>>> s = pd.Series([0, 1, 2, 3, 4])
>>> s.replace(0, 5)
0 5
Expand All @@ -4998,6 +5012,7 @@ def bfill(self, axis=None, inplace=False, limit=None, downcast=None):
3 3
4 4
dtype: int64

>>> df = pd.DataFrame({'A': [0, 1, 2, 3, 4],
... 'B': [5, 6, 7, 8, 9],
... 'C': ['a', 'b', 'c', 'd', 'e']})
Expand All @@ -5009,20 +5024,24 @@ def bfill(self, axis=None, inplace=False, limit=None, downcast=None):
3 3 8 d
4 4 9 e

**List-like `to_replace`**

>>> df.replace([0, 1, 2, 3], 4)
A B C
0 4 5 a
1 4 6 b
2 4 7 c
3 4 8 d
4 4 9 e

>>> df.replace([0, 1, 2, 3], [4, 3, 2, 1])
A B C
0 4 5 a
1 3 6 b
2 2 7 c
3 1 8 d
4 4 9 e

>>> s.replace([1, 2], method='bfill')
0 0
1 3
Expand All @@ -5031,20 +5050,24 @@ def bfill(self, axis=None, inplace=False, limit=None, downcast=None):
4 4
dtype: int64

**dict-like `to_replace`**

>>> df.replace({0: 10, 1: 100})
A B C
0 10 5 a
1 100 6 b
2 2 7 c
3 3 8 d
4 4 9 e

>>> df.replace({'A': 0, 'B': 5}, 100)
A B C
0 100 100 a
1 1 6 b
2 2 7 c
3 3 8 d
4 4 9 e

>>> df.replace({'A': {0: 100, 4: 400}})
A B C
0 100 5 a
Expand All @@ -5053,45 +5076,87 @@ def bfill(self, axis=None, inplace=False, limit=None, downcast=None):
3 3 8 d
4 400 9 e

**Regular expression `to_replace`**

>>> df = pd.DataFrame({'A': ['bat', 'foo', 'bait'],
... 'B': ['abc', 'bar', 'xyz']})
>>> df.replace(to_replace=r'^ba.$', value='new', regex=True)
A B
0 new abc
1 foo new
2 bait xyz

>>> df.replace({'A': r'^ba.$'}, {'A': 'new'}, regex=True)
A B
0 new abc
1 foo bar
2 bait xyz

>>> df.replace(regex=r'^ba.$', value='new')
A B
0 new abc
1 foo new
2 bait xyz

>>> df.replace(regex={r'^ba.$':'new', 'foo':'xyz'})
A B
0 new abc
1 xyz new
2 bait xyz

>>> df.replace(regex=[r'^ba.$', 'foo'], value='new')
A B
0 new abc
1 new new
2 bait xyz

Note that when replacing multiple ``bool`` or ``datetime64`` objects,
the data types in the ``to_replace`` parameter must match the data
the data types in the `to_replace` parameter must match the data
type of the value being replaced:

>>> df = pd.DataFrame({'A': [True, False, True],
... 'B': [False, True, False]})
>>> df.replace({'a string': 'new value', True: False}) # raises
Traceback (most recent call last):
...
TypeError: Cannot compare types 'ndarray(dtype=bool)' and 'str'

This raises a ``TypeError`` because one of the ``dict`` keys is not of
the correct type for replacement.

Compare the behavior of ``s.replace({'a': None})`` and
``s.replace('a', None)`` to understand the pecularities
of the `to_replace` parameter:

>>> s = pd.Series([10, 'a', 'a', 'b', 'a'])

When one uses a dict as the `to_replace` value, it is like the
value(s) in the dict are equal to the `value` parameter.
``s.replace({'a': None})`` is equivalent to
``s.replace(to_replace={'a': None}, value=None, method=None)``:

>>> s.replace({'a': None})
0 10
1 None
2 None
3 b
4 None
dtype: object

When ``value=None`` and `to_replace` is a scalar, list or
tuple, `replace` uses the method parameter (default 'pad') to do the
replacement. So this is why the 'a' values are being replaced by 10
in rows 1 and 2 and 'b' in row 4 in this case.
The command ``s.replace('a', None)`` is actually equivalent to
``s.replace(to_replace='a', value=None, method='pad')``:

>>> s.replace('a', None)
0 10
1 10
2 10
3 b
4 b
dtype: object
""")

@Appender(_shared_docs['replace'] % _shared_doc_kwargs)
Expand Down