Skip to content

Commit 4de2e9b

Browse files
math-and-dataTomAugspurger
authored andcommitted
DOC: update the pandas.DataFrame.replace docstring (#20271)
1 parent 8def649 commit 4de2e9b

File tree

1 file changed

+107
-41
lines changed

1 file changed

+107
-41
lines changed

pandas/core/generic.py

+107-41
Original file line numberDiff line numberDiff line change
@@ -5454,64 +5454,69 @@ def bfill(self, axis=None, inplace=False, limit=None, downcast=None):
54545454
limit=limit, downcast=downcast)
54555455

54565456
_shared_docs['replace'] = ("""
5457-
Replace values given in 'to_replace' with 'value'.
5457+
Replace values given in `to_replace` with `value`.
5458+
5459+
Values of the %(klass)s are replaced with other values dynamically.
5460+
This differs from updating with ``.loc`` or ``.iloc``, which require
5461+
you to specify a location to update with some value.
54585462
54595463
Parameters
54605464
----------
5461-
to_replace : str, regex, list, dict, Series, numeric, or None
5465+
to_replace : str, regex, list, dict, Series, int, float, or None
5466+
How to find the values that will be replaced.
54625467
54635468
* numeric, str or regex:
54645469
5465-
- numeric: numeric values equal to ``to_replace`` will be
5466-
replaced with ``value``
5467-
- str: string exactly matching ``to_replace`` will be replaced
5468-
with ``value``
5469-
- regex: regexs matching ``to_replace`` will be replaced with
5470-
``value``
5470+
- numeric: numeric values equal to `to_replace` will be
5471+
replaced with `value`
5472+
- str: string exactly matching `to_replace` will be replaced
5473+
with `value`
5474+
- regex: regexs matching `to_replace` will be replaced with
5475+
`value`
54715476
54725477
* list of str, regex, or numeric:
54735478
5474-
- First, if ``to_replace`` and ``value`` are both lists, they
5479+
- First, if `to_replace` and `value` are both lists, they
54755480
**must** be the same length.
54765481
- Second, if ``regex=True`` then all of the strings in **both**
54775482
lists will be interpreted as regexs otherwise they will match
5478-
directly. This doesn't matter much for ``value`` since there
5483+
directly. This doesn't matter much for `value` since there
54795484
are only a few possible substitution regexes you can use.
54805485
- str, regex and numeric rules apply as above.
54815486
54825487
* dict:
54835488
54845489
- Dicts can be used to specify different replacement values
54855490
for different existing values. For example,
5486-
{'a': 'b', 'y': 'z'} replaces the value 'a' with 'b' and
5487-
'y' with 'z'. To use a dict in this way the ``value``
5488-
parameter should be ``None``.
5491+
``{'a': 'b', 'y': 'z'}`` replaces the value 'a' with 'b' and
5492+
'y' with 'z'. To use a dict in this way the `value`
5493+
parameter should be `None`.
54895494
- For a DataFrame a dict can specify that different values
54905495
should be replaced in different columns. For example,
5491-
{'a': 1, 'b': 'z'} looks for the value 1 in column 'a' and
5492-
the value 'z' in column 'b' and replaces these values with
5493-
whatever is specified in ``value``. The ``value`` parameter
5496+
``{'a': 1, 'b': 'z'}`` looks for the value 1 in column 'a'
5497+
and the value 'z' in column 'b' and replaces these values
5498+
with whatever is specified in `value`. The `value` parameter
54945499
should not be ``None`` in this case. You can treat this as a
54955500
special case of passing two lists except that you are
54965501
specifying the column to search in.
54975502
- For a DataFrame nested dictionaries, e.g.,
5498-
{'a': {'b': np.nan}}, are read as follows: look in column 'a'
5499-
for the value 'b' and replace it with NaN. The ``value``
5503+
``{'a': {'b': np.nan}}``, are read as follows: look in column
5504+
'a' for the value 'b' and replace it with NaN. The `value`
55005505
parameter should be ``None`` to use a nested dict in this
55015506
way. You can nest regular expressions as well. Note that
55025507
column names (the top-level dictionary keys in a nested
55035508
dictionary) **cannot** be regular expressions.
55045509
55055510
* None:
55065511
5507-
- This means that the ``regex`` argument must be a string,
5508-
compiled regular expression, or list, dict, ndarray or Series
5509-
of such elements. If ``value`` is also ``None`` then this
5510-
**must** be a nested dictionary or ``Series``.
5512+
- This means that the `regex` argument must be a string,
5513+
compiled regular expression, or list, dict, ndarray or
5514+
Series of such elements. If `value` is also ``None`` then
5515+
this **must** be a nested dictionary or Series.
55115516
55125517
See the examples section for examples of each of these.
55135518
value : scalar, dict, list, str, regex, default None
5514-
Value to replace any values matching ``to_replace`` with.
5519+
Value to replace any values matching `to_replace` with.
55155520
For a DataFrame a dict of values can be used to specify which
55165521
value to use for each column (columns not in the dict will not be
55175522
filled). Regular expressions, strings and lists or dicts of such
@@ -5521,45 +5526,50 @@ def bfill(self, axis=None, inplace=False, limit=None, downcast=None):
55215526
other views on this object (e.g. a column from a DataFrame).
55225527
Returns the caller if this is True.
55235528
limit : int, default None
5524-
Maximum size gap to forward or backward fill
5525-
regex : bool or same types as ``to_replace``, default False
5526-
Whether to interpret ``to_replace`` and/or ``value`` as regular
5527-
expressions. If this is ``True`` then ``to_replace`` *must* be a
5529+
Maximum size gap to forward or backward fill.
5530+
regex : bool or same types as `to_replace`, default False
5531+
Whether to interpret `to_replace` and/or `value` as regular
5532+
expressions. If this is ``True`` then `to_replace` *must* be a
55285533
string. Alternatively, this could be a regular expression or a
55295534
list, dict, or array of regular expressions in which case
5530-
``to_replace`` must be ``None``.
5531-
method : string, optional, {'pad', 'ffill', 'bfill'}
5532-
The method to use when for replacement, when ``to_replace`` is a
5533-
scalar, list or tuple and ``value`` is None.
5535+
`to_replace` must be ``None``.
5536+
method : {'pad', 'ffill', 'bfill', `None`}
5537+
The method to use when for replacement, when `to_replace` is a
5538+
scalar, list or tuple and `value` is ``None``.
55345539
5535-
.. versionchanged:: 0.23.0
5536-
Added to DataFrame
5540+
.. versionchanged:: 0.23.0
5541+
Added to DataFrame.
5542+
axis : None
5543+
.. deprecated:: 0.13.0
5544+
Has no effect and will be removed.
55375545
55385546
See Also
55395547
--------
5540-
%(klass)s.fillna : Fill NA/NaN values
5548+
%(klass)s.fillna : Fill NA values
55415549
%(klass)s.where : Replace values based on boolean condition
5550+
Series.str.replace : Simple string replacement.
55425551
55435552
Returns
55445553
-------
5545-
filled : %(klass)s
5554+
%(klass)s
5555+
Object after replacement.
55465556
55475557
Raises
55485558
------
55495559
AssertionError
5550-
* If ``regex`` is not a ``bool`` and ``to_replace`` is not
5560+
* If `regex` is not a ``bool`` and `to_replace` is not
55515561
``None``.
55525562
TypeError
5553-
* If ``to_replace`` is a ``dict`` and ``value`` is not a ``list``,
5563+
* If `to_replace` is a ``dict`` and `value` is not a ``list``,
55545564
``dict``, ``ndarray``, or ``Series``
5555-
* If ``to_replace`` is ``None`` and ``regex`` is not compilable
5565+
* If `to_replace` is ``None`` and `regex` is not compilable
55565566
into a regular expression or is a list, dict, ndarray, or
55575567
Series.
55585568
* When replacing multiple ``bool`` or ``datetime64`` objects and
5559-
the arguments to ``to_replace`` does not match the type of the
5569+
the arguments to `to_replace` does not match the type of the
55605570
value being replaced
55615571
ValueError
5562-
* If a ``list`` or an ``ndarray`` is passed to ``to_replace`` and
5572+
* If a ``list`` or an ``ndarray`` is passed to `to_replace` and
55635573
`value` but they are not the same length.
55645574
55655575
Notes
@@ -5573,10 +5583,15 @@ def bfill(self, axis=None, inplace=False, limit=None, downcast=None):
55735583
numbers *are* strings, then you can do this.
55745584
* This method has *a lot* of options. You are encouraged to experiment
55755585
and play with this method to gain intuition about how it works.
5586+
* When dict is used as the `to_replace` value, it is like
5587+
key(s) in the dict are the to_replace part and
5588+
value(s) in the dict are the value parameter.
55765589
55775590
Examples
55785591
--------
55795592
5593+
**Scalar `to_replace` and `value`**
5594+
55805595
>>> s = pd.Series([0, 1, 2, 3, 4])
55815596
>>> s.replace(0, 5)
55825597
0 5
@@ -5585,6 +5600,7 @@ def bfill(self, axis=None, inplace=False, limit=None, downcast=None):
55855600
3 3
55865601
4 4
55875602
dtype: int64
5603+
55885604
>>> df = pd.DataFrame({'A': [0, 1, 2, 3, 4],
55895605
... 'B': [5, 6, 7, 8, 9],
55905606
... 'C': ['a', 'b', 'c', 'd', 'e']})
@@ -5596,20 +5612,24 @@ def bfill(self, axis=None, inplace=False, limit=None, downcast=None):
55965612
3 3 8 d
55975613
4 4 9 e
55985614
5615+
**List-like `to_replace`**
5616+
55995617
>>> df.replace([0, 1, 2, 3], 4)
56005618
A B C
56015619
0 4 5 a
56025620
1 4 6 b
56035621
2 4 7 c
56045622
3 4 8 d
56055623
4 4 9 e
5624+
56065625
>>> df.replace([0, 1, 2, 3], [4, 3, 2, 1])
56075626
A B C
56085627
0 4 5 a
56095628
1 3 6 b
56105629
2 2 7 c
56115630
3 1 8 d
56125631
4 4 9 e
5632+
56135633
>>> s.replace([1, 2], method='bfill')
56145634
0 0
56155635
1 3
@@ -5618,20 +5638,24 @@ def bfill(self, axis=None, inplace=False, limit=None, downcast=None):
56185638
4 4
56195639
dtype: int64
56205640
5641+
**dict-like `to_replace`**
5642+
56215643
>>> df.replace({0: 10, 1: 100})
56225644
A B C
56235645
0 10 5 a
56245646
1 100 6 b
56255647
2 2 7 c
56265648
3 3 8 d
56275649
4 4 9 e
5650+
56285651
>>> df.replace({'A': 0, 'B': 5}, 100)
56295652
A B C
56305653
0 100 100 a
56315654
1 1 6 b
56325655
2 2 7 c
56335656
3 3 8 d
56345657
4 4 9 e
5658+
56355659
>>> df.replace({'A': {0: 100, 4: 400}})
56365660
A B C
56375661
0 100 5 a
@@ -5640,45 +5664,87 @@ def bfill(self, axis=None, inplace=False, limit=None, downcast=None):
56405664
3 3 8 d
56415665
4 400 9 e
56425666
5667+
**Regular expression `to_replace`**
5668+
56435669
>>> df = pd.DataFrame({'A': ['bat', 'foo', 'bait'],
56445670
... 'B': ['abc', 'bar', 'xyz']})
56455671
>>> df.replace(to_replace=r'^ba.$', value='new', regex=True)
56465672
A B
56475673
0 new abc
56485674
1 foo new
56495675
2 bait xyz
5676+
56505677
>>> df.replace({'A': r'^ba.$'}, {'A': 'new'}, regex=True)
56515678
A B
56525679
0 new abc
56535680
1 foo bar
56545681
2 bait xyz
5682+
56555683
>>> df.replace(regex=r'^ba.$', value='new')
56565684
A B
56575685
0 new abc
56585686
1 foo new
56595687
2 bait xyz
5688+
56605689
>>> df.replace(regex={r'^ba.$':'new', 'foo':'xyz'})
56615690
A B
56625691
0 new abc
56635692
1 xyz new
56645693
2 bait xyz
5694+
56655695
>>> df.replace(regex=[r'^ba.$', 'foo'], value='new')
56665696
A B
56675697
0 new abc
56685698
1 new new
56695699
2 bait xyz
56705700
56715701
Note that when replacing multiple ``bool`` or ``datetime64`` objects,
5672-
the data types in the ``to_replace`` parameter must match the data
5702+
the data types in the `to_replace` parameter must match the data
56735703
type of the value being replaced:
56745704
56755705
>>> df = pd.DataFrame({'A': [True, False, True],
56765706
... 'B': [False, True, False]})
56775707
>>> df.replace({'a string': 'new value', True: False}) # raises
5708+
Traceback (most recent call last):
5709+
...
56785710
TypeError: Cannot compare types 'ndarray(dtype=bool)' and 'str'
56795711
56805712
This raises a ``TypeError`` because one of the ``dict`` keys is not of
56815713
the correct type for replacement.
5714+
5715+
Compare the behavior of ``s.replace({'a': None})`` and
5716+
``s.replace('a', None)`` to understand the pecularities
5717+
of the `to_replace` parameter:
5718+
5719+
>>> s = pd.Series([10, 'a', 'a', 'b', 'a'])
5720+
5721+
When one uses a dict as the `to_replace` value, it is like the
5722+
value(s) in the dict are equal to the `value` parameter.
5723+
``s.replace({'a': None})`` is equivalent to
5724+
``s.replace(to_replace={'a': None}, value=None, method=None)``:
5725+
5726+
>>> s.replace({'a': None})
5727+
0 10
5728+
1 None
5729+
2 None
5730+
3 b
5731+
4 None
5732+
dtype: object
5733+
5734+
When ``value=None`` and `to_replace` is a scalar, list or
5735+
tuple, `replace` uses the method parameter (default 'pad') to do the
5736+
replacement. So this is why the 'a' values are being replaced by 10
5737+
in rows 1 and 2 and 'b' in row 4 in this case.
5738+
The command ``s.replace('a', None)`` is actually equivalent to
5739+
``s.replace(to_replace='a', value=None, method='pad')``:
5740+
5741+
>>> s.replace('a', None)
5742+
0 10
5743+
1 10
5744+
2 10
5745+
3 b
5746+
4 b
5747+
dtype: object
56825748
""")
56835749

56845750
@Appender(_shared_docs['replace'] % _shared_doc_kwargs)

0 commit comments

Comments
 (0)