DOC: update the pandas.DataFrame.replace docstring #20271

math-and-data · 2018-03-11T03:15:21Z

PR title is "DOC: update the docstring"
The validation script passes: scripts/validate_docstrings.py <your-function-or-method>
The PEP8 style check passes: git diff upstream/master -u -- "*.py" | flake8 --diff
The html version looks good: python doc/make.py --single <your-function-or-method>
It has been proofread on language by another sprint participant

Note: Just did a minor improvement, not a full change!

Still a few verification errors:

Errors in parameters section
- Parameter "to_replace" description should start with capital letter
- Parameter "axis" description should finish with "."
Examples do not pass tests

################################################################################
##################### Docstring (pandas.DataFrame.replace) #####################
################################################################################

Replace values given in 'to_replace' with 'value'.

Values of the DataFrame or a Series are being replaced with
other values. One or several values can be replaced with one
or several values.

Parameters
----------
to_replace : str, regex, list, dict, Series, numeric, or None

    * numeric, str or regex:

        - numeric: numeric values equal to ``to_replace`` will be
          replaced with ``value``
        - str: string exactly matching ``to_replace`` will be replaced
          with ``value``
        - regex: regexs matching ``to_replace`` will be replaced with
          ``value``

    * list of str, regex, or numeric:

        - First, if ``to_replace`` and ``value`` are both lists, they
          **must** be the same length.
        - Second, if ``regex=True`` then all of the strings in **both**
          lists will be interpreted as regexs otherwise they will match
          directly. This doesn't matter much for ``value`` since there
          are only a few possible substitution regexes you can use.
        - str, regex and numeric rules apply as above.

    * dict:

        - Dicts can be used to specify different replacement values
          for different existing values. For example,
          {'a': 'b', 'y': 'z'} replaces the value 'a' with 'b' and
          'y' with 'z'. To use a dict in this way the ``value``
          parameter should be ``None``.
        - For a DataFrame a dict can specify that different values
          should be replaced in different columns. For example,
          {'a': 1, 'b': 'z'} looks for the value 1 in column 'a' and
          the value 'z' in column 'b' and replaces these values with
          whatever is specified in ``value``. The ``value`` parameter
          should not be ``None`` in this case. You can treat this as a
          special case of passing two lists except that you are
          specifying the column to search in.
        - For a DataFrame nested dictionaries, e.g.,
          {'a': {'b': np.nan}}, are read as follows: look in column 'a'
          for the value 'b' and replace it with NaN. The ``value``
          parameter should be ``None`` to use a nested dict in this
          way. You can nest regular expressions as well. Note that
          column names (the top-level dictionary keys in a nested
          dictionary) **cannot** be regular expressions.

    * None:

        - This means that the ``regex`` argument must be a string,
          compiled regular expression, or list, dict, ndarray or Series
          of such elements. If ``value`` is also ``None`` then this
          **must** be a nested dictionary or ``Series``.

    See the examples section for examples of each of these.
value : scalar, dict, list, str, regex, default None
    Value to replace any values matching ``to_replace`` with.
    For a DataFrame a dict of values can be used to specify which
    value to use for each column (columns not in the dict will not be
    filled). Regular expressions, strings and lists or dicts of such
    objects are also allowed.
inplace : boolean, default False
    If True, in place. Note: this will modify any
    other views on this object (e.g. a column from a DataFrame).
    Returns the caller if this is True.
limit : int, default None
    Maximum size gap to forward or backward fill.
regex : bool or same types as ``to_replace``, default False
    Whether to interpret ``to_replace`` and/or ``value`` as regular
    expressions. If this is ``True`` then ``to_replace`` *must* be a
    string. Alternatively, this could be a regular expression or a
    list, dict, or array of regular expressions in which case
    ``to_replace`` must be ``None``.
method : string, optional, {'pad', 'ffill', 'bfill'}, default is 'pad'
    The method to use when for replacement, when ``to_replace`` is a
    scalar, list or tuple and ``value`` is None.
axis : None
    Deprecated.

    .. versionchanged:: 0.23.0
        Added to DataFrame

See Also
--------
DataFrame.fillna : Fill NA/NaN values
DataFrame.where : Replace values based on boolean condition

Returns
-------
DataFrame
    Some values have been substituted for new values.

Raises
------
AssertionError
    * If ``regex`` is not a ``bool`` and ``to_replace`` is not
      ``None``.
TypeError
    * If ``to_replace`` is a ``dict`` and ``value`` is not a ``list``,
      ``dict``, ``ndarray``, or ``Series``
    * If ``to_replace`` is ``None`` and ``regex`` is not compilable
      into a regular expression or is a list, dict, ndarray, or
      Series.
    * When replacing multiple ``bool`` or ``datetime64`` objects and
      the arguments to ``to_replace`` does not match the type of the
      value being replaced
ValueError
    * If a ``list`` or an ``ndarray`` is passed to ``to_replace`` and
      `value` but they are not the same length.

Notes
-----
* Regex substitution is performed under the hood with ``re.sub``. The
  rules for substitution for ``re.sub`` are the same.
* Regular expressions will only substitute on strings, meaning you
  cannot provide, for example, a regular expression matching floating
  point numbers and expect the columns in your frame that have a
  numeric dtype to be matched. However, if those floating point
  numbers *are* strings, then you can do this.
* This method has *a lot* of options. You are encouraged to experiment
  and play with this method to gain intuition about how it works.

Examples
--------

>>> s = pd.Series([0, 1, 2, 3, 4])
>>> s.replace(0, 5)
0    5
1    1
2    2
3    3
4    4
dtype: int64
>>> df = pd.DataFrame({'A': [0, 1, 2, 3, 4],
...                    'B': [5, 6, 7, 8, 9],
...                    'C': ['a', 'b', 'c', 'd', 'e']})
>>> df.replace(0, 5)
   A  B  C
0  5  5  a
1  1  6  b
2  2  7  c
3  3  8  d
4  4  9  e

>>> df.replace([0, 1, 2, 3], 4)
   A  B  C
0  4  5  a
1  4  6  b
2  4  7  c
3  4  8  d
4  4  9  e
>>> df.replace([0, 1, 2, 3], [4, 3, 2, 1])
   A  B  C
0  4  5  a
1  3  6  b
2  2  7  c
3  1  8  d
4  4  9  e
>>> s.replace([1, 2], method='bfill')
0    0
1    3
2    3
3    3
4    4
dtype: int64

>>> df.replace({0: 10, 1: 100})
     A  B  C
0   10  5  a
1  100  6  b
2    2  7  c
3    3  8  d
4    4  9  e
>>> df.replace({'A': 0, 'B': 5}, 100)
     A    B  C
0  100  100  a
1    1    6  b
2    2    7  c
3    3    8  d
4    4    9  e
>>> df.replace({'A': {0: 100, 4: 400}})
     A  B  C
0  100  5  a
1    1  6  b
2    2  7  c
3    3  8  d
4  400  9  e

>>> df = pd.DataFrame({'A': ['bat', 'foo', 'bait'],
...                    'B': ['abc', 'bar', 'xyz']})
>>> df.replace(to_replace=r'^ba.$', value='new', regex=True)
      A    B
0   new  abc
1   foo  new
2  bait  xyz
>>> df.replace({'A': r'^ba.$'}, {'A': 'new'}, regex=True)
      A    B
0   new  abc
1   foo  bar
2  bait  xyz
>>> df.replace(regex=r'^ba.$', value='new')
      A    B
0   new  abc
1   foo  new
2  bait  xyz
>>> df.replace(regex={r'^ba.$':'new', 'foo':'xyz'})
      A    B
0   new  abc
1   xyz  new
2  bait  xyz
>>> df.replace(regex=[r'^ba.$', 'foo'], value='new')
      A    B
0   new  abc
1   new  new
2  bait  xyz

Note that when replacing multiple ``bool`` or ``datetime64`` objects,
the data types in the ``to_replace`` parameter must match the data
type of the value being replaced:

>>> df = pd.DataFrame({'A': [True, False, True],
...                    'B': [False, True, False]})
>>> df.replace({'a string': 'new value', True: False})  # raises
TypeError: Cannot compare types 'ndarray(dtype=bool)' and 'str'

This raises a ``TypeError`` because one of the ``dict`` keys is not of
the correct type for replacement.

Compare the behavior of
``s.replace('a', None)`` and ``s.replace({'a': None})`` to understand
the pecularities of the ``to_replace`` parameter.
``s.replace('a', None)`` is actually equivalent to
``s.replace(to_replace='a', value=None, method='pad')``,
because when ``value=None`` and ``to_replace`` is a scalar, list or
tuple, ``replace`` uses the method parameter to do the replacement.
So this is why the 'a' values are being replaced by 30 in rows 3 and 4
and 'b' in row 6 in this case. However, this behaviour does not occur
when you use a dict as the ``to_replace`` value. In this case, it is
like the value(s) in the dict are equal to the value parameter.

>>> s = pd.Series([10, 20, 30, 'a', 'a', 'b', 'a'])
>>> print(s)
0    10
1    20
2    30
3     a
4     a
5     b
6     a
dtype: object
>>> print(s.replace('a', None))
0    10
1    20
2    30
3    30
4    30
5     b
6     b
dtype: object
>>> print(s.replace({'a': None}))
0      10
1      20
2      30
3    None
4    None
5       b
6    None
dtype: object

################################################################################
################################## Validation ##################################
################################################################################

Errors found:
        Errors in parameters section
                Parameter "to_replace" description should start with capital letter
                Parameter "axis" description should finish with "."
        Examples do not pass tests

################################################################################
################################### Doctests ###################################
################################################################################

**********************************************************************
Line 229, in pandas.DataFrame.replace
Failed example:
    df.replace({'a string': 'new value', True: False})  # raises
Exception raised:
    Traceback (most recent call last):
      File "C:\Users\thisi\AppData\Local\conda\conda\envs\pandas_dev\lib\doctest.py", line 1330, in __run
        compileflags, 1), test.globs)
      File "<doctest pandas.DataFrame.replace[17]>", line 1, in <module>
        df.replace({'a string': 'new value', True: False})  # raises
      File "C:\Users\thisi\Documents\GitHub\pandas\pandas\core\frame.py", line 3136, in replace
        method=method, axis=axis)
      File "C:\Users\thisi\Documents\GitHub\pandas\pandas\core\generic.py", line 5208, in replace
        limit=limit, regex=regex)
      File "C:\Users\thisi\Documents\GitHub\pandas\pandas\core\frame.py", line 3136, in replace
        method=method, axis=axis)
      File "C:\Users\thisi\Documents\GitHub\pandas\pandas\core\generic.py", line 5257, in replace
        regex=regex)
      File "C:\Users\thisi\Documents\GitHub\pandas\pandas\core\internals.py", line 3696, in replace_list
        masks = [comp(s) for i, s in enumerate(src_list)]
      File "C:\Users\thisi\Documents\GitHub\pandas\pandas\core\internals.py", line 3696, in <listcomp>
        masks = [comp(s) for i, s in enumerate(src_list)]
      File "C:\Users\thisi\Documents\GitHub\pandas\pandas\core\internals.py", line 3694, in comp
        return _maybe_compare(values, getattr(s, 'asm8', s), operator.eq)
      File "C:\Users\thisi\Documents\GitHub\pandas\pandas\core\internals.py", line 5122, in _maybe_compare
        b=type_names[1]))
    TypeError: Cannot compare types 'ndarray(dtype=bool)' and 'str'

WillAyd · 2018-03-11T04:02:36Z

pandas/core/generic.py


-        .. versionchanged:: 0.23.0
-           Added to DataFrame
+            .. versionchanged:: 0.23.0


This isn't what you want to do - make sure you keep the versionchanged directive below the method argument as that's what was added in v0.23

WillAyd · 2018-03-11T04:06:52Z

pandas/core/generic.py

        regex : bool or same types as ``to_replace``, default False
            Whether to interpret ``to_replace`` and/or ``value`` as regular
            expressions. If this is ``True`` then ``to_replace`` *must* be a
            string. Alternatively, this could be a regular expression or a
            list, dict, or array of regular expressions in which case
            ``to_replace`` must be ``None``.
-        method : string, optional, {'pad', 'ffill', 'bfill'}
+        method : string, optional, {'pad', 'ffill', 'bfill'}, default is 'pad'


method : {'pad', 'ffill', 'bfill', `None`}

WillAyd · 2018-03-11T04:08:21Z

pandas/core/generic.py

            The method to use when for replacement, when ``to_replace`` is a
            scalar, list or tuple and ``value`` is None.
+        axis : None
+            Deprecated.


Warning says this will be removed in v0.13? Woof...I guess OK to document for this change but should have a follow up change to actually go ahead and remove - care to take a stab at that?

@WillAyd where is this warning?

Just a couple of lines into the function definition

https://github.com/math-and-data/pandas/blob/44b7de3a5b7541bb2f4fee8a0f4e8f527b4c1b29/pandas/core/generic.py#L5152

I'm happy to take a stab at this - always nice when I can remove code too

@math-and-data awesome thanks! Can you open a separate issue for this?

@WillAyd I was waiting for this PR to be approved, then I would open a new request where I change the relevant code (remove the 'axis' reference) and edit the documentation accordingly. Is there anything else I had missed in this PR (other than the suggestion of breaking out the DataFrame and Series examples)?

WillAyd · 2018-03-11T04:11:02Z

pandas/core/generic.py

@@ -4869,6 +4869,10 @@ def bfill(self, axis=None, inplace=False, limit=None, downcast=None):
    _shared_docs['replace'] = ("""
        Replace values given in 'to_replace' with 'value'.

+        Values of the DataFrame or a Series are being replaced with


Not sure this extended description is adding much. Better served to make mention of how this can replace values with a dynamic set of inputs like dicts

done, thank you for the suggestion

WillAyd · 2018-03-11T04:12:40Z

pandas/core/generic.py

+        Values of the DataFrame or a Series are being replaced with
+        other values. One or several values can be replaced with one
+        or several values.
+
        Parameters
        ----------
        to_replace : str, regex, list, dict, Series, numeric, or None


Say int, float instead of numeric (if float is even valid?)

WillAyd · 2018-03-11T04:21:06Z

pandas/core/generic.py

+        the pecularities of the ``to_replace`` parameter.
+        ``s.replace('a', None)`` is actually equivalent to
+        ``s.replace(to_replace='a', value=None, method='pad')``,
+        because when ``value=None`` and ``to_replace`` is a scalar, list or


This is interesting as I was not aware of this behavior. Certainly great to have it documented, though I would move the majority of the writing into the Notes section and shorten the blurb introducing the comparison here.

WillAyd · 2018-03-11T04:21:37Z

pandas/core/generic.py

+        ``s.replace(to_replace='a', value=None, method='pad')``,
+        because when ``value=None`` and ``to_replace`` is a scalar, list or
+        tuple, ``replace`` uses the method parameter to do the replacement.
+        So this is why the 'a' values are being replaced by 30 in rows 3 and 4


Maybe just reinforce that it's the fill behavior that is really replacing values here

WillAyd · 2018-03-11T04:22:29Z

pandas/core/generic.py

+        like the value(s) in the dict are equal to the value parameter.
+
+        >>> s = pd.Series([10, 20, 30, 'a', 'a', 'b', 'a'])
+        >>> print(s)


This Series is simple enough where you don't need to explicitly print it - the constructor shows you everything of interest

I personally have found the visual of inspecting the changes before/after easier for such replacements (both in vertical positions). You have more experience and I'll rely on your suggestion and make the change.

WillAyd · 2018-03-11T04:23:09Z

pandas/core/generic.py

+        5     b
+        6     b
+        dtype: object
+        >>> print(s.replace({'a': None}))


I would put this example first as it is (from my perspective) the behavior most would expect. Having it first makes it a better segue into the nuance that you want to describe with the other example

WillAyd · 2018-03-11T04:24:07Z

pandas/core/generic.py

+        when you use a dict as the ``to_replace`` value. In this case, it is
+        like the value(s) in the dict are equal to the value parameter.
+
+        >>> s = pd.Series([10, 20, 30, 'a', 'a', 'b', 'a'])


Just to keep things concise why don't you get rid of 10 and 20 in this example? They don't serve any real purpose but make the documentation longer. Can also replace 30 with 1

great suggestion of simplifying.

jreback · 2018-03-11T14:09:23Z

pandas/core/generic.py

            The method to use when for replacement, when ``to_replace`` is a
            scalar, list or tuple and ``value`` is None.
+        axis : None
+            Deprecated.


@WillAyd where is this warning?

jreback · 2018-03-11T14:10:32Z

pandas/core/generic.py

+        5     b
+        6     a
+        dtype: object
+        >>> print(s.replace('a', None))


you don't need the prints, use a blank line between cases. Having an expl for each case is also nice.

math-and-data · 2018-03-14T22:44:16Z

Docstring validation not passing

################################################################################
##################### Docstring (pandas.DataFrame.replace) #####################
################################################################################

Replace values given in 'to_replace' with 'value'.

Values of the DataFrame or a Series are being replaced with
other values in a dynamic way. Instead of replacing values in a
specific cell (row/column combination), this method allows for more
flexibility with replacements. For instance, values can be replaced
by specifying lists of values and replacements separately or
with a dynamic set of inputs like dicts.

Parameters
----------
to_replace : str, regex, list, dict, Series, int, float, or None
    * numeric, str or regex:

        - numeric: numeric values equal to ``to_replace`` will be
          replaced with ``value``
        - str: string exactly matching ``to_replace`` will be replaced
          with ``value``
        - regex: regexs matching ``to_replace`` will be replaced with
          ``value``

    * list of str, regex, or numeric:

        - First, if ``to_replace`` and ``value`` are both lists, they
          **must** be the same length.
        - Second, if ``regex=True`` then all of the strings in **both**
          lists will be interpreted as regexs otherwise they will match
          directly. This doesn't matter much for ``value`` since there
          are only a few possible substitution regexes you can use.
        - str, regex and numeric rules apply as above.

    * dict:

        - Dicts can be used to specify different replacement values
          for different existing values. For example,
          {'a': 'b', 'y': 'z'} replaces the value 'a' with 'b' and
          'y' with 'z'. To use a dict in this way the ``value``
          parameter should be ``None``.
        - For a DataFrame a dict can specify that different values
          should be replaced in different columns. For example,
          {'a': 1, 'b': 'z'} looks for the value 1 in column 'a' and
          the value 'z' in column 'b' and replaces these values with
          whatever is specified in ``value``. The ``value`` parameter
          should not be ``None`` in this case. You can treat this as a
          special case of passing two lists except that you are
          specifying the column to search in.
        - For a DataFrame nested dictionaries, e.g.,
          {'a': {'b': np.nan}}, are read as follows: look in column
          'a' for the value 'b' and replace it with NaN. The ``value``
          parameter should be ``None`` to use a nested dict in this
          way. You can nest regular expressions as well. Note that
          column names (the top-level dictionary keys in a nested
          dictionary) **cannot** be regular expressions.

    * None:

        - This means that the ``regex`` argument must be a string,
          compiled regular expression, or list, dict, ndarray or
          Series of such elements. If ``value`` is also ``None`` then
          this **must** be a nested dictionary or ``Series``.

    See the examples section for examples of each of these.
value : scalar, dict, list, str, regex, default None
    Value to replace any values matching ``to_replace`` with.
    For a DataFrame a dict of values can be used to specify which
    value to use for each column (columns not in the dict will not be
    filled). Regular expressions, strings and lists or dicts of such
    objects are also allowed.
inplace : boolean, default False
    If True, in place. Note: this will modify any
    other views on this object (e.g. a column from a DataFrame).
    Returns the caller if this is True.
limit : int, default None
    Maximum size gap to forward or backward fill.
regex : bool or same types as ``to_replace``, default False
    Whether to interpret ``to_replace`` and/or ``value`` as regular
    expressions. If this is ``True`` then ``to_replace`` *must* be a
    string. Alternatively, this could be a regular expression or a
    list, dict, or array of regular expressions in which case
    ``to_replace`` must be ``None``.
method : {'pad', 'ffill', 'bfill', `None`}
    The method to use when for replacement, when ``to_replace`` is a
    scalar, list or tuple and ``value`` is `None`.
    .. versionchanged:: 0.23.0
        Added to DataFrame.
axis : None
    Deprecated.

See Also
--------
DataFrame.fillna : Fill `NaN` values
DataFrame.where : Replace values based on boolean condition

Returns
-------
DataFrame
    Object after replacement.

Raises
------
AssertionError
    * If ``regex`` is not a ``bool`` and ``to_replace`` is not
      ``None``.
TypeError
    * If ``to_replace`` is a ``dict`` and ``value`` is not a ``list``,
      ``dict``, ``ndarray``, or ``Series``
    * If ``to_replace`` is ``None`` and ``regex`` is not compilable
      into a regular expression or is a list, dict, ndarray, or
      Series.
    * When replacing multiple ``bool`` or ``datetime64`` objects and
      the arguments to ``to_replace`` does not match the type of the
      value being replaced
ValueError
    * If a ``list`` or an ``ndarray`` is passed to ``to_replace`` and
      `value` but they are not the same length.

Notes
-----
* Regex substitution is performed under the hood with ``re.sub``. The
  rules for substitution for ``re.sub`` are the same.
* Regular expressions will only substitute on strings, meaning you
  cannot provide, for example, a regular expression matching floating
  point numbers and expect the columns in your frame that have a
  numeric dtype to be matched. However, if those floating point
  numbers *are* strings, then you can do this.
* This method has *a lot* of options. You are encouraged to experiment
  and play with this method to gain intuition about how it works.
* When dict is used as the ``to_replace`` value, it is like
  key(s) in the dict are the to_replace part and
  value(s) in the dict are the value parameter.

Examples
--------

>>> s = pd.Series([0, 1, 2, 3, 4])
>>> s.replace(0, 5)
0    5
1    1
2    2
3    3
4    4
dtype: int64
>>> df = pd.DataFrame({'A': [0, 1, 2, 3, 4],
...                    'B': [5, 6, 7, 8, 9],
...                    'C': ['a', 'b', 'c', 'd', 'e']})
>>> df.replace(0, 5)
   A  B  C
0  5  5  a
1  1  6  b
2  2  7  c
3  3  8  d
4  4  9  e

>>> df.replace([0, 1, 2, 3], 4)
   A  B  C
0  4  5  a
1  4  6  b
2  4  7  c
3  4  8  d
4  4  9  e
>>> df.replace([0, 1, 2, 3], [4, 3, 2, 1])
   A  B  C
0  4  5  a
1  3  6  b
2  2  7  c
3  1  8  d
4  4  9  e
>>> s.replace([1, 2], method='bfill')
0    0
1    3
2    3
3    3
4    4
dtype: int64

>>> df.replace({0: 10, 1: 100})
     A  B  C
0   10  5  a
1  100  6  b
2    2  7  c
3    3  8  d
4    4  9  e
>>> df.replace({'A': 0, 'B': 5}, 100)
     A    B  C
0  100  100  a
1    1    6  b
2    2    7  c
3    3    8  d
4    4    9  e
>>> df.replace({'A': {0: 100, 4: 400}})
     A  B  C
0  100  5  a
1    1  6  b
2    2  7  c
3    3  8  d
4  400  9  e

>>> df = pd.DataFrame({'A': ['bat', 'foo', 'bait'],
...                    'B': ['abc', 'bar', 'xyz']})
>>> df.replace(to_replace=r'^ba.$', value='new', regex=True)
      A    B
0   new  abc
1   foo  new
2  bait  xyz
>>> df.replace({'A': r'^ba.$'}, {'A': 'new'}, regex=True)
      A    B
0   new  abc
1   foo  bar
2  bait  xyz
>>> df.replace(regex=r'^ba.$', value='new')
      A    B
0   new  abc
1   foo  new
2  bait  xyz
>>> df.replace(regex={r'^ba.$':'new', 'foo':'xyz'})
      A    B
0   new  abc
1   xyz  new
2  bait  xyz
>>> df.replace(regex=[r'^ba.$', 'foo'], value='new')
      A    B
0   new  abc
1   new  new
2  bait  xyz

Note that when replacing multiple ``bool`` or ``datetime64`` objects,
the data types in the ``to_replace`` parameter must match the data
type of the value being replaced:

>>> df = pd.DataFrame({'A': [True, False, True],
...                    'B': [False, True, False]})
>>> df.replace({'a string': 'new value', True: False})  # raises
TypeError: Cannot compare types 'ndarray(dtype=bool)' and 'str'

This raises a ``TypeError`` because one of the ``dict`` keys is not of
the correct type for replacement.

Compare the behavior of ``s.replace({'a': None})`` and
``s.replace('a', None)`` to understand the pecularities
of the ``to_replace`` parameter:

>>> s = pd.Series([10, 'a', 'a', 'b', 'a'])

When one uses a dict as the ``to_replace`` value, it is like the
value(s) in the dict are equal to the value parameter.
``s.replace({'a': None})`` is equivalent to
``s.replace(to_replace={'a': None}, value=None, method=None)``:

>>> s.replace({'a': None})
0      10
1    None
2    None
3       b
4    None
dtype: object

When ``value=None`` and ``to_replace`` are a scalar, list or
tuple, ``replace`` uses the method parameter (default 'pad') to do the
replacement. So this is why the 'a' values are being replaced by 10
in rows 1 and 2 and 'b' in row 4 in this case.
The command ``s.replace('a', None)`` is actually equivalent to
``s.replace(to_replace='a', value=None, method='pad')``:

>>> s.replace('a', None)
0    10
1    10
2    10
3     b
4     b
dtype: object

################################################################################
################################## Validation ##################################
################################################################################

Errors found:
        Errors in parameters section
                Parameter "to_replace" description should start with capital letter
        Examples do not pass tests

################################################################################
################################### Doctests ###################################
################################################################################

**********************************************************************
Line 233, in pandas.DataFrame.replace
Failed example:
    df.replace({'a string': 'new value', True: False})  # raises
Exception raised:
    Traceback (most recent call last):
      File "C:\Users\thisi\AppData\Local\conda\conda\envs\pandas_dev\lib\doctest.py", line 1330, in __run
        compileflags, 1), test.globs)
      File "<doctest pandas.DataFrame.replace[17]>", line 1, in <module>
        df.replace({'a string': 'new value', True: False})  # raises
      File "C:\Users\thisi\Documents\GitHub\pandas\pandas\core\frame.py", line 3136, in replace
        method=method, axis=axis)
      File "C:\Users\thisi\Documents\GitHub\pandas\pandas\core\generic.py", line 5205, in replace
        limit=limit, regex=regex)
      File "C:\Users\thisi\Documents\GitHub\pandas\pandas\core\frame.py", line 3136, in replace
        method=method, axis=axis)
      File "C:\Users\thisi\Documents\GitHub\pandas\pandas\core\generic.py", line 5254, in replace
        regex=regex)
      File "C:\Users\thisi\Documents\GitHub\pandas\pandas\core\internals.py", line 3696, in replace_list
        masks = [comp(s) for i, s in enumerate(src_list)]
      File "C:\Users\thisi\Documents\GitHub\pandas\pandas\core\internals.py", line 3696, in <listcomp>
        masks = [comp(s) for i, s in enumerate(src_list)]
      File "C:\Users\thisi\Documents\GitHub\pandas\pandas\core\internals.py", line 3694, in comp
        return _maybe_compare(values, getattr(s, 'asm8', s), operator.eq)
      File "C:\Users\thisi\Documents\GitHub\pandas\pandas\core\internals.py", line 5122, in _maybe_compare
        b=type_names[1]))
    TypeError: Cannot compare types 'ndarray(dtype=bool)' and 'str'

Section headers. Consistent quoting. Formatting. Traceback.

codecov · 2018-03-15T14:54:58Z

Codecov Report

Merging #20271 into master will increase coverage by 0.02%.
The diff coverage is 100%.

@@            Coverage Diff             @@
##           master   #20271      +/-   ##
==========================================
+ Coverage   91.82%   91.84%   +0.02%     
==========================================
  Files         152      153       +1     
  Lines       49248    49305      +57     
==========================================
+ Hits        45222    45286      +64     
+ Misses       4026     4019       -7

Flag	Coverage Δ
#multiple	`90.24% <100%> (+0.02%)`	⬆️
#single	`41.89% <53.84%> (ø)`	⬆️

Impacted Files	Coverage Δ
pandas/core/generic.py	`95.94% <100%> (+0.08%)`	⬆️
pandas/io/clipboard/clipboards.py	`30.58% <0%> (-1.6%)`	⬇️
pandas/core/config_init.py	`99.24% <0%> (-0.76%)`	⬇️
pandas/core/arrays/categorical.py	`95.78% <0%> (-0.41%)`	⬇️
pandas/core/nanops.py	`96.3% <0%> (-0.4%)`	⬇️
pandas/util/_decorators.py	`82.25% <0%> (-0.15%)`	⬇️
pandas/plotting/_core.py	`82.39% <0%> (-0.12%)`	⬇️
pandas/io/pytables.py	`92.41% <0%> (-0.05%)`	⬇️
pandas/core/frame.py	`97.16% <0%> (-0.02%)`	⬇️
pandas/tseries/offsets.py	`97% <0%> (-0.01%)`	⬇️
... and 27 more

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update cdfce2b...58f6531. Read the comment docs.

TomAugspurger · 2018-03-15T14:56:44Z

Updated

################################################################################
##################### Docstring (pandas.DataFrame.replace) #####################
################################################################################

Replace values given in `to_replace` with `value`.

Values of the DataFrame are replaced with other values dynamically.
This differs from updating with ``.loc`` or ``.iloc``, which require
you to specify a location to update with some value.

Parameters
----------
to_replace : str, regex, list, dict, Series, int, float, or None
    How to find the values that will be replaced.

    * numeric, str or regex:

        - numeric: numeric values equal to `to_replace` will be
          replaced with `value`
        - str: string exactly matching `to_replace` will be replaced
          with `value`
        - regex: regexs matching `to_replace` will be replaced with
          `value`

    * list of str, regex, or numeric:

        - First, if `to_replace` and `value` are both lists, they
          **must** be the same length.
        - Second, if ``regex=True`` then all of the strings in **both**
          lists will be interpreted as regexs otherwise they will match
          directly. This doesn't matter much for `value` since there
          are only a few possible substitution regexes you can use.
        - str, regex and numeric rules apply as above.

    * dict:

        - Dicts can be used to specify different replacement values
          for different existing values. For example,
          ``{'a': 'b', 'y': 'z'}`` replaces the value 'a' with 'b' and
          'y' with 'z'. To use a dict in this way the `value`
          parameter should be `None`.
        - For a DataFrame a dict can specify that different values
          should be replaced in different columns. For example,
          ``{'a': 1, 'b': 'z'}`` looks for the value 1 in column 'a'
          and the value 'z' in column 'b' and replaces these values
          with whatever is specified in `value`. The `value` parameter
          should not be ``None`` in this case. You can treat this as a
          special case of passing two lists except that you are
          specifying the column to search in.
        - For a DataFrame nested dictionaries, e.g.,
          ``{'a': {'b': np.nan}}``, are read as follows: look in column
          'a' for the value 'b' and replace it with NaN. The `value`
          parameter should be ``None`` to use a nested dict in this
          way. You can nest regular expressions as well. Note that
          column names (the top-level dictionary keys in a nested
          dictionary) **cannot** be regular expressions.

    * None:

        - This means that the `regex` argument must be a string,
          compiled regular expression, or list, dict, ndarray or
          Series of such elements. If `value` is also ``None`` then
          this **must** be a nested dictionary or Series.

    See the examples section for examples of each of these.
value : scalar, dict, list, str, regex, default None
    Value to replace any values matching `to_replace` with.
    For a DataFrame a dict of values can be used to specify which
    value to use for each column (columns not in the dict will not be
    filled). Regular expressions, strings and lists or dicts of such
    objects are also allowed.
inplace : boolean, default False
    If True, in place. Note: this will modify any
    other views on this object (e.g. a column from a DataFrame).
    Returns the caller if this is True.
limit : int, default None
    Maximum size gap to forward or backward fill.
regex : bool or same types as `to_replace`, default False
    Whether to interpret `to_replace` and/or `value` as regular
    expressions. If this is ``True`` then `to_replace` *must* be a
    string. Alternatively, this could be a regular expression or a
    list, dict, or array of regular expressions in which case
    `to_replace` must be ``None``.
method : {'pad', 'ffill', 'bfill', `None`}
    The method to use when for replacement, when `to_replace` is a
    scalar, list or tuple and `value` is ``None``.

    .. versionchanged:: 0.23.0
        Added to DataFrame.
axis : None
    Deprecated.

See Also
--------
DataFrame.fillna : Fill `NaN` values
DataFrame.where : Replace values based on boolean condition
Series.str.replace : Simple string replacement.

Returns
-------
DataFrame
    Object after replacement.

Raises
------
AssertionError
    * If `regex` is not a ``bool`` and `to_replace` is not
      ``None``.
TypeError
    * If `to_replace` is a ``dict`` and `value` is not a ``list``,
      ``dict``, ``ndarray``, or ``Series``
    * If `to_replace` is ``None`` and `regex` is not compilable
      into a regular expression or is a list, dict, ndarray, or
      Series.
    * When replacing multiple ``bool`` or ``datetime64`` objects and
      the arguments to `to_replace` does not match the type of the
      value being replaced
ValueError
    * If a ``list`` or an ``ndarray`` is passed to `to_replace` and
      `value` but they are not the same length.

Notes
-----
* Regex substitution is performed under the hood with ``re.sub``. The
  rules for substitution for ``re.sub`` are the same.
* Regular expressions will only substitute on strings, meaning you
  cannot provide, for example, a regular expression matching floating
  point numbers and expect the columns in your frame that have a
  numeric dtype to be matched. However, if those floating point
  numbers *are* strings, then you can do this.
* This method has *a lot* of options. You are encouraged to experiment
  and play with this method to gain intuition about how it works.
* When dict is used as the `to_replace` value, it is like
  key(s) in the dict are the to_replace part and
  value(s) in the dict are the value parameter.

Examples
--------

**Scalar `to_replace` and `value`**

>>> s = pd.Series([0, 1, 2, 3, 4])
>>> s.replace(0, 5)
0    5
1    1
2    2
3    3
4    4
dtype: int64

>>> df = pd.DataFrame({'A': [0, 1, 2, 3, 4],
...                    'B': [5, 6, 7, 8, 9],
...                    'C': ['a', 'b', 'c', 'd', 'e']})
>>> df.replace(0, 5)
   A  B  C
0  5  5  a
1  1  6  b
2  2  7  c
3  3  8  d
4  4  9  e

**List-like `to_replace`**

>>> df.replace([0, 1, 2, 3], 4)
   A  B  C
0  4  5  a
1  4  6  b
2  4  7  c
3  4  8  d
4  4  9  e

>>> df.replace([0, 1, 2, 3], [4, 3, 2, 1])
   A  B  C
0  4  5  a
1  3  6  b
2  2  7  c
3  1  8  d
4  4  9  e

>>> s.replace([1, 2], method='bfill')
0    0
1    3
2    3
3    3
4    4
dtype: int64

**dict-like `to_replace`**

>>> df.replace({0: 10, 1: 100})
     A  B  C
0   10  5  a
1  100  6  b
2    2  7  c
3    3  8  d
4    4  9  e

>>> df.replace({'A': 0, 'B': 5}, 100)
     A    B  C
0  100  100  a
1    1    6  b
2    2    7  c
3    3    8  d
4    4    9  e

>>> df.replace({'A': {0: 100, 4: 400}})
     A  B  C
0  100  5  a
1    1  6  b
2    2  7  c
3    3  8  d
4  400  9  e

**Regular expression `to_replace`**

>>> df = pd.DataFrame({'A': ['bat', 'foo', 'bait'],
...                    'B': ['abc', 'bar', 'xyz']})
>>> df.replace(to_replace=r'^ba.$', value='new', regex=True)
      A    B
0   new  abc
1   foo  new
2  bait  xyz

>>> df.replace({'A': r'^ba.$'}, {'A': 'new'}, regex=True)
      A    B
0   new  abc
1   foo  bar
2  bait  xyz

>>> df.replace(regex=r'^ba.$', value='new')
      A    B
0   new  abc
1   foo  new
2  bait  xyz

>>> df.replace(regex={r'^ba.$':'new', 'foo':'xyz'})
      A    B
0   new  abc
1   xyz  new
2  bait  xyz

>>> df.replace(regex=[r'^ba.$', 'foo'], value='new')
      A    B
0   new  abc
1   new  new
2  bait  xyz

Note that when replacing multiple ``bool`` or ``datetime64`` objects,
the data types in the `to_replace` parameter must match the data
type of the value being replaced:

>>> df = pd.DataFrame({'A': [True, False, True],
...                    'B': [False, True, False]})
>>> df.replace({'a string': 'new value', True: False})  # raises
Traceback (most recent call last):
    ...
TypeError: Cannot compare types 'ndarray(dtype=bool)' and 'str'

This raises a ``TypeError`` because one of the ``dict`` keys is not of
the correct type for replacement.

Compare the behavior of ``s.replace({'a': None})`` and
``s.replace('a', None)`` to understand the pecularities
of the `to_replace` parameter:

>>> s = pd.Series([10, 'a', 'a', 'b', 'a'])

When one uses a dict as the `to_replace` value, it is like the
value(s) in the dict are equal to the `value` parameter.
``s.replace({'a': None})`` is equivalent to
``s.replace(to_replace={'a': None}, value=None, method=None)``:

>>> s.replace({'a': None})
0      10
1    None
2    None
3       b
4    None
dtype: object

When ``value=None`` and `to_replace` is a scalar, list or
tuple, `replace` uses the method parameter (default 'pad') to do the
replacement. So this is why the 'a' values are being replaced by 10
in rows 1 and 2 and 'b' in row 4 in this case.
The command ``s.replace('a', None)`` is actually equivalent to
``s.replace(to_replace='a', value=None, method='pad')``:

>>> s.replace('a', None)
0    10
1    10
2    10
3     b
4     b
dtype: object

################################################################################
################################## Validation ##################################
################################################################################

Docstring for "pandas.DataFrame.replace" correct. :)

jorisvandenbossche · 2018-03-15T14:56:49Z

I would personally split this docstring in separate ones for series and dataframe, it's becoming quite a monster :)

WillAyd

One very minor edit but otherwise lgtm

WillAyd · 2018-04-21T14:24:30Z

pandas/core/generic.py


        See Also
        --------
-        %(klass)s.fillna : Fill NA/NaN values
+        %(klass)s.fillna : Fill `NaN` values


This would be better as Fill NA values since it is talking about the concept of missing data and not necessarily the NaN value itself

minor change as requested

TomAugspurger

Fixed the linting failure. Let's get this merged when that passes.

TomAugspurger · 2018-04-22T14:22:29Z

Thanks @math-and-data!

math-and-data added 2 commits March 11, 2018 03:39

DOC: changed pandas.DataFrame/Series.replace docstring

068127c

DOC: changed pandas.DataFrame/Series.replace docstring

44b7de3

WillAyd requested changes Mar 11, 2018

View reviewed changes

jorisvandenbossche added Docs and removed Docs labels Mar 11, 2018

jreback requested changes Mar 11, 2018

View reviewed changes

jreback added Missing-data np.nan, pd.NaT, pd.NA, dropna, isnull, interpolate Reshaping Concat, Merge/Join, Stack/Unstack, Explode labels Mar 11, 2018

math-and-data added 3 commits March 14, 2018 23:16

DOC: pandas.DataFrame.replace - implemented feedback

b502d34

DOC: pandas.DataFrame.replace - int/float instead of numeric

80bc7ca

DOC: pandas.DataFrame.replace - formatting

f05d0af

Updates.

0a081fe

Section headers. Consistent quoting. Formatting. Traceback.

WillAyd approved these changes Apr 21, 2018

View reviewed changes

math-and-data and others added 2 commits April 21, 2018 19:22

DOC: update the pandas.DataFrame.replace docstring

cf6d655

minor change as requested

Fixed linting

58f6531

TomAugspurger approved these changes Apr 21, 2018

View reviewed changes

TomAugspurger merged commit 4de2e9b into pandas-dev:master Apr 22, 2018

math-and-data mentioned this pull request Apr 22, 2018

DEPR: removed long deprecated input param 'axis' in .replace() #20789

Merged

2 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

DOC: update the pandas.DataFrame.replace docstring #20271

DOC: update the pandas.DataFrame.replace docstring #20271

math-and-data commented Mar 11, 2018

WillAyd Mar 11, 2018

WillAyd Mar 11, 2018

WillAyd Mar 11, 2018

jreback Mar 11, 2018

WillAyd Mar 11, 2018

math-and-data Mar 12, 2018

WillAyd Mar 12, 2018

math-and-data Mar 14, 2018

math-and-data Apr 21, 2018

WillAyd Mar 11, 2018

math-and-data Mar 14, 2018

WillAyd Mar 11, 2018

WillAyd Mar 11, 2018

WillAyd Mar 11, 2018

WillAyd Mar 11, 2018

math-and-data Mar 14, 2018

WillAyd Mar 11, 2018

WillAyd Mar 11, 2018

math-and-data Mar 14, 2018

jreback Mar 11, 2018

jreback Mar 11, 2018

math-and-data Mar 14, 2018

math-and-data commented Mar 14, 2018

codecov bot commented Mar 15, 2018 •

edited

Loading

TomAugspurger commented Mar 15, 2018

jorisvandenbossche commented Mar 15, 2018

WillAyd left a comment

WillAyd Apr 21, 2018

TomAugspurger left a comment

TomAugspurger commented Apr 22, 2018

DOC: update the pandas.DataFrame.replace docstring #20271

DOC: update the pandas.DataFrame.replace docstring #20271

Conversation

math-and-data commented Mar 11, 2018

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

math-and-data commented Mar 14, 2018

codecov bot commented Mar 15, 2018 • edited Loading

Codecov Report

TomAugspurger commented Mar 15, 2018

jorisvandenbossche commented Mar 15, 2018

WillAyd left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

TomAugspurger left a comment

Choose a reason for hiding this comment

TomAugspurger commented Apr 22, 2018

codecov bot commented Mar 15, 2018 •

edited

Loading