Skip to content

DOC: update the pandas.DataFrame.replace docstring #20271

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

58 changes: 53 additions & 5 deletions pandas/core/generic.py
Original file line number Diff line number Diff line change
Expand Up @@ -4869,6 +4869,10 @@ def bfill(self, axis=None, inplace=False, limit=None, downcast=None):
_shared_docs['replace'] = ("""
Replace values given in 'to_replace' with 'value'.

Values of the DataFrame or a Series are being replaced with
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not sure this extended description is adding much. Better served to make mention of how this can replace values with a dynamic set of inputs like dicts

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done, thank you for the suggestion

other values. One or several values can be replaced with one
or several values.

Parameters
----------
to_replace : str, regex, list, dict, Series, numeric, or None
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Say int, float instead of numeric (if float is even valid?)

Expand Down Expand Up @@ -4934,19 +4938,21 @@ def bfill(self, axis=None, inplace=False, limit=None, downcast=None):
other views on this object (e.g. a column from a DataFrame).
Returns the caller if this is True.
limit : int, default None
Maximum size gap to forward or backward fill
Maximum size gap to forward or backward fill.
regex : bool or same types as ``to_replace``, default False
Whether to interpret ``to_replace`` and/or ``value`` as regular
expressions. If this is ``True`` then ``to_replace`` *must* be a
string. Alternatively, this could be a regular expression or a
list, dict, or array of regular expressions in which case
``to_replace`` must be ``None``.
method : string, optional, {'pad', 'ffill', 'bfill'}
method : string, optional, {'pad', 'ffill', 'bfill'}, default is 'pad'
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

method : {'pad', 'ffill', 'bfill', `None`}

The method to use when for replacement, when ``to_replace`` is a
scalar, list or tuple and ``value`` is None.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Generally can put builtins like None in back ticks i.e. `None`

axis : None
Deprecated.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Warning says this will be removed in v0.13? Woof...I guess OK to document for this change but should have a follow up change to actually go ahead and remove - care to take a stab at that?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@WillAyd where is this warning?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm happy to take a stab at this - always nice when I can remove code too

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@math-and-data awesome thanks! Can you open a separate issue for this?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

will do.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@WillAyd I was waiting for this PR to be approved, then I would open a new request where I change the relevant code (remove the 'axis' reference) and edit the documentation accordingly. Is there anything else I had missed in this PR (other than the suggestion of breaking out the DataFrame and Series examples)?


.. versionchanged:: 0.23.0
Added to DataFrame
.. versionchanged:: 0.23.0
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This isn't what you want to do - make sure you keep the versionchanged directive below the method argument as that's what was added in v0.23

Added to DataFrame

See Also
--------
Expand All @@ -4955,7 +4961,8 @@ def bfill(self, axis=None, inplace=False, limit=None, downcast=None):

Returns
-------
filled : %(klass)s
%(klass)s
Some values have been substituted for new values.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Description could be improved here - perhaps "Object after replacement" or something similar


Raises
------
Expand Down Expand Up @@ -5092,6 +5099,47 @@ def bfill(self, axis=None, inplace=False, limit=None, downcast=None):

This raises a ``TypeError`` because one of the ``dict`` keys is not of
the correct type for replacement.

Compare the behavior of
``s.replace('a', None)`` and ``s.replace({'a': None})`` to understand
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure if double back ticks provide anything so maybe replace with single? @TomAugspurger

the pecularities of the ``to_replace`` parameter.
``s.replace('a', None)`` is actually equivalent to
``s.replace(to_replace='a', value=None, method='pad')``,
because when ``value=None`` and ``to_replace`` is a scalar, list or
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is interesting as I was not aware of this behavior. Certainly great to have it documented, though I would move the majority of the writing into the Notes section and shorten the blurb introducing the comparison here.

tuple, ``replace`` uses the method parameter to do the replacement.
So this is why the 'a' values are being replaced by 30 in rows 3 and 4
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe just reinforce that it's the fill behavior that is really replacing values here

and 'b' in row 6 in this case. However, this behaviour does not occur
when you use a dict as the ``to_replace`` value. In this case, it is
like the value(s) in the dict are equal to the value parameter.

>>> s = pd.Series([10, 20, 30, 'a', 'a', 'b', 'a'])
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just to keep things concise why don't you get rid of 10 and 20 in this example? They don't serve any real purpose but make the documentation longer. Can also replace 30 with 1

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

great suggestion of simplifying.

>>> print(s)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You don't need the print statements here (and below)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This Series is simple enough where you don't need to explicitly print it - the constructor shows you everything of interest

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I personally have found the visual of inspecting the changes before/after easier for such replacements (both in vertical positions). You have more experience and I'll rely on your suggestion and make the change.

0 10
1 20
2 30
3 a
4 a
5 b
6 a
dtype: object
>>> print(s.replace('a', None))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

you don't need the prints, use a blank line between cases. Having an expl for each case is also nice.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done.

0 10
1 20
2 30
3 30
4 30
5 b
6 b
dtype: object
>>> print(s.replace({'a': None}))
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would put this example first as it is (from my perspective) the behavior most would expect. Having it first makes it a better segue into the nuance that you want to describe with the other example

0 10
1 20
2 30
3 None
4 None
5 b
6 None
dtype: object
""")

@Appender(_shared_docs['replace'] % _shared_doc_kwargs)
Expand Down