Skip to content

DOC: update the pandas.DataFrame.clip docstring #20212

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
wants to merge 25 commits into from
Closed
Changes from 1 commit
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
17 changes: 11 additions & 6 deletions pandas/core/generic.py
Original file line number Diff line number Diff line change
Expand Up @@ -5629,6 +5629,12 @@ def clip(self, lower=None, upper=None, axis=None, inplace=False,
Original input with those values above/below the
`upper`/`lower` thresholds set to the threshold values.

Notes
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it should be in References and not in Notes. It wasn't in the documentation for the sprint, for simplicity, but you can check this document: http://numpydoc.readthedocs.io/en/latest/format.html

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yea, I agree. Done.

-----

.. [1] Tukey, John W. "The future of data analysis." The annals of
mathematical statistics 33.1 (1962): 1-67.

See Also
--------
DataFrame.clip : Trim values at input threshold(s).
Expand All @@ -5650,13 +5656,13 @@ def clip(self, lower=None, upper=None, axis=None, inplace=False,
... 'b': [1, 2, 100]},
... index=['foo', 'bar', 'foobar'])
>>> df
a b
a b
foo -1 1
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

alignment doesn’t look right here

bar -2 2
foobar -100 100

>>> df.clip(lower=-10, upper=10)
a b
a b
foo -1 1
bar -2 2
foobar -10 10
Expand All @@ -5667,7 +5673,7 @@ def clip(self, lower=None, upper=None, axis=None, inplace=False,

>>> col_thresh = pd.Series({'a': -5, 'b': 5})
>>> df.clip(lower=col_thresh, axis='columns')
a b
a b
foo -1 5
bar -2 5
foobar -5 100
Expand All @@ -5681,15 +5687,14 @@ def clip(self, lower=None, upper=None, axis=None, inplace=False,
bar 1 2
foobar 10 100

`Winsorizing <https://en.wikipedia.org/wiki/Winsorizing>`__ is a
related method, whereby the data are clipped at
Winsorizing [1]_ is a related method, whereby the data are clipped at
the 5th and 95th percentiles. The ``DataFrame.quantile`` method returns
a ``Series`` with column names as index and the quantiles as values.
Use ``axis='columns'`` to apply clipping to columns.

>>> lower, upper = df.quantile(0.05), df.quantile(0.95)
>>> df.clip(lower=lower, upper=upper, axis='columns')
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not sure if it's just because I'm too tired, but is axis useful here? If the threshold is a scalar, doesn't seem to have an effect, right?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Axis is required here since df.quantile returns a series with the quantiles for each column. Running without the axis argument results in an error.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd find useful having this comment in the explanations before the test. :)

a b
a b
foo -1.1 1.1
bar -2.0 2.0
foobar -90.2 90.2
Expand Down