Skip to content

DOC: update the pandas.DataFrame.clip docstring #20368

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
wants to merge 1 commit into from
Closed

DOC: update the pandas.DataFrame.clip docstring #20368

wants to merge 1 commit into from

Conversation

Dpananos
Copy link

@Dpananos Dpananos commented Mar 15, 2018

added new examples and reference to winsorization

Checklist for the pandas documentation sprint (ignore this if you are doing
an unrelated PR):

  • PR title is "DOC: update the docstring"
  • The validation script passes: scripts/validate_docstrings.py <your-function-or-method>
  • The PEP8 style check passes: git diff upstream/master -u -- "*.py" | flake8 --diff
  • The html version looks good: python doc/make.py --single <your-function-or-method>
  • It has been proofread on language by another sprint participant

Please include the output of the validation script below between the "```" ticks:

################################################################################
###################### Docstring (pandas.DataFrame.clip)  ######################
################################################################################

Trim values at input threshold(s).

Assigns values outside boundary to boundary values. Thresholds
can be singular values or array like, and in the latter case
the clipping is performed element-wise in the specified axis.

Parameters
----------
lower : float or array_like, default None
    Minimum threshold value. All values below this
    threshold will be set to it.
upper : float or array_like, default None
    Maximum threshold value. All values above this
    threshold will be set to it.
axis : int or string axis name, optional
    Align object with lower and upper along the given axis.
inplace : boolean, default False
    Whether to perform the operation in place on the data.

    .. versionadded:: 0.21.0
*args, **kwargs
    Additional keywords have no effect but might be accepted
    for compatibility with numpy.

See Also
--------
clip_lower : Clip values below specified threshold(s).
clip_upper : Clip values above specified threshold(s).

Returns
-------
Series or DataFrame
    Same type as calling object with the values outside the
    clip boundaries replaced.

Notes
-----
.. [1] Tukey, John W. "The future of data analysis." The annals of
    mathematical statistics 33.1 (1962): 1-67.

Examples
--------
>>> df = pd.DataFrame({'a': [-1, -2, -100],
...                    'b': [1, 2, 100]},
...                    index=['foo', 'bar', 'foobar'])
>>> df
     a  b
foo -1  1
bar -2  2
foobar  -100    100

>>> df.clip(lower=-10, upper=10)
     a  b
foo -1  1
bar -2  2
foobar  -10 10

You can clip each column or row with different thresholds by passing
a ``Series`` to the lower/upper argument. Use the axis argument to clip
by column or rows.

>>> col_thresh = pd.Series({'a': -5, 'b': 5})
>>> df.clip(lower=col_thresh, axis='columns')
     a  b
foo -1  5
bar -2  5
foobar  -5  100

Clip the foo, bar, and foobar rows with lower thresholds 5, 7, and 10.

>>> row_thresh = pd.Series({'foo': 0, 'bar': 1, 'foobar': 10})
>>> df.clip(lower=row_thresh, axis='index')
    a   b
foo 0   1
bar 1   2
foobar  10  100

Winsorizing [1]_ is a related method, whereby the data are clipped at
the 5th and 95th percentiles. The ``DataFrame.quantile`` method returns
a ``Series`` with column names as index and the quantiles as values.
Use ``axis='columns'`` to apply clipping to columns.

>>> lower, upper = df.quantile(0.05), df.quantile(0.95)
>>> df.clip(lower=lower, upper=upper, axis='columns')
     a  b
foo -1.1    1.1
bar -2.0    2.0
foobar  -90.2   90.2

################################################################################
################################## Validation ##################################
################################################################################

Errors found:
	Errors in parameters section
		Parameters {'args', 'kwargs'} not documented
		Unknown parameters {'*args, **kwargs'}
		Parameter "inplace" description should finish with "."
		Parameter "*args, **kwargs" has no type


If the validation script still gives errors, but you think there is a good reason
to deviate in this case (and there are certainly such cases), please state this
explicitly.

Checklist for other PRs (remove this part if you are doing a PR for the pandas documentation sprint):

  • closes #xxxx
  • tests added / passed
  • passes git diff upstream/master -u -- "*.py" | flake8 --diff
  • whatsnew entry

This is a new pull request but has the same content of #20212. Was having some problems with git so I just started fresh

added new examples and reference to winsorization
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants