Skip to content

DOC: Fix quantile docstring #22906

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
wants to merge 6 commits into from
Closed
Changes from 3 commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
49 changes: 27 additions & 22 deletions pandas/core/frame.py
Original file line number Diff line number Diff line change
Expand Up @@ -7390,20 +7390,23 @@ def f(s):
def quantile(self, q=0.5, axis=0, numeric_only=True,
interpolation='linear'):
"""
Return values at the given quantile over requested axis.
Return value(s) at the given quantile over requested axis.

This function calculates the 'q' quantile values on the dataframe,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great description, I'm unsure about using the word calculate, as gives the impression that the values are not present in the data already. Also, I'd use DataFrame as in the class (capital letters).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Typos fixed, and changed terminology. Wait for next commit!

dividing data points into groups along `axis` axis.
In case of insufficient number of data points for clean division
into groups, specify `interpolation` scheme to implement.

Parameters
----------
q : float or array-like, default 0.5 (50% quantile)
0 <= q <= 1, the quantile(s) to compute
axis : {0, 1, 'index', 'columns'} (default 0)
0 or 'index' for row-wise, 1 or 'columns' for column-wise
q : float or array-like, default 0.5 (50% quantile) [0 <= q <= 1]
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In this line, besides the name, it'd be good to have just the types and the default value. Any additional clarification is better to have it in the description. By keeping only types, we'll eventually be able to catch typos automatically in them. So, q : float or array-like, default 0.5 and the rest as part of the description.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure, @datapythonista ! Looks much better that way!

The quantile(s) to compute.
axis : boolean{0, 1, 'index', 'columns'} (default 0)
For row-wise : 0 or'index', for column-wise : 1 or 'columns'.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

there is a typo here, but anyway this seems redundant with the type line. Can you check other docstrings with the axis parameter and copy their description.

numeric_only : boolean, default True
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you use bool (Python type) instead of boolean

Copy link
Contributor Author

@tm9k1 tm9k1 Sep 30, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'll switch to the convention axis : {0 or 'index', 1 or 'columns'}, default 0 so boolean/bool won't be needed.

If False, the quantile of datetime and timedelta data will be
computed as well
computed as well.
interpolation : {'linear', 'lower', 'higher', 'midpoint', 'nearest'}
.. versionadded:: 0.18.0

This optional parameter specifies the interpolation method to use,
when the desired quantile lies between two data points `i` and `j`:

Expand All @@ -7416,23 +7419,22 @@ def quantile(self, q=0.5, axis=0, numeric_only=True,

Returns
-------
quantiles : Series or DataFrame

- If ``q`` is an array, a DataFrame will be returned where the
index is ``q``, the columns are the columns of self, and the
scalar, Series or DataFrame
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there a case where this function returns a scalar? If so, can you explain in the description when

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure, I'll add a few examples !

- If `q` is an array, a DataFrame will be returned where the
index is `q`, the columns are the columns of self, and the
values are the quantiles.
- If ``q`` is a float, a Series will be returned where the
- If `q` is a float, a Series will be returned where the
index is the columns of self and the values are the quantiles.

Examples
--------

>>> df = pd.DataFrame(np.array([[1, 1], [2, 10], [3, 100], [4, 100]]),
columns=['a', 'b'])
>>> df = pd.DataFrame(np.array([[1, 1], [2, 10], \
[3, 100], [4, 100]]), \
columns=['a', 'b'])
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We usually try to keep the examples small, but I think the results are almost meaningless with just two samples. I'd have may be 6 or 8 in this case. Also, we try to use data that looks kind of real, and that the user know before hand (see this example: https://github.com/pandas-dev/pandas/blob/master/pandas/core/frame.py#L7580).

Also, can you use the validations mentioned in the issue. This example can't be run, and the backslashes are redundant and a style error is probably being shown.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Alright, @datapythonista . I'll think of something!

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@datapythonista please review my PR. I believe these examples should suffice.

>>> df.quantile(.1)
a 1.3
b 3.7
dtype: float64
Name: 0.1, dtype: float64
>>> df.quantile([.1, .5])
a b
0.1 1.3 3.7
Expand All @@ -7441,11 +7443,12 @@ def quantile(self, q=0.5, axis=0, numeric_only=True,
Specifying `numeric_only=False` will also compute the quantile of
datetime and timedelta data.

>>> df = pd.DataFrame({'A': [1, 2],
'B': [pd.Timestamp('2010'),
pd.Timestamp('2011')],
'C': [pd.Timedelta('1 days'),
pd.Timedelta('2 days')]})
>>> df = pd.DataFrame({ 'A': [1, 2], \
'B': [pd.Timestamp('2010'), \
pd.Timestamp('2011')], \
'C': [pd.Timedelta('1 days'), \
pd.Timedelta('2 days')]})
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If possible, try to define just an example at the beginning, and use it everywhere. If you need a date column, you can add it there.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

okay, @datapythonista .


>>> df.quantile(0.5, numeric_only=False)
A 1.5
B 2010-07-02 12:00:00
Expand All @@ -7455,7 +7458,9 @@ def quantile(self, q=0.5, axis=0, numeric_only=True,
See Also
--------
pandas.core.window.Rolling.quantile
Returns the rolling quantile for the DataFrame.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the See Also section has the format func : desc in the same line (continuing in the next). Can you check the docs and the other docstrings and adapt.

numpy.percentile
Returns 'nth' percentile for the DataFrame.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

numpy.percentile is used for numpy arrays, not for DataFrame.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

corrected !

"""
self._check_percentile(q)

Expand Down