-
-
Notifications
You must be signed in to change notification settings - Fork 18.4k
DOC: Fix quantile docstring #22906
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
DOC: Fix quantile docstring #22906
Changes from 3 commits
72d8082
6ad3c1d
dce5a3e
4910bbe
ef09ccc
39081e1
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -7390,20 +7390,23 @@ def f(s): | |
def quantile(self, q=0.5, axis=0, numeric_only=True, | ||
interpolation='linear'): | ||
""" | ||
Return values at the given quantile over requested axis. | ||
Return value(s) at the given quantile over requested axis. | ||
|
||
This function calculates the 'q' quantile values on the dataframe, | ||
dividing data points into groups along `axis` axis. | ||
In case of insufficient number of data points for clean division | ||
into groups, specify `interpolation` scheme to implement. | ||
|
||
Parameters | ||
---------- | ||
q : float or array-like, default 0.5 (50% quantile) | ||
0 <= q <= 1, the quantile(s) to compute | ||
axis : {0, 1, 'index', 'columns'} (default 0) | ||
0 or 'index' for row-wise, 1 or 'columns' for column-wise | ||
q : float or array-like, default 0.5 (50% quantile) [0 <= q <= 1] | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. In this line, besides the name, it'd be good to have just the types and the default value. Any additional clarification is better to have it in the description. By keeping only types, we'll eventually be able to catch typos automatically in them. So, There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Sure, @datapythonista ! Looks much better that way! |
||
The quantile(s) to compute. | ||
axis : boolean{0, 1, 'index', 'columns'} (default 0) | ||
tm9k1 marked this conversation as resolved.
Show resolved
Hide resolved
|
||
For row-wise : 0 or'index', for column-wise : 1 or 'columns'. | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. there is a typo here, but anyway this seems redundant with the type line. Can you check other docstrings with the |
||
numeric_only : boolean, default True | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Can you use There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I'll switch to the convention |
||
If False, the quantile of datetime and timedelta data will be | ||
computed as well | ||
computed as well. | ||
interpolation : {'linear', 'lower', 'higher', 'midpoint', 'nearest'} | ||
.. versionadded:: 0.18.0 | ||
|
||
This optional parameter specifies the interpolation method to use, | ||
when the desired quantile lies between two data points `i` and `j`: | ||
|
||
|
@@ -7416,23 +7419,22 @@ def quantile(self, q=0.5, axis=0, numeric_only=True, | |
|
||
Returns | ||
------- | ||
quantiles : Series or DataFrame | ||
|
||
- If ``q`` is an array, a DataFrame will be returned where the | ||
index is ``q``, the columns are the columns of self, and the | ||
scalar, Series or DataFrame | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Is there a case where this function returns a scalar? If so, can you explain in the description when There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Sure, I'll add a few examples ! |
||
- If `q` is an array, a DataFrame will be returned where the | ||
index is `q`, the columns are the columns of self, and the | ||
values are the quantiles. | ||
- If ``q`` is a float, a Series will be returned where the | ||
- If `q` is a float, a Series will be returned where the | ||
index is the columns of self and the values are the quantiles. | ||
|
||
Examples | ||
-------- | ||
|
||
>>> df = pd.DataFrame(np.array([[1, 1], [2, 10], [3, 100], [4, 100]]), | ||
columns=['a', 'b']) | ||
>>> df = pd.DataFrame(np.array([[1, 1], [2, 10], \ | ||
[3, 100], [4, 100]]), \ | ||
columns=['a', 'b']) | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. We usually try to keep the examples small, but I think the results are almost meaningless with just two samples. I'd have may be 6 or 8 in this case. Also, we try to use data that looks kind of real, and that the user know before hand (see this example: https://github.com/pandas-dev/pandas/blob/master/pandas/core/frame.py#L7580). Also, can you use the validations mentioned in the issue. This example can't be run, and the backslashes are redundant and a style error is probably being shown. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Alright, @datapythonista . I'll think of something! There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. @datapythonista please review my PR. I believe these examples should suffice. |
||
>>> df.quantile(.1) | ||
a 1.3 | ||
b 3.7 | ||
dtype: float64 | ||
Name: 0.1, dtype: float64 | ||
>>> df.quantile([.1, .5]) | ||
a b | ||
0.1 1.3 3.7 | ||
|
@@ -7441,11 +7443,12 @@ def quantile(self, q=0.5, axis=0, numeric_only=True, | |
Specifying `numeric_only=False` will also compute the quantile of | ||
datetime and timedelta data. | ||
|
||
>>> df = pd.DataFrame({'A': [1, 2], | ||
'B': [pd.Timestamp('2010'), | ||
pd.Timestamp('2011')], | ||
'C': [pd.Timedelta('1 days'), | ||
pd.Timedelta('2 days')]}) | ||
>>> df = pd.DataFrame({ 'A': [1, 2], \ | ||
'B': [pd.Timestamp('2010'), \ | ||
pd.Timestamp('2011')], \ | ||
'C': [pd.Timedelta('1 days'), \ | ||
pd.Timedelta('2 days')]}) | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. If possible, try to define just an example at the beginning, and use it everywhere. If you need a date column, you can add it there. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. okay, @datapythonista . |
||
|
||
>>> df.quantile(0.5, numeric_only=False) | ||
A 1.5 | ||
B 2010-07-02 12:00:00 | ||
|
@@ -7455,7 +7458,9 @@ def quantile(self, q=0.5, axis=0, numeric_only=True, | |
See Also | ||
-------- | ||
pandas.core.window.Rolling.quantile | ||
datapythonista marked this conversation as resolved.
Show resolved
Hide resolved
|
||
Returns the rolling quantile for the DataFrame. | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I think the |
||
numpy.percentile | ||
Returns 'nth' percentile for the DataFrame. | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. corrected ! |
||
""" | ||
self._check_percentile(q) | ||
|
||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Great description, I'm unsure about using the word
calculate
, as gives the impression that the values are not present in the data already. Also, I'd useDataFrame
as in the class (capital letters).There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Typos fixed, and changed terminology. Wait for next commit!