-
-
Notifications
You must be signed in to change notification settings - Fork 18.4k
DOC: Improve the docstring of DataFrame.describe() #20222
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Changes from 5 commits
916624c
d365098
777dadf
aa13b25
8da3c9a
1277860
66dab96
0f4e8ed
c570d24
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -7187,7 +7187,7 @@ def abs(self): | |
|
||
def describe(self, percentiles=None, include=None, exclude=None): | ||
""" | ||
Generates descriptive statistics that summarize the central tendency, | ||
Generate descriptive statistics that summarize the central tendency, | ||
dispersion and shape of a dataset's distribution, excluding | ||
``NaN`` values. | ||
|
||
|
@@ -7231,7 +7231,18 @@ def describe(self, percentiles=None, include=None, exclude=None): | |
|
||
Returns | ||
------- | ||
summary: Series/DataFrame of summary statistics | ||
Series or DataFrame | ||
Summary statistics of the Series or Dataframe provided. | ||
|
||
See Also | ||
-------- | ||
DataFrame.count: Count number of non-NA/null observations. | ||
DataFrame.max: Maximum of the values in the object. | ||
DataFrame.min: Minimum of the values in the object. | ||
DataFrame.mean: Mean of the values. | ||
DataFrame.std: Standard deviation of the obersvations. | ||
DataFrame.select_dtypes: Subset of a DataFrame including/excluding | ||
columns based on their dtype. | ||
|
||
Notes | ||
----- | ||
|
@@ -7275,6 +7286,7 @@ def describe(self, percentiles=None, include=None, exclude=None): | |
50% 2.0 | ||
75% 2.5 | ||
max 3.0 | ||
dtype: float64 | ||
|
||
Describing a categorical ``Series``. | ||
|
||
|
@@ -7305,9 +7317,9 @@ def describe(self, percentiles=None, include=None, exclude=None): | |
Describing a ``DataFrame``. By default only numeric fields | ||
are returned. | ||
|
||
>>> df = pd.DataFrame({ 'object': ['a', 'b', 'c'], | ||
>>> df = pd.DataFrame({ 'categorical': pd.Categorical(['d','e','f']), | ||
... 'numeric': [1, 2, 3], | ||
... 'categorical': pd.Categorical(['d','e','f']) | ||
... 'object': ['a', 'b', 'c'] | ||
... }) | ||
>>> df.describe() | ||
numeric | ||
|
@@ -7393,7 +7405,7 @@ def describe(self, percentiles=None, include=None, exclude=None): | |
Excluding object columns from a ``DataFrame`` description. | ||
|
||
>>> df.describe(exclude=[np.object]) | ||
categorical numeric | ||
categorical numeric | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. When running the validation script, I occasionally get a failure
Did you see this at all? This likely is an issue in the method itself, and not the docstring. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. yeah i do see this error but its flaky. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. To be clear, it's probably some kind of non-stable sorting inside the describe method, and nothing wrong with the docstring. It may be best to just include the docstring, and open a new issue. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. The strange thing is that just doing pd.DataFrame({"A": pd.Categorical(['d', 'e', 'f']), "B": ['a', 'b', 'c'], 'C': [1, 2, 3]}).describe(exclude=['number']) seems deterministic. |
||
count 3 3.0 | ||
unique 3 NaN | ||
top f NaN | ||
|
@@ -7405,15 +7417,6 @@ def describe(self, percentiles=None, include=None, exclude=None): | |
50% NaN 2.0 | ||
75% NaN 2.5 | ||
max NaN 3.0 | ||
|
||
See Also | ||
-------- | ||
DataFrame.count | ||
DataFrame.max | ||
DataFrame.min | ||
DataFrame.mean | ||
DataFrame.std | ||
DataFrame.select_dtypes | ||
""" | ||
if self.ndim >= 3: | ||
msg = "describe is not implemented on Panel objects." | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
PEP8 formatting here. No space after the
{
, spaces after the,
in the Categorical.