Skip to content

DOC: Improve the docstring of DataFrame.describe() #20222

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 9 commits into from
Jul 8, 2018
73 changes: 37 additions & 36 deletions pandas/core/generic.py
Original file line number Diff line number Diff line change
Expand Up @@ -7179,7 +7179,7 @@ def abs(self):

def describe(self, percentiles=None, include=None, exclude=None):
"""
Generates descriptive statistics that summarize the central tendency,
Generate descriptive statistics that summarize the central tendency,
dispersion and shape of a dataset's distribution, excluding
``NaN`` values.

Expand Down Expand Up @@ -7267,6 +7267,7 @@ def describe(self, percentiles=None, include=None, exclude=None):
50% 2.0
75% 2.5
max 3.0
dtype: float64

Describing a categorical ``Series``.

Expand Down Expand Up @@ -7315,18 +7316,18 @@ def describe(self, percentiles=None, include=None, exclude=None):
Describing all columns of a ``DataFrame`` regardless of data type.

>>> df.describe(include='all')
categorical numeric object
count 3 3.0 3
unique 3 NaN 3
top f NaN c
freq 1 NaN 1
mean NaN 2.0 NaN
std NaN 1.0 NaN
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is there a reason these are being changed? they are already in alphabetical order. I suppose you could supply columns on construction to guarantee the order.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

thanks for the suggestion. Updated it with your recommendation.

min NaN 1.0 NaN
25% NaN 1.5 NaN
50% NaN 2.0 NaN
75% NaN 2.5 NaN
max NaN 3.0 NaN
object numeric categorical
count 3 3.0 3
unique 3 NaN 3
top c NaN f
freq 1 NaN 1
mean NaN 2.0 NaN
std NaN 1.0 NaN
min NaN 1.0 NaN
25% NaN 1.5 NaN
50% NaN 2.0 NaN
75% NaN 2.5 NaN
max NaN 3.0 NaN

Describing a column from a ``DataFrame`` by accessing it as
an attribute.
Expand Down Expand Up @@ -7376,36 +7377,36 @@ def describe(self, percentiles=None, include=None, exclude=None):
Excluding numeric columns from a ``DataFrame`` description.

>>> df.describe(exclude=[np.number])
categorical object
count 3 3
unique 3 3
top f c
freq 1 1
object categorical
count 3 3
unique 3 3
top c f
freq 1 1

Excluding object columns from a ``DataFrame`` description.

>>> df.describe(exclude=[np.object])
categorical numeric
count 3 3.0
unique 3 NaN
top f NaN
freq 1 NaN
mean NaN 2.0
std NaN 1.0
min NaN 1.0
25% NaN 1.5
50% NaN 2.0
75% NaN 2.5
max NaN 3.0
numeric categorical
count 3.0 3
unique NaN 3
top NaN f
freq NaN 1
mean 2.0 NaN
std 1.0 NaN
min 1.0 NaN
25% 1.5 NaN
50% 2.0 NaN
75% 2.5 NaN
max 3.0 NaN

See Also
--------
DataFrame.count
DataFrame.max
DataFrame.min
DataFrame.mean
DataFrame.std
DataFrame.select_dtypes
DataFrame.count : Count number of non-NA/null observations
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

See Also goes before Examples

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fixed thanks

DataFrame.max : Maximum of the values in the object
DataFrame.min : Minimum of the values in the object
DataFrame.mean : Mean of the values
DataFrame.std : Standard deviation of the obersvations
DataFrame.select_dtypes : Subset of a DataFrame including/excluding columns based on their dtype
"""
if self.ndim >= 3:
msg = "describe is not implemented on Panel objects."
Expand Down