Skip to content

Commit 4b0c80b

Browse files
committed
DOC: Clarifying use of categorical data in describe docstring (pandas-dev#16722)
1 parent 50c1dda commit 4b0c80b

File tree

1 file changed

+58
-41
lines changed

1 file changed

+58
-41
lines changed

pandas/core/generic.py

+58-41
Original file line numberDiff line numberDiff line change
@@ -6352,20 +6352,22 @@ def describe(self, percentiles=None, include=None, exclude=None):
63526352
- A list-like of dtypes : Limits the results to the
63536353
provided data types.
63546354
To limit the result to numeric types submit
6355-
``numpy.number``. To limit it instead to categorical
6356-
objects submit the ``numpy.object`` data type. Strings
6355+
``numpy.number``. To limit it instead to object columns submit
6356+
the ``numpy.object`` data type. Strings
63576357
can also be used in the style of
6358-
``select_dtypes`` (e.g. ``df.describe(include=['O'])``)
6358+
``select_dtypes`` (e.g. ``df.describe(include=['O'])``). To
6359+
select pandas categorical columns, use ``'category'``
63596360
- None (default) : The result will include all numeric columns.
63606361
exclude : list-like of dtypes or None (default), optional,
63616362
A black list of data types to omit from the result. Ignored
63626363
for ``Series``. Here are the options:
63636364
63646365
- A list-like of dtypes : Excludes the provided data types
6365-
from the result. To select numeric types submit
6366-
``numpy.number``. To select categorical objects submit the data
6366+
from the result. To exclude numeric types submit
6367+
``numpy.number``. To exclude object columns submit the data
63676368
type ``numpy.object``. Strings can also be used in the style of
6368-
``select_dtypes`` (e.g. ``df.describe(include=['O'])``)
6369+
``select_dtypes`` (e.g. ``df.describe(include=['O'])``). To
6370+
exclude pandas categorical columns, use ``'category'``
63696371
- None (default) : The result will exclude nothing.
63706372
63716373
Returns
@@ -6390,9 +6392,11 @@ def describe(self, percentiles=None, include=None, exclude=None):
63906392
among those with the highest count.
63916393
63926394
For mixed data types provided via a ``DataFrame``, the default is to
6393-
return only an analysis of numeric columns. If ``include='all'``
6394-
is provided as an option, the result will include a union of
6395-
attributes of each type.
6395+
return only an analysis of numeric columns. If the dataframe consists
6396+
only of object and categorical data without any numeric columns, the
6397+
default is to return an analysis of both the object and categorical
6398+
columns. If ``include='all'`` is provided as an option, the result
6399+
will include a union of attributes of each type.
63966400
63976401
The `include` and `exclude` parameters can be used to limit
63986402
which columns in a ``DataFrame`` are analyzed for the output.
@@ -6442,8 +6446,10 @@ def describe(self, percentiles=None, include=None, exclude=None):
64426446
Describing a ``DataFrame``. By default only numeric fields
64436447
are returned.
64446448
6445-
>>> df = pd.DataFrame([[1, 'a'], [2, 'b'], [3, 'c']],
6446-
... columns=['numeric', 'object'])
6449+
>>> df = pd.DataFrame({ 'object': ['a', 'b', 'c'],
6450+
... 'numeric': [1, 2, 3],
6451+
... 'categorical': pd.Categorical(['d','e','f'])
6452+
... })
64476453
>>> df.describe()
64486454
numeric
64496455
count 3.0
@@ -6457,19 +6463,19 @@ def describe(self, percentiles=None, include=None, exclude=None):
64576463
64586464
Describing all columns of a ``DataFrame`` regardless of data type.
64596465
6460-
>>> df.describe(include='all')
6461-
numeric object
6462-
count 3.0 3
6463-
unique NaN 3
6464-
top NaN b
6465-
freq NaN 1
6466-
mean 2.0 NaN
6467-
std 1.0 NaN
6468-
min 1.0 NaN
6469-
25% 1.5 NaN
6470-
50% 2.0 NaN
6471-
75% 2.5 NaN
6472-
max 3.0 NaN
6466+
>>> df.describe(include='all')
6467+
categorical numeric object
6468+
count 3 3.0 3
6469+
unique 3 NaN 3
6470+
top f NaN c
6471+
freq 1 NaN 1
6472+
mean NaN 2.0 NaN
6473+
std NaN 1.0 NaN
6474+
min NaN 1.0 NaN
6475+
25% NaN 1.5 NaN
6476+
50% NaN 2.0 NaN
6477+
75% NaN 2.5 NaN
6478+
max NaN 3.0 NaN
64736479
64746480
Describing a column from a ``DataFrame`` by accessing it as
64756481
an attribute.
@@ -6483,7 +6489,6 @@ def describe(self, percentiles=None, include=None, exclude=None):
64836489
50% 2.0
64846490
75% 2.5
64856491
max 3.0
6486-
Name: numeric, dtype: float64
64876492
64886493
Including only numeric columns in a ``DataFrame`` description.
64896494
@@ -6504,31 +6509,43 @@ def describe(self, percentiles=None, include=None, exclude=None):
65046509
object
65056510
count 3
65066511
unique 3
6507-
top b
6512+
top c
65086513
freq 1
65096514
6515+
Including only categorical columns from a ``DataFrame`` description.
6516+
6517+
>>> df.describe(include=['category'])
6518+
categorical
6519+
count 3
6520+
unique 3
6521+
top f
6522+
freq 1
6523+
65106524
Excluding numeric columns from a ``DataFrame`` description.
65116525
65126526
>>> df.describe(exclude=[np.number])
6513-
object
6514-
count 3
6515-
unique 3
6516-
top b
6517-
freq 1
6527+
categorical object
6528+
count 3 3
6529+
unique 3 3
6530+
top f c
6531+
freq 1 1
65186532
65196533
Excluding object columns from a ``DataFrame`` description.
65206534
65216535
>>> df.describe(exclude=[np.object])
6522-
numeric
6523-
count 3.0
6524-
mean 2.0
6525-
std 1.0
6526-
min 1.0
6527-
25% 1.5
6528-
50% 2.0
6529-
75% 2.5
6530-
max 3.0
6531-
6536+
categorical numeric
6537+
count 3 3.0
6538+
unique 3 NaN
6539+
top f NaN
6540+
freq 1 NaN
6541+
mean NaN 2.0
6542+
std NaN 1.0
6543+
min NaN 1.0
6544+
25% NaN 1.5
6545+
50% NaN 2.0
6546+
75% NaN 2.5
6547+
max NaN 3.0
6548+
65326549
See Also
65336550
--------
65346551
DataFrame.count

0 commit comments

Comments
 (0)