Skip to content

Commit b385799

Browse files
gfyoungjreback
authored andcommitted
DOC: Clarify Categorical Crosstab Behaviour
Follow-on to #13073 by explaining the `Categorical` behaviour in the documentation. Author: gfyoung <[email protected]> Closes #13177 from gfyoung/crosstab-categorical-explain and squashes the following commits: 11ebb94 [gfyoung] DOC: Clarify Categorical Crosstab Behaviour
1 parent feee089 commit b385799

File tree

2 files changed

+25
-1
lines changed

2 files changed

+25
-1
lines changed

doc/source/reshaping.rst

+10
Original file line numberDiff line numberDiff line change
@@ -445,6 +445,16 @@ If ``crosstab`` receives only two Series, it will provide a frequency table.
445445
446446
pd.crosstab(df.A, df.B)
447447
448+
Any input passed containing ``Categorical`` data will have **all** of its
449+
categories included in the cross-tabulation, even if the actual data does
450+
not contain any instances of a particular category.
451+
452+
.. ipython:: python
453+
454+
foo = pd.Categorical(['a', 'b'], categories=['a', 'b', 'c'])
455+
bar = pd.Categorical(['d', 'e'], categories=['d', 'e', 'f'])
456+
pd.crosstab(foo, bar)
457+
448458
Normalization
449459
~~~~~~~~~~~~~
450460

pandas/tools/pivot.py

+15-1
Original file line numberDiff line numberDiff line change
@@ -410,7 +410,11 @@ def crosstab(index, columns, values=None, rownames=None, colnames=None,
410410
Notes
411411
-----
412412
Any Series passed will have their name attributes used unless row or column
413-
names for the cross-tabulation are specified
413+
names for the cross-tabulation are specified.
414+
415+
Any input passed containing Categorical data will have **all** of its
416+
categories included in the cross-tabulation, even if the actual data does
417+
not contain any instances of a particular category.
414418
415419
In the event that there aren't overlapping indexes an empty DataFrame will
416420
be returned.
@@ -434,6 +438,16 @@ def crosstab(index, columns, values=None, rownames=None, colnames=None,
434438
bar 1 2 1 0
435439
foo 2 2 1 2
436440
441+
>>> foo = pd.Categorical(['a', 'b'], categories=['a', 'b', 'c'])
442+
>>> bar = pd.Categorical(['d', 'e'], categories=['d', 'e', 'f'])
443+
>>> crosstab(foo, bar) # 'c' and 'f' are not represented in the data,
444+
# but they still will be counted in the output
445+
col_0 d e f
446+
row_0
447+
a 1 0 0
448+
b 0 1 0
449+
c 0 0 0
450+
437451
Returns
438452
-------
439453
crosstab : DataFrame

0 commit comments

Comments
 (0)