Skip to content

Commit 3716ad2

Browse files
committed
User guide: Describe Categorical.(to|from)_dummies
1 parent edacb70 commit 3716ad2

File tree

2 files changed

+47
-0
lines changed

2 files changed

+47
-0
lines changed

doc/source/user_guide/categorical.rst

+42
Original file line numberDiff line numberDiff line change
@@ -129,6 +129,48 @@ This conversion is likewise done column by column:
129129
df_cat['A']
130130
df_cat['B']
131131
132+
Dummy / indicator / one-hot encoded variables
133+
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
134+
135+
Some operations, like regression and classification,
136+
encodes a single categorical variable as a column for each category,
137+
with each row having False in all but one column (True).
138+
These are called dummy variables, or one-hot encoding.
139+
:class:`pandas.Categorical`s can easily be converted to and from such an encoding:
140+
141+
.. ipython:: python
142+
143+
cat = pd.Categorical(["a", "b", "b", "c"])
144+
cat
145+
146+
dummies = cat.to_dummies()
147+
dummies
148+
149+
pd.Categorical.from_dummies(dummies)
150+
151+
The :meth:`pandas.Categorical.from_dummies` class method accepts a dataframe
152+
whose dtypes are coercible to boolean, and an ``ordered`` argument
153+
for whether the resulting ``Categorical`` should be considered ordered
154+
(like the ``Categorical`` constructor).
155+
A column with a NA index will be ignored.
156+
Any row which is entirely falsey, or has a missing value,
157+
will be uncategorised.
158+
159+
:meth:`pandas.Categorical.to_dummies` produces a boolean dataframe of dummy variables.
160+
If the ``na_column`` argument is ``None`` (default),
161+
missing items will result in a row of ``False``.
162+
Otherwise, the value of ``na_column`` will be used as the index
163+
of an extra column representing these items:
164+
165+
.. ipython:: python
166+
167+
cat = pd.Categorical(["a", "b", np.nan])
168+
cat.to_dummies(na_column="other")
169+
170+
For more control over data types and column names,
171+
see :func:`pandas.get_dummies`.
172+
173+
.. versionadded:: 1.1.0
132174

133175
Controlling behavior
134176
~~~~~~~~~~~~~~~~~~~~

doc/source/user_guide/reshaping.rst

+5
Original file line numberDiff line numberDiff line change
@@ -665,6 +665,11 @@ To choose another dtype, use the ``dtype`` argument:
665665
666666
.. versionadded:: 0.23.0
667667

668+
For converting :class:`pandas.Categorical` objects directly
669+
to and from ``DataFrame``s of dummy variables, see
670+
:meth:`pandas.Categorical.to_dummies`and :meth:`pandas.Categorical.from_dummies`.
671+
672+
.. versionadded:: 1.1.0
668673
669674
.. _reshaping.factorize:
670675

0 commit comments

Comments
 (0)