@@ -129,6 +129,48 @@ This conversion is likewise done column by column:
129
129
df_cat[' A' ]
130
130
df_cat[' B' ]
131
131
132
+ Dummy / indicator / one-hot encoded variables
133
+ ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
134
+
135
+ Some operations, like regression and classification,
136
+ encodes a single categorical variable as a column for each category,
137
+ with each row having False in all but one column (True).
138
+ These are called dummy variables, or one-hot encoding.
139
+ :class: `pandas.Categorical`s can easily be converted to and from such an encoding:
140
+
141
+ .. ipython:: python
142
+
143
+ cat = pd.Categorical(["a", "b", "b", "c"])
144
+ cat
145
+
146
+ dummies = cat.to_dummies()
147
+ dummies
148
+
149
+ pd.Categorical.from_dummies(dummies)
150
+
151
+ The :meth:`pandas.Categorical.from_dummies ` class method accepts a dataframe
152
+ whose dtypes are coercible to boolean, and an ``ordered `` argument
153
+ for whether the resulting ``Categorical `` should be considered ordered
154
+ (like the ``Categorical `` constructor).
155
+ A column with a NA index will be ignored.
156
+ Any row which is entirely falsey, or has a missing value,
157
+ will be uncategorised.
158
+
159
+ :meth: `pandas.Categorical.to_dummies ` produces a boolean dataframe of dummy variables.
160
+ If the ``na_column `` argument is ``None `` (default),
161
+ missing items will result in a row of ``False ``.
162
+ Otherwise, the value of ``na_column `` will be used as the index
163
+ of an extra column representing these items:
164
+
165
+ .. ipython :: python
166
+
167
+ cat = pd.Categorical([" a" , " b" , np.nan])
168
+ cat.to_dummies(na_column = " other" )
169
+
170
+ For more control over data types and column names,
171
+ see :func: `pandas.get_dummies `.
172
+
173
+ .. versionadded :: 1.1.0
132
174
133
175
Controlling behavior
134
176
~~~~~~~~~~~~~~~~~~~~
0 commit comments