@@ -89,12 +89,22 @@ By passing a :class:`pandas.Categorical` object to a `Series` or assigning it to
89
89
df[" B" ] = raw_cat
90
90
df
91
91
92
- You can also specify differently ordered categories or make the resulting data ordered, by passing these arguments to ``astype() ``:
92
+ Anywhere above we passed a keyword ``dtype='category' ``, we used the default behavior of
93
+
94
+ 1. categories are inferred from the data
95
+ 2. categories are unordered.
96
+
97
+ To control those behaviors, instead of passing ``'category' ``, use an instance
98
+ of :class: `~pandas.api.types.CategoricalDtype `.
93
99
94
100
.. ipython :: python
95
101
96
- s = pd.Series([" a" ," b" ," c" ," a" ])
97
- s_cat = s.astype(" category" , categories = [" b" ," c" ," d" ], ordered = False )
102
+ from pandas.api.types import CategoricalDtype
103
+
104
+ s = pd.Series([" a" , " b" , " c" , " a" ])
105
+ cat_type = CategoricalDtype(categories = [" b" , " c" , " d" ],
106
+ ordered = True )
107
+ s_cat = s.astype(cat_type)
98
108
s_cat
99
109
100
110
Categorical data has a specific ``category `` :ref: `dtype <basics.dtypes >`:
@@ -133,6 +143,75 @@ constructor to save the factorize step during normal constructor mode:
133
143
splitter = np.random.choice([0 ,1 ], 5 , p = [0.5 ,0.5 ])
134
144
s = pd.Series(pd.Categorical.from_codes(splitter, categories = [" train" , " test" ]))
135
145
146
+ .. _categorical.categoricaldtype :
147
+
148
+ CategoricalDtype
149
+ ----------------
150
+
151
+ .. versionchanged :: 0.21.0
152
+
153
+ A categorical's type is fully described by
154
+
155
+ 1. ``categories ``: a sequence of unique values and no missing values
156
+ 2. ``ordered ``: a boolean
157
+
158
+ This information can be stored in a :class: `~pandas.api.types.CategoricalDtype `.
159
+ The ``categories `` argument is optional, which implies that the actual categories
160
+ should be inferred from whatever is present in the data when the
161
+ :class: `pandas.Categorical ` is created. The categories are assumed to be unordered
162
+ by default.
163
+
164
+ .. ipython :: python
165
+
166
+ from pandas.api.types import CategoricalDtype
167
+
168
+ CategoricalDtype([' a' , ' b' , ' c' ])
169
+ CategoricalDtype([' a' , ' b' , ' c' ], ordered = True )
170
+ CategoricalDtype()
171
+
172
+ A :class: `~pandas.api.types.CategoricalDtype ` can be used in any place pandas
173
+ expects a `dtype `. For example :func: `pandas.read_csv `,
174
+ :func: `pandas.DataFrame.astype `, or in the Series constructor.
175
+
176
+ .. note ::
177
+
178
+ As a convenience, you can use the string ``'category' `` in place of a
179
+ :class: `~pandas.api.types.CategoricalDtype ` when you want the default behavior of
180
+ the categories being unordered, and equal to the set values present in the
181
+ array. In other words, ``dtype='category' `` is equivalent to
182
+ ``dtype=CategoricalDtype() ``.
183
+
184
+ Equality Semantics
185
+ ~~~~~~~~~~~~~~~~~~
186
+
187
+ Two instances of :class: `~pandas.api.types.CategoricalDtype ` compare equal
188
+ whenever they have the same categories and orderedness. When comparing two
189
+ unordered categoricals, the order of the ``categories `` is not considered
190
+
191
+ .. ipython :: python
192
+
193
+ c1 = CategoricalDtype([' a' , ' b' , ' c' ], ordered = False )
194
+
195
+ # Equal, since order is not considered when ordered=False
196
+ c1 == CategoricalDtype([' b' , ' c' , ' a' ], ordered = False )
197
+
198
+ # Unequal, since the second CategoricalDtype is ordered
199
+ c1 == CategoricalDtype([' a' , ' b' , ' c' ], ordered = True )
200
+
201
+ All instances of ``CategoricalDtype `` compare equal to the string ``'category' ``
202
+
203
+ .. ipython :: python
204
+
205
+ c1 == ' category'
206
+
207
+ .. warning ::
208
+
209
+ Since ``dtype='category' `` is essentially ``CategoricalDtype(None, False) ``,
210
+ and since all instances ``CategoricalDtype `` compare equal to ``'category' ``,
211
+ all instances of ``CategoricalDtype `` compare equal to a
212
+ ``CategoricalDtype(None, False) ``, regardless of ``categories `` or
213
+ ``ordered ``.
214
+
136
215
Description
137
216
-----------
138
217
@@ -146,6 +225,8 @@ Using ``.describe()`` on categorical data will produce similar output to a `Seri
146
225
df.describe()
147
226
df[" cat" ].describe()
148
227
228
+ .. _categorical.cat :
229
+
149
230
Working with categories
150
231
-----------------------
151
232
@@ -182,7 +263,7 @@ It's also possible to pass in the categories in a specific order:
182
263
183
264
.. ipython :: python
184
265
185
- s = pd.Series(list (' babc' )).astype(' category ' , categories = list (' abcd' ))
266
+ s = pd.Series(list (' babc' )).astype(CategoricalDtype( list (' abcd' ) ))
186
267
s
187
268
188
269
# categories
@@ -204,6 +285,10 @@ by using the :func:`Categorical.rename_categories` method:
204
285
s.cat.categories = [" Group %s " % g for g in s.cat.categories]
205
286
s
206
287
s.cat.rename_categories([1 ,2 ,3 ])
288
+ s
289
+ # You can also pass a dict-like object to map the renaming
290
+ s.cat.rename_categories({1 : ' x' , 2 : ' y' , 3 : ' z' })
291
+ s
207
292
208
293
.. note ::
209
294
@@ -295,7 +380,9 @@ meaning and certain operations are possible. If the categorical is unordered, ``
295
380
296
381
s = pd.Series(pd.Categorical([" a" ," b" ," c" ," a" ], ordered = False ))
297
382
s.sort_values(inplace = True )
298
- s = pd.Series([" a" ," b" ," c" ," a" ]).astype(' category' , ordered = True )
383
+ s = pd.Series([" a" ," b" ," c" ," a" ]).astype(
384
+ CategoricalDtype(ordered = True )
385
+ )
299
386
s.sort_values(inplace = True )
300
387
s
301
388
s.min(), s.max()
@@ -395,9 +482,15 @@ categories or a categorical with any list-like object, will raise a TypeError.
395
482
396
483
.. ipython :: python
397
484
398
- cat = pd.Series([1 ,2 ,3 ]).astype(" category" , categories = [3 ,2 ,1 ], ordered = True )
399
- cat_base = pd.Series([2 ,2 ,2 ]).astype(" category" , categories = [3 ,2 ,1 ], ordered = True )
400
- cat_base2 = pd.Series([2 ,2 ,2 ]).astype(" category" , ordered = True )
485
+ cat = pd.Series([1 ,2 ,3 ]).astype(
486
+ CategoricalDtype([3 , 2 , 1 ], ordered = True )
487
+ )
488
+ cat_base = pd.Series([2 ,2 ,2 ]).astype(
489
+ CategoricalDtype([3 , 2 , 1 ], ordered = True )
490
+ )
491
+ cat_base2 = pd.Series([2 ,2 ,2 ]).astype(
492
+ CategoricalDtype(ordered = True )
493
+ )
401
494
402
495
cat
403
496
cat_base
0 commit comments