@@ -96,12 +96,20 @@ By passing a :class:`pandas.Categorical` object to a `Series` or assigning it to
96
96
df[" B" ] = raw_cat
97
97
df
98
98
99
- You can also specify differently ordered categories or make the resulting data ordered, by passing these arguments to ``astype() ``:
99
+ Anywhere above we passed a keyword ``dtype='category' ``, we used the default behavior of
100
+
101
+ 1. categories are inferred from the data
102
+ 2. categories are unordered.
103
+
104
+ To control those behaviors, instead of passing ``'category' ``, use an instance
105
+ of :class: `~pd.api.types.CategoricalDtype `.
100
106
101
107
.. ipython :: python
102
108
103
- s = pd.Series([" a" ," b" ," c" ," a" ])
104
- s_cat = s.astype(" category" , categories = [" b" ," c" ," d" ], ordered = False )
109
+ s = pd.Series([" a" , " b" , " c" , " a" ])
110
+ cat_type = pd.api.types.CategoricalDtype(categories = [" b" , " c" , " d" ],
111
+ ordered = False )
112
+ s_cat = s.astype(cat_type)
105
113
s_cat
106
114
107
115
Categorical data has a specific ``category `` :ref: `dtype <basics.dtypes >`:
@@ -140,6 +148,62 @@ constructor to save the factorize step during normal constructor mode:
140
148
splitter = np.random.choice([0 ,1 ], 5 , p = [0.5 ,0.5 ])
141
149
s = pd.Series(pd.Categorical.from_codes(splitter, categories = [" train" , " test" ]))
142
150
151
+ CategoricalDtype
152
+ ----------------
153
+
154
+ .. versionchanged :: 0.21.0
155
+
156
+ A categorical's type is fully described by 1.) its categories (an iterable with
157
+ unique values and no missing values), and 2.) its orderedness (a boolean).
158
+ This information can be stored in a :class: `~pandas.api.types.CategoricalDtype `.
159
+ The ``categories `` argument is optional, which implies that the actual categories
160
+ should be inferred from whatever is present in the data when the
161
+ :class: `pandas.Categorical ` is created.
162
+
163
+ .. ipython :: python
164
+
165
+ pd.api.types.CategoricalDtype([' a' , ' b' , ' c' ])
166
+ pd.api.types.CategoricalDtype([' a' , ' b' , ' c' ], ordered = True )
167
+ pd.api.types.CategoricalDtype()
168
+
169
+ A :class: `~pandas.api.types.CategoricalDtype ` can be used in any place pandas
170
+ expects a `dtype `. For example :func: `pandas.read_csv `,
171
+ :func: `pandas.DataFrame.astype `, or the Series constructor.
172
+
173
+ As a convenience, you can use the string `'category' ` in place of a
174
+ :class: `~pandas.api.types.CategoricalDtype ` when you want the default behavior of
175
+ the categories being unordered, and equal to the set values present in the
176
+ array. On other words, ``dtype='category' `` is equivalent to
177
+ ``dtype=pd.api.types.CategoricalDtype() ``.
178
+
179
+ Equality Semantics
180
+ ~~~~~~~~~~~~~~~~~~
181
+
182
+ Two instances of :class: `~pandas.api.types.CategoricalDtype ` compare equal whenever the have
183
+ the same categories and orderedness. When comparing two unordered categoricals, the
184
+ order of the ``categories `` is not considered
185
+
186
+ .. ipython :: python
187
+
188
+ c1 = pd.api.types.CategoricalDtype([' a' , ' b' , ' c' ], ordered = False )
189
+ # Equal, since order is not considered when ordered=False
190
+ c1 == pd.api.types.CategoricalDtype([' b' , ' c' , ' a' ], ordered = False )
191
+ # Unequal, since the second CategoricalDtype is ordered
192
+ c1 == pd.api.types.CategoricalDtype([' a' , ' b' , ' c' ], ordered = True )
193
+
194
+ All instances of ``CategoricalDtype `` compare equal to the string ``'category' ``
195
+
196
+ .. ipython :: python
197
+
198
+ c1 == ' category'
199
+
200
+
201
+ .. warning ::
202
+
203
+ Since ``dtype='category' `` is essentially ``CategoricalDtype(None, False) ``,
204
+ and since all instances ``CategoricalDtype `` compare equal to ``'`category' ``,
205
+ all instances of ``CategoricalDtype `` compare equal to a ``CategoricalDtype(None) ``
206
+
143
207
Description
144
208
-----------
145
209
@@ -189,7 +253,9 @@ It's also possible to pass in the categories in a specific order:
189
253
190
254
.. ipython :: python
191
255
192
- s = pd.Series(list (' babc' )).astype(' category' , categories = list (' abcd' ))
256
+ s = pd.Series(list (' babc' )).astype(
257
+ pd.api.types.CategoricalDtype(list (' abcd' ))
258
+ )
193
259
s
194
260
195
261
# categories
@@ -306,7 +372,9 @@ meaning and certain operations are possible. If the categorical is unordered, ``
306
372
307
373
s = pd.Series(pd.Categorical([" a" ," b" ," c" ," a" ], ordered = False ))
308
374
s.sort_values(inplace = True )
309
- s = pd.Series([" a" ," b" ," c" ," a" ]).astype(' category' , ordered = True )
375
+ s = pd.Series([" a" ," b" ," c" ," a" ]).astype(
376
+ pd.api.types.CategoricalDtype(ordered = True )
377
+ )
310
378
s.sort_values(inplace = True )
311
379
s
312
380
s.min(), s.max()
@@ -406,9 +474,15 @@ categories or a categorical with any list-like object, will raise a TypeError.
406
474
407
475
.. ipython :: python
408
476
409
- cat = pd.Series([1 ,2 ,3 ]).astype(" category" , categories = [3 ,2 ,1 ], ordered = True )
410
- cat_base = pd.Series([2 ,2 ,2 ]).astype(" category" , categories = [3 ,2 ,1 ], ordered = True )
411
- cat_base2 = pd.Series([2 ,2 ,2 ]).astype(" category" , ordered = True )
477
+ cat = pd.Series([1 ,2 ,3 ]).astype(
478
+ pd.api.types.CategoricalDtype([3 , 2 , 1 ], ordered = True )
479
+ )
480
+ cat_base = pd.Series([2 ,2 ,2 ]).astype(
481
+ pd.api.types.CategoricalDtype([3 , 2 , 1 ], ordered = True )
482
+ )
483
+ cat_base2 = pd.Series([2 ,2 ,2 ]).astype(
484
+ pd.api.types.CategoricalDtype(ordered = True )
485
+ )
412
486
413
487
cat
414
488
cat_base
0 commit comments