@@ -96,12 +96,19 @@ By passing a :class:`pandas.Categorical` object to a `Series` or assigning it to
96
96
df[" B" ] = raw_cat
97
97
df
98
98
99
- You can also specify differently ordered categories or make the resulting data ordered, by passing these arguments to ``astype() ``:
99
+ Anywhere above we passed a keyword ``dtype='category' ``, we used the default behavior of
100
+
101
+ 1. categories are inferred from the data
102
+ 2. categories are unordered.
103
+
104
+ To control those behaviors, instead of passing ``'category' ``, use an instance
105
+ of :class: `CategoricalDtype `.
100
106
101
107
.. ipython :: python
102
108
103
- s = pd.Series([" a" ," b" ," c" ," a" ])
104
- s_cat = s.astype(" category" , categories = [" b" ," c" ," d" ], ordered = False )
109
+ s = pd.Series([" a" , " b" , " c" , " a" ])
110
+ cat_type = pd.CategoricalDtype(categories = [" b" , " c" , " d" ], ordered = False )
111
+ s_cat = s.astype(cat_type)
105
112
s_cat
106
113
107
114
Categorical data has a specific ``category `` :ref: `dtype <basics.dtypes >`:
@@ -140,6 +147,61 @@ constructor to save the factorize step during normal constructor mode:
140
147
splitter = np.random.choice([0 ,1 ], 5 , p = [0.5 ,0.5 ])
141
148
s = pd.Series(pd.Categorical.from_codes(splitter, categories = [" train" , " test" ]))
142
149
150
+ CategoricalDtype
151
+ ----------------
152
+
153
+ .. versionchanged :: 0.21.0
154
+
155
+ A categorical's type is fully described by 1.) its categories (an iterable with
156
+ unique values and no missing values), and 2.) its orderedness (a boolean).
157
+ This information can be stored in a :class: `~pandas.CategoricalDtype `.
158
+ The ``categories `` argument is optional, which implies that the actual categories
159
+ should be inferred from whatever is present in the data when the
160
+ :class: `pandas.Categorical ` is created.
161
+
162
+ .. ipython :: python
163
+
164
+ pd.CategoricalDtype([' a' , ' b' , ' c' ])
165
+ pd.CategoricalDtype([' a' , ' b' , ' c' ], ordered = True )
166
+ pd.CategoricalDtype()
167
+
168
+ A :class: `~pandas.CategoricalDtype ` can be used in any place pandas expects a
169
+ `dtype `. For example :func: `pandas.read_csv `, :func: `pandas.DataFrame.astype `,
170
+ or the Series constructor.
171
+
172
+ As a convenience, you can use the string `'category' ` in place of a
173
+ :class: `pandas.CategoricalDtype ` when you want the default behavior of
174
+ the categories being unordered, and equal to the set values present in the array.
175
+ On other words, ``dtype='category' `` is equivalent to ``dtype=pd.CategoricalDtype() ``.
176
+
177
+ Equality Semantics
178
+ ~~~~~~~~~~~~~~~~~~
179
+
180
+ Two instances of :class: `pandas.CategoricalDtype ` compare equal whenever the have
181
+ the same categories and orderedness. When comparing two unordered categoricals, the
182
+ order of the ``categories `` is not considered
183
+
184
+ .. ipython :: python
185
+
186
+ c1 = pd.CategoricalDtype([' a' , ' b' , ' c' ], ordered = False )
187
+ # Equal, since order is not considered when ordered=False
188
+ c1 == pd.CategoricalDtype([' b' , ' c' , ' a' ], ordered = False )
189
+ # Unequal, since the second CategoricalDtype is ordered
190
+ c1 == pd.CategoricalDtype([' a' , ' b' , ' c' ], ordered = True )
191
+
192
+ All instances of ``CategoricalDtype `` compare equal to the string ``'category' ``
193
+
194
+ .. ipython :: python
195
+
196
+ c1 == ' category'
197
+
198
+
199
+ .. warning ::
200
+
201
+ Since ``dtype='category' `` is essentially ``CategoricalDtype(None, False) ``,
202
+ and since all instances ``CategoricalDtype `` compare equal to ``'`category' ``,
203
+ all instances of ``CategoricalDtype `` compare equal to a ``CategoricalDtype(None) ``
204
+
143
205
Description
144
206
-----------
145
207
@@ -189,7 +251,7 @@ It's also possible to pass in the categories in a specific order:
189
251
190
252
.. ipython :: python
191
253
192
- s = pd.Series(list (' babc' )).astype(' category ' , categories = list (' abcd' ))
254
+ s = pd.Series(list (' babc' )).astype(pd.CategoricalDtype( list (' abcd' ) ))
193
255
s
194
256
195
257
# categories
@@ -306,7 +368,7 @@ meaning and certain operations are possible. If the categorical is unordered, ``
306
368
307
369
s = pd.Series(pd.Categorical([" a" ," b" ," c" ," a" ], ordered = False ))
308
370
s.sort_values(inplace = True )
309
- s = pd.Series([" a" ," b" ," c" ," a" ]).astype(' category ' , ordered = True )
371
+ s = pd.Series([" a" ," b" ," c" ," a" ]).astype(pd.CategoricalDtype( ordered = True ) )
310
372
s.sort_values(inplace = True )
311
373
s
312
374
s.min(), s.max()
@@ -406,9 +468,9 @@ categories or a categorical with any list-like object, will raise a TypeError.
406
468
407
469
.. ipython :: python
408
470
409
- cat = pd.Series([1 ,2 ,3 ]).astype(" category " , categories = [3 ,2 , 1 ], ordered = True )
410
- cat_base = pd.Series([2 ,2 ,2 ]).astype(" category " , categories = [3 ,2 , 1 ], ordered = True )
411
- cat_base2 = pd.Series([2 ,2 ,2 ]).astype(" category " , ordered = True )
471
+ cat = pd.Series([1 ,2 ,3 ]).astype(pd.CategoricalDtype( [3 , 2 , 1 ], ordered = True ) )
472
+ cat_base = pd.Series([2 ,2 ,2 ]).astype(pd.CategoricalDtype( [3 , 2 , 1 ], ordered = True ) )
473
+ cat_base2 = pd.Series([2 ,2 ,2 ]).astype(pd.CategoricalDtype( ordered = True ) )
412
474
413
475
cat
414
476
cat_base
0 commit comments