@@ -44,11 +44,26 @@ The categorical data type is useful in the following cases:
44
44
* As a signal to other Python libraries that this column should be treated as a categorical
45
45
variable (e.g. to use suitable statistical methods or plot types).
46
46
47
+ .. note ::
48
+
49
+ In contrast to R's `factor ` function, categorical data is not converting input values to
50
+ strings and categories will end up the same data type as the original values.
51
+
52
+ .. note ::
53
+
54
+ In contrast to R's `factor ` function, there is currently no way to assign/change labels at
55
+ creation time. Use `categories ` to change the categories after creation time.
56
+
47
57
See also the :ref: `API docs on categoricals<api.categorical> `.
48
58
59
+ .. _categorical.objectcreation :
60
+
49
61
Object Creation
50
62
---------------
51
63
64
+ Series Creation
65
+ ~~~~~~~~~~~~~~~
66
+
52
67
Categorical ``Series `` or columns in a ``DataFrame `` can be created in several ways:
53
68
54
69
By specifying ``dtype="category" `` when constructing a ``Series ``:
@@ -77,7 +92,7 @@ discrete bins. See the :ref:`example on tiling <reshaping.tile.cut>` in the docs
77
92
df[' group' ] = pd.cut(df.value, range (0 , 105 , 10 ), right = False , labels = labels)
78
93
df.head(10 )
79
94
80
- By passing a :class: `pandas.Categorical ` object to a `Series ` or assigning it to a `DataFrame `.
95
+ By passing a :class: `pandas.Categorical ` object to a `` Series `` or assigning it to a `` DataFrame ` `.
81
96
82
97
.. ipython :: python
83
98
@@ -89,6 +104,56 @@ By passing a :class:`pandas.Categorical` object to a `Series` or assigning it to
89
104
df[" B" ] = raw_cat
90
105
df
91
106
107
+ Categorical data has a specific ``category `` :ref: `dtype <basics.dtypes >`:
108
+
109
+ .. ipython :: python
110
+
111
+ df.dtypes
112
+
113
+ DataFrame Creation
114
+ ~~~~~~~~~~~~~~~~~~
115
+
116
+ Columns in a ``DataFrame `` can be batch converted to categorical, either at the time of construction
117
+ or after construction. The conversion to categorical is done on a column by column basis; labels present
118
+ in a one column will not be carried over and used as categories in another column.
119
+
120
+ Columns can be batch converted by specifying ``dtype="category" `` when constructing a ``DataFrame ``:
121
+
122
+ .. ipython :: python
123
+
124
+ df = pd.DataFrame({' A' : list (' abca' ), ' B' : list (' bccd' )}, dtype = " category" )
125
+ df.dtypes
126
+
127
+ Note that the categories present in each column differ; since the conversion is done on a column by column
128
+ basis, only labels present in a given column are categories:
129
+
130
+ .. ipython :: python
131
+
132
+ df[' A' ]
133
+ df[' B' ]
134
+
135
+
136
+ .. versionadded :: 0.23.0
137
+
138
+ Similarly, columns in an existing ``DataFrame `` can be batch converted using :meth: `DataFrame.astype `:
139
+
140
+ .. ipython :: python
141
+
142
+ df = pd.DataFrame({' A' : list (' abca' ), ' B' : list (' bccd' )})
143
+ df_cat = df.astype(' category' )
144
+ df_cat.dtypes
145
+
146
+ This conversion is likewise done on a column by column basis:
147
+
148
+ .. ipython :: python
149
+
150
+ df_cat[' A' ]
151
+ df_cat[' B' ]
152
+
153
+
154
+ Controlling Behavior
155
+ ~~~~~~~~~~~~~~~~~~~~
156
+
92
157
In the examples above where we passed ``dtype='category' ``, we used the default
93
158
behavior:
94
159
@@ -108,21 +173,30 @@ of :class:`~pandas.api.types.CategoricalDtype`.
108
173
s_cat = s.astype(cat_type)
109
174
s_cat
110
175
111
- Categorical data has a specific ``category `` :ref: `dtype <basics.dtypes >`:
176
+ Similarly, a ``CategoricalDtype `` can be used with a ``DataFrame `` to ensure that categories
177
+ are consistent among all columns.
112
178
113
179
.. ipython :: python
114
180
115
- df.dtypes
181
+ df = pd.DataFrame({' A' : list (' abca' ), ' B' : list (' bccd' )})
182
+ cat_type = CategoricalDtype(categories = list (' abcd' ),
183
+ ordered = True )
184
+ df_cat = df.astype(cat_type)
185
+ df_cat[' A' ]
186
+ df_cat[' B' ]
116
187
117
- .. note ::
188
+ If you already have `codes ` and `categories `, you can use the
189
+ :func: `~pandas.Categorical.from_codes ` constructor to save the factorize step
190
+ during normal constructor mode:
118
191
119
- In contrast to R's `factor ` function, categorical data is not converting input values to
120
- strings and categories will end up the same data type as the original values.
192
+ .. ipython :: python
121
193
122
- .. note ::
194
+ splitter = np.random.choice([0 ,1 ], 5 , p = [0.5 ,0.5 ])
195
+ s = pd.Series(pd.Categorical.from_codes(splitter, categories = [" train" , " test" ]))
123
196
124
- In contrast to R's `factor ` function, there is currently no way to assign/change labels at
125
- creation time. Use `categories ` to change the categories after creation time.
197
+
198
+ Regaining Original Data
199
+ ~~~~~~~~~~~~~~~~~~~~~~~
126
200
127
201
To get back to the original ``Series `` or NumPy array, use
128
202
``Series.astype(original_dtype) `` or ``np.asarray(categorical) ``:
@@ -136,15 +210,6 @@ To get back to the original ``Series`` or NumPy array, use
136
210
s2.astype(str )
137
211
np.asarray(s2)
138
212
139
- If you already have `codes ` and `categories `, you can use the
140
- :func: `~pandas.Categorical.from_codes ` constructor to save the factorize step
141
- during normal constructor mode:
142
-
143
- .. ipython :: python
144
-
145
- splitter = np.random.choice([0 ,1 ], 5 , p = [0.5 ,0.5 ])
146
- s = pd.Series(pd.Categorical.from_codes(splitter, categories = [" train" , " test" ]))
147
-
148
213
.. _categorical.categoricaldtype :
149
214
150
215
CategoricalDtype
0 commit comments