@@ -46,9 +46,14 @@ The categorical data type is useful in the following cases:
46
46
47
47
See also the :ref: `API docs on categoricals<api.categorical> `.
48
48
49
+ .. _categorical.objectcreation :
50
+
49
51
Object Creation
50
52
---------------
51
53
54
+ Series Creation
55
+ ~~~~~~~~~~~~~~~
56
+
52
57
Categorical ``Series `` or columns in a ``DataFrame `` can be created in several ways:
53
58
54
59
By specifying ``dtype="category" `` when constructing a ``Series ``:
@@ -77,7 +82,7 @@ discrete bins. See the :ref:`example on tiling <reshaping.tile.cut>` in the docs
77
82
df[' group' ] = pd.cut(df.value, range (0 , 105 , 10 ), right = False , labels = labels)
78
83
df.head(10 )
79
84
80
- By passing a :class: `pandas.Categorical ` object to a `Series ` or assigning it to a `DataFrame `.
85
+ By passing a :class: `pandas.Categorical ` object to a `` Series `` or assigning it to a `` DataFrame ` `.
81
86
82
87
.. ipython :: python
83
88
@@ -89,6 +94,55 @@ By passing a :class:`pandas.Categorical` object to a `Series` or assigning it to
89
94
df[" B" ] = raw_cat
90
95
df
91
96
97
+ Categorical data has a specific ``category `` :ref: `dtype <basics.dtypes >`:
98
+
99
+ .. ipython :: python
100
+
101
+ df.dtypes
102
+
103
+ DataFrame Creation
104
+ ~~~~~~~~~~~~~~~~~~
105
+
106
+ Similar to the previous section where a single column was converted to categorical, all columns in a
107
+ ``DataFrame `` can be batch converted to categorical either during or after construction.
108
+
109
+ This can be done during construction by specifying ``dtype="category" `` in the ``DataFrame `` constructor:
110
+
111
+ .. ipython :: python
112
+
113
+ df = pd.DataFrame({' A' : list (' abca' ), ' B' : list (' bccd' )}, dtype = " category" )
114
+ df.dtypes
115
+
116
+ Note that the categories present in each column differ; the conversion is done column by column, so
117
+ only labels present in a given column are categories:
118
+
119
+ .. ipython :: python
120
+
121
+ df[' A' ]
122
+ df[' B' ]
123
+
124
+
125
+ .. versionadded :: 0.23.0
126
+
127
+ Analogously, all columns in an existing ``DataFrame `` can be batch converted using :meth: `DataFrame.astype `:
128
+
129
+ .. ipython :: python
130
+
131
+ df = pd.DataFrame({' A' : list (' abca' ), ' B' : list (' bccd' )})
132
+ df_cat = df.astype(' category' )
133
+ df_cat.dtypes
134
+
135
+ This conversion is likewise done column by column:
136
+
137
+ .. ipython :: python
138
+
139
+ df_cat[' A' ]
140
+ df_cat[' B' ]
141
+
142
+
143
+ Controlling Behavior
144
+ ~~~~~~~~~~~~~~~~~~~~
145
+
92
146
In the examples above where we passed ``dtype='category' ``, we used the default
93
147
behavior:
94
148
@@ -108,21 +162,36 @@ of :class:`~pandas.api.types.CategoricalDtype`.
108
162
s_cat = s.astype(cat_type)
109
163
s_cat
110
164
111
- Categorical data has a specific ``category `` :ref: `dtype <basics.dtypes >`:
165
+ Similarly, a ``CategoricalDtype `` can be used with a ``DataFrame `` to ensure that categories
166
+ are consistent among all columns.
112
167
113
168
.. ipython :: python
114
169
115
- df.dtypes
170
+ df = pd.DataFrame({' A' : list (' abca' ), ' B' : list (' bccd' )})
171
+ cat_type = CategoricalDtype(categories = list (' abcd' ),
172
+ ordered = True )
173
+ df_cat = df.astype(cat_type)
174
+ df_cat[' A' ]
175
+ df_cat[' B' ]
116
176
117
177
.. note ::
118
178
119
- In contrast to R's `factor ` function, categorical data is not converting input values to
120
- strings and categories will end up the same data type as the original values.
179
+ To perform table-wise conversion, where all labels in the entire ``DataFrame `` are used as
180
+ categories for each column, the ``categories `` parameter can be determined programatically by
181
+ ``categories = pd.unique(df.values.ravel()) ``.
121
182
122
- .. note ::
183
+ If you already have ``codes `` and ``categories ``, you can use the
184
+ :func: `~pandas.Categorical.from_codes ` constructor to save the factorize step
185
+ during normal constructor mode:
123
186
124
- In contrast to R's `factor ` function, there is currently no way to assign/change labels at
125
- creation time. Use `categories ` to change the categories after creation time.
187
+ .. ipython :: python
188
+
189
+ splitter = np.random.choice([0 ,1 ], 5 , p = [0.5 ,0.5 ])
190
+ s = pd.Series(pd.Categorical.from_codes(splitter, categories = [" train" , " test" ]))
191
+
192
+
193
+ Regaining Original Data
194
+ ~~~~~~~~~~~~~~~~~~~~~~~
126
195
127
196
To get back to the original ``Series `` or NumPy array, use
128
197
``Series.astype(original_dtype) `` or ``np.asarray(categorical) ``:
@@ -136,14 +205,15 @@ To get back to the original ``Series`` or NumPy array, use
136
205
s2.astype(str )
137
206
np.asarray(s2)
138
207
139
- If you already have `codes ` and `categories `, you can use the
140
- :func: `~pandas.Categorical.from_codes ` constructor to save the factorize step
141
- during normal constructor mode:
208
+ .. note ::
142
209
143
- .. ipython :: python
210
+ In contrast to R's `factor ` function, categorical data is not converting input values to
211
+ strings; categories will end up the same data type as the original values.
144
212
145
- splitter = np.random.choice([0 ,1 ], 5 , p = [0.5 ,0.5 ])
146
- s = pd.Series(pd.Categorical.from_codes(splitter, categories = [" train" , " test" ]))
213
+ .. note ::
214
+
215
+ In contrast to R's `factor ` function, there is currently no way to assign/change labels at
216
+ creation time. Use `categories ` to change the categories after creation time.
147
217
148
218
.. _categorical.categoricaldtype :
149
219
0 commit comments