You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Create a second column in the first DataFrame with the values of the first column using a Categorical data type. The unique set of category values for the new column are derived from the data used to create the column. Extract the unique set of categories for this column as an index object.
1369
+
1370
+
.. ipython:: python
1371
+
1372
+
df1['B'] = df1.A.astype('category')
1373
+
i = df1.B.cat.categories
1374
+
i
1375
+
1376
+
Extract the unique set of categories as a second index from the column in the second DataFrame.
1377
+
1378
+
.. ipython:: python
1379
+
1380
+
i2 = df2.A.astype('category').cat.categories
1381
+
i2
1382
+
1383
+
Use the symmetric difference operator on the two indexes to get the unique set of categorical values not in both lists and add those values to the categories from the df1.B categories.
We've now recovered the original list of unique ascii letters.
1393
+
1394
+
.. ipython:: python
1395
+
1396
+
(np.array(sorted(cats)) ==sorted(uniques)).all()
1397
+
1398
+
This unique list of categorical values can be used to create a CategoricalDtype. Columns created with this type will be of type Categorical and have the Categories of the CategoricalDtype specified rather than building a set of categories based on the data in the column.
1399
+
1400
+
.. ipython:: python
1401
+
1402
+
cat_type = CategoricalDtype(categories=cats)
1403
+
df2['B'] = df2['A'].astype(cat_type)
1404
+
1405
+
Comparing the codes used for the categorical columns of the two DataFrames shows that the same codes are used for each.
1406
+
1407
+
Note: Column B in df2 has an extra value since it contains "Z" whereas column B in df1 does not.
0 commit comments