@@ -797,37 +797,52 @@ Assigning a ``Categorical`` to parts of a column of other types will use the val
797
797
df.dtypes
798
798
799
799
.. _categorical.merge :
800
+ .. _categorical.concat :
800
801
801
- Merging
802
- ~~~~~~~
802
+ Merging / Concatenation
803
+ ~~~~~~~~~~~~~~~~~~~~~~~
803
804
804
- You can concat two ``DataFrames `` containing categorical data together,
805
- but the categories of these categoricals need to be the same:
805
+ By default, combining ``Series `` or ``DataFrames `` which contain the same
806
+ categories results in ``category `` dtype, otherwise results will depend on the
807
+ dtype of the underlying categories. Merges that result in non-categorical
808
+ dtypes will likely have higher memory usage. Use ``.astype `` or
809
+ ``union_categoricals `` to ensure ``category `` results.
806
810
807
811
.. ipython :: python
808
812
809
- cat = pd.Series([" a" , " b" ], dtype = " category" )
810
- vals = [1 , 2 ]
811
- df = pd.DataFrame({" cats" : cat, " vals" : vals})
812
- res = pd.concat([df, df])
813
- res
814
- res.dtypes
813
+ from pandas.api.types import union_categoricals
815
814
816
- In this case the categories are not the same, and therefore an error is raised:
815
+ # same categories
816
+ s1 = pd.Series([' a' , ' b' ], dtype = ' category' )
817
+ s2 = pd.Series([' a' , ' b' , ' a' ], dtype = ' category' )
818
+ pd.concat([s1, s2])
817
819
818
- .. ipython :: python
820
+ # different categories
821
+ s3 = pd.Series([' b' , ' c' ], dtype = ' category' )
822
+ pd.concat([s1, s3])
819
823
820
- df_different = df.copy()
821
- df_different[" cats" ].cat.categories = [" c" , " d" ]
822
- try :
823
- pd.concat([df, df_different])
824
- except ValueError as e:
825
- print (" ValueError:" , str (e))
824
+ # Output dtype is inferred based on categories values
825
+ int_cats = pd.Series([1 , 2 ], dtype = " category" )
826
+ float_cats = pd.Series([3.0 , 4.0 ], dtype = " category" )
827
+ pd.concat([int_cats, float_cats])
828
+
829
+ pd.concat([s1, s3]).astype(' category' )
830
+ union_categoricals([s1.array, s3.array])
826
831
827
- The same applies to `` df.append(df_different) ``.
832
+ The following table summarizes the results of merging `` Categoricals ``:
828
833
829
- See also the section on :ref: `merge dtypes<merging.dtypes> ` for notes about preserving merge dtypes and performance.
834
+ +-------------------+------------------------+----------------------+-----------------------------+
835
+ | arg1 | arg2 | identical | result |
836
+ +===================+========================+======================+=============================+
837
+ | category | category | True | category |
838
+ +-------------------+------------------------+----------------------+-----------------------------+
839
+ | category (object) | category (object) | False | object (dtype is inferred) |
840
+ +-------------------+------------------------+----------------------+-----------------------------+
841
+ | category (int) | category (float) | False | float (dtype is inferred) |
842
+ +-------------------+------------------------+----------------------+-----------------------------+
830
843
844
+ See also the section on :ref: `merge dtypes<merging.dtypes> ` for notes about
845
+ preserving merge dtypes and performance.
831
846
832
847
.. _categorical.union :
833
848
@@ -918,46 +933,6 @@ the resulting array will always be a plain ``Categorical``:
918
933
# "b" is coded to 0 throughout, same as c1, different from c2
919
934
c.codes
920
935
921
- .. _categorical.concat :
922
-
923
- Concatenation
924
- ~~~~~~~~~~~~~
925
-
926
- This section describes concatenations specific to ``category `` dtype. See :ref: `Concatenating objects<merging.concat> ` for general description.
927
-
928
- By default, ``Series `` or ``DataFrame `` concatenation which contains the same categories
929
- results in ``category `` dtype, otherwise results in ``object `` dtype.
930
- Use ``.astype `` or ``union_categoricals `` to get ``category `` result.
931
-
932
- .. ipython :: python
933
-
934
- # same categories
935
- s1 = pd.Series([' a' , ' b' ], dtype = ' category' )
936
- s2 = pd.Series([' a' , ' b' , ' a' ], dtype = ' category' )
937
- pd.concat([s1, s2])
938
-
939
- # different categories
940
- s3 = pd.Series([' b' , ' c' ], dtype = ' category' )
941
- pd.concat([s1, s3])
942
-
943
- pd.concat([s1, s3]).astype(' category' )
944
- union_categoricals([s1.array, s3.array])
945
-
946
-
947
- Following table summarizes the results of ``Categoricals `` related concatenations.
948
-
949
- +----------+--------------------------------------------------------+----------------------------+
950
- | arg1 | arg2 | result |
951
- +==========+========================================================+============================+
952
- | category | category (identical categories) | category |
953
- +----------+--------------------------------------------------------+----------------------------+
954
- | category | category (different categories, both not ordered) | object (dtype is inferred) |
955
- +----------+--------------------------------------------------------+----------------------------+
956
- | category | category (different categories, either one is ordered) | object (dtype is inferred) |
957
- +----------+--------------------------------------------------------+----------------------------+
958
- | category | not category | object (dtype is inferred) |
959
- +----------+--------------------------------------------------------+----------------------------+
960
-
961
936
962
937
Getting data in/out
963
938
-------------------
0 commit comments