@@ -746,6 +746,79 @@ The ``indicator`` argument will also accept string arguments, in which case the
746
746
pd.merge(df1, df2, on = ' col1' , how = ' outer' , indicator = ' indicator_column' )
747
747
748
748
749
+ .. _merging.dtypes :
750
+
751
+ Merge Dtypes
752
+ ~~~~~~~~~~~~
753
+
754
+ .. versionadded :: 0.19.0
755
+
756
+ Merging will preserve the dtype of the join keys.
757
+
758
+ .. ipython :: python
759
+
760
+ left = pd.DataFrame({' key' : [1 ], ' v1' : [10 ]})
761
+ left
762
+ right = pd.DataFrame({' key' : [1 , 2 ], ' v1' : [20 , 30 ]})
763
+ right
764
+
765
+ We are able to preserve the join keys
766
+
767
+ .. ipython :: python
768
+
769
+ pd.merge(left, right, how = ' outer' )
770
+ pd.merge(left, right, how = ' outer' ).dtypes
771
+
772
+ Of course if you have missing values that are introduced, then the
773
+ resulting dtype will be upcast.
774
+
775
+ .. ipython :: python
776
+
777
+ pd.merge(left, right, how = ' outer' , on = ' key' )
778
+ pd.merge(left, right, how = ' outer' , on = ' key' ).dtypes
779
+
780
+ .. versionadded :: 0.20.0
781
+
782
+ Merging will preserve ``category `` dtypes of the mergands.
783
+
784
+ The left frame.
785
+
786
+ .. ipython :: python
787
+
788
+ X = pd.Series(np.random.choice([' foo' , ' bar' ], size = (10 ,)))
789
+ X = X.astype(' category' , categories = [' foo' , ' bar' ])
790
+
791
+ left = DataFrame({' X' : X,
792
+ ' Y' : np.random.choice([' one' , ' two' , ' three' ], size = (10 ,))})
793
+ left
794
+ left.dtypes
795
+
796
+ The right frame.
797
+
798
+ .. ipython :: python
799
+
800
+ right = DataFrame({' X' : Series([' foo' , ' bar' ]).astype(' category' , categories = [' foo' , ' bar' ]),
801
+ ' Z' : [1 , 2 ]})
802
+ right
803
+ right.dtypes
804
+
805
+ The merged result
806
+
807
+ .. ipython :: python
808
+
809
+ result = pd.merge(left, right, how = ' outer' )
810
+ result
811
+ result.dtypes
812
+
813
+ .. note ::
814
+
815
+ The category dtypes must be *exactly * the same, meaning the same categories and the ordered attribute.
816
+ Otherwise the result will coerce to ``object `` dtype.
817
+
818
+ .. note ::
819
+
820
+ Merging on ``category `` dtypes that are the same can be quite performant compared to ``object `` dtype merging.
821
+
749
822
.. _merging.join.index :
750
823
751
824
Joining on index
0 commit comments