Skip to content

Commit 16e2fbe

Browse files
committed
DOC: merge docs
1 parent a4b2ee6 commit 16e2fbe

File tree

2 files changed

+76
-0
lines changed

2 files changed

+76
-0
lines changed

doc/source/categorical.rst

+3
Original file line numberDiff line numberDiff line change
@@ -646,6 +646,9 @@ In this case the categories are not the same and so an error is raised:
646646
647647
The same applies to ``df.append(df_different)``.
648648

649+
See also the section on :ref:`merge dtypes<merging.dtypes>` for notes about preserving merge dtypes and performance.
650+
651+
649652
.. _categorical.union:
650653

651654
Unioning

doc/source/merging.rst

+73
Original file line numberDiff line numberDiff line change
@@ -746,6 +746,79 @@ The ``indicator`` argument will also accept string arguments, in which case the
746746
pd.merge(df1, df2, on='col1', how='outer', indicator='indicator_column')
747747
748748
749+
.. _merging.dtypes:
750+
751+
Merge Dtypes
752+
~~~~~~~~~~~~
753+
754+
.. versionadded:: 0.19.0
755+
756+
Merging will preserve the dtype of the join keys.
757+
758+
.. ipython:: python
759+
760+
df1 = pd.DataFrame({'key': [1], 'v1': [10]})
761+
df1
762+
df2 = pd.DataFrame({'key': [1, 2], 'v1': [20, 30]})
763+
df2
764+
765+
We are able to preserve the join keys
766+
767+
.. ipython:: python
768+
769+
pd.merge(df1, df2, how='outer')
770+
pd.merge(df1, df2, how='outer').dtypes
771+
772+
Of course if you have missing values that are introduced, then the
773+
resulting dtype will be upcast.
774+
775+
.. ipython:: python
776+
777+
pd.merge(df1, df2, how='outer', on='key')
778+
pd.merge(df1, df2, how='outer', on='key').dtypes
779+
780+
.. versionadded:: 0.20.0
781+
782+
Merging will preserve ``category`` dtypes of the mergands.
783+
784+
The left frame.
785+
786+
.. ipython:: python
787+
788+
X = pd.Series(np.random.choice(['foo', 'bar'], size=(10,)))
789+
X = X.astype('category', categories=['foo', 'bar'])
790+
791+
left = DataFrame({'X': X,
792+
'Y': np.random.choice(['one', 'two', 'three'], size=(10,))})
793+
left
794+
left.dtypes
795+
796+
The right frame.
797+
798+
.. ipython:: python
799+
800+
right = DataFrame({'X': Series(['foo', 'bar']).astype('category', categories=['foo', 'bar']),
801+
'Z': [1, 2]})
802+
right
803+
right.dtypes
804+
805+
The merged result
806+
807+
.. ipython:: python
808+
809+
result = pd.merge(left, right, how='outer')
810+
result
811+
result.dtypes
812+
813+
.. note::
814+
815+
The category dtypes must be *exactly* the same, meaning the same categories and the ordered attribute.
816+
Otherwise the result will coerce to ``object`` dtype.
817+
818+
.. note::
819+
820+
Merging on ``category`` dtypes that are the same can be quite performant compared to ``object`` dtype merging.
821+
749822
.. _merging.join.index:
750823

751824
Joining on index

0 commit comments

Comments
 (0)