Skip to content

Commit 602cc46

Browse files
xgdgscjorisvandenbossche
authored andcommitted
DOC: warning section on memory overflow when joining/merging dataframes on index with duplicate keys (#14788)
closes #14736
1 parent 428c106 commit 602cc46

File tree

1 file changed

+23
-0
lines changed

1 file changed

+23
-0
lines changed

doc/source/merging.rst

+23
Original file line numberDiff line numberDiff line change
@@ -692,6 +692,29 @@ either the left or right tables, the values in the joined table will be
692692
p.plot([left, right], result,
693693
labels=['left', 'right'], vertical=False);
694694
plt.close('all');
695+
696+
Here is another example with duplicate join keys in DataFrames:
697+
698+
.. ipython:: python
699+
700+
left = pd.DataFrame({'A' : [1,2], 'B' : [2, 2]})
701+
702+
right = pd.DataFrame({'A' : [4,5,6], 'B': [2,2,2]})
703+
704+
result = pd.merge(left, right, on='B', how='outer')
705+
706+
.. ipython:: python
707+
:suppress:
708+
709+
@savefig merging_merge_on_key_dup.png
710+
p.plot([left, right], result,
711+
labels=['left', 'right'], vertical=False);
712+
plt.close('all');
713+
714+
.. warning::
715+
716+
Joining / merging on duplicate keys can cause a returned frame that is the multiplication of the row dimensions,
717+
may result in memory overflow. It is the user' s responsibility to manage duplicate values in keys before joining large DataFrames.
695718

696719
.. _merging.indicator:
697720

0 commit comments

Comments
 (0)