From 93a63a5148f5602d6a431ff4aeaee8cef5e07ed8 Mon Sep 17 00:00:00 2001 From: xgdgsc Date: Fri, 2 Dec 2016 17:37:16 +0800 Subject: [PATCH 1/8] add doc regarding #14736 --- doc/source/merging.rst | 5 +++++ 1 file changed, 5 insertions(+) diff --git a/doc/source/merging.rst b/doc/source/merging.rst index c6541a26c72b4..8e19f073ad8b3 100644 --- a/doc/source/merging.rst +++ b/doc/source/merging.rst @@ -548,6 +548,7 @@ standard database join operations between DataFrame objects: only appears in ``'left'`` DataFrame, ``right_only`` for observations whose merge key only appears in ``'right'`` DataFrame, and ``both`` if the observation's merge key is found in both. + .. versionadded:: 0.17.0 @@ -817,6 +818,10 @@ equivalent: left.join(right, on=key_or_keys) pd.merge(left, right, left_on=key_or_keys, right_index=True, how='left', sort=False) + +.. warning:: + + * Joining on index with duplicate keys when joining large dataframes would cause severe memory overflow, sometimes freezes the computer and user has to hard reboot, which can be dangerous for unsaved work. Please make sure no duplicate keys in index before joining. Obviously you can choose whichever form you find more convenient. For many-to-one joins (where one of the DataFrame's is already indexed by the join From 795a369ccf479ac8d31a8ee9b47f30d74a461af2 Mon Sep 17 00:00:00 2001 From: xgdgsc Date: Fri, 2 Dec 2016 17:38:55 +0800 Subject: [PATCH 2/8] remove blank line --- doc/source/merging.rst | 1 - 1 file changed, 1 deletion(-) diff --git a/doc/source/merging.rst b/doc/source/merging.rst index 8e19f073ad8b3..87c9af648c67b 100644 --- a/doc/source/merging.rst +++ b/doc/source/merging.rst @@ -548,7 +548,6 @@ standard database join operations between DataFrame objects: only appears in ``'left'`` DataFrame, ``right_only`` for observations whose merge key only appears in ``'right'`` DataFrame, and ``both`` if the observation's merge key is found in both. - .. versionadded:: 0.17.0 From 06d4430951f341c52716487b7a9b5f3213361519 Mon Sep 17 00:00:00 2001 From: xgdgsc Date: Sat, 3 Dec 2016 09:29:18 +0800 Subject: [PATCH 3/8] change index to keys , fix spaces and format --- doc/source/merging.rst | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/doc/source/merging.rst b/doc/source/merging.rst index 87c9af648c67b..7ab07d54ace3a 100644 --- a/doc/source/merging.rst +++ b/doc/source/merging.rst @@ -820,7 +820,7 @@ equivalent: .. warning:: - * Joining on index with duplicate keys when joining large dataframes would cause severe memory overflow, sometimes freezes the computer and user has to hard reboot, which can be dangerous for unsaved work. Please make sure no duplicate keys in index before joining. + Joining on keys with duplicate values when joining large dataframes would cause severe memory overflow, sometimes freezes the computer and user has to hard reboot, which can be dangerous for unsaved work. Please make sure no duplicate values in keys before joining. Obviously you can choose whichever form you find more convenient. For many-to-one joins (where one of the DataFrame's is already indexed by the join From fb021fb02cd68d2d585cc5c3f41e98c89e20f619 Mon Sep 17 00:00:00 2001 From: xgdgsc Date: Sat, 3 Dec 2016 09:34:10 +0800 Subject: [PATCH 4/8] fix format --- doc/source/merging.rst | 4 +++- 1 file changed, 3 insertions(+), 1 deletion(-) diff --git a/doc/source/merging.rst b/doc/source/merging.rst index 7ab07d54ace3a..81bf7a47f535d 100644 --- a/doc/source/merging.rst +++ b/doc/source/merging.rst @@ -820,7 +820,9 @@ equivalent: .. warning:: - Joining on keys with duplicate values when joining large dataframes would cause severe memory overflow, sometimes freezes the computer and user has to hard reboot, which can be dangerous for unsaved work. Please make sure no duplicate values in keys before joining. + Joining on keys with duplicate values when joining large dataframes would cause severe memory overflow, sometimes freezes the + computer and user has to hard reboot, which can be dangerous for unsaved work. Please make sure no duplicate values in keys before + joining. Obviously you can choose whichever form you find more convenient. For many-to-one joins (where one of the DataFrame's is already indexed by the join From 403b099b79022359bdfa3bfdabf7ee1c8dd868fd Mon Sep 17 00:00:00 2001 From: xgdgsc Date: Sun, 11 Dec 2016 22:46:39 +0800 Subject: [PATCH 5/8] add example , change warning position --- doc/source/merging.rst | 27 ++++++++++++++++++++++----- 1 file changed, 22 insertions(+), 5 deletions(-) diff --git a/doc/source/merging.rst b/doc/source/merging.rst index 81bf7a47f535d..9c3b535871942 100644 --- a/doc/source/merging.rst +++ b/doc/source/merging.rst @@ -692,6 +692,28 @@ either the left or right tables, the values in the joined table will be p.plot([left, right], result, labels=['left', 'right'], vertical=False); plt.close('all'); + +Here is another example with duplicate join keys in DataFrames: + +.. ipython:: python + + left = pd.DataFrame({'A' : [1,2], 'B' : [2, 2]}) + + right = pd.DataFrame({'A' : [4,5,6], 'B': [2,2,2]}) + + result = pd.merge(left, right, on='B', how='outer') + +.. ipython:: python + :suppress: + + @savefig merging_merge_on_key_multiple.png + p.plot([left, right], result, + labels=['left', 'right'], vertical=False); + plt.close('all'); + +.. warning:: + + Joining / merging on duplicate keys can cause a returned frame that is the multiplication of the row dimensions, may result in memory overflow, which can be dangerous for unsaved work. Please make sure no duplicate values in keys before joining large DataFrames. .. _merging.indicator: @@ -818,11 +840,6 @@ equivalent: pd.merge(left, right, left_on=key_or_keys, right_index=True, how='left', sort=False) -.. warning:: - - Joining on keys with duplicate values when joining large dataframes would cause severe memory overflow, sometimes freezes the - computer and user has to hard reboot, which can be dangerous for unsaved work. Please make sure no duplicate values in keys before - joining. Obviously you can choose whichever form you find more convenient. For many-to-one joins (where one of the DataFrame's is already indexed by the join From ee9fc6090c3aed401ade2c09c65ff83c98fead47 Mon Sep 17 00:00:00 2001 From: xgdgsc Date: Sun, 11 Dec 2016 22:48:14 +0800 Subject: [PATCH 6/8] fix format --- doc/source/merging.rst | 4 +++- 1 file changed, 3 insertions(+), 1 deletion(-) diff --git a/doc/source/merging.rst b/doc/source/merging.rst index 9c3b535871942..5bda368809427 100644 --- a/doc/source/merging.rst +++ b/doc/source/merging.rst @@ -713,7 +713,9 @@ Here is another example with duplicate join keys in DataFrames: .. warning:: - Joining / merging on duplicate keys can cause a returned frame that is the multiplication of the row dimensions, may result in memory overflow, which can be dangerous for unsaved work. Please make sure no duplicate values in keys before joining large DataFrames. + Joining / merging on duplicate keys can cause a returned frame that is the multiplication of the row dimensions, + may result in memory overflow, which can be dangerous for unsaved work. + Please make sure no duplicate values in keys before joining large DataFrames. .. _merging.indicator: From 7581ce0153e3fce9f2857e0b30ae3f2c5a512a3a Mon Sep 17 00:00:00 2001 From: xgdgsc Date: Sun, 11 Dec 2016 23:52:47 +0800 Subject: [PATCH 7/8] change fig name, change sentence --- doc/source/merging.rst | 5 ++--- 1 file changed, 2 insertions(+), 3 deletions(-) diff --git a/doc/source/merging.rst b/doc/source/merging.rst index 5bda368809427..e9944d96c07aa 100644 --- a/doc/source/merging.rst +++ b/doc/source/merging.rst @@ -706,7 +706,7 @@ Here is another example with duplicate join keys in DataFrames: .. ipython:: python :suppress: - @savefig merging_merge_on_key_multiple.png + @savefig merging_merge_on_key_dup.png p.plot([left, right], result, labels=['left', 'right'], vertical=False); plt.close('all'); @@ -714,8 +714,7 @@ Here is another example with duplicate join keys in DataFrames: .. warning:: Joining / merging on duplicate keys can cause a returned frame that is the multiplication of the row dimensions, - may result in memory overflow, which can be dangerous for unsaved work. - Please make sure no duplicate values in keys before joining large DataFrames. + may result in memory overflow. It is the user' s responsibility to manage duplicate values in keys before joining large DataFrames. .. _merging.indicator: From 48e5322e3e715bea5d4a3485ac6c04437c62c418 Mon Sep 17 00:00:00 2001 From: xgdgsc Date: Sun, 11 Dec 2016 23:59:52 +0800 Subject: [PATCH 8/8] remove blank line --- doc/source/merging.rst | 1 - 1 file changed, 1 deletion(-) diff --git a/doc/source/merging.rst b/doc/source/merging.rst index e9944d96c07aa..f95987afd4c77 100644 --- a/doc/source/merging.rst +++ b/doc/source/merging.rst @@ -840,7 +840,6 @@ equivalent: left.join(right, on=key_or_keys) pd.merge(left, right, left_on=key_or_keys, right_index=True, how='left', sort=False) - Obviously you can choose whichever form you find more convenient. For many-to-one joins (where one of the DataFrame's is already indexed by the join