ENH: accept None behaviour for suffixes in DataFrame.merge #25243

charlesdong1991 · 2019-02-09T15:56:56Z

closes ENH: accept None behaviour for suffixes in DataFrame.merge #25242
tests added / passed
passes git diff upstream/master -u -- "*.py" | flake8 --diff
whatsnew entry

codecov · 2019-02-09T16:27:44Z

Codecov Report

Merging #25243 into master will decrease coverage by 49.77%.
The diff coverage is 0%.

@@             Coverage Diff             @@
##           master   #25243       +/-   ##
===========================================
- Coverage   91.89%   42.12%   -49.78%     
===========================================
  Files         166      166               
  Lines       52226    52226               
===========================================
- Hits        47993    22000    -25993     
- Misses       4233    30226    +25993

Flag	Coverage Δ
#multiple	`?`
#single	`42.12% <0%> (-0.01%)`	⬇️

Impacted Files	Coverage Δ
pandas/core/internals/managers.py	`66.32% <ø> (-28.51%)`	⬇️
pandas/core/reshape/merge.py	`9.49% <0%> (-85%)`	⬇️
pandas/io/formats/latex.py	`0% <0%> (-100%)`	⬇️
pandas/core/categorical.py	`0% <0%> (-100%)`	⬇️
pandas/io/sas/sas_constants.py	`0% <0%> (-100%)`	⬇️
pandas/tseries/plotting.py	`0% <0%> (-100%)`	⬇️
pandas/tseries/converter.py	`0% <0%> (-100%)`	⬇️
pandas/io/formats/html.py	`0% <0%> (-99.35%)`	⬇️
pandas/core/groupby/categorical.py	`0% <0%> (-95.46%)`	⬇️
pandas/io/sas/sas7bdat.py	`0% <0%> (-91.17%)`	⬇️
... and 125 more

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 04df22f...923eeea. Read the comment docs.

codecov · 2019-02-09T16:27:44Z

Codecov Report

Merging #25243 into master will not change coverage.
The diff coverage is 100%.

@@           Coverage Diff           @@
##           master   #25243   +/-   ##
=======================================
  Coverage   91.75%   91.75%           
=======================================
  Files         173      173           
  Lines       52960    52960           
=======================================
  Hits        48595    48595           
  Misses       4365     4365

Flag	Coverage Δ
#multiple	`90.33% <100%> (ø)`	⬆️
#single	`41.71% <0%> (ø)`	⬆️

Impacted Files	Coverage Δ
pandas/core/internals/managers.py	`94.82% <ø> (-0.02%)`	⬇️
pandas/core/frame.py	`96.85% <ø> (ø)`	⬆️
pandas/core/reshape/merge.py	`94.49% <100%> (+0.01%)`	⬆️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 42b4c97...80693b7. Read the comment docs.

jreback · 2019-02-09T17:38:40Z

pandas/core/reshape/merge.py

@@ -159,13 +159,13 @@ def merge_ordered(left, right, on=None,
        left DataFrame
    fill_method : {'ffill', None}, default None
        Interpolation method for data
-    suffixes : Sequence, default is ("_x", "_y")
+    suffixes : Sequence or False/None, default is ("_x", "_y")


we don't want to accept False, rather this is an optional argument

okay... will remove False option... thanks @jreback

jreback · 2019-02-09T17:38:58Z

pandas/core/reshape/merge.py

@@ -495,6 +495,8 @@ def __init__(self, left, right, how='inner', on=None,

        self.copy = copy
        self.suffixes = suffixes
+        if not self.suffixes:


if self.suffixes is None

jreback

hmm have to think about the implications of this. It now gives you back duplicate column names instead of erroring.

jreback · 2019-02-11T13:15:06Z

pandas/core/frame.py

-Merge DataFrames df1 and df2, but raise an exception if the DataFrames have
-any overlapping columns.
-
->>> df1.merge(df2, left_on='lkey', right_on='rkey', suffixes=(False, False))


did we previously accept False here? or was this added in the previous PR?

this was not added in previous PR.
IIUC previously the code was something like
if not lsuffix and not rsuffix, so False and None are both not allowed, and will raise the ValueError,
but since in this PR, we don't raise error or warnings when there are duplicate column names, so i removed this part of docstring. @jreback @jorisvandenbossche

jreback · 2019-02-11T13:15:51Z

pandas/core/reshape/merge.py

@@ -495,6 +495,8 @@ def __init__(self, left, right, how='inner', on=None,

        self.copy = copy
        self.suffixes = suffixes
+        if self.suffixes is None:


you can change the signature I think to suffixes=None

jreback · 2019-02-11T13:16:54Z

pandas/tests/frame/test_join.py

@@ -97,12 +97,6 @@ def test_join_index(frame):
    with pytest.raises(ValueError, match='join method'):
        f.join(f2, how='foo')

-    # corner case - overlapping columns


hmm, this used to error. I am not sure I like this change. cc @jorisvandenbossche

yes, this removal is made because in last PR we found out pandas don't explicitly refuse overlapping columns, e.g.

df1 = pd.DataFrame({"a": [1, 2], "b": [3, 4], "b_x": [5, 6]}) df2 = pd.DataFrame({"a": [1, 2], "b": [4, 5]}) df1.merge(df2, on="a")

the output will still contain two "b_x" columns. @jorisvandenbossche @simonjayhawkins

jreback · 2019-03-03T03:28:33Z

can you merge master

WillAyd · 2019-03-19T04:09:28Z

FWIW I'm -1 on this PR as I've never found duplicate column names to be very useful (maybe personal bias) and they certainly have some quirks to them, so not something I think we should propagate

jreback · 2019-03-19T23:32:09Z

yeah after a relook, I agree. closing this.

jreback requested changes Feb 9, 2019

View reviewed changes

jreback added Reshaping Concat, Merge/Join, Stack/Unstack, Explode API Design labels Feb 9, 2019

charlesdong1991 changed the title ~~ENH: accept None/False behaviour for suffixes in DataFrame.merge~~ ENH: accept None behaviour for suffixes in DataFrame.merge Feb 9, 2019

jreback requested changes Feb 11, 2019

View reviewed changes

merge master

80693b7

charlesdong1991 force-pushed the xref_issue_24819 branch from 120e769 to 80693b7 Compare March 3, 2019 08:35

jreback closed this Mar 19, 2019

jreback added this to the No action milestone Mar 19, 2019

simonjayhawkins mentioned this pull request May 16, 2022

Merge nonstring columns #46879

Closed

4 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

ENH: accept None behaviour for suffixes in DataFrame.merge #25243

ENH: accept None behaviour for suffixes in DataFrame.merge #25243

charlesdong1991 commented Feb 9, 2019

codecov bot commented Feb 9, 2019

codecov bot commented Feb 9, 2019 •

edited

Loading

jreback Feb 9, 2019

charlesdong1991 Feb 9, 2019

jreback Feb 9, 2019

jreback left a comment

jreback Feb 11, 2019

charlesdong1991 Feb 11, 2019

jreback Feb 11, 2019

jreback Feb 11, 2019

charlesdong1991 Feb 15, 2019

jreback commented Mar 3, 2019

WillAyd commented Mar 19, 2019

jreback commented Mar 19, 2019

ENH: accept None behaviour for suffixes in DataFrame.merge #25243

ENH: accept None behaviour for suffixes in DataFrame.merge #25243

Conversation

charlesdong1991 commented Feb 9, 2019

codecov bot commented Feb 9, 2019

Codecov Report

codecov bot commented Feb 9, 2019 • edited Loading

Codecov Report

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

jreback left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

jreback commented Mar 3, 2019

WillAyd commented Mar 19, 2019

jreback commented Mar 19, 2019

codecov bot commented Feb 9, 2019 •

edited

Loading