-
-
Notifications
You must be signed in to change notification settings - Fork 18.4k
ENH: accept None behaviour for suffixes in DataFrame.merge #25243
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
ENH: accept None behaviour for suffixes in DataFrame.merge #25243
Conversation
Codecov Report
@@ Coverage Diff @@
## master #25243 +/- ##
===========================================
- Coverage 91.89% 42.12% -49.78%
===========================================
Files 166 166
Lines 52226 52226
===========================================
- Hits 47993 22000 -25993
- Misses 4233 30226 +25993
Continue to review full report at Codecov.
|
Codecov Report
@@ Coverage Diff @@
## master #25243 +/- ##
=======================================
Coverage 91.75% 91.75%
=======================================
Files 173 173
Lines 52960 52960
=======================================
Hits 48595 48595
Misses 4365 4365
Continue to review full report at Codecov.
|
pandas/core/reshape/merge.py
Outdated
@@ -159,13 +159,13 @@ def merge_ordered(left, right, on=None, | |||
left DataFrame | |||
fill_method : {'ffill', None}, default None | |||
Interpolation method for data | |||
suffixes : Sequence, default is ("_x", "_y") | |||
suffixes : Sequence or False/None, default is ("_x", "_y") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
we don't want to accept False, rather this is an optional argument
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
okay... will remove False option... thanks @jreback
pandas/core/reshape/merge.py
Outdated
@@ -495,6 +495,8 @@ def __init__(self, left, right, how='inner', on=None, | |||
|
|||
self.copy = copy | |||
self.suffixes = suffixes | |||
if not self.suffixes: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
if self.suffixes is None
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
hmm have to think about the implications of this. It now gives you back duplicate column names instead of erroring.
Merge DataFrames df1 and df2, but raise an exception if the DataFrames have | ||
any overlapping columns. | ||
|
||
>>> df1.merge(df2, left_on='lkey', right_on='rkey', suffixes=(False, False)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
did we previously accept False here? or was this added in the previous PR?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this was not added in previous PR.
IIUC previously the code was something like
if not lsuffix and not rsuffix
, so False
and None
are both not allowed, and will raise the ValueError
,
but since in this PR, we don't raise error or warnings when there are duplicate column names, so i removed this part of docstring. @jreback @jorisvandenbossche
@@ -495,6 +495,8 @@ def __init__(self, left, right, how='inner', on=None, | |||
|
|||
self.copy = copy | |||
self.suffixes = suffixes | |||
if self.suffixes is None: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
you can change the signature I think to suffixes=None
@@ -97,12 +97,6 @@ def test_join_index(frame): | |||
with pytest.raises(ValueError, match='join method'): | |||
f.join(f2, how='foo') | |||
|
|||
# corner case - overlapping columns |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
hmm, this used to error. I am not sure I like this change. cc @jorisvandenbossche
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yes, this removal is made because in last PR we found out pandas don't explicitly refuse overlapping columns, e.g.
df1 = pd.DataFrame({"a": [1, 2], "b": [3, 4], "b_x": [5, 6]})
df2 = pd.DataFrame({"a": [1, 2], "b": [4, 5]})
df1.merge(df2, on="a")
the output will still contain two "b_x" columns. @jorisvandenbossche @simonjayhawkins
can you merge master |
120e769
to
80693b7
Compare
FWIW I'm -1 on this PR as I've never found duplicate column names to be very useful (maybe personal bias) and they certainly have some quirks to them, so not something I think we should propagate |
yeah after a relook, I agree. closing this. |
git diff upstream/master -u -- "*.py" | flake8 --diff