-
-
Notifications
You must be signed in to change notification settings - Fork 18.4k
ENH: accept None behaviour for suffixes in DataFrame.merge #25243
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Changes from all commits
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -159,13 +159,13 @@ def merge_ordered(left, right, on=None, | |
left DataFrame | ||
fill_method : {'ffill', None}, default None | ||
Interpolation method for data | ||
suffixes : Sequence, default is ("_x", "_y") | ||
suffixes : Sequence or None, default is ("_x", "_y") | ||
A length-2 sequence where each element is optionally a string | ||
indicating the suffix to add to overlapping column names in | ||
`left` and `right` respectively. Pass a value of `None` instead | ||
of a string to indicate that the column name from `left` or | ||
`right` should be left as-is, with no suffix. At least one of the | ||
values must not be None. | ||
`right` should be left as-is, with no suffix. `None` means | ||
keep both name of overlapping columns as-is. | ||
.. versionchanged:: 0.25.0 | ||
how : {'left', 'right', 'outer', 'inner'}, default 'outer' | ||
|
@@ -495,6 +495,8 @@ def __init__(self, left, right, how='inner', on=None, | |
|
||
self.copy = copy | ||
self.suffixes = suffixes | ||
if self.suffixes is None: | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. you can change the signature I think to |
||
self.suffixes = (None, None) | ||
self.sort = sort | ||
|
||
self.left_index = left_index | ||
|
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -97,12 +97,6 @@ def test_join_index(frame): | |
with pytest.raises(ValueError, match='join method'): | ||
f.join(f2, how='foo') | ||
|
||
# corner case - overlapping columns | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. hmm, this used to error. I am not sure I like this change. cc @jorisvandenbossche There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. yes, this removal is made because in last PR we found out pandas don't explicitly refuse overlapping columns, e.g. df1 = pd.DataFrame({"a": [1, 2], "b": [3, 4], "b_x": [5, 6]})
df2 = pd.DataFrame({"a": [1, 2], "b": [4, 5]})
df1.merge(df2, on="a") the output will still contain two "b_x" columns. @jorisvandenbossche @simonjayhawkins |
||
msg = 'columns overlap but no suffix' | ||
for how in ('outer', 'left', 'inner'): | ||
with pytest.raises(ValueError, match=msg): | ||
frame.join(frame, how=how) | ||
|
||
|
||
def test_join_index_more(frame): | ||
af = frame.loc[:, ['A', 'B']] | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
did we previously accept False here? or was this added in the previous PR?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this was not added in previous PR.
IIUC previously the code was something like
if not lsuffix and not rsuffix
, soFalse
andNone
are both not allowed, and will raise theValueError
,but since in this PR, we don't raise error or warnings when there are duplicate column names, so i removed this part of docstring. @jreback @jorisvandenbossche