Skip to content

ENH: accept None behaviour for suffixes in DataFrame.merge #25243

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed

Conversation

charlesdong1991
Copy link
Member

@codecov
Copy link

codecov bot commented Feb 9, 2019

Codecov Report

Merging #25243 into master will decrease coverage by 49.77%.
The diff coverage is 0%.

Impacted file tree graph

@@             Coverage Diff             @@
##           master   #25243       +/-   ##
===========================================
- Coverage   91.89%   42.12%   -49.78%     
===========================================
  Files         166      166               
  Lines       52226    52226               
===========================================
- Hits        47993    22000    -25993     
- Misses       4233    30226    +25993
Flag Coverage Δ
#multiple ?
#single 42.12% <0%> (-0.01%) ⬇️
Impacted Files Coverage Δ
pandas/core/internals/managers.py 66.32% <ø> (-28.51%) ⬇️
pandas/core/reshape/merge.py 9.49% <0%> (-85%) ⬇️
pandas/io/formats/latex.py 0% <0%> (-100%) ⬇️
pandas/core/categorical.py 0% <0%> (-100%) ⬇️
pandas/io/sas/sas_constants.py 0% <0%> (-100%) ⬇️
pandas/tseries/plotting.py 0% <0%> (-100%) ⬇️
pandas/tseries/converter.py 0% <0%> (-100%) ⬇️
pandas/io/formats/html.py 0% <0%> (-99.35%) ⬇️
pandas/core/groupby/categorical.py 0% <0%> (-95.46%) ⬇️
pandas/io/sas/sas7bdat.py 0% <0%> (-91.17%) ⬇️
... and 125 more

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 04df22f...923eeea. Read the comment docs.

@codecov
Copy link

codecov bot commented Feb 9, 2019

Codecov Report

Merging #25243 into master will not change coverage.
The diff coverage is 100%.

Impacted file tree graph

@@           Coverage Diff           @@
##           master   #25243   +/-   ##
=======================================
  Coverage   91.75%   91.75%           
=======================================
  Files         173      173           
  Lines       52960    52960           
=======================================
  Hits        48595    48595           
  Misses       4365     4365
Flag Coverage Δ
#multiple 90.33% <100%> (ø) ⬆️
#single 41.71% <0%> (ø) ⬆️
Impacted Files Coverage Δ
pandas/core/internals/managers.py 94.82% <ø> (-0.02%) ⬇️
pandas/core/frame.py 96.85% <ø> (ø) ⬆️
pandas/core/reshape/merge.py 94.49% <100%> (+0.01%) ⬆️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 42b4c97...80693b7. Read the comment docs.

@@ -159,13 +159,13 @@ def merge_ordered(left, right, on=None,
left DataFrame
fill_method : {'ffill', None}, default None
Interpolation method for data
suffixes : Sequence, default is ("_x", "_y")
suffixes : Sequence or False/None, default is ("_x", "_y")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we don't want to accept False, rather this is an optional argument

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

okay... will remove False option... thanks @jreback

@@ -495,6 +495,8 @@ def __init__(self, left, right, how='inner', on=None,

self.copy = copy
self.suffixes = suffixes
if not self.suffixes:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

if self.suffixes is None

@jreback jreback added Reshaping Concat, Merge/Join, Stack/Unstack, Explode API Design labels Feb 9, 2019
@charlesdong1991 charlesdong1991 changed the title ENH: accept None/False behaviour for suffixes in DataFrame.merge ENH: accept None behaviour for suffixes in DataFrame.merge Feb 9, 2019
Copy link
Contributor

@jreback jreback left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

hmm have to think about the implications of this. It now gives you back duplicate column names instead of erroring.

Merge DataFrames df1 and df2, but raise an exception if the DataFrames have
any overlapping columns.

>>> df1.merge(df2, left_on='lkey', right_on='rkey', suffixes=(False, False))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

did we previously accept False here? or was this added in the previous PR?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this was not added in previous PR.
IIUC previously the code was something like
if not lsuffix and not rsuffix, so False and None are both not allowed, and will raise the ValueError,
but since in this PR, we don't raise error or warnings when there are duplicate column names, so i removed this part of docstring. @jreback @jorisvandenbossche

@@ -495,6 +495,8 @@ def __init__(self, left, right, how='inner', on=None,

self.copy = copy
self.suffixes = suffixes
if self.suffixes is None:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

you can change the signature I think to suffixes=None

@@ -97,12 +97,6 @@ def test_join_index(frame):
with pytest.raises(ValueError, match='join method'):
f.join(f2, how='foo')

# corner case - overlapping columns
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

hmm, this used to error. I am not sure I like this change. cc @jorisvandenbossche

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes, this removal is made because in last PR we found out pandas don't explicitly refuse overlapping columns, e.g.

df1 = pd.DataFrame({"a": [1, 2], "b": [3, 4], "b_x": [5, 6]})
df2 = pd.DataFrame({"a": [1, 2], "b": [4, 5]})
df1.merge(df2, on="a")

the output will still contain two "b_x" columns. @jorisvandenbossche @simonjayhawkins

@jreback
Copy link
Contributor

jreback commented Mar 3, 2019

can you merge master

@WillAyd
Copy link
Member

WillAyd commented Mar 19, 2019

FWIW I'm -1 on this PR as I've never found duplicate column names to be very useful (maybe personal bias) and they certainly have some quirks to them, so not something I think we should propagate

@jreback
Copy link
Contributor

jreback commented Mar 19, 2019

yeah after a relook, I agree. closing this.

@jreback jreback closed this Mar 19, 2019
@jreback jreback added this to the No action milestone Mar 19, 2019
@simonjayhawkins simonjayhawkins mentioned this pull request May 16, 2022
4 tasks
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
API Design Reshaping Concat, Merge/Join, Stack/Unstack, Explode
Projects
None yet
Development

Successfully merging this pull request may close these issues.

ENH: accept None behaviour for suffixes in DataFrame.merge
3 participants