DOC: DataFrame.merge behaviour for suffix=(False, False) #22141

elmq0022 · 2018-07-31T03:23:52Z

closes [documentation] merge(suffixes=(False, False)) should cause an error if a suffix would be required to complete the merge but it is undocumented #22045
N/A tests added / passed
passes git diff upstream/master -u -- "*.py" | flake8 --diff
N/A whatsnew entry

pep8speaks · 2018-07-31T03:23:55Z

Hello @elmq0022! Thanks for updating the PR.

Cheers ! There are no PEP8 issues in this Pull Request. 🍻

Comment last updated on August 24, 2018 at 23:57 Hours UTC

codecov · 2018-07-31T04:45:08Z

Codecov Report

Merging #22141 into master will not change coverage.
The diff coverage is n/a.

@@           Coverage Diff           @@
##           master   #22141   +/-   ##
=======================================
  Coverage   92.04%   92.04%           
=======================================
  Files         169      169           
  Lines       50776    50776           
=======================================
  Hits        46737    46737           
  Misses       4039     4039

Flag	Coverage Δ
#multiple	`90.45% <ø> (ø)`	⬆️
#single	`42.23% <ø> (ø)`	⬆️

Impacted Files	Coverage Δ
pandas/core/frame.py	`97.2% <ø> (ø)`	⬆️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 55d176d...304af4f. Read the comment docs.

jreback · 2018-08-02T10:31:13Z

pandas/core/frame.py

@@ -183,7 +183,8 @@
    the order of the join keys depends on the join type (how keyword).
 suffixes : 2-length sequence (tuple, list, ...)
    Suffix to apply to overlapping column names in the left and right
-    side, respectively.
+    side, respectively. If (False, False), overlapping
+    column names raise an error.


can you add an example in the doc-string to make this even more clear? (we don't even have a suffix example at all......)

@jreback, the doc string example does illustrate the behavior for the default suffix arguments.

Also, there is a more detailed suffix example here: https://pandas.pydata.org/pandas-docs/stable/merging.html#overlapping-value-columns

I suggest adding a specific note or example to the link above for suffix=(False, False) instead of the doc string.

I feel a more experienced user would likely know what to do just from the parameter argument description while a more novice user will probably need to look at the longer documentation with copy/paste examples that work with ipython or jupyter.

I also suggest the merge, join, and concat methods all have a reference to the more complete documentation in the see also section. I've found this documentation very helpful in the past and still refer to if from time to time.

Thoughts?

@jreback Also, I'm happy to just update the doc string. Just wanted to offer a different perspective. Let me know what you prefer. Thanks!

@elmq0022 yeah an updated doc-string would be great

For the type, I'd have suffixes : tuple of (str, str), default ('_x', '_y'). Small detail, but I'd rephrase you addition to sound like To raise an exception on overlapping columns, use (False, False). Being able to use (False, False) to raise looks like a feature to me, and I think it's better to present it this way. May be we could also have it in the Notes section.

But it's just an idea, I'm happy with the PR the way it is. It's a good addition.

@elmq0022 can you make the changes suggested in the previous comment?

elmq0022 · 2018-08-14T11:51:18Z

@jreback, I updated the docstring with a couple suffix examples.

elmq0022 · 2018-08-20T20:53:20Z

@jreback any additional feedback? Thanks!

jreback · 2018-08-20T22:37:55Z

pandas/core/frame.py

@@ -254,6 +255,23 @@
 3  foo        5  foo        8
 4  bar        2  bar        6
 5  baz        3  baz        7
+
+>>> A.merge(B, left_on='lkey', right_on='rkey', how='outer',


can you add a comment for both of these example of what they are showing

elmq0022 · 2018-08-20T22:55:14Z

Sure.

…

On Mon, Aug 20, 2018, 5:39 PM Jeff Reback ***@***.***> wrote: ***@***.**** requested changes on this pull request. ------------------------------ In pandas/core/frame.py <#22141 (comment)>: > @@ -254,6 +255,23 @@ 3 foo 5 foo 8 4 bar 2 bar 6 5 baz 3 baz 7 + +>>> A.merge(B, left_on='lkey', right_on='rkey', how='outer', can you add a comment for both of these example of what they are showing — You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub <#22141 (review)>, or mute the thread <https://github.com/notifications/unsubscribe-auth/ADSn4MXWQzuu2KT8NMh_2yG1MprevKA0ks5uSzqtgaJpZM4VniSl> .

elmq0022 · 2018-08-21T11:49:03Z

@jreback and @datapythonista, I added comments to the examples.

elmq0022 · 2018-08-21T17:37:56Z

@datapythonista, I made the updates. This should be good to go now. Thanks!

datapythonista

Happy to merge after fixing the typo and the variable names.

But if you want to work on making the docstring follow best practices and our standards, you can also:

In parameters replace string by str, boolean by bool and integer by int
In the returns add a description (indented in the next line after the type.
Make the short summary (first line) shorter to fit in a single line
run scripts/validate_docstrings pandas.DataFrame.merge to see if there is any other issue

Thanks!

datapythonista · 2018-08-22T08:38:16Z

pandas/core/frame.py

    Suffix to apply to overlapping column names in the left and right
-    side, respectively.
+    side, respectively. To raise an exption on overlapping columns use


typo: s/exption/exception

datapythonista · 2018-08-22T08:40:03Z

pandas/core/frame.py

+Merge DataFrames A and B. Specify the left and right suffix
+to append to the name of any overlapping columns.
+
+>>> A.merge(B, left_on='lkey', right_on='rkey', how='outer',


sorry, I didn't see before that in the original docstring the DataFrame names were A and B. Do you mind changing them to df1 and df2. That's the standard we use, and A and B in Python should be used for classes, not instances.

Also, we can take care of that in a different PR if you prefer, but it'd be great to have a more real world example (it makes it easier to understand what's going on), and in this case I find the how='outer' misleading, as it doesn't add value.

Yes, I can make those updates.

Also thanks for the location of the validation scripts and such. This is very helpful for my first documentation pull request.

jreback · 2018-08-22T10:19:45Z

pandas/core/frame.py

+3  foo           5  foo            8
+4  bar           2  bar            6
+5  baz           3  baz            7
+


is this part of the doc-tests?

Yes, I was able to verify this using pytest --doctest-modules.

elmq0022 · 2018-08-23T14:44:50Z

@datapythonista I made the updates.

datapythonista

Just some comments on minor things. For the rest looks good to me.

datapythonista · 2018-08-23T15:19:48Z

pandas/core/frame.py

@@ -212,7 +212,8 @@

 Returns
 -------
-DataFrame
+df: DataFrame


Can you just leave DataFrame here?

datapythonista · 2018-08-23T15:20:17Z

pandas/core/frame.py

@@ -197,7 +197,7 @@
    "right_only" for observations whose merge key only appears in 'right'
    DataFrame, and "both" if the observation's merge key is found in both.

-validate : string, default None
+validate : str, default None


can you replace the default None by optional

datapythonista · 2018-08-23T15:20:59Z

pandas/core/frame.py

 ...                   'value': [1, 2, 3, 5]})
->>> B = pd.DataFrame({'rkey': ['foo', 'bar', 'baz', 'foo'],
+>>> df2 = pd.DataFrame({'rkey': ['foo', 'bar', 'baz', 'foo'],
 ...                   'value': [5, 6, 7, 8]})


The indentation of the continuation lines seems wrong.

elmq0022 · 2018-08-24T05:08:27Z

@jreback, @datapythonista two items to address:

First, the following Travis job is failing.

43166.3
Python: 3.5
JOB="2.7, locale, slow, old NumPy" ENV_FILE="ci/travis-27-locale.yaml" LOCALE_OVERRIDE="zh_CN.UTF-8" SLOW=true
3 min 51 sec

The error message is:
ImportError: libgfortran.so.1: cannot open shared object file: No such file or directory.

I don't think this error is on my end, but let me know if I have to do something differently. I did just rebase and rebuild all the C extensions.

Second, I think I get what @jreback was referring to prior. I am not sure if the pandas.DataFram.merge doctests are necessarily run during CI. In doctests.sh on line 23 there is no merge argument mentioned.

pytest --doctest-modules -v pandas/core/frame.py \
        -k"-assign -axes -combine -isin -itertuples -join -nlargest -nsmallest -nunique -pivot_table -quantile -query -reindex -reindex_axis -replace -round -set_index -stack -to_dict -to_stata -transform"

Should we add it here between -join and -nlargest? Not sure where else this would go or what else would kick off the tests.

datapythonista · 2018-08-24T08:43:11Z

@elmq0022 the CI error seems indeed unrelated to your changes. About the doctests, the - in -join -nlargest ... means that the tests are being skipped. So, that list is the list of tests that are failing now. If -merge is not there means that the examples are already correct, so as far as they continue to be correct, nothing needs to be done (if you have a example that breaks or with output different than what presented, the CI should report it).

datapythonista

lgtm. Later on, we can think on providing more examples about the behavior described in the extended summary, but I'd address that in a separated PR. Thanks @elmq0022

elmq0022 · 2018-08-24T11:59:51Z

@datapythonista @jreback, thanks for all your help. Will need to rebase and resubmit to pass the CI?

datapythonista · 2018-08-24T12:03:15Z

Yes, if you can merge master into your branch and push, that should trigger the CI again.

elmq0022 · 2018-08-25T12:40:10Z

@jreback and @datapythonista all green.

elmq0022 · 2018-08-29T01:31:42Z

@jreback friendly reminder. Thanks!

datapythonista · 2018-09-04T15:05:09Z

Thanks for the work on this @elmq0022

elmq0022 · 2018-09-04T16:50:21Z

@datapythonista no problem. Happy to help. Thanks for feedback and merging the request.

elmq0022 force-pushed the GH22045 branch from 4c2b044 to d4564c6 Compare July 31, 2018 03:26

elmq0022 changed the title ~~documented DataFrame.merge behaviour for suffix=(False, False)~~ DOC: DataFrame.merge behaviour for suffix=(False, False) Jul 31, 2018

elmq0022 mentioned this pull request Aug 2, 2018

[documentation] merge(suffixes=(False, False)) should cause an error if a suffix would be required to complete the merge but it is undocumented #22045

Closed

gfyoung added the Docs label Aug 2, 2018

gfyoung requested a review from jreback August 2, 2018 04:53

gfyoung added the Algos Non-arithmetic algos: value_counts, factorize, sorting, isin, clip, shift, diff label Aug 2, 2018

jreback requested changes Aug 2, 2018

View reviewed changes

elmq0022 force-pushed the GH22045 branch from d4564c6 to 7c28463 Compare August 14, 2018 03:40

jreback requested changes Aug 20, 2018

View reviewed changes

jreback requested a review from datapythonista August 20, 2018 22:39

elmq0022 force-pushed the GH22045 branch 2 times, most recently from 4252f79 to 57fdba5 Compare August 21, 2018 04:02

elmq0022 force-pushed the GH22045 branch from 57fdba5 to d8c1672 Compare August 21, 2018 13:50

datapythonista requested changes Aug 22, 2018

View reviewed changes

jreback reviewed Aug 22, 2018

View reviewed changes

elmq0022 force-pushed the GH22045 branch from d8c1672 to 6e271af Compare August 23, 2018 03:37

datapythonista requested changes Aug 23, 2018

View reviewed changes

elmq0022 force-pushed the GH22045 branch from ddccba0 to 6130e64 Compare August 24, 2018 04:39

datapythonista approved these changes Aug 24, 2018

View reviewed changes

elmq0022 added 3 commits August 24, 2018 18:55

documented DataFrame.merge behaviour for suffix=(False, False)

a1618db

Added a description to the examples.

d3c17bf

final clean up

304af4f

elmq0022 force-pushed the GH22045 branch from 6130e64 to 304af4f Compare August 24, 2018 23:57

datapythonista merged commit 3285bdc into pandas-dev:master Sep 4, 2018

aeltanawy pushed a commit to aeltanawy/pandas that referenced this pull request Sep 20, 2018

DOC: Updating DataFrame.merge docstring (pandas-dev#22141)

607d646

Sup3rGeo pushed a commit to Sup3rGeo/pandas that referenced this pull request Oct 1, 2018

DOC: Updating DataFrame.merge docstring (pandas-dev#22141)

5edab7e

DOC: DataFrame.merge behaviour for suffix=(False, False) #22141

DOC: DataFrame.merge behaviour for suffix=(False, False) #22141

Conversation

elmq0022 commented Jul 31, 2018 • edited Loading

pep8speaks commented Jul 31, 2018 • edited Loading

Comment last updated on August 24, 2018 at 23:57 Hours UTC

codecov bot commented Jul 31, 2018 • edited Loading

Codecov Report

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

elmq0022 commented Aug 14, 2018

elmq0022 commented Aug 20, 2018

Choose a reason for hiding this comment

elmq0022 commented Aug 20, 2018 via email

elmq0022 commented Aug 21, 2018

elmq0022 commented Aug 21, 2018

datapythonista left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

elmq0022 commented Aug 23, 2018

datapythonista left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

elmq0022 commented Aug 24, 2018

datapythonista commented Aug 24, 2018

datapythonista left a comment

Choose a reason for hiding this comment

elmq0022 commented Aug 24, 2018

datapythonista commented Aug 24, 2018

elmq0022 commented Aug 25, 2018 • edited Loading

elmq0022 commented Aug 29, 2018

datapythonista commented Sep 4, 2018

elmq0022 commented Sep 4, 2018

elmq0022 commented Jul 31, 2018 •

edited

Loading

pep8speaks commented Jul 31, 2018 •

edited

Loading

codecov bot commented Jul 31, 2018 •

edited

Loading

elmq0022 commented Aug 25, 2018 •

edited

Loading