-
-
Notifications
You must be signed in to change notification settings - Fork 18.4k
DOC: DataFrame.merge behaviour for suffix=(False, False) #22141
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Hello @elmq0022! Thanks for updating the PR. Cheers ! There are no PEP8 issues in this Pull Request. 🍻 Comment last updated on August 24, 2018 at 23:57 Hours UTC |
Codecov Report
@@ Coverage Diff @@
## master #22141 +/- ##
=======================================
Coverage 92.04% 92.04%
=======================================
Files 169 169
Lines 50776 50776
=======================================
Hits 46737 46737
Misses 4039 4039
Continue to review full report at Codecov.
|
pandas/core/frame.py
Outdated
@@ -183,7 +183,8 @@ | |||
the order of the join keys depends on the join type (how keyword). | |||
suffixes : 2-length sequence (tuple, list, ...) | |||
Suffix to apply to overlapping column names in the left and right | |||
side, respectively. | |||
side, respectively. If (False, False), overlapping | |||
column names raise an error. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can you add an example in the doc-string to make this even more clear? (we don't even have a suffix example at all......)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@jreback, the doc string example does illustrate the behavior for the default suffix arguments.
Also, there is a more detailed suffix example here: https://pandas.pydata.org/pandas-docs/stable/merging.html#overlapping-value-columns
I suggest adding a specific note or example to the link above for suffix=(False, False)
instead of the doc string.
I feel a more experienced user would likely know what to do just from the parameter argument description while a more novice user will probably need to look at the longer documentation with copy/paste examples that work with ipython or jupyter.
I also suggest the merge, join, and concat methods all have a reference to the more complete documentation in the see also section. I've found this documentation very helpful in the past and still refer to if from time to time.
Thoughts?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@jreback Also, I'm happy to just update the doc string. Just wanted to offer a different perspective. Let me know what you prefer. Thanks!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@elmq0022 yeah an updated doc-string would be great
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For the type, I'd have suffixes : tuple of (str, str), default ('_x', '_y')
. Small detail, but I'd rephrase you addition to sound like To raise an exception on overlapping columns, use (False, False)
. Being able to use (False, False) to raise looks like a feature to me, and I think it's better to present it this way. May be we could also have it in the Notes
section.
But it's just an idea, I'm happy with the PR the way it is. It's a good addition.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@elmq0022 can you make the changes suggested in the previous comment?
@jreback, I updated the docstring with a couple suffix examples. |
@jreback any additional feedback? Thanks! |
pandas/core/frame.py
Outdated
@@ -254,6 +255,23 @@ | |||
3 foo 5 foo 8 | |||
4 bar 2 bar 6 | |||
5 baz 3 baz 7 | |||
|
|||
>>> A.merge(B, left_on='lkey', right_on='rkey', how='outer', |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can you add a comment for both of these example of what they are showing
Sure.
…On Mon, Aug 20, 2018, 5:39 PM Jeff Reback ***@***.***> wrote:
***@***.**** requested changes on this pull request.
------------------------------
In pandas/core/frame.py
<#22141 (comment)>:
> @@ -254,6 +255,23 @@
3 foo 5 foo 8
4 bar 2 bar 6
5 baz 3 baz 7
+
+>>> A.merge(B, left_on='lkey', right_on='rkey', how='outer',
can you add a comment for both of these example of what they are showing
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
<#22141 (review)>,
or mute the thread
<https://github.com/notifications/unsubscribe-auth/ADSn4MXWQzuu2KT8NMh_2yG1MprevKA0ks5uSzqtgaJpZM4VniSl>
.
|
4252f79
to
57fdba5
Compare
@jreback and @datapythonista, I added comments to the examples. |
@datapythonista, I made the updates. This should be good to go now. Thanks! |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Happy to merge after fixing the typo and the variable names.
But if you want to work on making the docstring follow best practices and our standards, you can also:
- In parameters replace string by str, boolean by bool and integer by int
- In the returns add a description (indented in the next line after the type.
- Make the short summary (first line) shorter to fit in a single line
- run
scripts/validate_docstrings pandas.DataFrame.merge
to see if there is any other issue
Thanks!
pandas/core/frame.py
Outdated
Suffix to apply to overlapping column names in the left and right | ||
side, respectively. | ||
side, respectively. To raise an exption on overlapping columns use |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
typo: s/exption/exception
pandas/core/frame.py
Outdated
Merge DataFrames A and B. Specify the left and right suffix | ||
to append to the name of any overlapping columns. | ||
|
||
>>> A.merge(B, left_on='lkey', right_on='rkey', how='outer', |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
sorry, I didn't see before that in the original docstring the DataFrame names were A
and B
. Do you mind changing them to df1
and df2
. That's the standard we use, and A
and B
in Python should be used for classes, not instances.
Also, we can take care of that in a different PR if you prefer, but it'd be great to have a more real world example (it makes it easier to understand what's going on), and in this case I find the how='outer'
misleading, as it doesn't add value.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, I can make those updates.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Also thanks for the location of the validation scripts and such. This is very helpful for my first documentation pull request.
3 foo 5 foo 8 | ||
4 bar 2 bar 6 | ||
5 baz 3 baz 7 | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
is this part of the doc-tests?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, I was able to verify this using pytest --doctest-modules.
@datapythonista I made the updates. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just some comments on minor things. For the rest looks good to me.
pandas/core/frame.py
Outdated
@@ -212,7 +212,8 @@ | |||
|
|||
Returns | |||
------- | |||
DataFrame | |||
df: DataFrame |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you just leave DataFrame
here?
pandas/core/frame.py
Outdated
@@ -197,7 +197,7 @@ | |||
"right_only" for observations whose merge key only appears in 'right' | |||
DataFrame, and "both" if the observation's merge key is found in both. | |||
|
|||
validate : string, default None | |||
validate : str, default None |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can you replace the default None
by optional
pandas/core/frame.py
Outdated
... 'value': [1, 2, 3, 5]}) | ||
>>> B = pd.DataFrame({'rkey': ['foo', 'bar', 'baz', 'foo'], | ||
>>> df2 = pd.DataFrame({'rkey': ['foo', 'bar', 'baz', 'foo'], | ||
... 'value': [5, 6, 7, 8]}) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The indentation of the continuation lines seems wrong.
@jreback, @datapythonista two items to address: First, the following Travis job is failing. 43166.3 The error message is: I don't think this error is on my end, but let me know if I have to do something differently. I did just rebase and rebuild all the C extensions. Second, I think I get what @jreback was referring to prior. I am not sure if the pandas.DataFram.merge doctests are necessarily run during CI. In doctests.sh on line 23 there is no merge argument mentioned. pytest --doctest-modules -v pandas/core/frame.py \
-k"-assign -axes -combine -isin -itertuples -join -nlargest -nsmallest -nunique -pivot_table -quantile -query -reindex -reindex_axis -replace -round -set_index -stack -to_dict -to_stata -transform" Should we add it here between -join and -nlargest? Not sure where else this would go or what else would kick off the tests. |
@elmq0022 the CI error seems indeed unrelated to your changes. About the doctests, the |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
lgtm. Later on, we can think on providing more examples about the behavior described in the extended summary, but I'd address that in a separated PR. Thanks @elmq0022
@datapythonista @jreback, thanks for all your help. Will need to rebase and resubmit to pass the CI? |
Yes, if you can merge master into your branch and push, that should trigger the CI again. |
@jreback and @datapythonista all green. |
@jreback friendly reminder. Thanks! |
Thanks for the work on this @elmq0022 |
@datapythonista no problem. Happy to help. Thanks for feedback and merging the request. |
git diff upstream/master -u -- "*.py" | flake8 --diff