-
-
Notifications
You must be signed in to change notification settings - Fork 18.4k
DOC: Updated aggregate docstring #35042
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Merged
Merged
Changes from 13 commits
Commits
Show all changes
14 commits
Select commit
Hold shift + click to select a range
b7ee7a6
DOC: Updated aggregate docstring
gurukiran07 beee950
Doc: updated aggregate docstring
gurukiran07 fff4b7e
Update pandas/core/generic.py
gurukiran07 26add4a
Update generic.py
gurukiran07 d50bbc9
Merge remote-tracking branch 'upstream/master'
gurukiran07 15ecaf7
Update generic.py
gurukiran07 cc231c8
Revert "Update generic.py"
gurukiran07 3a52d99
Revert "Revert "Update generic.py""
gurukiran07 f769179
Updated docstring of agg
gurukiran07 8813780
Trailing whitespace removed
gurukiran07 1fc0949
DOC: Updated docstring of agg
gurukiran07 f4dc10d
Merge remote-tracking branch 'upstream/master'
gurukiran07 0927b61
Update generic.py
gurukiran07 fcb5496
Updated Docstring
gurukiran07 File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @gurukiran07, but I don't find this comment very clear. I don't think we want to talk about what pandas operations generally do in this docstring. I guess you mean that
aggregate
ignoresNaN
values, even if the function argument does not. Just say that, with a code example if you want to be clearer. But I don't expect readers of this docstring to know aboutnumpy.nanmedian
andnumpy.median
to use them as an example, after a vague sentence about pandas operations in general.Can you rephrase please?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@datapythonista what they want to say, I think, is that
np.nanmedian
andnp.median
will map to the same internal pandas operation, and, as it says in 10 minutes to pandas:There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think the comment in the PR is very clear and useful to users. We can surely provide information on what is being used in the notes section, but after reading the proposed comments I feel more confused than before reading it. So, it would be great to rephrase it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This was in a previous commit
do you think it's clearer / more useful?
@gurukiran07 is this still active? Do you want to try rephrasing?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@MarcoGorelli Yes, I'm active. Sorry for my inactiveness. I can try rephrasing it but if you have something in mind, please free to take over.
Can we put it like this @datapythonista @MarcoGorelli
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think the average pandas user knowns anything about Python, so I would exclude that part.
Also, as I said earlier, in the context of this docstring I don't think it's relevant what pandas operations generally do.
I think the point here is to say that THIS pandas operation (
agg
) will excludeNaN
before computing the reduction. So, all the examples listed are equivalent, since the reduction won't be applied to the missing values, and the user doesn't need to bother about them. I think both paragraphs are phrased in a way that it becomes confusing to understand the point in the context of this docstrings.pandas operations generally exclude NaNs
is not even saying that this operation is excludingNaNs
. And I wouldn't expect users to know the difference betweennumpy.nanmedian
andnumpy.median
, as I said before, so for the example to be useful that should be explained. Likein numpy, there are two operations to control the impact of NaNs in the result, nanmedia meaning that [...]. In pandas, agg, as most operations just ignore the missing values, and return the operation only considering the values that are present
.IMHO, something like this will let a user really understand what's the point being made here. While to current comment is even difficult to understand for an experienced pandas user.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@datapythonista
IMO, this explains the point very well to both new and experienced users while conveying that
agg
excludes missing data. I feel mentioning something about NumPy operations might confuse new users.If we agree, I can commit this line:
In pandas,
agg
, as most operations just ignore the missing values, and return the operation only considering the values that are present.Since I did not come up with this originally. Please feel free to take over and commit.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
OK yeah, that's clearer, thanks for your input
@gurukiran07 go ahead, I'd just make sure plurality matches, so
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sorry to drag this out but re-reading this the structure of sentence seems strange.
@datapythonista does this reflect what you wanted to say?