-
-
Notifications
You must be signed in to change notification settings - Fork 18.4k
DOC: Updated aggregate docstring #35042
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Hello @gurukiran07! Thanks for updating this PR. We checked the lines you've touched for PEP 8 issues, and found: There are currently no PEP 8 issues detected in this Pull Request. Cheers! 🍻 Comment last updated at 2020-08-24 13:26:17 UTC |
Co-authored-by: Marco Gorelli <[email protected]>
@gurukiran07 why did you close this, and why do you say "I guess this is fixed now"? |
This was mentioned in the docs. I thought this would cover |
@datapythonista @MarcoGorelli It's green now, all tests passed. If there are any changes to be made let me know. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Minor comment, other than that looks good to me
pandas/core/generic.py
Outdated
@@ -5076,6 +5076,9 @@ def pipe(self, func, *args, **kwargs): | |||
----- | |||
`agg` is an alias for `aggregate`. Use the alias. | |||
|
|||
Pandas operations generally exclude NaNs. For example, ``agg(np.nanmedian)``, | |||
``agg(np.median)``, and ``agg('median')`` will return the same result. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @gurukiran07, but I don't find this comment very clear. I don't think we want to talk about what pandas operations generally do in this docstring. I guess you mean that aggregate
ignores NaN
values, even if the function argument does not. Just say that, with a code example if you want to be clearer. But I don't expect readers of this docstring to know about numpy.nanmedian
and numpy.median
to use them as an example, after a vague sentence about pandas operations in general.
Can you rephrase please?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@datapythonista what they want to say, I think, is that np.nanmedian
and np.median
will map to the same internal pandas operation, and, as it says in 10 minutes to pandas:
Operations in general exclude missing data
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think the comment in the PR is very clear and useful to users. We can surely provide information on what is being used in the notes section, but after reading the proposed comments I feel more confused than before reading it. So, it would be great to rephrase it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This was in a previous commit
- Some NumPy functions resolve to their corresponding internal Cython function.
As pandas operations generally exclude NaNs, for example `.agg(np.nanmedian)`,
`.agg(np.median)`, and `.agg('median') will return the same result.
do you think it's clearer / more useful?
@gurukiran07 is this still active? Do you want to try rephrasing?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@gurukiran07 is this still active? Do you want to try rephrasing?
@MarcoGorelli Yes, I'm active. Sorry for my inactiveness. I can try rephrasing it but if you have something in mind, please free to take over.
Can we put it like this @datapythonista @MarcoGorelli
Pandas operations, in general, exclude missing NaNs.
For example, mean of Series [1, 2, np.nan] would be 1.5
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think the average pandas user knowns anything about Python, so I would exclude that part.
Also, as I said earlier, in the context of this docstring I don't think it's relevant what pandas operations generally do.
I think the point here is to say that THIS pandas operation (agg
) will exclude NaN
before computing the reduction. So, all the examples listed are equivalent, since the reduction won't be applied to the missing values, and the user doesn't need to bother about them. I think both paragraphs are phrased in a way that it becomes confusing to understand the point in the context of this docstrings. pandas operations generally exclude NaNs
is not even saying that this operation is excluding NaNs
. And I wouldn't expect users to know the difference between numpy.nanmedian
and numpy.median
, as I said before, so for the example to be useful that should be explained. Like in numpy, there are two operations to control the impact of NaNs in the result, nanmedia meaning that [...]. In pandas, agg, as most operations just ignore the missing values, and return the operation only considering the values that are present
.
IMHO, something like this will let a user really understand what's the point being made here. While to current comment is even difficult to understand for an experienced pandas user.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In pandas,
agg
, as most operations just ignore the missing values, and return the operation only considering the values that are present.
IMO, this explains the point very well to both new and experienced users while conveying that agg
excludes missing data. I feel mentioning something about NumPy operations might confuse new users.
If we agree, I can commit this line:
In pandas, agg
, as most operations just ignore the missing values, and return the operation only considering the values that are present.
Since I did not come up with this originally. Please feel free to take over and commit.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
OK yeah, that's clearer, thanks for your input
@gurukiran07 go ahead, I'd just make sure plurality matches, so
In pandas, agg, as most operations just ignores the missing values, and returns the operation only considering the values that are present.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sorry to drag this out but re-reading this the structure of sentence seems strange.
In pandas, agg, as most operations just ignore the missing values, returns the operation only considering the values that are present
@datapythonista does this reflect what you wanted to say?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @gurukiran07, looks good to me.
@datapythonista Guess, I need to retrigger the checks. Will ping once green. |
@datapythonista All checks passed, it's green now. |
Cool, merging then following Marc's approval, thanks @gurukiran07 |
* DOC: Updated aggregate docstring * Doc: updated aggregate docstring * Update pandas/core/generic.py Co-authored-by: Marco Gorelli <[email protected]> * Update generic.py * Update generic.py * Revert "Update generic.py" This reverts commit 15ecaf7. * Revert "Revert "Update generic.py"" This reverts commit cc231c8. * Updated docstring of agg * Trailing whitespace removed * DOC: Updated docstring of agg * Update generic.py * Updated Docstring Co-authored-by: Marco Gorelli <[email protected]>
Link to current
Series.agg
: https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.Series.agg.htmlOriginal question asked in gitter:
Does
pd.Series.agg
withfunc
parameter set to'median'
usesnp.nanmedian
(Not onlymedian
includingmean
,mode
)?Whenever
NaN
is present does it fallback to usingnp.nanmedian
?Reply from @MarcoGorelli
if you look at
pandas/core/base.py
you'll seein
_cython_table
. So, both resolve to the same internal cython function.IMO it's better to mention this in the docs under
Note:
section.Under note section saying:
Some NumPy functions such as
np.mean
,np.nanmean
,np.median
etc. resolve to their corresponding internal cython function.Output of
python validate_docstrings.py pandas.Series.agg