-
-
Notifications
You must be signed in to change notification settings - Fork 18.4k
DOC: RT03 fix for min,max,mean,meadian,kurt,skew #57682
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
/preview |
Website preview of this PR available at: https://pandas.pydata.org/preview/pandas-dev/pandas/57682/ |
Thanks @YashpalAhlawat, great job. There are couple of details that would be good to address. This docstring is also being reused by https://pandas.pydata.org/preview/pandas-dev/pandas/57682/docs/reference/api/pandas.Series.kurt.html Few things:
Thanks for the help with this! |
@datapythonista , All points has been addressed. If you have any other approach in mind to address third point. I will be happy to make changes in my implementation for that. Thanks |
Thanks for the updates @YashpalAhlawat. It would be good to change the parameters of |
@datapythonista , I have implemented a solution in a similar manner. It is functioning as expected. I would appreciate your feedback. If you believe there is a better approach, I am open to making the necessary changes. |
/preview |
Thanks a lot for working on this @YashpalAhlawat. I've been checking, and it's quite complex how we are generating these docstrings. I think I'm happy to merge your changes as they are if you want, but while it fixes the problems with the docstrings you are fixing here, it adds even a bit more complexity to the function. Also, there will still be docstrings with related problems, for example https://pandas.pydata.org/docs/reference/api/pandas.Series.sum.html will still have the What do you think about replacing Finally, I think it'd be good to move the code of condition So, you'll have something like: if ndim == 1:
return_type = "Series or scalar"
axis_descr = "{index (0)}"
else:
if base_doc in (_num_doc, _sum_prod_doc):
return_type = "scalar"
else:
return_type = "Series or scalar"
axis_descr = "{index (0), columns (1)}" Not sure if what we should do is to remove this function and simplify how docstrings are being reused, but I think this way at least things don't get too much more complicated than now. What do you think? |
I would love to change the code for all. Shouldn't I remove all logic and put doc strings directly to methods. |
Yes, if you want to do that, I think for this case everybody will surely agree, the complexity here is quite high as you already experienced. If you do it, it's better to do it step by step, we can probably have a PR for every base docstring. Or as you think it makes sense, but not replacing the whole Thanks a lot for the work on this @YashpalAhlawat |
I have updated the code as per suggestion. For removing all the code and putting plain docstrings in methods. That can be picked separately once this PR is merged. |
/preview |
Website preview of this PR available at: https://pandas.pydata.org/preview/pandas-dev/pandas/57682/ |
/preview |
Website preview of this PR available at: https://pandas.pydata.org/preview/pandas-dev/pandas/57682/ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the work on this @YashpalAhlawat. And sorry for the delay. Looks quite good, but the logic is not always correct (see comment).
If you can fix it, I think we can get this merged.
pandas/core/generic.py
Outdated
if ndim == 1: | ||
return_type = "Series or scalar" | ||
axis_descr = "{index (0)}" | ||
name2 = "Series" | ||
if base_doc in (_num_doc, _sum_prod_doc): | ||
return_type = "scalar" | ||
else: | ||
return_type = "Series or scalar" | ||
axis_descr = "{index (0), columns (1)}" | ||
name2 = "DataFrame" | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think this logic is totally correct.
If I'm not missing anything, there are 3 cases:
Series
: return type is alwaysscalar
, the axis parameter only accepts 0 which is ignored.DataFrame
group 1 (e.g.skew
): axis accepts 0 and 1, return is alwaysSeries
DataFrame
group 2 (e.g.any
): axis accepts 0, 1 and None, return isSeries or scalar
sinceaxis=0
means both axis together and makes the return a scalar.
You'll have to check which base_doc belong to each group. Also, can you please rename name2
to obj_type
or something more descriptive?
Also, if you
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In the series case for:
https://pandas.pydata.org/docs/reference/api/pandas.Series.sem.html
https://pandas.pydata.org/docs/reference/api/pandas.Series.std.html
https://pandas.pydata.org/docs/reference/api/pandas.Series.var.html
In all of the above cases return type is Series or scalar as per documentation.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have made the suggested variable name, etc and modified logic for all cases, verified locally all the cases
For Series
Function | Official Documentation return type | PR code Return type |
---|---|---|
Any | Scalar or Series | Scalar or Series |
All | Scalar or Series | Scalar or Series |
Min | Scalar or scalar | Scalar |
Max | Scalar or scalar | Scalar |
Sum | Scalar or scalar | Scalar |
Prod | Scalar or scalar | Scalar |
Median | Scalar or scalar | Scalar |
Mean | Scalar or scalar | Scalar |
Var | Scalar or Series (if level specified) | Scalar or Series (if level specified) |
Std | Scalar or Series (if level specified) | Scalar or Series (if level specified) |
Sem | Scalar or Series (if level specified) | Scalar or Series (if level specified) |
Skew | Scalar or scalar | Scalar |
Kurt | Scalar or scalar | Scalar |
Cumsum | Scalar or Series | Scalar or Series |
Cumprod | Scalar or Series | Scalar or Series |
Cummin | Scalar or Series | Scalar or Series |
Cummax | Scalar or Series | Scalar or Series |
For DataFrame
Function | Official Documentation return type | PR code Return type |
---|---|---|
Any | Series or DataFrame | Series or DataFrame |
All | Series or DataFrame | Series or DataFrame |
Min | Series or scalar | Series or scalar |
Max | Series or scalar | Series or scalar |
Sum | Series or scalar | Series or scalar |
Prod | Series or scalar | Series or scalar |
Median | Series or scalar | Series or scalar |
Mean | Series or scalar | Series or scalar |
Var | Series or DataFrame (if level specified) | Series or DataFrame (if level specified) |
Std | Series or DataFrame (if level specified) | Series or DataFrame (if level specified) |
Sem | Series or DataFrame (if level specified) | Series or DataFrame (if level specified) |
Skew | Series or scalar | Series or scalar |
Kurt | Series or scalar | Series or scalar |
Cumsum | Series or DataFrame | Series or DataFrame |
Cumprod | Series or DataFrame | Series or DataFrame |
Cummin | Series or DataFrame | Series or DataFrame |
Cummax | Series or DataFrame | Series or DataFrame |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for all the detailed information and for the work on this PR @YashpalAhlawat.
In the series case for:
https://pandas.pydata.org/docs/reference/api/pandas.Series.sem.html https://pandas.pydata.org/docs/reference/api/pandas.Series.std.html https://pandas.pydata.org/docs/reference/api/pandas.Series.var.html
In all of the above cases return type is Series or scalar as per documentation.
Looking at for example Series.std
, I fail to see a way for the function to return anything that it's not a scalar, the level
argument was deprecated and doesn't exist anymore. So the documentation is outdated. I assume it's also the case for the other two functions.
For Series.any
and Series.all
I think a scalar is always returned as well.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Also, for DataFrame.any
and DataFrame.all
the return time should be Series or scalar
, not Series or DataFrame
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For series Std, any , all methods needs to updated. Is level is deprecated from all std, var, sem ?
For dataframe any,all methods need to be updated. Is level is deprecated from dataframe methods also?
Kindly, clear the expectations. I will fix it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You can check the docstring of the objects. For var
for example, there is no parameter levels
: https://pandas.pydata.org/docs/reference/api/pandas.Series.var.html
I think it's the case for all them, I checked several and I didn't see any with the level
parameter.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For series, all cases will return Scalar, except std, var, sem as all belongs to same group should have Series and Scalar _num_ddof_doc
df = pd.DataFrame({'a': [1, 2], 'b': [2, 3]}, index=['tiger', 'zebra'])
type(df.sem())
<class 'pandas.core.series.Series'>
For DataFrame, all cases should return Series and Scalar, since **skew, max, min, ** belongs to same group _num_doc.
If this looks good to you i will proceed and implement.
@datapythonista please review
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For Series, cumsum and similar returns Series I think. And for Dataframe I think it can return DataFrame or Series as you had. For the rest what you say sounds correct. You can check the type annotations and examples, and check the parameter axis which is the one that controls what object is being returned.
@YashpalAhlawat you'll have to resolve the conflicts, we changed how the ignored errors are specified in |
@datapythonista , Could you please review this. So that I can close this and move on to some other task. Thanks!! |
Thanks for the PR but it appears this PR has gone stale and needs a rebase. Closing but feel free to ping once you have addressed the conflicts and want to keep working on this |
This PR will fix RT03 error for min,max,mean,meadian,kurt,skew methods
doc/source/whatsnew/vX.X.X.rst
file if fixing a bug or adding a new feature.