DOC: RT03 fix for min,max,mean,meadian,kurt,skew #57682

YashpalAhlawat · 2024-02-29T18:05:43Z

This PR will fix RT03 error for min,max,mean,meadian,kurt,skew methods

xref DOC: fix RT03 errors in docstrings #57416
Tests added and passed if fixing a bug or adding a new feature
All code checks passed.
Added type annotations to new arguments/methods/functions.
Added an entry in the latest doc/source/whatsnew/vX.X.X.rst file if fixing a bug or adding a new feature.

pandas/core/generic.py

datapythonista · 2024-03-01T20:07:01Z

/preview

github-actions · 2024-03-01T20:07:58Z

Website preview of this PR available at: https://pandas.pydata.org/preview/pandas-dev/pandas/57682/

datapythonista · 2024-03-01T20:20:41Z

Thanks @YashpalAhlawat, great job.

There are couple of details that would be good to address. This docstring is also being reused by Series methods, you can see how it looks here:

https://pandas.pydata.org/preview/pandas-dev/pandas/57682/docs/reference/api/pandas.Series.kurt.html

Few things:

Can we also remove the Series.kurt... methods from the validation as we did with the DataFrame methods?
If you see in the description you added, it says applying the kurt function to the DataFrame elements when we are in the Series docstring. Can you use a parameter so it says ... Series elements please?
Finally, this is not from your PR, but if you see the docstring in the link, it says scalar or scalar for the return type, which doesn't make sense. I think for DataFrame it should says Series or scalar and for Series it should say simply scalar. Can you have a look to see if this is really the case, and make the docstring show the correct types please?

Thanks for the help with this!

YashpalAhlawat · 2024-03-02T04:51:54Z

Thanks @YashpalAhlawat, great job.

There are couple of details that would be good to address. This docstring is also being reused by Series methods, you can see how it looks here:

https://pandas.pydata.org/preview/pandas-dev/pandas/57682/docs/reference/api/pandas.Series.kurt.html

Few things:

Can we also remove the Series.kurt... methods from the validation as we did with the DataFrame methods?

If you see in the description you added, it says applying the kurt function to the DataFrame elements when we are in the Series docstring. Can you use a parameter so it says ... Series elements please?

Finally, this is not from your PR, but if you see the docstring in the link, it says scalar or scalar for the return type, which doesn't make sense. I think for DataFrame it should says Series or scalar and for Series it should say simply scalar. Can you have a look to see if this is really the case, and make the docstring show the correct types please?

Thanks for the help with this!

@datapythonista , All points has been addressed. If you have any other approach in mind to address third point. I will be happy to make changes in my implementation for that.

Thanks

datapythonista · 2024-03-02T11:50:01Z

Thanks for the updates @YashpalAhlawat. It would be good to change the parameters of name1, name2 as needed in a static way. The idea is that we have a template we use in different methods. And when reusing the template we specify the values for each function. So, you can have a value Series or scalar or scalar as appropriate so the type renders correctly without requiring extra complexity. Does this make sense?

YashpalAhlawat · 2024-03-02T12:11:38Z

Thanks for the updates @YashpalAhlawat. It would be good to change the parameters of name1, name2 as needed in a static way. The idea is that we have a template we use in different methods. And when reusing the template we specify the values for each function. So, you can have a value Series or scalar or scalar as appropriate so the type renders correctly without requiring extra complexity. Does this make sense?

@datapythonista , I have implemented a solution in a similar manner. It is functioning as expected. I would appreciate your feedback. If you believe there is a better approach, I am open to making the necessary changes.

YashpalAhlawat · 2024-03-02T12:12:14Z

/preview

pandas/core/generic.py

datapythonista · 2024-03-05T18:11:05Z

Thanks a lot for working on this @YashpalAhlawat.

I've been checking, and it's quite complex how we are generating these docstrings. I think I'm happy to merge your changes as they are if you want, but while it fixes the problems with the docstrings you are fixing here, it adds even a bit more complexity to the function. Also, there will still be docstrings with related problems, for example https://pandas.pydata.org/docs/reference/api/pandas.Series.sum.html will still have the scalar or scalar return.

What do you think about replacing name1 and name2 and using return_type instead containing both values together for all cases (e.g. DataFrame or Series, Series or scalar, scalar)? In some cases, that would require changing the description of the return, to avoid having the type (same as you did in one of the iterations of this PR, focusing on what is being returned conceptually instead of in the types).

Finally, I think it'd be good to move the code of condition if ndim == 1: at the end of the make_doc function, so you can have the if you added together with the settings of the other return_type (name1 and name2 now).

So, you'll have something like:

      if ndim == 1:
          return_type = "Series or scalar"
          axis_descr = "{index (0)}"
      else:
          if base_doc in (_num_doc, _sum_prod_doc):
              return_type = "scalar"
          else:
              return_type = "Series or scalar"
          axis_descr = "{index (0), columns (1)}"

Not sure if what we should do is to remove this function and simplify how docstrings are being reused, but I think this way at least things don't get too much more complicated than now. What do you think?

YashpalAhlawat · 2024-03-06T04:11:23Z

@datapythonista ,

I would love to change the code for all.
But #57683 focuses on removing such code and adding string directly to methods.

Shouldn't I remove all logic and put doc strings directly to methods.

datapythonista · 2024-03-06T14:34:52Z

Shouldn't I remove all logic and put doc strings directly to methods.

Yes, if you want to do that, I think for this case everybody will surely agree, the complexity here is quite high as you already experienced. If you do it, it's better to do it step by step, we can probably have a PR for every base docstring. Or as you think it makes sense, but not replacing the whole make_doc function in a single PR, that'll be very hard to review.

Thanks a lot for the work on this @YashpalAhlawat

YashpalAhlawat · 2024-03-10T06:59:03Z

@datapythonista ,

I have updated the code as per suggestion.

For removing all the code and putting plain docstrings in methods. That can be picked separately once this PR is merged.

datapythonista · 2024-03-12T13:16:13Z

/preview

github-actions · 2024-03-12T13:17:50Z

Website preview of this PR available at: https://pandas.pydata.org/preview/pandas-dev/pandas/57682/

datapythonista · 2024-03-14T16:10:58Z

/preview

github-actions · 2024-03-14T16:12:38Z

Website preview of this PR available at: https://pandas.pydata.org/preview/pandas-dev/pandas/57682/

datapythonista

Thanks for the work on this @YashpalAhlawat. And sorry for the delay. Looks quite good, but the logic is not always correct (see comment).

If you can fix it, I think we can get this merged.

datapythonista · 2024-03-14T16:30:48Z

pandas/core/generic.py

+    if ndim == 1:
+        return_type = "Series or scalar"
+        axis_descr = "{index (0)}"
+        name2 = "Series"
+        if base_doc in (_num_doc, _sum_prod_doc):
+            return_type = "scalar"
+    else:
+        return_type = "Series or scalar"
+        axis_descr = "{index (0), columns (1)}"
+        name2 = "DataFrame"
+


I don't think this logic is totally correct.

If I'm not missing anything, there are 3 cases:

Series: return type is always scalar, the axis parameter only accepts 0 which is ignored.

DataFrame group 1 (e.g. skew): axis accepts 0 and 1, return is always Series

DataFrame group 2 (e.g. any): axis accepts 0, 1 and None, return is Series or scalar since axis=0 means both axis together and makes the return a scalar.

You'll have to check which base_doc belong to each group. Also, can you please rename name2 to obj_type or something more descriptive?
Also, if you

@datapythonista

In the series case for:

https://pandas.pydata.org/docs/reference/api/pandas.Series.sem.html
https://pandas.pydata.org/docs/reference/api/pandas.Series.std.html
https://pandas.pydata.org/docs/reference/api/pandas.Series.var.html

In all of the above cases return type is Series or scalar as per documentation.

I have made the suggested variable name, etc and modified logic for all cases, verified locally all the cases

For Series

Function Official Documentation return type PR code Return type

Any Scalar or Series Scalar or Series

All Scalar or Series Scalar or Series

Min Scalar or scalar Scalar

Max Scalar or scalar Scalar

Sum Scalar or scalar Scalar

Prod Scalar or scalar Scalar

Median Scalar or scalar Scalar

Mean Scalar or scalar Scalar

Var Scalar or Series (if level specified) Scalar or Series (if level specified)

Std Scalar or Series (if level specified) Scalar or Series (if level specified)

Sem Scalar or Series (if level specified) Scalar or Series (if level specified)

Skew Scalar or scalar Scalar

Kurt Scalar or scalar Scalar

Cumsum Scalar or Series Scalar or Series

Cumprod Scalar or Series Scalar or Series

Cummin Scalar or Series Scalar or Series

Cummax Scalar or Series Scalar or Series

For DataFrame

Function Official Documentation return type PR code Return type

Any Series or DataFrame Series or DataFrame

All Series or DataFrame Series or DataFrame

Min Series or scalar Series or scalar

Max Series or scalar Series or scalar

Sum Series or scalar Series or scalar

Prod Series or scalar Series or scalar

Median Series or scalar Series or scalar

Mean Series or scalar Series or scalar

Var Series or DataFrame (if level specified) Series or DataFrame (if level specified)

Std Series or DataFrame (if level specified) Series or DataFrame (if level specified)

Sem Series or DataFrame (if level specified) Series or DataFrame (if level specified)

Skew Series or scalar Series or scalar

Kurt Series or scalar Series or scalar

Cumsum Series or DataFrame Series or DataFrame

Cumprod Series or DataFrame Series or DataFrame

Cummin Series or DataFrame Series or DataFrame

Cummax Series or DataFrame Series or DataFrame

Thanks for all the detailed information and for the work on this PR @YashpalAhlawat.

In the series case for:

https://pandas.pydata.org/docs/reference/api/pandas.Series.sem.html https://pandas.pydata.org/docs/reference/api/pandas.Series.std.html https://pandas.pydata.org/docs/reference/api/pandas.Series.var.html

In all of the above cases return type is Series or scalar as per documentation.

Looking at for example Series.std, I fail to see a way for the function to return anything that it's not a scalar, the level argument was deprecated and doesn't exist anymore. So the documentation is outdated. I assume it's also the case for the other two functions.

For Series.any and Series.all I think a scalar is always returned as well.

Also, for DataFrame.any and DataFrame.all the return time should be Series or scalar, not Series or DataFrame.

@datapythonista ,

For series Std, any , all methods needs to updated. Is level is deprecated from all std, var, sem ?

For dataframe any,all methods need to be updated. Is level is deprecated from dataframe methods also?

Kindly, clear the expectations. I will fix it.

You can check the docstring of the objects. For var for example, there is no parameter levels: https://pandas.pydata.org/docs/reference/api/pandas.Series.var.html

I think it's the case for all them, I checked several and I didn't see any with the level parameter.

For series, all cases will return Scalar, except std, var, sem as all belongs to same group should have Series and Scalar _num_ddof_doc

df = pd.DataFrame({'a': [1, 2], 'b': [2, 3]}, index=['tiger', 'zebra'])
type(df.sem())
<class 'pandas.core.series.Series'>

For DataFrame, all cases should return Series and Scalar, since **skew, max, min, ** belongs to same group _num_doc.

If this looks good to you i will proceed and implement.

@datapythonista please review

For Series, cumsum and similar returns Series I think. And for Dataframe I think it can return DataFrame or Series as you had. For the rest what you say sounds correct. You can check the type annotations and examples, and check the parameter axis which is the one that controls what object is being returned.

…_fix_num_doc

datapythonista · 2024-03-19T17:18:09Z

@YashpalAhlawat you'll have to resolve the conflicts, we changed how the ignored errors are specified in code_checks.sh.

YashpalAhlawat · 2024-03-25T15:05:33Z

@datapythonista , Could you please review this. So that I can close this and move on to some other task.

Thanks!!

mroeschke · 2024-04-23T18:53:42Z

Thanks for the PR but it appears this PR has gone stale and needs a rebase. Closing but feel free to ping once you have addressed the conflicts and want to keep working on this

DOC: RT03 fix for min,max,mean,meadian,kurt,skew

88bd06a

YashpalAhlawat requested a review from mroeschke as a code owner February 29, 2024 18:05

retrigger checks

74875ab

datapythonista reviewed Feb 29, 2024

View reviewed changes

pandas/core/generic.py Outdated Show resolved Hide resolved

datapythonista added the Docs label Feb 29, 2024

YashpalAhlawat added 2 commits March 1, 2024 14:49

Review Suggestions

964e8d5

Review Suggestions Implemented

2e87ece

YashpalAhlawat added 2 commits March 2, 2024 10:15

updated return type in doc

4483540

removed series functions from partially ignoring

00295f5

YashpalAhlawat requested a review from datapythonista March 2, 2024 12:11

bergnerjonas reviewed Mar 3, 2024

View reviewed changes

pandas/core/generic.py Outdated Show resolved Hide resolved

Review suggestions

ce00d72

datapythonista mentioned this pull request Mar 5, 2024

wip Remove docstring substitutions #57683

Closed

5 tasks

YashpalAhlawat added 3 commits March 9, 2024 19:21

resolve merge conflicts

9411d49

refactored logic

495fb1b

reformatting

097eb4a

updated if logic

523fc01

Resolve merge conflicts

1126083

datapythonista reviewed Mar 14, 2024

View reviewed changes

datapythonista mentioned this pull request Mar 14, 2024

DOC: Updated the returns for DataFrame.any/all to return either a Series or scalar #57817

Closed

YashpalAhlawat added 2 commits March 17, 2024 21:17

Merge branch 'main' of https://github.com/pandas-dev/pandas into RT03…

90555ba

…_fix_num_doc

review suggestions

eb33a4b

YashpalAhlawat requested a review from datapythonista March 17, 2024 17:01

YashpalAhlawat added 3 commits March 21, 2024 08:40

resolve merge conflicts

3d4eeeb

logic update

b8e10e0

Removed additional RT03

8b0feb1

mroeschke closed this Apr 23, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

DOC: RT03 fix for min,max,mean,meadian,kurt,skew #57682

DOC: RT03 fix for min,max,mean,meadian,kurt,skew #57682

YashpalAhlawat commented Feb 29, 2024 •

edited

Loading

datapythonista commented Mar 1, 2024

github-actions bot commented Mar 1, 2024

datapythonista commented Mar 1, 2024

YashpalAhlawat commented Mar 2, 2024 •

edited

Loading

datapythonista commented Mar 2, 2024

YashpalAhlawat commented Mar 2, 2024 •

edited

Loading

YashpalAhlawat commented Mar 2, 2024

datapythonista commented Mar 5, 2024

YashpalAhlawat commented Mar 6, 2024

datapythonista commented Mar 6, 2024

YashpalAhlawat commented Mar 10, 2024

datapythonista commented Mar 12, 2024

github-actions bot commented Mar 12, 2024

datapythonista commented Mar 14, 2024

github-actions bot commented Mar 14, 2024

datapythonista left a comment

datapythonista Mar 14, 2024

YashpalAhlawat Mar 17, 2024

YashpalAhlawat Mar 17, 2024

datapythonista Mar 17, 2024

datapythonista Mar 17, 2024

YashpalAhlawat Mar 19, 2024 •

edited

Loading

datapythonista Mar 19, 2024

YashpalAhlawat Mar 21, 2024

datapythonista Mar 21, 2024

datapythonista commented Mar 19, 2024

YashpalAhlawat commented Mar 25, 2024

mroeschke commented Apr 23, 2024

Function	Official Documentation return type	PR code Return type
Any	Scalar or Series	Scalar or Series
All	Scalar or Series	Scalar or Series
Min	Scalar or scalar	Scalar
Max	Scalar or scalar	Scalar
Sum	Scalar or scalar	Scalar
Prod	Scalar or scalar	Scalar
Median	Scalar or scalar	Scalar
Mean	Scalar or scalar	Scalar
Var	Scalar or Series (if level specified)	Scalar or Series (if level specified)
Std	Scalar or Series (if level specified)	Scalar or Series (if level specified)
Sem	Scalar or Series (if level specified)	Scalar or Series (if level specified)
Skew	Scalar or scalar	Scalar
Kurt	Scalar or scalar	Scalar
Cumsum	Scalar or Series	Scalar or Series
Cumprod	Scalar or Series	Scalar or Series
Cummin	Scalar or Series	Scalar or Series
Cummax	Scalar or Series	Scalar or Series

Function	Official Documentation return type	PR code Return type
Any	Series or DataFrame	Series or DataFrame
All	Series or DataFrame	Series or DataFrame
Min	Series or scalar	Series or scalar
Max	Series or scalar	Series or scalar
Sum	Series or scalar	Series or scalar
Prod	Series or scalar	Series or scalar
Median	Series or scalar	Series or scalar
Mean	Series or scalar	Series or scalar
Var	Series or DataFrame (if level specified)	Series or DataFrame (if level specified)
Std	Series or DataFrame (if level specified)	Series or DataFrame (if level specified)
Sem	Series or DataFrame (if level specified)	Series or DataFrame (if level specified)
Skew	Series or scalar	Series or scalar
Kurt	Series or scalar	Series or scalar
Cumsum	Series or DataFrame	Series or DataFrame
Cumprod	Series or DataFrame	Series or DataFrame
Cummin	Series or DataFrame	Series or DataFrame
Cummax	Series or DataFrame	Series or DataFrame

DOC: RT03 fix for min,max,mean,meadian,kurt,skew #57682

DOC: RT03 fix for min,max,mean,meadian,kurt,skew #57682

Conversation

YashpalAhlawat commented Feb 29, 2024 • edited Loading

datapythonista commented Mar 1, 2024

github-actions bot commented Mar 1, 2024

datapythonista commented Mar 1, 2024

YashpalAhlawat commented Mar 2, 2024 • edited Loading

datapythonista commented Mar 2, 2024

YashpalAhlawat commented Mar 2, 2024 • edited Loading

YashpalAhlawat commented Mar 2, 2024

datapythonista commented Mar 5, 2024

YashpalAhlawat commented Mar 6, 2024

datapythonista commented Mar 6, 2024

YashpalAhlawat commented Mar 10, 2024

datapythonista commented Mar 12, 2024

github-actions bot commented Mar 12, 2024

datapythonista commented Mar 14, 2024

github-actions bot commented Mar 14, 2024

datapythonista left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

For Series

For DataFrame

Choose a reason for hiding this comment

Choose a reason for hiding this comment

YashpalAhlawat Mar 19, 2024 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

datapythonista commented Mar 19, 2024

YashpalAhlawat commented Mar 25, 2024

mroeschke commented Apr 23, 2024

YashpalAhlawat commented Feb 29, 2024 •

edited

Loading

YashpalAhlawat commented Mar 2, 2024 •

edited

Loading

YashpalAhlawat commented Mar 2, 2024 •

edited

Loading

YashpalAhlawat Mar 19, 2024 •

edited

Loading