Skip to content

DOC: RT03 fix for min,max,mean,meadian,kurt,skew #57682

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
wants to merge 17 commits into from
Closed
Show file tree
Hide file tree
Changes from 12 commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
18 changes: 0 additions & 18 deletions ci/code_checks.sh
Original file line number Diff line number Diff line change
Expand Up @@ -603,23 +603,14 @@ if [[ -z "$CHECK" || "$CHECK" == "docstrings" ]]; then
$BASE_DIR/scripts/validate_docstrings.py --format=actions --errors=RT03 --ignore_functions \
pandas.DataFrame.hist\
pandas.DataFrame.infer_objects\
pandas.DataFrame.kurt\
pandas.DataFrame.kurtosis\
pandas.DataFrame.mask\
pandas.DataFrame.max\
pandas.DataFrame.mean\
pandas.DataFrame.median\
pandas.DataFrame.min\
pandas.DataFrame.prod\
pandas.DataFrame.product\
pandas.DataFrame.sem\
pandas.DataFrame.skew\
pandas.DataFrame.std\
pandas.DataFrame.sum\
pandas.DataFrame.to_parquet\
pandas.DataFrame.unstack\
pandas.DataFrame.value_counts\
pandas.DataFrame.var\
pandas.DataFrame.where\
pandas.DatetimeIndex.indexer_at_time\
pandas.DatetimeIndex.indexer_between_time\
Expand Down Expand Up @@ -676,14 +667,9 @@ if [[ -z "$CHECK" || "$CHECK" == "docstrings" ]]; then
pandas.Series.first_valid_index\
pandas.Series.get\
pandas.Series.infer_objects\
pandas.Series.kurt\
pandas.Series.kurtosis\
pandas.Series.last_valid_index\
pandas.Series.mask\
pandas.Series.max\
pandas.Series.mean\
pandas.Series.median\
pandas.Series.min\
pandas.Series.nunique\
pandas.Series.pipe\
pandas.Series.plot.box\
Expand All @@ -694,10 +680,7 @@ if [[ -z "$CHECK" || "$CHECK" == "docstrings" ]]; then
pandas.Series.product\
pandas.Series.reindex\
pandas.Series.reorder_levels\
pandas.Series.sem\
pandas.Series.skew\
pandas.Series.sparse.to_coo\
pandas.Series.std\
pandas.Series.str.capitalize\
pandas.Series.str.casefold\
pandas.Series.str.center\
Expand Down Expand Up @@ -730,7 +713,6 @@ if [[ -z "$CHECK" || "$CHECK" == "docstrings" ]]; then
pandas.Series.to_numpy\
pandas.Series.to_timestamp\
pandas.Series.value_counts\
pandas.Series.var\
pandas.Series.where\
pandas.TimedeltaIndex.as_unit\
pandas.TimedeltaIndex.to_pytimedelta\
Expand Down
44 changes: 27 additions & 17 deletions pandas/core/generic.py
Original file line number Diff line number Diff line change
Expand Up @@ -11861,7 +11861,9 @@ def last_valid_index(self) -> Hashable:

Returns
-------
{name1} or scalar\
{return_type}
The result of applying the {name} function to the {name2} elements,
possibly just for rows or columns based on the ``axis`` parameter.\
{see_also}\
{examples}
"""
Expand Down Expand Up @@ -11894,7 +11896,9 @@ def last_valid_index(self) -> Hashable:

Returns
-------
{name1} or scalar\
{return_type}
The result of applying the {name} function to the {name2} elements,
possibly just for rows or columns based on the ``axis`` parameter.\
{see_also}\
{examples}
"""
Expand Down Expand Up @@ -11924,7 +11928,9 @@ def last_valid_index(self) -> Hashable:

Returns
-------
{name1} or {name2} (if level specified) \
{return_type} (if level specified)
The result of applying the {name} function to the {name2} elements,
possibly just for rows or columns based on the ``axis`` parameter.\
{notes}\
{examples}
"""
Expand Down Expand Up @@ -12022,9 +12028,9 @@ def last_valid_index(self) -> Hashable:

Returns
-------
{name1} or {name2}
If level is specified, then, {name2} is returned; otherwise, {name1}
is returned.
{return_type}
The result of applying the {name} function to the {name2} elements,
possibly just for rows or columns based on the ``axis`` parameter.

{see_also}
{examples}"""
Expand Down Expand Up @@ -12109,8 +12115,9 @@ def last_valid_index(self) -> Hashable:

Returns
-------
{name1} or {name2}
Return cumulative {desc} of {name1} or {name2}.
{return_type}
The result of applying the cumulative {desc} function to the {name2} elements,
possibly just for rows or columns based on the ``axis`` parameter.

See Also
--------
Expand Down Expand Up @@ -12575,14 +12582,6 @@ def make_doc(name: str, ndim: int) -> str:
"""
Generate the docstring for a Series/DataFrame reduction.
"""
if ndim == 1:
name1 = "scalar"
name2 = "Series"
axis_descr = "{index (0)}"
else:
name1 = "Series"
name2 = "DataFrame"
axis_descr = "{index (0), columns (1)}"

if name == "any":
base_doc = _bool_doc
Expand Down Expand Up @@ -12910,10 +12909,21 @@ def make_doc(name: str, ndim: int) -> str:
else:
raise NotImplementedError

if ndim == 1:
return_type = "Series or scalar"
axis_descr = "{index (0)}"
name2 = "Series"
if base_doc in (_num_doc, _sum_prod_doc):
return_type = "scalar"
else:
return_type = "Series or scalar"
axis_descr = "{index (0), columns (1)}"
name2 = "DataFrame"

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think this logic is totally correct.

If I'm not missing anything, there are 3 cases:

  • Series: return type is always scalar, the axis parameter only accepts 0 which is ignored.
  • DataFrame group 1 (e.g. skew): axis accepts 0 and 1, return is always Series
  • DataFrame group 2 (e.g. any): axis accepts 0, 1 and None, return is Series or scalar since axis=0 means both axis together and makes the return a scalar.

You'll have to check which base_doc belong to each group. Also, can you please rename name2 to obj_type or something more descriptive?
Also, if you

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have made the suggested variable name, etc and modified logic for all cases, verified locally all the cases

For Series

Function Official Documentation return type PR code Return type
Any Scalar or Series Scalar or Series
All Scalar or Series Scalar or Series
Min Scalar or scalar Scalar
Max Scalar or scalar Scalar
Sum Scalar or scalar Scalar
Prod Scalar or scalar Scalar
Median Scalar or scalar Scalar
Mean Scalar or scalar Scalar
Var Scalar or Series (if level specified) Scalar or Series (if level specified)
Std Scalar or Series (if level specified) Scalar or Series (if level specified)
Sem Scalar or Series (if level specified) Scalar or Series (if level specified)
Skew Scalar or scalar Scalar
Kurt Scalar or scalar Scalar
Cumsum Scalar or Series Scalar or Series
Cumprod Scalar or Series Scalar or Series
Cummin Scalar or Series Scalar or Series
Cummax Scalar or Series Scalar or Series

For DataFrame

Function Official Documentation return type PR code Return type
Any Series or DataFrame Series or DataFrame
All Series or DataFrame Series or DataFrame
Min Series or scalar Series or scalar
Max Series or scalar Series or scalar
Sum Series or scalar Series or scalar
Prod Series or scalar Series or scalar
Median Series or scalar Series or scalar
Mean Series or scalar Series or scalar
Var Series or DataFrame (if level specified) Series or DataFrame (if level specified)
Std Series or DataFrame (if level specified) Series or DataFrame (if level specified)
Sem Series or DataFrame (if level specified) Series or DataFrame (if level specified)
Skew Series or scalar Series or scalar
Kurt Series or scalar Series or scalar
Cumsum Series or DataFrame Series or DataFrame
Cumprod Series or DataFrame Series or DataFrame
Cummin Series or DataFrame Series or DataFrame
Cummax Series or DataFrame Series or DataFrame

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for all the detailed information and for the work on this PR @YashpalAhlawat.

In the series case for:

https://pandas.pydata.org/docs/reference/api/pandas.Series.sem.html https://pandas.pydata.org/docs/reference/api/pandas.Series.std.html https://pandas.pydata.org/docs/reference/api/pandas.Series.var.html

In all of the above cases return type is Series or scalar as per documentation.

Looking at for example Series.std, I fail to see a way for the function to return anything that it's not a scalar, the level argument was deprecated and doesn't exist anymore. So the documentation is outdated. I assume it's also the case for the other two functions.

For Series.any and Series.all I think a scalar is always returned as well.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also, for DataFrame.any and DataFrame.all the return time should be Series or scalar, not Series or DataFrame.

Copy link
Contributor Author

@YashpalAhlawat YashpalAhlawat Mar 19, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@datapythonista ,

For series Std, any , all methods needs to updated. Is level is deprecated from all std, var, sem ?

For dataframe any,all methods need to be updated. Is level is deprecated from dataframe methods also?

Kindly, clear the expectations. I will fix it.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You can check the docstring of the objects. For var for example, there is no parameter levels: https://pandas.pydata.org/docs/reference/api/pandas.Series.var.html

I think it's the case for all them, I checked several and I didn't see any with the level parameter.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For series, all cases will return Scalar, except std, var, sem as all belongs to same group should have Series and Scalar _num_ddof_doc

df = pd.DataFrame({'a': [1, 2], 'b': [2, 3]}, index=['tiger', 'zebra'])
type(df.sem())
<class 'pandas.core.series.Series'>

For DataFrame, all cases should return Series and Scalar, since **skew, max, min, ** belongs to same group _num_doc.

If this looks good to you i will proceed and implement.

@datapythonista please review

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For Series, cumsum and similar returns Series I think. And for Dataframe I think it can return DataFrame or Series as you had. For the rest what you say sounds correct. You can check the type annotations and examples, and check the parameter axis which is the one that controls what object is being returned.

docstr = base_doc.format(
desc=desc,
name=name,
name1=name1,
return_type=return_type,
name2=name2,
axis_descr=axis_descr,
see_also=see_also,
Expand Down
Loading