Skip to content

Add a correction keyword to the std methods #183

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 4 commits into from
Jul 5, 2023
Merged
Show file tree
Hide file tree
Changes from 1 commit
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
20 changes: 18 additions & 2 deletions spec/API_specification/dataframe_api/column_object.py
Original file line number Diff line number Diff line change
Expand Up @@ -410,20 +410,36 @@ def mean(self, *, skip_nulls: bool = True) -> DType:
dtypes.
"""

def std(self, *, skip_nulls: bool = True) -> DType:
def std(self, *, correction: int | float = 1, skip_nulls: bool = True) -> DType:
"""
Reduction returns a scalar. Must be supported for numerical and
datetime data types. Returns a float for numerical data types, and
datetime (with the appropriate timedelta format string) for datetime
dtypes.

Parameters
----------
correction
Correction to apply to the result. 0 for sample standard deviation
and 1 for population standard deviation.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If we're only supporting 0 or 1 can this just be an int for now? We could change to allow floats later if desired?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the description is misleading here. The 0 and 1 references are, tmk, to help users identify common correction values. See the Array API specification for a more extensive description.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we "borrow" the docstring from the Array API specification then?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fine with me, but up to Marco. :)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've reworded to make clear that 0 and 1 are just examples

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That rewording was for only 1 of 4 docstrings. I now edited all of them, and for one adopted the more extensive description from the array API standard.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@MarcoGorelli if you're happy with that change, I think this PR is good to go.

skip_nulls
Whether to skip null values.
"""

def var(self, *, skip_nulls: bool = True) -> DType:
def var(self, *, correction: int | float = 1, skip_nulls: bool = True) -> DType:
"""
Reduction returns a scalar. Must be supported for numerical and
datetime data types. Returns a float for numerical data types, and
datetime (with the appropriate timedelta format string) for datetime
dtypes.

Parameters
----------
correction
Correction to apply to the result. 0 for sample standard deviation
and 1 for population standard deviation.
skip_nulls
Whether to skip null values.
"""

def is_null(self) -> Column:
Expand Down
20 changes: 18 additions & 2 deletions spec/API_specification/dataframe_api/dataframe_object.py
Original file line number Diff line number Diff line change
Expand Up @@ -684,15 +684,31 @@ def mean(self, *, skip_nulls: bool = True) -> DataFrame:
"""
...

def std(self, *, skip_nulls: bool = True) -> DataFrame:
def std(self, *, correction: int | float = 1, skip_nulls: bool = True) -> DataFrame:
"""
Reduction returns a 1-row DataFrame.

Parameters
----------
correction
Correction to apply to the result. 0 for sample standard deviation
and 1 for population standard deviation.
skip_nulls
Whether to skip null values.
"""
...

def var(self, *, skip_nulls: bool = True) -> DataFrame:
def var(self, *, correction: int | float = 1, skip_nulls: bool = True) -> DataFrame:
"""
Reduction returns a 1-row DataFrame.

Parameters
----------
correction
Correction to apply to the result. 0 for sample standard deviation
and 1 for population standard deviation.
skip_nulls
Whether to skip null values.
"""
...

Expand Down
4 changes: 2 additions & 2 deletions spec/API_specification/dataframe_api/groupby_object.py
Original file line number Diff line number Diff line change
Expand Up @@ -41,10 +41,10 @@ def median(self, *, skip_nulls: bool = True) -> "DataFrame":
def mean(self, *, skip_nulls: bool = True) -> "DataFrame":
...

def std(self, *, skip_nulls: bool = True) -> "DataFrame":
def std(self, *, correction: int | float = 1, skip_nulls: bool = True) -> "DataFrame":
...

def var(self, *, skip_nulls: bool = True) -> "DataFrame":
def var(self, *, correction: int | float = 1, skip_nulls: bool = True) -> "DataFrame":
...

def size(self) -> "DataFrame":
Expand Down