Skip to content

DOC: update the DataFrame.count docstring #20221

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 5 commits into from
Mar 12, 2018
Merged
Changes from 2 commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
62 changes: 54 additions & 8 deletions pandas/core/frame.py
Original file line number Diff line number Diff line change
Expand Up @@ -5592,22 +5592,68 @@ def corrwith(self, other, axis=0, drop=False):

def count(self, axis=0, level=None, numeric_only=False):
"""
Return Series with number of non-NA/null observations over requested
axis. Works with non-floating point data as well (detects NaN and None)
Count non-NA cells for each column or row.

Return Series with number of non-NA observations over requested
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

One last change, maybe remove the first sentence since this can return a DataFrame with level.

I think just use the extended summary to say what counts as non-null data.

The values None, NaN, NaT, and optionally np.inf (depending on pandas.options.mode.use_inf_as_na) are considered NA.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do you mean the first sentence in the extended summary, i.e. :
"Return Series with number of non-NA observations over requested axis."

If I understand you right I would change the entire summary (i.e. short and extended summary) to look like the following:

        Count non-NA cells for each column or row.

        The values None, NaN, NaT, and optionally np.inf (depending on
        pandas.options.mode.use_inf_as_na) are considered NA.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah. np.inf to `numpy.inf` and single backticks around pandas.options.mode.use_inf_as_na.

axis. Works with non-floating point data as well (detects `None`,
`NaN` and `NaT`)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

End with a .


Parameters
----------
axis : {0 or 'index', 1 or 'columns'}, default 0
0 or 'index' for row-wise, 1 or 'columns' for column-wise
level : int or level name, default None
If the axis is a MultiIndex (hierarchical), count along a
particular level, collapsing into a DataFrame
If 0 or 'index' counts are generated for each column.
If 1 or 'columns' counts are generated for each row.
level : int or str, optional
If the axis is a `MultiIndex` (hierarchical), count along a
particular level, collapsing into a `DataFrame`.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

backticks around the `level` parameter.

A `str` specifies the level name.
numeric_only : boolean, default False
Include only float, int, boolean data
Include only `float`, `int` or `boolean` data.

Returns
-------
count : Series (or DataFrame if level specified)
Series or DataFrame
For each column/row the number of non-NA/null entries.
If level is specified returns a `DataFrame`.

See Also
--------
Series.count: number of non-NA elements in a Series
DataFrame.shape: number of DataFrame rows and columns (including NA
elements)
DataFrame.isnull: boolean same-sized DataFrame showing places of NA
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

refer to isna instead

elements

Examples
--------
>>> df = pd.DataFrame({"Person":
... ["John", "Myla", None, "John", "Myla"],
... "Age": [24., np.nan, 21., 33, 26],
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

PEP8: indendt one more space. smae with line below.

Copy link
Contributor Author

@joders joders Mar 11, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For me flake complains if I change that. on my system flake doesn't check the examples, so I copy it in the code:

        df = pd.DataFrame({"Person":
                           ["John", "Myla", None, "John", "Myla"],
                           "Age": [24., np.nan, 21., 33, 26],
                           "Single": [False, True, True, True, False]})
        df

If I have it like it like this flake only complains about the pd not being defined:
pandas/core/frame.py:5672:14: F821 undefined name 'pd'

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry I misread.

... "Single": [False, True, True, True, False]})
>>> df
Person Age Single
0 John 24.0 False
1 Myla NaN True
2 None 21.0 True
3 John 33.0 True
4 Myla 26.0 False
>>> df.count()
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

blank line between cases

Person 4
Age 4
Single 5
dtype: int64
>>> df.count(axis=1)
0 3
1 2
2 2
3 3
4 3
dtype: int64
>>> df.set_index(["Person", "Single"]).count(level="Person")
Age
Person
John 2
Myla 1
"""
axis = self._get_axis_number(axis)
if level is not None:
Expand Down