Skip to content

DOC: update the DataFrame.count docstring #20221

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 5 commits into from
Mar 12, 2018
Merged
Changes from 1 commit
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
52 changes: 44 additions & 8 deletions pandas/core/frame.py
Original file line number Diff line number Diff line change
Expand Up @@ -5592,22 +5592,58 @@ def corrwith(self, other, axis=0, drop=False):

def count(self, axis=0, level=None, numeric_only=False):
"""
Return Series with number of non-NA/null observations over requested
axis. Works with non-floating point data as well (detects NaN and None)
Count non-NA cells for each column or row.

Return Series with number of non-NA observations over requested
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

One last change, maybe remove the first sentence since this can return a DataFrame with level.

I think just use the extended summary to say what counts as non-null data.

The values None, NaN, NaT, and optionally np.inf (depending on pandas.options.mode.use_inf_as_na) are considered NA.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do you mean the first sentence in the extended summary, i.e. :
"Return Series with number of non-NA observations over requested axis."

If I understand you right I would change the entire summary (i.e. short and extended summary) to look like the following:

        Count non-NA cells for each column or row.

        The values None, NaN, NaT, and optionally np.inf (depending on
        pandas.options.mode.use_inf_as_na) are considered NA.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah. np.inf to `numpy.inf` and single backticks around pandas.options.mode.use_inf_as_na.

axis. Works with non-floating point data as well (detects `NaN` and
`None`)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you also add NaT here (that's our missing value for datetime data)


Parameters
----------
axis : {0 or 'index', 1 or 'columns'}, default 0
0 or 'index' for row-wise, 1 or 'columns' for column-wise
level : int or level name, default None
If the axis is a MultiIndex (hierarchical), count along a
particular level, collapsing into a DataFrame
If equal 0 or 'index' counts are generated for each column.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think you can remove "equal" from this line and the next.

If equal 1 or 'columns' counts are generated for each row.
level : int or str, optional
If the axis is a `MultiIndex` (hierarchical), count along a
particular level, collapsing into a `DataFrame`.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

backticks around the `level` parameter.

A `str` specifies the level name.
numeric_only : boolean, default False
Include only float, int, boolean data
Include only `float`, `int` or `boolean` data.

Returns
-------
count : Series (or DataFrame if level specified)
Series or DataFrame
For each column/row the number of non-NA/null entries.
If level is specified returns a `DataFrame`.

See Also
--------
Series.count: number of non-NA elements in a Series
DataFrame.shape: number of DataFrame rows and columns (including NA
elements)
DataFrame.isnull: boolean same-sized DataFrame showing places of NA
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

refer to isna instead

elements

Examples
--------
>>> df=pd.DataFrame({ "Person":["John","Myla",None],
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pep8 on this example. space around =, no space after {, space after :, space after ,.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you also add an example with level=? I think you could

  1. Make the dataframe 2 items longer and repeat John an Myla.
  2. Update the df output and df.count examples
  3. show df.set_index(['Person', 'Single']).count(level='Person')

... "Age":[24.,np.nan,21.],
... "Single":[False,True,True] })
>>> df
Person Age Single
0 John 24.0 False
1 Myla NaN True
2 None 21.0 True
>>> df.count()
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

blank line between cases

Person 2
Age 2
Single 3
dtype: int64
>>> df.count(axis=1)
0 3
1 2
2 2
dtype: int64
"""
axis = self._get_axis_number(axis)
if level is not None:
Expand Down