-
-
Notifications
You must be signed in to change notification settings - Fork 18.4k
DOC: update the DataFrame.count docstring #20221
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
DOC: update the DataFrame.count docstring #20221
Conversation
pandas/core/frame.py
Outdated
|
||
Return Series with number of non-NA observations over requested | ||
axis. Works with non-floating point data as well (detects `NaN` and | ||
`None`) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could you also add NaT
here (that's our missing value for datetime data)
pandas/core/frame.py
Outdated
level : int or level name, default None | ||
If the axis is a MultiIndex (hierarchical), count along a | ||
particular level, collapsing into a DataFrame | ||
If equal 0 or 'index' counts are generated for each column. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think you can remove "equal" from this line and the next.
pandas/core/frame.py
Outdated
|
||
Examples | ||
-------- | ||
>>> df=pd.DataFrame({ "Person":["John","Myla",None], |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pep8 on this example. space around =
, no space after {
, space after :
, space after ,
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could you also add an example with level=
? I think you could
- Make the dataframe 2 items longer and repeat John an Myla.
- Update the
df
output anddf.count
examples - show
df.set_index(['Person', 'Single']).count(level='Person')
pandas/core/frame.py
Outdated
Series.count: number of non-NA elements in a Series | ||
DataFrame.shape: number of DataFrame rows and columns (including NA | ||
elements) | ||
DataFrame.isnull: boolean same-sized DataFrame showing places of NA |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
refer to isna instead
pandas/core/frame.py
Outdated
2 None 21.0 True | ||
3 John 33.0 True | ||
4 Myla 26.0 False | ||
>>> df.count() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
blank line between cases
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you paste the output of the doc validation script again?
pandas/core/frame.py
Outdated
|
||
Return Series with number of non-NA observations over requested | ||
axis. Works with non-floating point data as well (detects `None`, | ||
`NaN` and `NaT`) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
End with a .
pandas/core/frame.py
Outdated
If 1 or 'columns' counts are generated for each **row**. | ||
level : int or str, optional | ||
If the axis is a `MultiIndex` (hierarchical), count along a | ||
particular level, collapsing into a `DataFrame`. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
backticks around the `level` parameter.
|
pandas/core/frame.py
Outdated
axis. Works with non-floating point data as well (detects NaN and None) | ||
Count non-NA cells for each column or row. | ||
|
||
Return Series with number of non-NA observations over requested |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
One last change, maybe remove the first sentence since this can return a DataFrame with level
.
I think just use the extended summary to say what counts as non-null data.
The values None
, NaN
, NaT
, and optionally np.inf
(depending on pandas.options.mode.use_inf_as_na
) are considered NA.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do you mean the first sentence in the extended summary, i.e. :
"Return Series with number of non-NA observations over requested axis."
If I understand you right I would change the entire summary (i.e. short and extended summary) to look like the following:
Count non-NA cells for each column or row.
The values None, NaN, NaT, and optionally np.inf (depending on
pandas.options.mode.use_inf_as_na) are considered NA.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah. np.inf to `numpy.inf` and single backticks around pandas.options.mode.use_inf_as_na.
|
||
>>> df = pd.DataFrame({"Person": | ||
... ["John", "Myla", None, "John", "Myla"], | ||
... "Age": [24., np.nan, 21., 33, 26], |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
PEP8: indendt one more space. smae with line below.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For me flake complains if I change that. on my system flake doesn't check the examples, so I copy it in the code:
df = pd.DataFrame({"Person":
["John", "Myla", None, "John", "Myla"],
"Age": [24., np.nan, 21., 33, 26],
"Single": [False, True, True, True, False]})
df
If I have it like it like this flake only complains about the pd not being defined:
pandas/core/frame.py:5672:14: F821 undefined name 'pd'
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sorry I misread.
|
Thanks @joders! |
thanks for providing pandas |
Checklist for the pandas documentation sprint (ignore this if you are doing
an unrelated PR):
scripts/validate_docstrings.py <your-function-or-method>
git diff upstream/master -u -- "*.py" | flake8 --diff
python doc/make.py --single <your-function-or-method>
Please include the output of the validation script below between the "```" ticks:
If the validation script still gives errors, but you think there is a good reason
to deviate in this case (and there are certainly such cases), please state this
explicitly.