-
-
Notifications
You must be signed in to change notification settings - Fork 18.4k
DOC: Improve the docstring of DataFrame.describe() #20222
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
DOC: Improve the docstring of DataFrame.describe() #20222
Conversation
Hello @nehiljain! Thanks for updating the PR. Cheers ! There are no PEP8 issues in this Pull Request. 🍻 Comment last updated on July 08, 2018 at 04:52 Hours UTC |
pandas/core/generic.py
Outdated
top f NaN c | ||
freq 1 NaN 1 | ||
mean NaN 2.0 NaN | ||
std NaN 1.0 NaN |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
is there a reason these are being changed? they are already in alphabetical order. I suppose you could supply columns on construction to guarantee the order.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
thanks for the suggestion. Updated it with your recommendation.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good, couple of comments.
The constructor of the DataFrame with 3 columns has some non PEP-8 spaces. Besides that, it'd be good to create it in a way that guarantees the order of columns, as Jeff says.
The Returns method should be type+description, not name+type.
pandas/core/generic.py
Outdated
DataFrame.mean | ||
DataFrame.std | ||
DataFrame.select_dtypes | ||
DataFrame.count : Count number of non-NA/null observations |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
See Also goes before Examples
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
fixed thanks
…ame_describe * upstream/master: (25 commits) DOC: Improved pandas.plotting.bootstrap_plot docstring (pandas-dev#20166) DOC: update the Index.get_values docstring (pandas-dev#20231) DOC: update the pandas.DataFrame.all docstring (pandas-dev#20216) DOC: update the Series.view docstring (pandas-dev#20220) DOC: update the docstring of pandas.DataFrame.from_dict (pandas-dev#20259) DOC: add docstring for Index.get_duplicates (pandas-dev#20223) Docstring pandas.series.diff (pandas-dev#20238) DOC: update `pandas/core/ops.py` docstring template to accept examples (pandas-dev#20246) DOC: update the DataFrame.iat[] docstring (pandas-dev#20219) DOC: update the pandas.DataFrame.diff docstring (pandas-dev#20227) DOC: pd.core.window.Expanding.kurt docstring (split from pd.core.Rolling.kurt) (pandas-dev#20064) DOC: update the pandas.date_range() docstring (pandas-dev#20143) DOC: update DataFrame.to_records (pandas-dev#20191) DOC: Improved the docstring of pandas.plotting.radviz (pandas-dev#20169) DOC: Update pandas.DataFrame.tail docstring (pandas-dev#20225) DOC: update the DataFrame.cov docstring (pandas-dev#20245) DOC: update pandas.DataFrame.head docstring (pandas-dev#20262) DOC: Improve pandas.Series.plot.kde docstring and kwargs rewording for whole file (pandas-dev#20041) DOC: update the DataFrame.head() docstring (pandas-dev#20206) DOC: update the Index.shift docstring (pandas-dev#20192) ...
Codecov Report
@@ Coverage Diff @@
## master #20222 +/- ##
=======================================
Coverage 91.95% 91.95%
=======================================
Files 160 160
Lines 49858 49858
=======================================
Hits 45845 45845
Misses 4013 4013
Continue to review full report at Codecov.
|
It'd be great if you could also change the I think for the type we're using Besides that, lgtm. |
…ocumenation conventions
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
lgtm
@jreback please take another look |
@@ -7393,7 +7405,7 @@ def describe(self, percentiles=None, include=None, exclude=None): | |||
Excluding object columns from a ``DataFrame`` description. | |||
|
|||
>>> df.describe(exclude=[np.object]) | |||
categorical numeric | |||
categorical numeric |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
When running the validation script, I occasionally get a failure
Line 210, in pandas.DataFrame.describe
Failed example:
df.describe(exclude=[np.number])
Expected:
categorical object
count 3 3
unique 3 3
top f c
freq 1 1
Got:
categorical object
count 3 3
unique 3 3
top f a
freq 1 1
Did you see this at all? This likely is an issue in the method itself, and not the docstring.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yeah i do see this error but its flaky.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
To be clear, it's probably some kind of non-stable sorting inside the describe method, and nothing wrong with the docstring. It may be best to just include the docstring, and open a new issue.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The strange thing is that just doing
pd.DataFrame({"A": pd.Categorical(['d', 'e', 'f']), "B": ['a', 'b', 'c'], 'C': [1, 2, 3]}).describe(exclude=['number'])
seems deterministic.
pandas/core/generic.py
Outdated
@@ -7305,9 +7317,9 @@ def describe(self, percentiles=None, include=None, exclude=None): | |||
Describing a ``DataFrame``. By default only numeric fields | |||
are returned. | |||
|
|||
>>> df = pd.DataFrame({ 'object': ['a', 'b', 'c'], | |||
>>> df = pd.DataFrame({ 'categorical': pd.Categorical(['d','e','f']), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
PEP8 formatting here. No space after the {
, spaces after the ,
in the Categorical.
Only concern is #20222 (comment), which isn't an issue with the docstring. LGTM otherwise. |
…ame_describe * upstream/master: (158 commits) Add link to "Craft Minimal Bug Report" blogpost (pandas-dev#20431) BUG: fixed json_normalize for subrecords with NoneTypes (pandas-dev#20030) (pandas-dev#20399) BUG: ExtensionArray.fillna for scalar values (pandas-dev#20412) DOC" update the Pandas core window rolling count docstring" (pandas-dev#20264) DOC: update the pandas.DataFrame.plot.hist docstring (pandas-dev#20155) DOC: Only use ~ in class links to hide prefixes. (pandas-dev#20402) Bug: Allow np.timedelta64 objects to index TimedeltaIndex (pandas-dev#20408) DOC: add disallowing of Series construction of len-1 list with index to whatsnew (pandas-dev#20392) MAINT: Remove weird pd file DOC: update the Index.isin docstring (pandas-dev#20249) BUG: Handle all-NA blocks in concat (pandas-dev#20382) DOC: update the pandas.core.resample.Resampler.fillna docstring (pandas-dev#20379) BUG: Don't raise exceptions splitting a blank string (pandas-dev#20067) DOC: update the pandas.DataFrame.cummax docstring (pandas-dev#20336) DOC: update the pandas.core.window.x.mean docstring (pandas-dev#20265) DOC: update the api.types.is_number docstring (pandas-dev#20196) Fix linter (pandas-dev#20389) DOC: Improved the docstring of pandas.Series.dt.to_pytimedelta (pandas-dev#20142) DOC: update the pandas.Series.dt.is_month_end docstring (pandas-dev#20181) DOC: update the window.Rolling.min docstring (pandas-dev#20263) ...
thanks @nehiljain and @mroeschke for the fixup! |
Checklist for the pandas documentation sprint (ignore this if you are doing
an unrelated PR):
scripts/validate_docstrings.py <your-function-or-method>
git diff upstream/master -u -- "*.py" | flake8 --diff
python doc/make.py --single <your-function-or-method>
Please include the output of the validation script below between the "```" ticks:
If the validation script still gives errors, but you think there is a good reason
to deviate in this case (and there are certainly such cases), please state this
explicitly.