-
-
Notifications
You must be signed in to change notification settings - Fork 18.4k
DOC: update the docstring of pandas.DataFrame.info #20197
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
DOC: update the docstring of pandas.DataFrame.info #20197
Conversation
Hello @dcanones! Thanks for updating the PR. Cheers ! There are no PEP8 issues in this Pull Request. 🍻 Comment last updated on March 16, 2018 at 20:12 Hours UTC |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Neat!
Small improvements
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There are some errors in the doctring format and in the examples.
pandas/core/frame.py
Outdated
dtypes: float64(1), int64(1) | ||
memory usage: 160.0 bytes | ||
|
||
See Also |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
See Also section should go before the Examples section.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done!
pandas/core/frame.py
Outdated
Returns | ||
------- | ||
None: NoneType | ||
This method outputs a summary of a DataFrame and returns None. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It would be more clear to me if it says "This method prints" instead of "This method outputs".
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done!
pandas/core/frame.py
Outdated
@@ -1815,13 +1815,17 @@ def info(self, verbose=None, buf=None, max_cols=None, memory_usage=None, | |||
""" | |||
Concise summary of a DataFrame. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Since info() is a method, it should start with an infinitive verb.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you please elaborate more this comment or provide what you think fits the best? Original docstring remains unchanged here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It is just the recommendation from the pandas doscring guide to short summaries:
"For functions and methods, the short summary must start with an infinitive verb."
https://python-sprints.github.io/pandas/guide/pandas_docstring.html
Maybe it would go better with "Display a summary", "Present", "Print" or something like that.
pandas/core/frame.py
Outdated
@@ -1815,13 +1815,17 @@ def info(self, verbose=None, buf=None, max_cols=None, memory_usage=None, | |||
""" | |||
Concise summary of a DataFrame. | |||
|
|||
This method shows information about DataFrame type of index, columns | |||
dtypes, non-null values and memory usage. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'd like to recommend some little changes in the text: "This method shows information about a DataFrame: index dtype, ..."
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done!
pandas/core/frame.py
Outdated
memory usage: 200.0+ bytes | ||
|
||
>>> file = open("df_info.txt", "w", encoding="utf-8") | ||
>>> df.info(buf=file) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We've tried to run this example but it didn't work. In order to make this work, it is necessary to give a io.StringIO() object to the 'buf' parameter. And then write the StringIO value to a file.
import io
buf = io.StringIO()
df2.info(buf=buf)
s = buf.getvalue()
file = open("df_info.txt", "w", encoding="utf-8")
file.write(s)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done! (Which Python version did you use? It worked in Python 3.6
pandas/core/frame.py
Outdated
>>> file.close() | ||
|
||
>>> df.drop('text_col', axis=1, inplace=True) | ||
>>> df.info(memory_usage='Deep') |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The correct parameter is 'deep', not 'Deep'. I'd also like to suggest a more detailed explanation of what it does.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Without 'deep'
df2.info()
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 300000 entries, 0 to 299999
Data columns (total 2 columns):
a 300000 non-null object
b 300000 non-null object
dtypes: object(2)
memory usage: 4.6+ MB <<<<<<<<
With memory_usage='deep'
df2.info(memory_usage = 'deep')
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 300000 entries, 0 to 299999
Data columns (total 2 columns):
a 300000 non-null object
b 300000 non-null object
dtypes: object(2)
memory usage: 36.7 MB <<<<<<<<<
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done!
pandas/core/frame.py
Outdated
>>> df.info(buf=file) | ||
>>> file.close() | ||
|
||
>>> df.drop('text_col', axis=1, inplace=True) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There is no reason to use df.drop in this example.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done!
Thanks for your feedback. We will improve this PR asap.
…On Sat, Mar 10, 2018, 19:48 Hissashi Rocha ***@***.***> wrote:
***@***.**** commented on this pull request.
There are some errors in the doctring format and in the examples.
------------------------------
In pandas/core/frame.py
<#20197 (comment)>:
> + memory usage: 200.0+ bytes
+
+ >>> file = open("df_info.txt", "w", encoding="utf-8")
+ >>> df.info(buf=file)
+
+ >>> df.drop('text_col', axis=1, inplace=True)
+ >>> df.info(memory_usage='Deep')
+ <class 'pandas.core.frame.DataFrame'>
+ RangeIndex: 5 entries, 0 to 4
+ Data columns (total 2 columns):
+ int_col 5 non-null int64
+ float_col 5 non-null float64
+ dtypes: float64(1), int64(1)
+ memory usage: 160.0 bytes
+
+ See Also
See Also section should go before the Examples section.
------------------------------
In pandas/core/frame.py
<#20197 (comment)>:
> @@ -1840,6 +1844,59 @@ def info(self, verbose=None, buf=None, max_cols=None, memory_usage=None,
- If True, always show counts.
- If False, never show counts.
+ Returns
+ -------
+ None: NoneType
+ This method outputs a summary of a DataFrame and returns None.
It would be more clear to me if it says "This method prints" instead of
"This method outputs".
------------------------------
In pandas/core/frame.py
<#20197 (comment)>:
> @@ -1815,13 +1815,17 @@ def info(self, verbose=None, buf=None, max_cols=None, memory_usage=None,
"""
Concise summary of a DataFrame.
Since info() is a method, it should start with an infinitive verb.
------------------------------
In pandas/core/frame.py
<#20197 (comment)>:
> @@ -1815,13 +1815,17 @@ def info(self, verbose=None, buf=None, max_cols=None, memory_usage=None,
"""
Concise summary of a DataFrame.
+ This method shows information about DataFrame type of index, columns
+ dtypes, non-null values and memory usage.
I'd like to recommend some little changes in the text: "This method shows
information about a DataFrame: index dtype, ..."
------------------------------
In pandas/core/frame.py
<#20197 (comment)>:
> + Data columns (total 3 columns):
+ int_col 5 non-null int64
+ text_col 5 non-null object
+ float_col 5 non-null float64
+ dtypes: float64(1), int64(1), object(1)
+ memory usage: 200.0+ bytes
+
+ >>> df.info(verbose=False)
+ <class 'pandas.core.frame.DataFrame'>
+ RangeIndex: 5 entries, 0 to 4
+ Columns: 3 entries, int_col to float_col
+ dtypes: float64(1), int64(1), object(1)
+ memory usage: 200.0+ bytes
+
+ >>> file = open("df_info.txt", "w", encoding="utf-8")
+ >>> df.info(buf=file)
We've tried to run this example but it didn't work. In order to make this
work, it is necessary to give a io.StringIO() object to the 'buf'
parameter. And then write the StringIO value to a file.
import io
buf = io.StringIO()
df2.info(buf=buf)
s = buf.getvalue()
file = open("df_info.txt", "w", encoding="utf-8")
file.write(s)
------------------------------
In pandas/core/frame.py
<#20197 (comment)>:
> + dtypes: float64(1), int64(1), object(1)
+ memory usage: 200.0+ bytes
+
+ >>> df.info(verbose=False)
+ <class 'pandas.core.frame.DataFrame'>
+ RangeIndex: 5 entries, 0 to 4
+ Columns: 3 entries, int_col to float_col
+ dtypes: float64(1), int64(1), object(1)
+ memory usage: 200.0+ bytes
+
+ >>> file = open("df_info.txt", "w", encoding="utf-8")
+ >>> df.info(buf=file)
+ >>> file.close()
+
+ >>> df.drop('text_col', axis=1, inplace=True)
+ >>> df.info(memory_usage='Deep')
The correct parameter is 'deep', not 'Deep'. I'd also like to suggest a
more detailed explanation of what it does.
------------------------------
In pandas/core/frame.py
<#20197 (comment)>:
> + float_col 5 non-null float64
+ dtypes: float64(1), int64(1), object(1)
+ memory usage: 200.0+ bytes
+
+ >>> df.info(verbose=False)
+ <class 'pandas.core.frame.DataFrame'>
+ RangeIndex: 5 entries, 0 to 4
+ Columns: 3 entries, int_col to float_col
+ dtypes: float64(1), int64(1), object(1)
+ memory usage: 200.0+ bytes
+
+ >>> file = open("df_info.txt", "w", encoding="utf-8")
+ >>> df.info(buf=file)
+ >>> file.close()
+
+ >>> df.drop('text_col', axis=1, inplace=True)
There is no reason to use df.drop in this example.
------------------------------
In pandas/core/frame.py
<#20197 (comment)>:
> + dtypes: float64(1), int64(1), object(1)
+ memory usage: 200.0+ bytes
+
+ >>> df.info(verbose=False)
+ <class 'pandas.core.frame.DataFrame'>
+ RangeIndex: 5 entries, 0 to 4
+ Columns: 3 entries, int_col to float_col
+ dtypes: float64(1), int64(1), object(1)
+ memory usage: 200.0+ bytes
+
+ >>> file = open("df_info.txt", "w", encoding="utf-8")
+ >>> df.info(buf=file)
+ >>> file.close()
+
+ >>> df.drop('text_col', axis=1, inplace=True)
+ >>> df.info(memory_usage='Deep')
Without 'deep'
df2.info()
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 300000 entries, 0 to 299999
Data columns (total 2 columns):
a 300000 non-null object
b 300000 non-null object
dtypes: object(2)
memory usage: 4.6+ MB <<<<<<<<
With memory_usage='deep'
df2.info(memory_usage = 'deep')
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 300000 entries, 0 to 299999
Data columns (total 2 columns):
a 300000 non-null object
b 300000 non-null object
dtypes: object(2)
memory usage: 36.7 MB <<<<<<<<<
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
<#20197 (review)>,
or mute the thread
<https://github.com/notifications/unsubscribe-auth/AGNsMwRzzAY_HaeENL0xT7cyrU-5xYKQks5tdB_sgaJpZM4SlV4w>
.
|
Reviewers feedback
Improvements and extended examples introductions
Added reviewer feedback.
Added latest changes. I think it is ready to merge. |
Pushed a few changes (lines too long, some rewording of parameters). |
|
||
Returns | ||
------- | ||
None |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Typically, Returns
is omitted for methods with no return value. In this case I think it's good to explicitly say that there's no return value.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks, nice docstring update!
Checklist for the pandas documentation sprint (ignore this if you are doing
an unrelated PR):
scripts/validate_docstrings.py <your-function-or-method>
git diff upstream/master -u -- "*.py" | flake8 --diff
python doc/make.py --single <your-function-or-method>
Please include the output of the validation script below between the "```" ticks:
If the validation script still gives errors, but you think there is a good reason
to deviate in this case (and there are certainly such cases), please state this
explicitly.
Checklist for other PRs (remove this part if you are doing a PR for the pandas documentation sprint):
git diff upstream/master -u -- "*.py" | flake8 --diff