Skip to content

DOC: update the docstring of pandas.DataFrame.info #20197

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 9 commits into from
Mar 17, 2018
58 changes: 58 additions & 0 deletions pandas/core/frame.py
Original file line number Diff line number Diff line change
Expand Up @@ -1815,13 +1815,17 @@ def info(self, verbose=None, buf=None, max_cols=None, memory_usage=None,
"""
Concise summary of a DataFrame.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since info() is a method, it should start with an infinitive verb.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you please elaborate more this comment or provide what you think fits the best? Original docstring remains unchanged here.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It is just the recommendation from the pandas doscring guide to short summaries:
"For functions and methods, the short summary must start with an infinitive verb."
https://python-sprints.github.io/pandas/guide/pandas_docstring.html

Maybe it would go better with "Display a summary", "Present", "Print" or something like that.


This method shows information about DataFrame type of index, columns
dtypes, non-null values and memory usage.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd like to recommend some little changes in the text: "This method shows information about a DataFrame: index dtype, ..."

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done!


Parameters
----------
verbose : {None, True, False}, optional
Whether to print the full summary.
None follows the `display.max_info_columns` setting.
True or False overrides the `display.max_info_columns` setting.
buf : writable buffer, defaults to sys.stdout
Whether to pipe the output.
max_cols : int, default None
Determines whether full summary or short summary is printed.
None follows the `display.max_info_columns` setting.
Expand All @@ -1840,6 +1844,60 @@ def info(self, verbose=None, buf=None, max_cols=None, memory_usage=None,
- If True, always show counts.
- If False, never show counts.

Returns
-------
None: NoneType
This method outputs a summary of a DataFrame and returns None.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It would be more clear to me if it says "This method prints" instead of "This method outputs".

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done!


Examples
--------
>>> int_values = [1, 2, 3, 4, 5]
>>> text_values = ['alpha', 'beta', 'gamma', 'delta', 'epsilon']
>>> float_values = [0.0, 0.25, 0.5, 0.75, 1.0]
>>> df = pd.DataFrame({"int_col": int_values, "text_col": text_values, "float_col": float_values})
>>> df
int_col text_col float_col
0 1 alpha 0.00
1 2 beta 0.25
2 3 gamma 0.50
3 4 delta 0.75
4 5 epsilon 1.00

>>> df.info(verbose=True)
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 5 entries, 0 to 4
Data columns (total 3 columns):
int_col 5 non-null int64
text_col 5 non-null object
float_col 5 non-null float64
dtypes: float64(1), int64(1), object(1)
memory usage: 200.0+ bytes

>>> df.info(verbose=False)
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 5 entries, 0 to 4
Columns: 3 entries, int_col to float_col
dtypes: float64(1), int64(1), object(1)
memory usage: 200.0+ bytes

>>> file = open("df_info.txt", "w", encoding="utf-8")
>>> df.info(buf=file)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We've tried to run this example but it didn't work. In order to make this work, it is necessary to give a io.StringIO() object to the 'buf' parameter. And then write the StringIO value to a file.

import io
buf = io.StringIO()
df2.info(buf=buf)
s = buf.getvalue()
file = open("df_info.txt", "w", encoding="utf-8")
file.write(s)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done! (Which Python version did you use? It worked in Python 3.6

>>> file.close()

>>> df.drop('text_col', axis=1, inplace=True)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There is no reason to use df.drop in this example.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done!

>>> df.info(memory_usage='Deep')
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The correct parameter is 'deep', not 'Deep'. I'd also like to suggest a more detailed explanation of what it does.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Without 'deep'

df2.info()
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 300000 entries, 0 to 299999
Data columns (total 2 columns):
a 300000 non-null object
b 300000 non-null object
dtypes: object(2)
memory usage: 4.6+ MB <<<<<<<<

With memory_usage='deep'

df2.info(memory_usage = 'deep')
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 300000 entries, 0 to 299999
Data columns (total 2 columns):
a 300000 non-null object
b 300000 non-null object
dtypes: object(2)
memory usage: 36.7 MB <<<<<<<<<

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done!

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 5 entries, 0 to 4
Data columns (total 2 columns):
int_col 5 non-null int64
float_col 5 non-null float64
dtypes: float64(1), int64(1)
memory usage: 160.0 bytes

See Also
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

See Also section should go before the Examples section.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done!

--------

describe: Generate descriptive statistics of DataFrame columns.
"""
from pandas.io.formats.format import _put_lines

Expand Down