DOC: update the docstring of pandas.DataFrame.info #20197

dcanones · 2018-03-10T16:38:08Z

Checklist for the pandas documentation sprint (ignore this if you are doing
an unrelated PR):

PR title is "DOC: update the docstring"
The validation script passes: scripts/validate_docstrings.py <your-function-or-method>
The PEP8 style check passes: git diff upstream/master -u -- "*.py" | flake8 --diff
The html version looks good: python doc/make.py --single <your-function-or-method>
It has been proofread on language by another sprint participant

Please include the output of the validation script below between the "```" ticks:

(pandas_dev) david@david-TM1604:~/repos/pandas/scripts$ python validate_docstrings.py pandas.DataFrame.info

################################################################################
###################### Docstring (pandas.DataFrame.info)  ######################
################################################################################

Concise summary of a DataFrame.

This method shows information about DataFrame type of index, columns
dtypes and non-null values and memory usage.

Parameters
----------
verbose : {None, True, False}, optional
    Whether to print the full summary.
    None follows the `display.max_info_columns` setting.
    True or False overrides the `display.max_info_columns` setting.
buf : writable buffer, defaults to sys.stdout
    Whether to pipe the output.
max_cols : int, default None
    Determines whether full summary or short summary is printed.
    None follows the `display.max_info_columns` setting.
memory_usage : boolean/string, default None
    Specifies whether total memory usage of the DataFrame
    elements (including index) should be displayed. None follows
    the `display.memory_usage` setting. True or False overrides
    the `display.memory_usage` setting. A value of 'deep' is equivalent
    of True, with deep introspection. Memory usage is shown in
    human-readable units (base-2 representation).
null_counts : boolean, default None
    Whether to show the non-null counts

    - If None, then only show if the frame is smaller than
      max_info_rows and max_info_columns.
    - If True, always show counts.
    - If False, never show counts.

Returns
-------
None: NoneType
    This method outputs a summary of a DataFrame and returns None.

Examples
--------
>>> int_values = [1, 2, 3, 4, 5]
>>> text_values = ['alpha', 'beta', 'gamma', 'delta', 'epsilon']
>>> float_values = [0.0, 0.25, 0.5, 0.75, 1.0]
>>> df = pd.DataFrame({"int_col": int_values, "text_col": text_values, "float_col": float_values})
>>> df
   int_col text_col  float_col
0        1    alpha       0.00
1        2     beta       0.25
2        3    gamma       0.50
3        4    delta       0.75
4        5  epsilon       1.00

>>> df.info(verbose=True)
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 5 entries, 0 to 4
Data columns (total 3 columns):
int_col      5 non-null int64
text_col     5 non-null object
float_col    5 non-null float64
dtypes: float64(1), int64(1), object(1)
memory usage: 200.0+ bytes

>>> df.info(verbose=False)
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 5 entries, 0 to 4
Columns: 3 entries, int_col to float_col
dtypes: float64(1), int64(1), object(1)
memory usage: 200.0+ bytes

>>> file = open("df_info.txt", "w", encoding="utf-8")
>>> df.info(buf=file)

>>> df.drop('text_col', axis=1, inplace=True)
>>> df.info(memory_usage='Deep')
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 5 entries, 0 to 4
Data columns (total 2 columns):
int_col      5 non-null int64
float_col    5 non-null float64
dtypes: float64(1), int64(1)
memory usage: 160.0 bytes

See Also
--------

describe: Generate descriptive statistics of DataFrame columns.

################################################################################
################################## Validation ##################################
################################################################################

Docstring for "pandas.DataFrame.info" correct. :)

If the validation script still gives errors, but you think there is a good reason
to deviate in this case (and there are certainly such cases), please state this
explicitly.

Checklist for other PRs (remove this part if you are doing a PR for the pandas documentation sprint):

closes #xxxx
tests added / passed
passes git diff upstream/master -u -- "*.py" | flake8 --diff
whatsnew entry

pep8speaks · 2018-03-10T16:38:11Z

Hello @dcanones! Thanks for updating the PR.

Cheers ! There are no PEP8 issues in this Pull Request. 🍻

Comment last updated on March 16, 2018 at 20:12 Hours UTC

PEP-8 fixing

ReubenMarkham

Neat!

Small improvements

hissashirocha

There are some errors in the doctring format and in the examples.

hissashirocha · 2018-03-10T16:58:31Z

pandas/core/frame.py

+        dtypes: float64(1), int64(1)
+        memory usage: 160.0 bytes
+
+        See Also


See Also section should go before the Examples section.

hissashirocha · 2018-03-10T17:10:18Z

pandas/core/frame.py

+        Returns
+        -------
+        None: NoneType
+            This method outputs a summary of a DataFrame and returns None.


It would be more clear to me if it says "This method prints" instead of "This method outputs".

hissashirocha · 2018-03-10T17:59:57Z

pandas/core/frame.py

@@ -1815,13 +1815,17 @@ def info(self, verbose=None, buf=None, max_cols=None, memory_usage=None,
        """
        Concise summary of a DataFrame.


Since info() is a method, it should start with an infinitive verb.

Can you please elaborate more this comment or provide what you think fits the best? Original docstring remains unchanged here.

It is just the recommendation from the pandas doscring guide to short summaries:
"For functions and methods, the short summary must start with an infinitive verb."
https://python-sprints.github.io/pandas/guide/pandas_docstring.html

Maybe it would go better with "Display a summary", "Present", "Print" or something like that.

hissashirocha · 2018-03-10T18:12:29Z

pandas/core/frame.py

@@ -1815,13 +1815,17 @@ def info(self, verbose=None, buf=None, max_cols=None, memory_usage=None,
        """
        Concise summary of a DataFrame.

+        This method shows information about DataFrame type of index, columns
+        dtypes, non-null values and memory usage.


I'd like to recommend some little changes in the text: "This method shows information about a DataFrame: index dtype, ..."

hissashirocha · 2018-03-10T18:30:53Z

pandas/core/frame.py

+        memory usage: 200.0+ bytes
+
+        >>> file = open("df_info.txt", "w", encoding="utf-8")
+        >>> df.info(buf=file)


We've tried to run this example but it didn't work. In order to make this work, it is necessary to give a io.StringIO() object to the 'buf' parameter. And then write the StringIO value to a file.

import io
buf = io.StringIO()
df2.info(buf=buf)
s = buf.getvalue()
file = open("df_info.txt", "w", encoding="utf-8")
file.write(s)

Done! (Which Python version did you use? It worked in Python 3.6

hissashirocha · 2018-03-10T18:41:54Z

pandas/core/frame.py

+        >>> file.close()
+
+        >>> df.drop('text_col', axis=1, inplace=True)
+        >>> df.info(memory_usage='Deep')


The correct parameter is 'deep', not 'Deep'. I'd also like to suggest a more detailed explanation of what it does.

Without 'deep'

df2.info()
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 300000 entries, 0 to 299999
Data columns (total 2 columns):
a 300000 non-null object
b 300000 non-null object
dtypes: object(2)
memory usage: 4.6+ MB <<<<<<<<

With memory_usage='deep'

df2.info(memory_usage = 'deep')
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 300000 entries, 0 to 299999
Data columns (total 2 columns):
a 300000 non-null object
b 300000 non-null object
dtypes: object(2)
memory usage: 36.7 MB <<<<<<<<<

hissashirocha · 2018-03-10T18:44:59Z

pandas/core/frame.py

+        >>> df.info(buf=file)
+        >>> file.close()
+
+        >>> df.drop('text_col', axis=1, inplace=True)


There is no reason to use df.drop in this example.

dcanones · 2018-03-10T19:02:15Z

Thanks for your feedback. We will improve this PR asap.

…

On Sat, Mar 10, 2018, 19:48 Hissashi Rocha ***@***.***> wrote: ***@***.**** commented on this pull request. There are some errors in the doctring format and in the examples. ------------------------------ In pandas/core/frame.py <#20197 (comment)>: > + memory usage: 200.0+ bytes + + >>> file = open("df_info.txt", "w", encoding="utf-8") + >>> df.info(buf=file) + + >>> df.drop('text_col', axis=1, inplace=True) + >>> df.info(memory_usage='Deep') + <class 'pandas.core.frame.DataFrame'> + RangeIndex: 5 entries, 0 to 4 + Data columns (total 2 columns): + int_col 5 non-null int64 + float_col 5 non-null float64 + dtypes: float64(1), int64(1) + memory usage: 160.0 bytes + + See Also See Also section should go before the Examples section. ------------------------------ In pandas/core/frame.py <#20197 (comment)>: > @@ -1840,6 +1844,59 @@ def info(self, verbose=None, buf=None, max_cols=None, memory_usage=None, - If True, always show counts. - If False, never show counts. + Returns + ------- + None: NoneType + This method outputs a summary of a DataFrame and returns None. It would be more clear to me if it says "This method prints" instead of "This method outputs". ------------------------------ In pandas/core/frame.py <#20197 (comment)>: > @@ -1815,13 +1815,17 @@ def info(self, verbose=None, buf=None, max_cols=None, memory_usage=None, """ Concise summary of a DataFrame. Since info() is a method, it should start with an infinitive verb. ------------------------------ In pandas/core/frame.py <#20197 (comment)>: > @@ -1815,13 +1815,17 @@ def info(self, verbose=None, buf=None, max_cols=None, memory_usage=None, """ Concise summary of a DataFrame. + This method shows information about DataFrame type of index, columns + dtypes, non-null values and memory usage. I'd like to recommend some little changes in the text: "This method shows information about a DataFrame: index dtype, ..." ------------------------------ In pandas/core/frame.py <#20197 (comment)>: > + Data columns (total 3 columns): + int_col 5 non-null int64 + text_col 5 non-null object + float_col 5 non-null float64 + dtypes: float64(1), int64(1), object(1) + memory usage: 200.0+ bytes + + >>> df.info(verbose=False) + <class 'pandas.core.frame.DataFrame'> + RangeIndex: 5 entries, 0 to 4 + Columns: 3 entries, int_col to float_col + dtypes: float64(1), int64(1), object(1) + memory usage: 200.0+ bytes + + >>> file = open("df_info.txt", "w", encoding="utf-8") + >>> df.info(buf=file) We've tried to run this example but it didn't work. In order to make this work, it is necessary to give a io.StringIO() object to the 'buf' parameter. And then write the StringIO value to a file. import io buf = io.StringIO() df2.info(buf=buf) s = buf.getvalue() file = open("df_info.txt", "w", encoding="utf-8") file.write(s) ------------------------------ In pandas/core/frame.py <#20197 (comment)>: > + dtypes: float64(1), int64(1), object(1) + memory usage: 200.0+ bytes + + >>> df.info(verbose=False) + <class 'pandas.core.frame.DataFrame'> + RangeIndex: 5 entries, 0 to 4 + Columns: 3 entries, int_col to float_col + dtypes: float64(1), int64(1), object(1) + memory usage: 200.0+ bytes + + >>> file = open("df_info.txt", "w", encoding="utf-8") + >>> df.info(buf=file) + >>> file.close() + + >>> df.drop('text_col', axis=1, inplace=True) + >>> df.info(memory_usage='Deep') The correct parameter is 'deep', not 'Deep'. I'd also like to suggest a more detailed explanation of what it does. ------------------------------ In pandas/core/frame.py <#20197 (comment)>: > + float_col 5 non-null float64 + dtypes: float64(1), int64(1), object(1) + memory usage: 200.0+ bytes + + >>> df.info(verbose=False) + <class 'pandas.core.frame.DataFrame'> + RangeIndex: 5 entries, 0 to 4 + Columns: 3 entries, int_col to float_col + dtypes: float64(1), int64(1), object(1) + memory usage: 200.0+ bytes + + >>> file = open("df_info.txt", "w", encoding="utf-8") + >>> df.info(buf=file) + >>> file.close() + + >>> df.drop('text_col', axis=1, inplace=True) There is no reason to use df.drop in this example. ------------------------------ In pandas/core/frame.py <#20197 (comment)>: > + dtypes: float64(1), int64(1), object(1) + memory usage: 200.0+ bytes + + >>> df.info(verbose=False) + <class 'pandas.core.frame.DataFrame'> + RangeIndex: 5 entries, 0 to 4 + Columns: 3 entries, int_col to float_col + dtypes: float64(1), int64(1), object(1) + memory usage: 200.0+ bytes + + >>> file = open("df_info.txt", "w", encoding="utf-8") + >>> df.info(buf=file) + >>> file.close() + + >>> df.drop('text_col', axis=1, inplace=True) + >>> df.info(memory_usage='Deep') Without 'deep' df2.info() <class 'pandas.core.frame.DataFrame'> RangeIndex: 300000 entries, 0 to 299999 Data columns (total 2 columns): a 300000 non-null object b 300000 non-null object dtypes: object(2) memory usage: 4.6+ MB <<<<<<<< With memory_usage='deep' df2.info(memory_usage = 'deep') <class 'pandas.core.frame.DataFrame'> RangeIndex: 300000 entries, 0 to 299999 Data columns (total 2 columns): a 300000 non-null object b 300000 non-null object dtypes: object(2) memory usage: 36.7 MB <<<<<<<<< — You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub <#20197 (review)>, or mute the thread <https://github.com/notifications/unsubscribe-auth/AGNsMwRzzAY_HaeENL0xT7cyrU-5xYKQks5tdB_sgaJpZM4SlV4w> .

Reviewers feedback

Improvements and extended examples introductions

Added reviewer feedback.

dcanones · 2018-03-16T19:36:49Z

Added latest changes. I think it is ready to merge.

TomAugspurger · 2018-03-16T20:09:38Z

Pushed a few changes (lines too long, some rewording of parameters).

TomAugspurger · 2018-03-16T20:13:56Z

pandas/core/frame.py

+
+        Returns
+        -------
+        None


Typically, Returns is omitted for methods with no return value. In this case I think it's good to explicitly say that there's no return value.

jorisvandenbossche

Thanks, nice docstring update!

DOC: Improved the docstring of pandas.DataFrame.info

cb0cd65

DOC: update the docstring of pandas.DataFrame.info

06fff94

PEP-8 fixing

ReubenMarkham approved these changes Mar 10, 2018

View reviewed changes

DOC: Improved the docstring of pandas.DataFrame.info

c666c7a

Small improvements

hissashirocha reviewed Mar 10, 2018

View reviewed changes

David Adrián Cañones Castellano added 2 commits March 10, 2018 21:07

DOC: update the docstring of pandas.DataFrame.info

39bf30a

Reviewers feedback

DOC: update the docstring of pandas.DataFrame.info

3da09a9

Improvements and extended examples introductions

jorisvandenbossche added the Docs label Mar 12, 2018

DOC: update the docstring of pandas.DataFrame.info

7bf0f11

Added reviewer feedback.

Updates

0c2919d

TomAugspurger added 2 commits March 16, 2018 15:10

Wording

65439a1

More wording

3e785f8

TomAugspurger reviewed Mar 16, 2018

View reviewed changes

TomAugspurger approved these changes Mar 16, 2018

View reviewed changes

jorisvandenbossche added this to the 0.23.0 milestone Mar 17, 2018

jorisvandenbossche approved these changes Mar 17, 2018

View reviewed changes

jorisvandenbossche merged commit 49eb22d into pandas-dev:master Mar 17, 2018

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

DOC: update the docstring of pandas.DataFrame.info #20197

DOC: update the docstring of pandas.DataFrame.info #20197

dcanones commented Mar 10, 2018 •

edited

Loading

pep8speaks commented Mar 10, 2018 •

edited

Loading

ReubenMarkham left a comment

hissashirocha left a comment

hissashirocha Mar 10, 2018

dcanones Mar 10, 2018

hissashirocha Mar 10, 2018

dcanones Mar 10, 2018

hissashirocha Mar 10, 2018

dcanones Mar 10, 2018

hissashirocha Mar 11, 2018

hissashirocha Mar 10, 2018

dcanones Mar 10, 2018

hissashirocha Mar 10, 2018

dcanones Mar 10, 2018

hissashirocha Mar 10, 2018

hissashirocha Mar 10, 2018

dcanones Mar 10, 2018

hissashirocha Mar 10, 2018

dcanones Mar 10, 2018

dcanones commented Mar 10, 2018 via email

dcanones commented Mar 16, 2018

TomAugspurger commented Mar 16, 2018

TomAugspurger Mar 16, 2018

jorisvandenbossche left a comment

		@@ -1815,13 +1815,17 @@ def info(self, verbose=None, buf=None, max_cols=None, memory_usage=None,
		"""
		Concise summary of a DataFrame.

DOC: update the docstring of pandas.DataFrame.info #20197

DOC: update the docstring of pandas.DataFrame.info #20197

Conversation

dcanones commented Mar 10, 2018 • edited Loading

pep8speaks commented Mar 10, 2018 • edited Loading

Comment last updated on March 16, 2018 at 20:12 Hours UTC

ReubenMarkham left a comment

Choose a reason for hiding this comment

hissashirocha left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

dcanones commented Mar 10, 2018 via email

dcanones commented Mar 16, 2018

TomAugspurger commented Mar 16, 2018

Choose a reason for hiding this comment

jorisvandenbossche left a comment

Choose a reason for hiding this comment

dcanones commented Mar 10, 2018 •

edited

Loading

pep8speaks commented Mar 10, 2018 •

edited

Loading