Skip to content

DOC: update the docstring of pandas.DataFrame.info #20197

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 9 commits into from
Mar 17, 2018
Merged

DOC: update the docstring of pandas.DataFrame.info #20197

merged 9 commits into from
Mar 17, 2018

Conversation

dcanones
Copy link
Contributor

@dcanones dcanones commented Mar 10, 2018

Checklist for the pandas documentation sprint (ignore this if you are doing
an unrelated PR):

  • PR title is "DOC: update the docstring"
  • The validation script passes: scripts/validate_docstrings.py <your-function-or-method>
  • The PEP8 style check passes: git diff upstream/master -u -- "*.py" | flake8 --diff
  • The html version looks good: python doc/make.py --single <your-function-or-method>
  • It has been proofread on language by another sprint participant

Please include the output of the validation script below between the "```" ticks:

(pandas_dev) david@david-TM1604:~/repos/pandas/scripts$ python validate_docstrings.py pandas.DataFrame.info

################################################################################
###################### Docstring (pandas.DataFrame.info)  ######################
################################################################################

Concise summary of a DataFrame.

This method shows information about DataFrame type of index, columns
dtypes and non-null values and memory usage.

Parameters
----------
verbose : {None, True, False}, optional
    Whether to print the full summary.
    None follows the `display.max_info_columns` setting.
    True or False overrides the `display.max_info_columns` setting.
buf : writable buffer, defaults to sys.stdout
    Whether to pipe the output.
max_cols : int, default None
    Determines whether full summary or short summary is printed.
    None follows the `display.max_info_columns` setting.
memory_usage : boolean/string, default None
    Specifies whether total memory usage of the DataFrame
    elements (including index) should be displayed. None follows
    the `display.memory_usage` setting. True or False overrides
    the `display.memory_usage` setting. A value of 'deep' is equivalent
    of True, with deep introspection. Memory usage is shown in
    human-readable units (base-2 representation).
null_counts : boolean, default None
    Whether to show the non-null counts

    - If None, then only show if the frame is smaller than
      max_info_rows and max_info_columns.
    - If True, always show counts.
    - If False, never show counts.

Returns
-------
None: NoneType
    This method outputs a summary of a DataFrame and returns None.

Examples
--------
>>> int_values = [1, 2, 3, 4, 5]
>>> text_values = ['alpha', 'beta', 'gamma', 'delta', 'epsilon']
>>> float_values = [0.0, 0.25, 0.5, 0.75, 1.0]
>>> df = pd.DataFrame({"int_col": int_values, "text_col": text_values, "float_col": float_values})
>>> df
   int_col text_col  float_col
0        1    alpha       0.00
1        2     beta       0.25
2        3    gamma       0.50
3        4    delta       0.75
4        5  epsilon       1.00

>>> df.info(verbose=True)
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 5 entries, 0 to 4
Data columns (total 3 columns):
int_col      5 non-null int64
text_col     5 non-null object
float_col    5 non-null float64
dtypes: float64(1), int64(1), object(1)
memory usage: 200.0+ bytes

>>> df.info(verbose=False)
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 5 entries, 0 to 4
Columns: 3 entries, int_col to float_col
dtypes: float64(1), int64(1), object(1)
memory usage: 200.0+ bytes

>>> file = open("df_info.txt", "w", encoding="utf-8")
>>> df.info(buf=file)

>>> df.drop('text_col', axis=1, inplace=True)
>>> df.info(memory_usage='Deep')
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 5 entries, 0 to 4
Data columns (total 2 columns):
int_col      5 non-null int64
float_col    5 non-null float64
dtypes: float64(1), int64(1)
memory usage: 160.0 bytes

See Also
--------

describe: Generate descriptive statistics of DataFrame columns.

################################################################################
################################## Validation ##################################
################################################################################

Docstring for "pandas.DataFrame.info" correct. :)

If the validation script still gives errors, but you think there is a good reason
to deviate in this case (and there are certainly such cases), please state this
explicitly.

Checklist for other PRs (remove this part if you are doing a PR for the pandas documentation sprint):

  • closes #xxxx
  • tests added / passed
  • passes git diff upstream/master -u -- "*.py" | flake8 --diff
  • whatsnew entry

@pep8speaks
Copy link

pep8speaks commented Mar 10, 2018

Hello @dcanones! Thanks for updating the PR.

Cheers ! There are no PEP8 issues in this Pull Request. 🍻

Comment last updated on March 16, 2018 at 20:12 Hours UTC

Copy link

@ReubenMarkham ReubenMarkham left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Neat!

Copy link
Contributor

@hissashirocha hissashirocha left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There are some errors in the doctring format and in the examples.

dtypes: float64(1), int64(1)
memory usage: 160.0 bytes

See Also
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

See Also section should go before the Examples section.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done!

Returns
-------
None: NoneType
This method outputs a summary of a DataFrame and returns None.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It would be more clear to me if it says "This method prints" instead of "This method outputs".

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done!

@@ -1815,13 +1815,17 @@ def info(self, verbose=None, buf=None, max_cols=None, memory_usage=None,
"""
Concise summary of a DataFrame.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since info() is a method, it should start with an infinitive verb.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you please elaborate more this comment or provide what you think fits the best? Original docstring remains unchanged here.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It is just the recommendation from the pandas doscring guide to short summaries:
"For functions and methods, the short summary must start with an infinitive verb."
https://python-sprints.github.io/pandas/guide/pandas_docstring.html

Maybe it would go better with "Display a summary", "Present", "Print" or something like that.

@@ -1815,13 +1815,17 @@ def info(self, verbose=None, buf=None, max_cols=None, memory_usage=None,
"""
Concise summary of a DataFrame.

This method shows information about DataFrame type of index, columns
dtypes, non-null values and memory usage.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd like to recommend some little changes in the text: "This method shows information about a DataFrame: index dtype, ..."

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done!

memory usage: 200.0+ bytes

>>> file = open("df_info.txt", "w", encoding="utf-8")
>>> df.info(buf=file)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We've tried to run this example but it didn't work. In order to make this work, it is necessary to give a io.StringIO() object to the 'buf' parameter. And then write the StringIO value to a file.

import io
buf = io.StringIO()
df2.info(buf=buf)
s = buf.getvalue()
file = open("df_info.txt", "w", encoding="utf-8")
file.write(s)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done! (Which Python version did you use? It worked in Python 3.6

>>> file.close()

>>> df.drop('text_col', axis=1, inplace=True)
>>> df.info(memory_usage='Deep')
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The correct parameter is 'deep', not 'Deep'. I'd also like to suggest a more detailed explanation of what it does.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Without 'deep'

df2.info()
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 300000 entries, 0 to 299999
Data columns (total 2 columns):
a 300000 non-null object
b 300000 non-null object
dtypes: object(2)
memory usage: 4.6+ MB <<<<<<<<

With memory_usage='deep'

df2.info(memory_usage = 'deep')
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 300000 entries, 0 to 299999
Data columns (total 2 columns):
a 300000 non-null object
b 300000 non-null object
dtypes: object(2)
memory usage: 36.7 MB <<<<<<<<<

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done!

>>> df.info(buf=file)
>>> file.close()

>>> df.drop('text_col', axis=1, inplace=True)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There is no reason to use df.drop in this example.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done!

@dcanones
Copy link
Contributor Author

dcanones commented Mar 10, 2018 via email

David Adrián Cañones Castellano added 2 commits March 10, 2018 21:07
Improvements and extended examples introductions
@dcanones
Copy link
Contributor Author

Added latest changes. I think it is ready to merge.

@TomAugspurger
Copy link
Contributor

Pushed a few changes (lines too long, some rewording of parameters).


Returns
-------
None
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Typically, Returns is omitted for methods with no return value. In this case I think it's good to explicitly say that there's no return value.

@jorisvandenbossche jorisvandenbossche added this to the 0.23.0 milestone Mar 17, 2018
Copy link
Member

@jorisvandenbossche jorisvandenbossche left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks, nice docstring update!

@jorisvandenbossche jorisvandenbossche merged commit 49eb22d into pandas-dev:master Mar 17, 2018
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants