Skip to content

DOC: update the DataFrame.mode method docstring #20241

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
wants to merge 5 commits into from
Closed

DOC: update the DataFrame.mode method docstring #20241

wants to merge 5 commits into from

Conversation

mkoo21
Copy link

@mkoo21 mkoo21 commented Mar 10, 2018

Checklist for the pandas documentation sprint (ignore this if you are doing
an unrelated PR):

  • PR title is "DOC: update the docstring"
  • The validation script passes: scripts/validate_docstrings.py <your-function-or-method>
  • The PEP8 style check passes: git diff upstream/master -u -- "*.py" | flake8 --diff
  • The html version looks good: python doc/make.py --single <your-function-or-method>
  • It has been proofread on language by another sprint participant

Please include the output of the validation script below between the "```" ticks:

################################################################################
###################### Docstring (pandas.DataFrame.mode)  ######################
################################################################################

Get the mode(s) of each element along the axis selected.

Adds a row for each mode per label, filling gaps with NaN.

Parameters
----------
axis : {0 or 'index', 1 or 'columns'}, default 0
    The axis to iterate over while searching for the mode.
    To find the mode for each column, iterate over rows (``axis=0``
    default behaviour).
    To find the mode for each row, iterate over columns (``axis=1``).
numeric_only : boolean, default False
    If True, only apply to numeric dimensions.

Returns
-------
modes : DataFrame (sorted)
    A DataFrame containing the modes
    If ``axis=0``, there will be one column per column in the original
    DataFrame, with as many rows as there are modes.
    If ``axis=1``, there will be one row per row in the original
    DataFrame, with as many columns as there are modes.

Notes
-----
There may be multiple values returned for the selected
axis (when more than one item share the maximum frequency), which is
the reason why a dataframe is returned. If you want to impute missing
values with the mode in a dataframe ``df``, you can just do this:
``df.fillna(df.mode().iloc[0])``.

See Also
--------
Series.mode : Return the highest frequency value in a Series.
Series.value_counts : Returns a Series with all occuring values as
    indices and the number of occurences as values.

Examples
--------

``mode`` returns a DataFrame with multiple rows if there is more than
one mode. Missing entries are imputed with NaN.

>>> grades = pd.DataFrame({
...     'Science': [80, 70, 80, 75, 80, 75, 85, 90, 80, 70],
...     'Math': [70, 70, 75, 75, 80, 80, 85, 85, 90, 90]
... })
>>> grades.apply(lambda x: x.value_counts())
    Science  Math
70        2     2
75        2     2
80        4     2
85        1     2
90        1     2
>>> grades.mode()
   Science  Math
0     80.0    70
1      NaN    75
2      NaN    80
3      NaN    85
4      NaN    90

Use ``axis=1`` to apply mode over columns (get the mode of each row).

>>> student_grades = pd.DataFrame.from_dict({
...     'Alice': [80, 85, 90, 85, 95],
...     'Bob': [70, 80, 80, 75, 90]
... }, 'index')
>>> student_grades
        0   1   2   3   4
Alice  80  85  90  85  95
Bob    70  80  80  75  90
>>> student_grades.mode(axis=1)
        0
Alice  85
Bob    80

################################################################################
################################## Validation ##################################
################################################################################

Docstring for "pandas.DataFrame.mode" correct. :)

If the validation script still gives errors, but you think there is a good reason
to deviate in this case (and there are certainly such cases), please state this
explicitly.

@pep8speaks
Copy link

pep8speaks commented Mar 10, 2018

Hello @mkoo21! Thanks for updating the PR.

Line 5950:1: W293 blank line contains whitespace

Comment last updated on March 11, 2018 at 21:14 Hours UTC

@jorisvandenbossche
Copy link
Member

@mkoo21 Can you try to follow the guidelines please? Check for PEP8, paste the output of the validation script

@mkoo21
Copy link
Author

mkoo21 commented Mar 10, 2018

Sorry, I thought I ran the PEP8 check but it wasn't outputting anything when I ran it earlier.
Output of validation script:

################################################################################
###################### Docstring (pandas.DataFrame.mode)  ######################
################################################################################

Get the mode(s) of each element along the axis selected.

Adds a row for each mode per label, filling gaps with NaN.

Parameters
----------
axis : {0 or 'index', 1 or 'columns'}, default 0
    The axis to iterate over while searching for the mode.
    To find the mode for each column, iterate over rows (``axis=0``
    default behaviour).
    To find the mode for each row, iterate over columns (``axis=1``).
numeric_only : boolean, default False
    If True, only apply to numeric dimensions.

Returns
-------
modes : DataFrame (sorted)
    A DataFrame containing the modes
    If ``axis=0``, there will be one column per column in the original
    DataFrame, with as many rows as there are modes.
    If ``axis=1``, there will be one row per row in the original
    DataFrame, with as many columns as there are modes.

Notes
-----
There may be multiple values returned for the selected
axis (when more than one item share the maximum frequency), which is
the reason why a dataframe is returned. If you want to impute missing
values with the mode in a dataframe ``df``, you can just do this:
``df.fillna(df.mode().iloc[0])``.

See Also
--------
Series.mode : Return the highest frequency value in a Series.
Series.value_counts : Returns a Series with all occuring values as
    indices and the number of occurences as values.

Examples
--------

``mode`` returns a DataFrame with multiple rows if there is more than
one mode. Missing entries are imputed with NaN.

>>> grades = pd.DataFrame({
...     'Science': [80, 70, 80, 75, 80, 75, 85, 90, 80, 70],
...     'Math': [70, 70, 75, 75, 80, 80, 85, 85, 90, 90]
... })
>>> grades.apply(lambda x: x.value_counts())
    Science  Math
70        2     2
75        2     2
80        4     2
85        1     2
90        1     2
>>> grades.mode()
   Science  Math
0     80.0    70
1      NaN    75
2      NaN    80
3      NaN    85
4      NaN    90

Use ``axis=1`` to apply mode over columns (get the mode of each row).

>>> student_grades = pd.DataFrame.from_dict({
...     'Alice': [80, 85, 90, 85, 95],
...     'Bob': [70, 80, 80, 75, 90]
... }, 'index')
>>> student_grades
        0   1   2   3   4
Alice  80  85  90  85  95
Bob    70  80  80  75  90
>>> student_grades.mode(axis=1)
        0
Alice  85
Bob    80

################################################################################
################################## Validation ##################################
################################################################################

Docstring for "pandas.DataFrame.mode" correct. :)

@@ -111,9 +111,9 @@
by : str or list of str
Name or list of names to sort by.

- if `axis` is 0 or `'index'` then `by` may contain index
- if ``axis`` is 0 or ``'index'`` then `by` may contain index
levels and/or column labels
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

try not to change unrelated things

``df.fillna(df.mode().iloc[0])``
Get the mode(s) of each element along the axis selected.

Adds a row for each mode per label, filling gaps with NaN.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can add a line explaining what mode is

* 1 or 'columns' : get mode of each row
The axis to iterate over while searching for the mode.
To find the mode for each column, iterate over rows (``axis=0``,
default behaviour).
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

you don't need the parens, it doesn't read well

Notes
-----
There may be multiple values returned for the selected
axis (when more than one item share the maximum frequency), which is
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

no parens, just use a comma. capitalize DataFrame.

you can leave off the impute sentence

See Also
--------
Series.mode : Return the highest frequency value in a Series.
Series.value_counts : Returns a Series with all occuring values as
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

counts of values

1 2
"""

``mode`` returns a DataFrame with multiple rows if there is more than
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

you don't need this sentence as its above

>>> grades.apply(lambda x: x.value_counts())
Science Math
70 2 2
75 2 2
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

put the mode example first

@jreback jreback added Reshaping Concat, Merge/Join, Stack/Unstack, Explode Numeric Operations Arithmetic, Comparison, and Logical operations labels Mar 11, 2018
@codecov
Copy link

codecov bot commented Mar 11, 2018

Codecov Report

❗ No coverage uploaded for pull request base (master@c3d491a). Click here to learn what that means.
The diff coverage is 92.1%.

Impacted file tree graph

@@            Coverage Diff            @@
##             master   #20241   +/-   ##
=========================================
  Coverage          ?   91.72%           
=========================================
  Files             ?      150           
  Lines             ?    49162           
  Branches          ?        0           
=========================================
  Hits              ?    45096           
  Misses            ?     4066           
  Partials          ?        0
Flag Coverage Δ
#multiple 90.11% <92.1%> (?)
#single 41.86% <23.68%> (?)
Impacted Files Coverage Δ
pandas/core/frame.py 97.18% <92.1%> (ø)

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update c3d491a...e476c55. Read the comment docs.

@mkoo21
Copy link
Author

mkoo21 commented Mar 22, 2018

I've made the requested changes to the text. The validation and PEP8 scripts passed. I'm not sure how to address the CI failures on my end. Is there anything else I need to do?

@datapythonista
Copy link
Member

Superseded by #22404 (discontinued PR, and the source branch does not exist anymore).

@jorisvandenbossche jorisvandenbossche added this to the No action milestone Aug 18, 2018
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Docs Numeric Operations Arithmetic, Comparison, and Logical operations Reshaping Concat, Merge/Join, Stack/Unstack, Explode
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants