DOC: update the DataFrame.mode method docstring #20241

mkoo21 · 2018-03-10T21:15:48Z

Checklist for the pandas documentation sprint (ignore this if you are doing
an unrelated PR):

PR title is "DOC: update the docstring"
The validation script passes: scripts/validate_docstrings.py <your-function-or-method>
The PEP8 style check passes: git diff upstream/master -u -- "*.py" | flake8 --diff
The html version looks good: python doc/make.py --single <your-function-or-method>
It has been proofread on language by another sprint participant

Please include the output of the validation script below between the "```" ticks:

################################################################################
###################### Docstring (pandas.DataFrame.mode)  ######################
################################################################################

Get the mode(s) of each element along the axis selected.

Adds a row for each mode per label, filling gaps with NaN.

Parameters
----------
axis : {0 or 'index', 1 or 'columns'}, default 0
    The axis to iterate over while searching for the mode.
    To find the mode for each column, iterate over rows (``axis=0``
    default behaviour).
    To find the mode for each row, iterate over columns (``axis=1``).
numeric_only : boolean, default False
    If True, only apply to numeric dimensions.

Returns
-------
modes : DataFrame (sorted)
    A DataFrame containing the modes
    If ``axis=0``, there will be one column per column in the original
    DataFrame, with as many rows as there are modes.
    If ``axis=1``, there will be one row per row in the original
    DataFrame, with as many columns as there are modes.

Notes
-----
There may be multiple values returned for the selected
axis (when more than one item share the maximum frequency), which is
the reason why a dataframe is returned. If you want to impute missing
values with the mode in a dataframe ``df``, you can just do this:
``df.fillna(df.mode().iloc[0])``.

See Also
--------
Series.mode : Return the highest frequency value in a Series.
Series.value_counts : Returns a Series with all occuring values as
    indices and the number of occurences as values.

Examples
--------

``mode`` returns a DataFrame with multiple rows if there is more than
one mode. Missing entries are imputed with NaN.

>>> grades = pd.DataFrame({
...     'Science': [80, 70, 80, 75, 80, 75, 85, 90, 80, 70],
...     'Math': [70, 70, 75, 75, 80, 80, 85, 85, 90, 90]
... })
>>> grades.apply(lambda x: x.value_counts())
    Science  Math
70        2     2
75        2     2
80        4     2
85        1     2
90        1     2
>>> grades.mode()
   Science  Math
0     80.0    70
1      NaN    75
2      NaN    80
3      NaN    85
4      NaN    90

Use ``axis=1`` to apply mode over columns (get the mode of each row).

>>> student_grades = pd.DataFrame.from_dict({
...     'Alice': [80, 85, 90, 85, 95],
...     'Bob': [70, 80, 80, 75, 90]
... }, 'index')
>>> student_grades
        0   1   2   3   4
Alice  80  85  90  85  95
Bob    70  80  80  75  90
>>> student_grades.mode(axis=1)
        0
Alice  85
Bob    80

################################################################################
################################## Validation ##################################
################################################################################

Docstring for "pandas.DataFrame.mode" correct. :)

If the validation script still gives errors, but you think there is a good reason
to deviate in this case (and there are certainly such cases), please state this
explicitly.

pep8speaks · 2018-03-10T21:15:51Z

Hello @mkoo21! Thanks for updating the PR.

In the file pandas/core/frame.py, following are the PEP8 issues :

Line 5950:1: W293 blank line contains whitespace

Comment last updated on March 11, 2018 at 21:14 Hours UTC

jorisvandenbossche · 2018-03-10T21:22:19Z

@mkoo21 Can you try to follow the guidelines please? Check for PEP8, paste the output of the validation script

mkoo21 · 2018-03-10T21:25:44Z

Sorry, I thought I ran the PEP8 check but it wasn't outputting anything when I ran it earlier.
Output of validation script:

################################################################################
###################### Docstring (pandas.DataFrame.mode)  ######################
################################################################################

Get the mode(s) of each element along the axis selected.

Adds a row for each mode per label, filling gaps with NaN.

Parameters
----------
axis : {0 or 'index', 1 or 'columns'}, default 0
    The axis to iterate over while searching for the mode.
    To find the mode for each column, iterate over rows (``axis=0``
    default behaviour).
    To find the mode for each row, iterate over columns (``axis=1``).
numeric_only : boolean, default False
    If True, only apply to numeric dimensions.

Returns
-------
modes : DataFrame (sorted)
    A DataFrame containing the modes
    If ``axis=0``, there will be one column per column in the original
    DataFrame, with as many rows as there are modes.
    If ``axis=1``, there will be one row per row in the original
    DataFrame, with as many columns as there are modes.

Notes
-----
There may be multiple values returned for the selected
axis (when more than one item share the maximum frequency), which is
the reason why a dataframe is returned. If you want to impute missing
values with the mode in a dataframe ``df``, you can just do this:
``df.fillna(df.mode().iloc[0])``.

See Also
--------
Series.mode : Return the highest frequency value in a Series.
Series.value_counts : Returns a Series with all occuring values as
    indices and the number of occurences as values.

Examples
--------

``mode`` returns a DataFrame with multiple rows if there is more than
one mode. Missing entries are imputed with NaN.

>>> grades = pd.DataFrame({
...     'Science': [80, 70, 80, 75, 80, 75, 85, 90, 80, 70],
...     'Math': [70, 70, 75, 75, 80, 80, 85, 85, 90, 90]
... })
>>> grades.apply(lambda x: x.value_counts())
    Science  Math
70        2     2
75        2     2
80        4     2
85        1     2
90        1     2
>>> grades.mode()
   Science  Math
0     80.0    70
1      NaN    75
2      NaN    80
3      NaN    85
4      NaN    90

Use ``axis=1`` to apply mode over columns (get the mode of each row).

>>> student_grades = pd.DataFrame.from_dict({
...     'Alice': [80, 85, 90, 85, 95],
...     'Bob': [70, 80, 80, 75, 90]
... }, 'index')
>>> student_grades
        0   1   2   3   4
Alice  80  85  90  85  95
Bob    70  80  80  75  90
>>> student_grades.mode(axis=1)
        0
Alice  85
Bob    80

################################################################################
################################## Validation ##################################
################################################################################

Docstring for "pandas.DataFrame.mode" correct. :)

jreback · 2018-03-11T13:56:17Z

pandas/core/frame.py

@@ -111,9 +111,9 @@
        by : str or list of str
            Name or list of names to sort by.

-            - if `axis` is 0 or `'index'` then `by` may contain index
+            - if ``axis`` is 0 or ``'index'`` then `by` may contain index
              levels and/or column labels


try not to change unrelated things

jreback · 2018-03-11T13:56:35Z

pandas/core/frame.py

-        ``df.fillna(df.mode().iloc[0])``
+        Get the mode(s) of each element along the axis selected. 
+
+        Adds a row for each mode per label, filling gaps with NaN.


can add a line explaining what mode is

jreback · 2018-03-11T13:57:06Z

pandas/core/frame.py

-            * 1 or 'columns' : get mode of each row
+            The axis to iterate over while searching for the mode.
+            To find the mode for each column, iterate over rows (``axis=0``, 
+            default behaviour).


you don't need the parens, it doesn't read well

jreback · 2018-03-11T13:58:07Z

pandas/core/frame.py

+        Notes
+        -----
+        There may be multiple values returned for the selected
+        axis (when more than one item share the maximum frequency), which is


no parens, just use a comma. capitalize DataFrame.

you can leave off the impute sentence

jreback · 2018-03-11T13:58:22Z

pandas/core/frame.py

+        See Also
+        --------
+        Series.mode : Return the highest frequency value in a Series. 
+        Series.value_counts : Returns a Series with all occuring values as 


counts of values

jreback · 2018-03-11T13:58:37Z

pandas/core/frame.py

-        1  2
-        """
+
+        ``mode`` returns a DataFrame with multiple rows if there is more than 


you don't need this sentence as its above

jreback · 2018-03-11T13:59:06Z

pandas/core/frame.py

+        >>> grades.apply(lambda x: x.value_counts())
+            Science  Math
+        70        2     2
+        75        2     2


put the mode example first

codecov · 2018-03-11T21:14:42Z

Codecov Report

❗ No coverage uploaded for pull request base (master@c3d491a). Click here to learn what that means.
The diff coverage is 92.1%.

@@            Coverage Diff            @@
##             master   #20241   +/-   ##
=========================================
  Coverage          ?   91.72%           
=========================================
  Files             ?      150           
  Lines             ?    49162           
  Branches          ?        0           
=========================================
  Hits              ?    45096           
  Misses            ?     4066           
  Partials          ?        0

Flag	Coverage Δ
#multiple	`90.11% <92.1%> (?)`
#single	`41.86% <23.68%> (?)`

Impacted Files	Coverage Δ
pandas/core/frame.py	`97.18% <92.1%> (ø)`

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update c3d491a...e476c55. Read the comment docs.

mkoo21 · 2018-03-22T18:31:13Z

I've made the requested changes to the text. The validation and PEP8 scripts passed. I'm not sure how to address the CI failures on my end. Is there anything else I need to do?

datapythonista · 2018-08-17T20:32:34Z

Superseded by #22404 (discontinued PR, and the source branch does not exist anymore).

mkoo21 added 4 commits March 10, 2018 15:31

update doc page for mode

7b8f476

upgdate for pep :p

081e311

Notes section

27fe244

wording

7458582

jorisvandenbossche added the Docs label Mar 10, 2018

jreback requested changes Mar 11, 2018

View reviewed changes

jreback added Reshaping Concat, Merge/Join, Stack/Unstack, Explode Numeric Operations Arithmetic, Comparison, and Logical operations labels Mar 11, 2018

polishing up docstring

e476c55

datapythonista mentioned this pull request Aug 17, 2018

Updating DataFrame.mode docstring. #22404

Merged

datapythonista closed this Aug 17, 2018

jorisvandenbossche added this to the No action milestone Aug 18, 2018

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

DOC: update the DataFrame.mode method docstring #20241

DOC: update the DataFrame.mode method docstring #20241

mkoo21 commented Mar 10, 2018 •

edited

Loading

pep8speaks commented Mar 10, 2018 •

edited

Loading

jorisvandenbossche commented Mar 10, 2018

mkoo21 commented Mar 10, 2018 •

edited

Loading

jreback Mar 11, 2018

jreback Mar 11, 2018

jreback Mar 11, 2018

jreback Mar 11, 2018

jreback Mar 11, 2018

jreback Mar 11, 2018

jreback Mar 11, 2018

codecov bot commented Mar 11, 2018

mkoo21 commented Mar 22, 2018

datapythonista commented Aug 17, 2018

DOC: update the DataFrame.mode method docstring #20241

DOC: update the DataFrame.mode method docstring #20241

Conversation

mkoo21 commented Mar 10, 2018 • edited Loading

pep8speaks commented Mar 10, 2018 • edited Loading

Comment last updated on March 11, 2018 at 21:14 Hours UTC

jorisvandenbossche commented Mar 10, 2018

mkoo21 commented Mar 10, 2018 • edited Loading

jreback Mar 11, 2018

Choose a reason for hiding this comment

jreback Mar 11, 2018

Choose a reason for hiding this comment

jreback Mar 11, 2018

Choose a reason for hiding this comment

jreback Mar 11, 2018

Choose a reason for hiding this comment

jreback Mar 11, 2018

Choose a reason for hiding this comment

jreback Mar 11, 2018

Choose a reason for hiding this comment

jreback Mar 11, 2018

Choose a reason for hiding this comment

codecov bot commented Mar 11, 2018

Codecov Report

mkoo21 commented Mar 22, 2018

datapythonista commented Aug 17, 2018

mkoo21 commented Mar 10, 2018 •

edited

Loading

pep8speaks commented Mar 10, 2018 •

edited

Loading

mkoo21 commented Mar 10, 2018 •

edited

Loading