Skip to content

DOC: update the pandas.DataFrame.plot.box docstring #20373

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 4 commits into from
Mar 23, 2018

Conversation

4kxz
Copy link
Contributor

@4kxz 4kxz commented Mar 15, 2018

A bit late to the party, but I did it.

  • PR title is "DOC: update the docstring"
  • The validation script passes: scripts/validate_docstrings.py <your-function-or-method>
  • The PEP8 style check passes: git diff upstream/master -u -- "*.py" | flake8 --diff
  • The html version looks good: python doc/make.py --single <your-function-or-method>
  • It has been proofread on language by another sprint participant
################################################################################
#################### Docstring (pandas.DataFrame.plot.box)  ####################
################################################################################

Make a box plot of the DataFrame columns.

A box plot is a method for graphically depicting
groups of numerical data through their quartiles.
The box extends from the Q1 to Q3 quartile values of the data,
with a line at the median (Q2). The whiskers extend from the edges
of box to show the range of the data. The position of the whiskers
is set by default to 1.5*IQR (IQR = Q3 - Q1) from the edges of the
box. Outlier points are those past the end of the whiskers.

For further details see Wikipedia's
entry for `boxplot <https://en.wikipedia.org/wiki/Box_plot>`_.

A consideration when using this chart is that the box and the whiskers
can overlap, which is very common when plotting small sets of data.

Parameters
----------
by : string or sequence
    Column in the DataFrame to group by.
**kwds : optional
    Additional keywords are documented in
    :meth:`pandas.DataFrame.plot`.

Returns
-------
axes : :class:`matplotlib.axes.Axes` or numpy.ndarray of them

See Also
--------
pandas.DataFrame.boxplot: Another method to draw a box plot.
pandas.Series.plot.box: Draw a box plot from a Series object.
matplotlib.pyplot.boxplot: Draw a box plot in matplotlib.

Examples
--------
Draw a box plot from a DataFrame with four columns of randomly
generated data.

.. plot::
    :context: close-figs

    >>> data = np.random.randn(25, 4)
    >>> df = pd.DataFrame(data, columns=list('ABCD'))
    >>> plot = df.plot.box()

################################################################################
################################## Validation ##################################
################################################################################

Errors found:
	Errors in parameters section
		Parameters {'kwds'} not documented
		Unknown parameters {'**kwds'}

Due to issue #15079 I wasn't sure how to document the by argument. I will follow up to add it later.

Changed kwds description according to #20348.

Copy link
Contributor

@dukebody dukebody left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Apart from the typo in the extended description, LGTM!


This method draws a box and whisker for each series in the DataFrame
in a single chart. The boxes share the vertical axis, which
makes makes it easier to compare if they are similar in scale.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

"makes makes" → "makes"

@codecov
Copy link

codecov bot commented Mar 16, 2018

Codecov Report

Merging #20373 into master will decrease coverage by 0.02%.
The diff coverage is 100%.

Impacted file tree graph

@@            Coverage Diff             @@
##           master   #20373      +/-   ##
==========================================
- Coverage    91.8%   91.78%   -0.03%     
==========================================
  Files         152      152              
  Lines       49223    49223              
==========================================
- Hits        45191    45179      -12     
- Misses       4032     4044      +12
Flag Coverage Δ
#multiple 90.17% <100%> (-0.03%) ⬇️
#single 41.84% <50%> (ø) ⬆️
Impacted Files Coverage Δ
pandas/plotting/_core.py 82.5% <100%> (ø) ⬆️
pandas/plotting/_converter.py 65.07% <0%> (-1.74%) ⬇️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 02477da...49fe6f8. Read the comment docs.

Copy link
Member

@jorisvandenbossche jorisvandenbossche left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the PR! Added some comments

.. plot::
:context: close-figs

>>> data = np.random.randint(0, 10, size=(25, 4))
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it is better to use normal distribution here (np.random.randn(25, 4)), it gives a bit more natural boxplots (eg with outliers) than a uniform distribution.

--------
:meth:`pandas.DataFrame.boxplot` : Another method to draw a box plot.
:meth:`pandas.Series.plot.box` : Draw a box plot from a Series object.
:func:`matplotlib.pyplot.boxplot`: Draw a box plot in matplotlib.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The :meth:`.. part is not needed here, that is done automatically by sphinx.

in a single chart. The boxes share the vertical axis, which
makes it easier to compare if they are similar in scale.
Box plots are useful because they clearly show the median,
quartiles and flier points in the data.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There is also the PR on the boxplot method: #20152. It has also extended the summary, you can check if there are some additions over there you could also use here (eg I think the wikipedia link is interesting)

(ideally we would re-use this extended summary for both, but that is something we can fix later)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've tried to fix the issues that you commented.

The summary from #20152 is more explanatory so I've copied most of it here. I removed the mention to the by parameter which, unless I'm missing something, is not yet implemented for plot.box. Is someone working on that?

Copy link
Contributor

@dukebody dukebody left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

IMO this is good enough. We can consider creating some shared doc parts for both DataFrame.plot.box and DataFrame.boxplot later.

@jorisvandenbossche jorisvandenbossche merged commit 9093eaf into pandas-dev:master Mar 23, 2018
@jorisvandenbossche
Copy link
Member

@lkxz Thanks for the updates, and for the PR!

@jorisvandenbossche jorisvandenbossche added this to the 0.23.0 milestone Mar 23, 2018
javadnoorb pushed a commit to javadnoorb/pandas that referenced this pull request Mar 29, 2018
dworvos pushed a commit to dworvos/pandas that referenced this pull request Apr 2, 2018
kornilova203 pushed a commit to kornilova203/pandas that referenced this pull request Apr 23, 2018
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants