Skip to content

Commit 66630f4

Browse files
author
Tom Augspurger
committed
Merge pull request #7998 from sinhrks/boxplot
ENH/CLN: add BoxPlot class inheriting MPLPlot
2 parents cbc5ddc + 656e140 commit 66630f4

File tree

4 files changed

+392
-31
lines changed

4 files changed

+392
-31
lines changed

doc/source/v0.15.0.txt

+1
Original file line numberDiff line numberDiff line change
@@ -216,6 +216,7 @@ API changes
216216
as the ``left`` argument. (:issue:`7737`)
217217

218218
- Histogram from ``DataFrame.plot`` with ``kind='hist'`` (:issue:`7809`), See :ref:`the docs<visualization.hist>`.
219+
- Boxplot from ``DataFrame.plot`` with ``kind='box'`` (:issue:`7998`), See :ref:`the docs<visualization.box>`.
219220
- Consistency when indexing with ``.loc`` and a list-like indexer when no values are found.
220221

221222
.. ipython:: python

doc/source/visualization.rst

+68-9
Original file line numberDiff line numberDiff line change
@@ -124,6 +124,7 @@ These include:
124124

125125
* :ref:`'bar' <visualization.barplot>` or :ref:`'barh' <visualization.barplot>` for bar plots
126126
* :ref:`'hist' <visualization.hist>` for histogram
127+
* :ref:`'box' <visualization.box>` for boxplot
127128
* :ref:`'kde' <visualization.kde>` or ``'density'`` for density plots
128129
* :ref:`'area' <visualization.area_plot>` for area plots
129130
* :ref:`'scatter' <visualization.scatter>` for scatter plots
@@ -244,7 +245,7 @@ See the :meth:`hist <matplotlib.axes.Axes.hist>` method and the
244245
`matplotlib hist documenation <http://matplotlib.org/api/pyplot_api.html#matplotlib.pyplot.hist>`__ for more.
245246

246247

247-
The previous interface ``DataFrame.hist`` to plot histogram still can be used.
248+
The existing interface ``DataFrame.hist`` to plot histogram still can be used.
248249

249250
.. ipython:: python
250251
@@ -288,12 +289,65 @@ The ``by`` keyword can be specified to plot grouped histograms:
288289
Box Plots
289290
~~~~~~~~~
290291

291-
DataFrame has a :meth:`~DataFrame.boxplot` method that allows you to visualize the
292-
distribution of values within each column.
292+
Boxplot can be drawn calling a ``Series`` and ``DataFrame.plot`` with ``kind='box'``,
293+
or ``DataFrame.boxplot`` to visualize the distribution of values within each column.
294+
295+
.. versionadded:: 0.15.0
296+
297+
``plot`` method now supports ``kind='box'`` to draw boxplot.
293298

294299
For instance, here is a boxplot representing five trials of 10 observations of
295300
a uniform random variable on [0,1).
296301

302+
.. ipython:: python
303+
:suppress:
304+
305+
np.random.seed(123456)
306+
307+
.. ipython:: python
308+
309+
df = DataFrame(rand(10, 5), columns=['A', 'B', 'C', 'D', 'E'])
310+
311+
@savefig box_plot_new.png
312+
df.plot(kind='box')
313+
314+
Boxplot can be colorized by passing ``color`` keyword. You can pass a ``dict``
315+
whose keys are ``boxes``, ``whiskers``, ``medians`` and ``caps``.
316+
If some keys are missing in the ``dict``, default colors are used
317+
for the corresponding artists. Also, boxplot has ``sym`` keyword to specify fliers style.
318+
319+
When you pass other type of arguments via ``color`` keyword, it will be directly
320+
passed to matplotlib for all the ``boxes``, ``whiskers``, ``medians`` and ``caps``
321+
colorization.
322+
323+
The colors are applied to every boxes to be drawn. If you want
324+
more complicated colorization, you can get each drawn artists by passing
325+
:ref:`return_type <visualization.box.return>`.
326+
327+
.. ipython:: python
328+
329+
color = dict(boxes='DarkGreen', whiskers='DarkOrange',
330+
medians='DarkBlue', caps='Gray')
331+
332+
@savefig box_new_colorize.png
333+
df.plot(kind='box', color=color, sym='r+')
334+
335+
Also, you can pass other keywords supported by matplotlib ``boxplot``.
336+
For example, horizontal and custom-positioned boxplot can be drawn by
337+
``vert=False`` and ``positions`` keywords.
338+
339+
.. ipython:: python
340+
341+
@savefig box_new_kwargs.png
342+
df.plot(kind='box', vert=False, positions=[1, 4, 5, 6, 8])
343+
344+
345+
See the :meth:`boxplot <matplotlib.axes.Axes.boxplot>` method and the
346+
`matplotlib boxplot documenation <http://matplotlib.org/api/pyplot_api.html#matplotlib.pyplot.boxplot>`__ for more.
347+
348+
349+
The existing interface ``DataFrame.boxplot`` to plot boxplot still can be used.
350+
297351
.. ipython:: python
298352
:suppress:
299353
@@ -354,18 +408,23 @@ columns:
354408
355409
.. _visualization.box.return:
356410

357-
The return type of ``boxplot`` depends on two keyword arguments: ``by`` and ``return_type``.
358-
When ``by`` is ``None``:
411+
Basically, plot functions return :class:`matplotlib Axes <matplotlib.axes.Axes>` as a return value.
412+
In ``boxplot``, the return type can be changed by argument ``return_type``, and whether the subplots is enabled (``subplots=True`` in ``plot`` or ``by`` is specified in ``boxplot``).
413+
414+
When ``subplots=False`` / ``by`` is ``None``:
359415

360416
* if ``return_type`` is ``'dict'``, a dictionary containing the :class:`matplotlib Lines <matplotlib.lines.Line2D>` is returned. The keys are "boxes", "caps", "fliers", "medians", and "whiskers".
361-
This is the default.
417+
This is the default of ``boxplot`` in historical reason.
418+
Note that ``plot(kind='box')`` returns ``Axes`` as default as the same as other plots.
362419
* if ``return_type`` is ``'axes'``, a :class:`matplotlib Axes <matplotlib.axes.Axes>` containing the boxplot is returned.
363420
* if ``return_type`` is ``'both'`` a namedtuple containging the :class:`matplotlib Axes <matplotlib.axes.Axes>`
364421
and :class:`matplotlib Lines <matplotlib.lines.Line2D>` is returned
365422

366-
When ``by`` is some column of the DataFrame, a dict of ``return_type`` is returned, where
367-
the keys are the columns of the DataFrame. The plot has a facet for each column of
368-
the DataFrame, with a separate box for each value of ``by``.
423+
When ``subplots=True`` / ``by`` is some column of the DataFrame:
424+
425+
* A dict of ``return_type`` is returned, where the keys are the columns
426+
of the DataFrame. The plot has a facet for each column of
427+
the DataFrame, with a separate box for each value of ``by``.
369428

370429
Finally, when calling boxplot on a :class:`Groupby` object, a dict of ``return_type``
371430
is returned, where the keys are the same as the Groupby object. The plot has a

pandas/tests/test_graphics.py

+171-6
Original file line numberDiff line numberDiff line change
@@ -365,7 +365,8 @@ def _check_has_errorbars(self, axes, xerr=0, yerr=0):
365365
self.assertEqual(xerr, xerr_count)
366366
self.assertEqual(yerr, yerr_count)
367367

368-
def _check_box_return_type(self, returned, return_type, expected_keys=None):
368+
def _check_box_return_type(self, returned, return_type, expected_keys=None,
369+
check_ax_title=True):
369370
"""
370371
Check box returned type is correct
371372
@@ -377,6 +378,10 @@ def _check_box_return_type(self, returned, return_type, expected_keys=None):
377378
expected_keys : list-like, optional
378379
group labels in subplot case. If not passed,
379380
the function checks assuming boxplot uses single ax
381+
check_ax_title : bool
382+
Whether to check the ax.title is the same as expected_key
383+
Intended to be checked by calling from ``boxplot``.
384+
Normal ``plot`` doesn't attach ``ax.title``, it must be disabled.
380385
"""
381386
from matplotlib.axes import Axes
382387
types = {'dict': dict, 'axes': Axes, 'both': tuple}
@@ -402,14 +407,17 @@ def _check_box_return_type(self, returned, return_type, expected_keys=None):
402407
self.assertTrue(isinstance(value, types[return_type]))
403408
# check returned dict has correct mapping
404409
if return_type == 'axes':
405-
self.assertEqual(value.get_title(), key)
410+
if check_ax_title:
411+
self.assertEqual(value.get_title(), key)
406412
elif return_type == 'both':
407-
self.assertEqual(value.ax.get_title(), key)
413+
if check_ax_title:
414+
self.assertEqual(value.ax.get_title(), key)
408415
self.assertIsInstance(value.ax, Axes)
409416
self.assertIsInstance(value.lines, dict)
410417
elif return_type == 'dict':
411418
line = value['medians'][0]
412-
self.assertEqual(line.get_axes().get_title(), key)
419+
if check_ax_title:
420+
self.assertEqual(line.get_axes().get_title(), key)
413421
else:
414422
raise AssertionError
415423

@@ -452,7 +460,7 @@ def test_plot(self):
452460
_check_plot_works(self.ts.plot, kind='area', stacked=False)
453461
_check_plot_works(self.iseries.plot)
454462

455-
for kind in ['line', 'bar', 'barh', 'kde', 'hist']:
463+
for kind in ['line', 'bar', 'barh', 'kde', 'hist', 'box']:
456464
if not _ok_for_gaussian_kde(kind):
457465
continue
458466
_check_plot_works(self.series[:5].plot, kind=kind)
@@ -767,6 +775,15 @@ def test_hist_kde_color(self):
767775
self.assertEqual(len(lines), 1)
768776
self._check_colors(lines, ['r'])
769777

778+
@slow
779+
def test_boxplot_series(self):
780+
ax = self.ts.plot(kind='box', logy=True)
781+
self._check_ax_scales(ax, yaxis='log')
782+
xlabels = ax.get_xticklabels()
783+
self._check_text_labels(xlabels, [self.ts.name])
784+
ylabels = ax.get_yticklabels()
785+
self._check_text_labels(ylabels, [''] * len(ylabels))
786+
770787
@slow
771788
def test_autocorrelation_plot(self):
772789
from pandas.tools.plotting import autocorrelation_plot
@@ -1650,6 +1667,99 @@ def test_bar_log_subplots(self):
16501667

16511668
@slow
16521669
def test_boxplot(self):
1670+
df = self.hist_df
1671+
series = df['height']
1672+
numeric_cols = df._get_numeric_data().columns
1673+
labels = [com.pprint_thing(c) for c in numeric_cols]
1674+
1675+
ax = _check_plot_works(df.plot, kind='box')
1676+
self._check_text_labels(ax.get_xticklabels(), labels)
1677+
assert_array_equal(ax.xaxis.get_ticklocs(), np.arange(1, len(numeric_cols) + 1))
1678+
self.assertEqual(len(ax.lines), 8 * len(numeric_cols))
1679+
1680+
axes = _check_plot_works(df.plot, kind='box', subplots=True, logy=True)
1681+
self._check_axes_shape(axes, axes_num=3, layout=(1, 3))
1682+
self._check_ax_scales(axes, yaxis='log')
1683+
for ax, label in zip(axes, labels):
1684+
self._check_text_labels(ax.get_xticklabels(), [label])
1685+
self.assertEqual(len(ax.lines), 8)
1686+
1687+
axes = series.plot(kind='box', rot=40)
1688+
self._check_ticks_props(axes, xrot=40, yrot=0)
1689+
tm.close()
1690+
1691+
ax = _check_plot_works(series.plot, kind='box')
1692+
1693+
positions = np.array([1, 6, 7])
1694+
ax = df.plot(kind='box', positions=positions)
1695+
numeric_cols = df._get_numeric_data().columns
1696+
labels = [com.pprint_thing(c) for c in numeric_cols]
1697+
self._check_text_labels(ax.get_xticklabels(), labels)
1698+
assert_array_equal(ax.xaxis.get_ticklocs(), positions)
1699+
self.assertEqual(len(ax.lines), 8 * len(numeric_cols))
1700+
1701+
@slow
1702+
def test_boxplot_vertical(self):
1703+
df = self.hist_df
1704+
series = df['height']
1705+
numeric_cols = df._get_numeric_data().columns
1706+
labels = [com.pprint_thing(c) for c in numeric_cols]
1707+
1708+
# if horizontal, yticklabels are rotated
1709+
ax = df.plot(kind='box', rot=50, fontsize=8, vert=False)
1710+
self._check_ticks_props(ax, xrot=0, yrot=50, ylabelsize=8)
1711+
self._check_text_labels(ax.get_yticklabels(), labels)
1712+
self.assertEqual(len(ax.lines), 8 * len(numeric_cols))
1713+
1714+
axes = _check_plot_works(df.plot, kind='box', subplots=True,
1715+
vert=False, logx=True)
1716+
self._check_axes_shape(axes, axes_num=3, layout=(1, 3))
1717+
self._check_ax_scales(axes, xaxis='log')
1718+
for ax, label in zip(axes, labels):
1719+
self._check_text_labels(ax.get_yticklabels(), [label])
1720+
self.assertEqual(len(ax.lines), 8)
1721+
1722+
positions = np.array([3, 2, 8])
1723+
ax = df.plot(kind='box', positions=positions, vert=False)
1724+
self._check_text_labels(ax.get_yticklabels(), labels)
1725+
assert_array_equal(ax.yaxis.get_ticklocs(), positions)
1726+
self.assertEqual(len(ax.lines), 8 * len(numeric_cols))
1727+
1728+
@slow
1729+
def test_boxplot_return_type(self):
1730+
df = DataFrame(randn(6, 4),
1731+
index=list(string.ascii_letters[:6]),
1732+
columns=['one', 'two', 'three', 'four'])
1733+
with tm.assertRaises(ValueError):
1734+
df.plot(kind='box', return_type='NOTATYPE')
1735+
1736+
result = df.plot(kind='box', return_type='dict')
1737+
self._check_box_return_type(result, 'dict')
1738+
1739+
result = df.plot(kind='box', return_type='axes')
1740+
self._check_box_return_type(result, 'axes')
1741+
1742+
result = df.plot(kind='box', return_type='both')
1743+
self._check_box_return_type(result, 'both')
1744+
1745+
@slow
1746+
def test_boxplot_subplots_return_type(self):
1747+
df = self.hist_df
1748+
1749+
# normal style: return_type=None
1750+
result = df.plot(kind='box', subplots=True)
1751+
self.assertIsInstance(result, np.ndarray)
1752+
self._check_box_return_type(result, None,
1753+
expected_keys=['height', 'weight', 'category'])
1754+
1755+
for t in ['dict', 'axes', 'both']:
1756+
returned = df.plot(kind='box', return_type=t, subplots=True)
1757+
self._check_box_return_type(returned, t,
1758+
expected_keys=['height', 'weight', 'category'],
1759+
check_ax_title=False)
1760+
1761+
@slow
1762+
def test_boxplot_legacy(self):
16531763
df = DataFrame(randn(6, 4),
16541764
index=list(string.ascii_letters[:6]),
16551765
columns=['one', 'two', 'three', 'four'])
@@ -1693,7 +1803,7 @@ def test_boxplot(self):
16931803
self.assertEqual(len(ax.get_lines()), len(lines))
16941804

16951805
@slow
1696-
def test_boxplot_return_type(self):
1806+
def test_boxplot_return_type_legacy(self):
16971807
# API change in https://github.com/pydata/pandas/pull/7096
16981808
import matplotlib as mpl
16991809

@@ -2315,6 +2425,61 @@ def test_kde_colors(self):
23152425
rgba_colors = lmap(cm.jet, np.linspace(0, 1, len(df)))
23162426
self._check_colors(ax.get_lines(), linecolors=rgba_colors)
23172427

2428+
@slow
2429+
def test_boxplot_colors(self):
2430+
2431+
def _check_colors(bp, box_c, whiskers_c, medians_c, caps_c='k', fliers_c='b'):
2432+
self._check_colors(bp['boxes'], linecolors=[box_c] * len(bp['boxes']))
2433+
self._check_colors(bp['whiskers'], linecolors=[whiskers_c] * len(bp['whiskers']))
2434+
self._check_colors(bp['medians'], linecolors=[medians_c] * len(bp['medians']))
2435+
self._check_colors(bp['fliers'], linecolors=[fliers_c] * len(bp['fliers']))
2436+
self._check_colors(bp['caps'], linecolors=[caps_c] * len(bp['caps']))
2437+
2438+
default_colors = self.plt.rcParams.get('axes.color_cycle')
2439+
2440+
df = DataFrame(randn(5, 5))
2441+
bp = df.plot(kind='box', return_type='dict')
2442+
_check_colors(bp, default_colors[0], default_colors[0], default_colors[2])
2443+
tm.close()
2444+
2445+
dict_colors = dict(boxes='#572923', whiskers='#982042',
2446+
medians='#804823', caps='#123456')
2447+
bp = df.plot(kind='box', color=dict_colors, sym='r+', return_type='dict')
2448+
_check_colors(bp, dict_colors['boxes'], dict_colors['whiskers'],
2449+
dict_colors['medians'], dict_colors['caps'], 'r')
2450+
tm.close()
2451+
2452+
# partial colors
2453+
dict_colors = dict(whiskers='c', medians='m')
2454+
bp = df.plot(kind='box', color=dict_colors, return_type='dict')
2455+
_check_colors(bp, default_colors[0], 'c', 'm')
2456+
tm.close()
2457+
2458+
from matplotlib import cm
2459+
# Test str -> colormap functionality
2460+
bp = df.plot(kind='box', colormap='jet', return_type='dict')
2461+
jet_colors = lmap(cm.jet, np.linspace(0, 1, 3))
2462+
_check_colors(bp, jet_colors[0], jet_colors[0], jet_colors[2])
2463+
tm.close()
2464+
2465+
# Test colormap functionality
2466+
bp = df.plot(kind='box', colormap=cm.jet, return_type='dict')
2467+
_check_colors(bp, jet_colors[0], jet_colors[0], jet_colors[2])
2468+
tm.close()
2469+
2470+
# string color is applied to all artists except fliers
2471+
bp = df.plot(kind='box', color='DodgerBlue', return_type='dict')
2472+
_check_colors(bp, 'DodgerBlue', 'DodgerBlue', 'DodgerBlue',
2473+
'DodgerBlue')
2474+
2475+
# tuple is also applied to all artists except fliers
2476+
bp = df.plot(kind='box', color=(0, 1, 0), sym='#123456', return_type='dict')
2477+
_check_colors(bp, (0, 1, 0), (0, 1, 0), (0, 1, 0), (0, 1, 0), '#123456')
2478+
2479+
with tm.assertRaises(ValueError):
2480+
# Color contains invalid key results in ValueError
2481+
df.plot(kind='box', color=dict(boxes='red', xxxx='blue'))
2482+
23182483
def test_default_color_cycle(self):
23192484
import matplotlib.pyplot as plt
23202485
plt.rcParams['axes.color_cycle'] = list('rgbk')

0 commit comments

Comments
 (0)