-
-
Notifications
You must be signed in to change notification settings - Fork 18.4k
ENH: Allow column grouping in DataFrame.plot #29944
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Changes from 27 commits
469eff8
b96a659
c5b187b
99cc049
06da910
49c728a
b370ba5
d1a2ae0
10ac363
9774d42
572de08
4519f22
c6edb9f
73dd7eb
b804e57
d7ee47c
8fb298d
d84f356
ba0f982
fe98b86
f81d295
8255405
f54459e
f962425
dda80b3
a457b38
8d85ba9
bbaeffa
81f2a7f
fdd7610
fe965b0
4478597
ac51202
722c83b
c601c2d
c99abba
ea6c73d
9184e64
34515a8
ae90509
827c5b5
5aace24
b28d3b3
587cbc1
1517ae3
8c61b83
b75d86c
55928ff
b5f47f2
43070d3
e3c0146
a4c2676
71b7b39
4157618
9b44546
2d89f73
8cd6342
ffd4b18
66342bc
48c9f32
aa128dc
69efd18
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -636,8 +636,15 @@ class PlotAccessor(PandasObject): | |
- 'hexbin' : hexbin plot. | ||
ax : matplotlib axes object, default None | ||
An axes of the current figure. | ||
subplots : bool, default False | ||
Make separate subplots for each column. | ||
subplots : bool or list of iterables, default False | ||
|
||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Just to clarify, are the docs rendered correctly with this empty line? There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. We need an empty line because we're starting a list. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Yes, CI runs to build the docs (but I am not sure that it checks for these features). There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I couldn't build the docs
fails with stuff like (and with some theme issue) So instead I just added a line so that the docstring for subplots is consistent with the rest. If the rest is fine, then this one should be fine too. |
||
- ``False`` - no subplots will be used | ||
NicolasHug marked this conversation as resolved.
Show resolved
Hide resolved
|
||
- ``True`` - Make separate subplots for each column. | ||
- ``sequence of sequences of str`` - create a subplot for each group of columns. | ||
For example `[('a', 'c'), ('b', 'd')]` will create 2 subplots: one | ||
with columns 'a' and 'c', and one with columns 'b' and 'd'. | ||
Remaining columns that aren't specified will be plotted in | ||
additional subplots (one per column). | ||
sharex : bool, default True if ax is None else False | ||
In case ``subplots=True``, share x axis and set some x axis labels | ||
to invisible; defaults to True if ax is None otherwise False if | ||
|
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -1,5 +1,6 @@ | ||
from collections import Counter, Iterable | ||
import re | ||
from typing import List, Optional | ||
from typing import List, Optional, Sequence, Union | ||
import warnings | ||
|
||
from matplotlib.artist import Artist | ||
|
@@ -84,7 +85,7 @@ def __init__( | |
data, | ||
kind=None, | ||
by=None, | ||
subplots=False, | ||
subplots: Union[bool, Sequence[Sequence[str]]] = False, | ||
sharex=None, | ||
sharey=False, | ||
use_index=True, | ||
|
@@ -119,8 +120,7 @@ def __init__( | |
self.kind = kind | ||
|
||
self.sort_columns = sort_columns | ||
|
||
self.subplots = subplots | ||
self.subplots = self._validate_subplots_kwarg(subplots) | ||
|
||
if sharex is None: | ||
if ax is None: | ||
|
@@ -200,6 +200,79 @@ def __init__( | |
|
||
self._validate_color_args() | ||
|
||
def _validate_subplots_kwarg( | ||
self, subplots: Union[bool, Sequence[Sequence[str]]] | ||
NicolasHug marked this conversation as resolved.
Show resolved
Hide resolved
|
||
) -> Union[bool, Sequence[Sequence[int]]]: | ||
NicolasHug marked this conversation as resolved.
Show resolved
Hide resolved
|
||
|
||
if isinstance(subplots, bool): | ||
return subplots | ||
elif not isinstance(subplots, Iterable): | ||
raise ValueError("subplots should be a bool or an iterable") | ||
NicolasHug marked this conversation as resolved.
Show resolved
Hide resolved
|
||
|
||
supported_kinds = ( | ||
"line", | ||
"bar", | ||
"barh", | ||
"hist", | ||
"kde", | ||
"density", | ||
"area", | ||
"pie", | ||
) | ||
if self._kind not in supported_kinds: | ||
raise ValueError( | ||
"When subplots is an iterable, kind must be " | ||
NicolasHug marked this conversation as resolved.
Show resolved
Hide resolved
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. What if By the way, this is quite confusing for me (probably regardless of the given PR): inside There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
When subplots is boolean we return immediately as we fall back to the current behaviour, as in master. I haven't double checked but I would hope that The validation happening in |
||
f"one of {', '.join(supported_kinds)}. Got {self._kind}." | ||
) | ||
|
||
if any( | ||
not isinstance(group, Iterable) or isinstance(group, str) | ||
for group in subplots | ||
): | ||
raise ValueError( | ||
NicolasHug marked this conversation as resolved.
Show resolved
Hide resolved
|
||
"When subplots is an iterable, each entry " | ||
"should be a list/tuple of column names." | ||
) | ||
|
||
cols_in_groups = [col for group in subplots for col in group] | ||
duplicates = {col for (col, cnt) in Counter(cols_in_groups).items() if cnt > 1} | ||
if duplicates: | ||
raise ValueError( | ||
f"Each column should be in only one subplot. Columns {duplicates} " | ||
NicolasHug marked this conversation as resolved.
Show resolved
Hide resolved
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Is it really a desirable behavior? There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Currently (in master), only one column per plot is possible. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. yeah, agree, could be a follow-up enhancement PR! |
||
"were found in multiple sublots." | ||
) | ||
|
||
cols_in_groups_set = set(cols_in_groups) | ||
cols_remaining = set(self.data.columns) - cols_in_groups_set | ||
bad_columns = cols_in_groups_set - set(self.data.columns) | ||
|
||
if bad_columns: | ||
raise ValueError( | ||
"Subplots contains the following column(s) " | ||
f"which are invalid names: {bad_columns}" | ||
) | ||
|
||
# subplots is a list of tuples where each tuple is a group of | ||
# columns to be grouped together (one ax per group). | ||
# we consolidate the subplots list such that: | ||
# - the tuples contain indices instead of column names | ||
# - the columns that aren't yet in the list are added in a group | ||
# of their own. | ||
# For example with columns from a to g, and | ||
# subplots = [(a, c), (b, f, e)], | ||
# we end up with [(ai, ci), (bi, fi, ei), (di,), (gi,)] | ||
# This way, we can handle self.subplots in a homogeneous manner | ||
# later. | ||
# TODO: also accept indices instead of just names? | ||
|
||
out = [] | ||
index = list(self.data.columns).index | ||
for group in subplots: | ||
out.append(tuple(index(col) for col in group)) | ||
for col in cols_remaining: | ||
out.append((index(col),)) | ||
return out | ||
|
||
def _validate_color_args(self): | ||
import matplotlib.colors | ||
|
||
|
@@ -314,8 +387,11 @@ def _maybe_right_yaxis(self, ax, axes_num): | |
|
||
def _setup_subplots(self): | ||
if self.subplots: | ||
naxes = ( | ||
self.nseries if isinstance(self.subplots, bool) else len(self.subplots) | ||
) | ||
fig, axes = _subplots( | ||
naxes=self.nseries, | ||
naxes=naxes, | ||
sharex=self.sharex, | ||
sharey=self.sharey, | ||
figsize=self.figsize, | ||
|
@@ -693,9 +769,21 @@ def _get_ax_layer(cls, ax, primary=True): | |
else: | ||
return getattr(ax, "right_ax", ax) | ||
|
||
def _col_idx_to_axis_idx(self, col_idx: int): | ||
NicolasHug marked this conversation as resolved.
Show resolved
Hide resolved
|
||
"""Return the index of the axis where the column at col_idx should be plotted""" | ||
if isinstance(self.subplots, list): | ||
# Subplots is a list: some columns will be grouped together in the same ax | ||
for group_idx, group in enumerate(self.subplots): | ||
if col_idx in group: | ||
return group_idx | ||
else: | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Is this There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. it's a matter of style. I personally think that this way is less ambiguous when the first return statement is deeply nested with 4 indentation levels: it's easy to miss and to think that the function only returns There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I suppose, that once you added return type,
Or something like this. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I think mypy is wrong here. Because of the validation done earlier, there must be a valid column. Raising an exception might make mypy happy but it would never be hit (because of the validation), and we wouldn't be able to test against it so this would reduce test coverage. I changed the for loop into a |
||
# subplots is True: one ax per column | ||
return col_idx | ||
|
||
def _get_ax(self, i): | ||
# get the twinx ax if appropriate | ||
if self.subplots: | ||
i = self._col_idx_to_axis_idx(i) | ||
ax = self.axes[i] | ||
ax = self._maybe_right_yaxis(ax, i) | ||
self.axes[i] = ax | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
move to 1.2