Skip to content

ENH: Allow column grouping in DataFrame.plot #29944

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 62 commits into from
Jun 9, 2022
Merged
Show file tree
Hide file tree
Changes from 13 commits
Commits
Show all changes
62 commits
Select commit Hold shift + click to select a range
469eff8
WIP
NicolasHug Dec 1, 2019
b96a659
minimal test
NicolasHug Dec 1, 2019
c5b187b
Merge branch 'master' of https://github.com/pandas-dev/pandas into su…
NicolasHug Jan 17, 2020
99cc049
formatted files
NicolasHug Jan 17, 2020
06da910
linting again
NicolasHug Jan 17, 2020
49c728a
Merge branch 'master' of https://github.com/pandas-dev/pandas into su…
NicolasHug Mar 30, 2020
b370ba5
Check number of lines and labels
NicolasHug Mar 30, 2020
d1a2ae0
Added input checks
NicolasHug Mar 30, 2020
10ac363
some comments
NicolasHug Mar 30, 2020
9774d42
Whatsnew entry
NicolasHug Mar 30, 2020
572de08
More tests
NicolasHug Mar 30, 2020
4519f22
Skip test if no scipy for kde
NicolasHug Mar 30, 2020
c6edb9f
Merge branch 'master' into subplot_groups
NicolasHug Apr 5, 2020
73dd7eb
Merge branch 'master' of https://github.com/pandas-dev/pandas into su…
NicolasHug Apr 14, 2020
b804e57
Address comments
NicolasHug Apr 14, 2020
d7ee47c
fix docstring
NicolasHug Apr 14, 2020
8fb298d
Added error when columns are fouund in multiple subplots
NicolasHug Apr 14, 2020
d84f356
Merge branch 'subplot_groups' of github.com:NicolasHug/pandas into su…
NicolasHug Apr 14, 2020
ba0f982
sorted imports
NicolasHug Apr 14, 2020
fe98b86
Merge branch 'master' of https://github.com/pandas-dev/pandas into su…
NicolasHug Jul 21, 2020
f81d295
Added type annotations
NicolasHug Jul 21, 2020
8255405
some more
NicolasHug Jul 21, 2020
f54459e
typos
NicolasHug Jul 21, 2020
f962425
Addressed comments
NicolasHug Jul 21, 2020
dda80b3
Specify stricter output type
NicolasHug Jul 21, 2020
a457b38
Again
NicolasHug Jul 21, 2020
8d85ba9
hopefully fix mypy?
NicolasHug Jul 21, 2020
bbaeffa
Merge branch 'master' of https://github.com/pandas-dev/pandas into su…
NicolasHug Jul 29, 2020
81f2a7f
Merge branch 'master' of https://github.com/pandas-dev/pandas into su…
NicolasHug Sep 1, 2020
fdd7610
added test for coverage
NicolasHug Sep 1, 2020
fe965b0
Merge branch 'master' of https://github.com/pandas-dev/pandas into su…
NicolasHug Sep 4, 2020
4478597
changed to whatsnew 112
NicolasHug Sep 5, 2020
ac51202
used proper whatsnew file
NicolasHug Sep 5, 2020
722c83b
Merge branch 'master' of https://github.com/pandas-dev/pandas into su…
NicolasHug Sep 5, 2020
c601c2d
Merge branch 'master' into subplot_groups
NicolasHug Sep 6, 2020
c99abba
Merge branch 'master' into subplot_groups
NicolasHug Nov 6, 2020
ea6c73d
unwgnjgrwnjwgnjrkknrjg
NicolasHug Nov 6, 2020
9184e64
Merge branch 'master' of github.com:pandas-dev/pandas into subplot_gr…
NicolasHug Jan 17, 2021
34515a8
Addressed comments
NicolasHug Jan 17, 2021
ae90509
remove code due to merge mess up
NicolasHug Jan 17, 2021
827c5b5
fix mypy
NicolasHug Jan 17, 2021
5aace24
Merge branch 'master' of github.com:pandas-dev/pandas into subplot_gr…
NicolasHug Jan 17, 2021
b28d3b3
Merge branch 'master' of github.com:pandas-dev/pandas into subplot_gr…
NicolasHug Jan 18, 2021
587cbc1
Merge branch 'master' of github.com:pandas-dev/pandas into subplot_gr…
NicolasHug Apr 3, 2021
1517ae3
Merge branch 'master' into subplot_groups
NicolasHug May 10, 2021
8c61b83
Merge branch 'master' of github.com:pandas-dev/pandas into subplot_gr…
NicolasHug May 10, 2021
b75d86c
merge again
NicolasHug May 10, 2021
55928ff
mypy for the billionth time
NicolasHug May 10, 2021
b5f47f2
Merge branch 'master' of github.com:pandas-dev/pandas into subplot_gr…
NicolasHug Jun 8, 2021
43070d3
add return type
NicolasHug Jun 8, 2021
e3c0146
add line
NicolasHug Jun 8, 2021
a4c2676
mypy for the billionth time + 1
NicolasHug Jun 8, 2021
71b7b39
Merge branch 'master' of github.com:pandas-dev/pandas into subplot_gr…
NicolasHug Oct 4, 2021
4157618
Merge branch 'main' of github.com:pandas-dev/pandas into subplot_groups
NicolasHug Jan 17, 2022
9b44546
update whatnew
NicolasHug Jan 17, 2022
2d89f73
Merge branch 'main' into subplot_groups
NicolasHug Mar 7, 2022
8cd6342
import Iterable from typing.
NicolasHug Mar 29, 2022
ffd4b18
Merge branch 'main' of github.com:pandas-dev/pandas into subplot_groups
NicolasHug Mar 29, 2022
66342bc
Merge remote-tracking branch 'upstream/main' into subplot_groups
mroeschke Jun 2, 2022
48c9f32
Process groups to indices in one pass
mroeschke Jun 3, 2022
aa128dc
Merge remote-tracking branch 'upstream/main' into subplot_groups
mroeschke Jun 3, 2022
69efd18
Fix typing check
mroeschke Jun 3, 2022
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
1 change: 1 addition & 0 deletions doc/source/whatsnew/v1.1.0.rst
Original file line number Diff line number Diff line change
Expand Up @@ -87,6 +87,7 @@ Other enhancements
- Positional slicing on a :class:`IntervalIndex` now supports slices with ``step > 1`` (:issue:`31658`)
- :class:`Series.str` now has a `fullmatch` method that matches a regular expression against the entire string in each row of the series, similar to `re.fullmatch` (:issue:`32806`).
- :meth:`DataFrame.sample` will now also allow array-like and BitGenerator objects to be passed to ``random_state`` as seeds (:issue:`32503`)
- :meth:`DataFrame.plot` will now allow for the ``subplots`` parameter to be a list of tuples so that columns may be grouped together in the same plot. (:issue:`29944`)
- :meth:`MultiIndex.union` will now raise `RuntimeWarning` if the object inside are unsortable, pass `sort=False` to suppress this warning (:issue:`33015`)
-

Expand Down
79 changes: 78 additions & 1 deletion pandas/plotting/_matplotlib/core.py
Original file line number Diff line number Diff line change
@@ -1,3 +1,4 @@
from collections import Iterable
import re
from typing import Optional
import warnings
Expand Down Expand Up @@ -115,6 +116,67 @@ def __init__(
self.sort_columns = sort_columns

self.subplots = subplots
if not isinstance(self.subplots, (bool, Iterable)):
raise ValueError("subplots should be a bool or an iterable")
if isinstance(self.subplots, Iterable):

supported_kinds = (
"line",
"bar",
"barh",
"hist",
"kde",
"density",
"area",
"pie",
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

looking at your impressive examples, maybe we should exclude pie chart here? doesnt seem to fit as good as other charts here? any thoughts on it?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree it doesn't look good, but it seems to be consistent (i.e. equally bad) as when subplots=True. I updated the notebook at the end where you can see how it looks. So maybe it can be kept? No strong opinion!

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i have no strong opinion either, my only concern is for pie chart, if they share the same category, they will have the same color in the pie, but different sizes depending on the corresponding values.

but no strong opinion

)
if self._kind not in supported_kinds:
raise ValueError(
"When subplots is an iterable, kind must be "
f"one of {', '.join(supported_kinds)}. Got {self._kind}"
)

if any(
not isinstance(group, Iterable) or isinstance(group, str)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

should only not isinstance(group, Iterable) be enough here? isinstance(group, str) also falls into this, right?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The thing is, we want group to be an iterable but we don't want group to be a str, which is also iterable.

Only using if any(not isinstance(group, Iterable) would allow strings and we don't want that

for group in subplots
):
raise ValueError(
"When subplots is an iterable, each entry "
"should be a list/tuple of column names "
"or column indices."
)

cols_in_groups = {col for group in self.subplots for col in group}
cols_remaining = set(data.columns) - cols_in_groups
bad_columns = cols_in_groups - set(data.columns)

if bad_columns:
raise ValueError(
"Subplots contains the following column(s) "
f"which are invalid names: {bad_columns}"
)

# subplots is a list of tuples where each tuple is a group of
# columns to be grouped together (one ax per group).
# we consolidate the subplots list such that:
# - the tuples contain indexes instead of column names
# - the columns that aren't yet in the list are added in a group
# of their own.
# For example with columns from a to g, and
# subplots = [(a, c), (b, f, e)],
# we end up with [(ai, ci), (bi, fi, ei), (di,), (gi,)]
# This way, we can handle self.subplots in a homogeneous manner
# later.
# TODO: also accept indexes instead of just names?

subplots = []
index = list(data.columns).index
for group in self.subplots:
subplots.append(tuple(index(col) for col in group))
for col in cols_remaining:
subplots.append((index(col),))

self.subplots = subplots

if sharex is None:
if ax is None:
Expand Down Expand Up @@ -306,8 +368,11 @@ def _maybe_right_yaxis(self, ax, axes_num):

def _setup_subplots(self):
if self.subplots:
naxes = (
self.nseries if isinstance(self.subplots, bool) else len(self.subplots)
)
fig, axes = _subplots(
naxes=self.nseries,
naxes=naxes,
sharex=self.sharex,
sharey=self.sharey,
figsize=self.figsize,
Expand Down Expand Up @@ -676,9 +741,21 @@ def _get_ax_layer(cls, ax, primary=True):
else:
return getattr(ax, "right_ax", ax)

def _col_idx_to_axis_idx(self, col_idx):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the annotation also is needed for the new method, thanks

"""Return the index of the axis where the column at col_idx should be plotted"""
if isinstance(self.subplots, list):
# Subplots is a list: some columns are be grouped together in the same ax
for group_idx, group in enumerate(self.subplots):
if col_idx in group:
return group_idx
else:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this else clause required? I guess, that it may be dropped.

Copy link
Contributor Author

@NicolasHug NicolasHug Jun 8, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it's a matter of style. I personally think that this way is less ambiguous when the first return statement is deeply nested with 4 indentation levels: it's easy to miss and to think that the function only returns col_idx. The else clause avoids that potential confusion. This choice was deliberate on my part, but I don't mind changing.

Copy link
Member

@ivanovmg ivanovmg Jun 8, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I suppose, that once you added return type, mypy started complaining.
Inside the loop you return only in case if col_idx in group, this is the reason for the mypy error.
I suggest that you modify it:

for group_idx, group in enumerate(self.subplots):
    if col_idx in group:
        return group_idx
raise KeyError(f"Failed to find {col_idx} in {self.subplots}")

Or something like this.

Copy link
Contributor Author

@NicolasHug NicolasHug Jun 8, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think mypy is wrong here. Because of the validation done earlier, there must be a valid column. Raising an exception might make mypy happy but it would never be hit (because of the validation), and we wouldn't be able to test against it so this would reduce test coverage. I changed the for loop into a next(), which seems to trick mypy into being reasonable again.

# subplots is True: one ax per column
return col_idx

def _get_ax(self, i):
# get the twinx ax if appropriate
if self.subplots:
i = self._col_idx_to_axis_idx(i)
ax = self.axes[i]
ax = self._maybe_right_yaxis(ax, i)
self.axes[i] = ax
Expand Down
61 changes: 61 additions & 0 deletions pandas/tests/plotting/test_frame.py
Original file line number Diff line number Diff line change
Expand Up @@ -408,6 +408,9 @@ def test_subplots(self):
for ax in axes:
assert ax.get_legend() is None

with pytest.raises(ValueError, match="should be a bool or an iterable"):
axes = df.plot(subplots=123)

def test_groupby_boxplot_sharey(self):
# https://github.com/pandas-dev/pandas/issues/20968
# sharey can now be switched check whether the right
Expand Down Expand Up @@ -3279,6 +3282,64 @@ def test_plot_no_numeric_data(self):
with pytest.raises(TypeError):
df.plot()

@td.skip_if_no_scipy
@pytest.mark.parametrize(
"kind", ("line", "bar", "barh", "hist", "kde", "density", "area", "pie")
)
def test_group_subplot(self, kind):
d = {
"a": np.arange(10),
"b": np.arange(10) + 1,
"c": np.arange(10) + 1,
"d": np.arange(10),
"e": np.arange(10),
}
df = pd.DataFrame(d)

axes = df.plot(subplots=[("b", "e"), ("c", "d")], kind=kind)
assert len(axes) == 3 # 2 groups + single column a

expected_labels = (["b", "e"], ["c", "d"], ["a"])
for ax, labels in zip(axes, expected_labels):
if kind != "pie":
self._check_legend_labels(ax, labels=labels)
if kind == "line":
assert len(ax.lines) == len(labels)

@pytest.mark.parametrize(
"subplots",
[
"a", # iterable of non-iterable
(1,), # iterable of non-iterable
("a",), # iterable of strings
],
)
def test_group_subplot_bad_input(self, subplots):
# Make sure error is raised when subplots is not a properly
# formatted iterable. Only iterables of iterables are permitted, and
# entries should not be strings.
d = {"a": np.arange(10), "b": np.arange(10)}
df = pd.DataFrame(d)

with pytest.raises(ValueError, match="each entry should be a list/tuple"):
df.plot(subplots=subplots)

def test_group_subplot_invalid_column_name(self):
d = {"a": np.arange(10), "b": np.arange(10)}
df = pd.DataFrame(d)

with pytest.raises(ValueError, match="invalid names: {'bad_name'}"):
df.plot(subplots=[("a", "bad_name")])

@pytest.mark.parametrize("kind", ("box", "scatter", "hexbin"))
def test_group_subplot_invalid_kind(self, kind):
d = {"a": np.arange(10), "b": np.arange(10)}
df = pd.DataFrame(d)
with pytest.raises(
ValueError, match="When subplots is an iterable, kind must be one of"
):
df.plot(subplots=[("a", "b")], kind=kind)

def test_missing_markers_legend(self):
# 14958
df = pd.DataFrame(np.random.randn(8, 3), columns=["A", "B", "C"])
Expand Down