-
-
Notifications
You must be signed in to change notification settings - Fork 18.4k
API & BUG: allow list-like y argument to df.plot & fix integer arg to x,y #20000
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Changes from 4 commits
19d64fe
8361338
20a2dc0
9535b4c
7d7d74e
d6d824f
37a6eca
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -965,6 +965,8 @@ Plotting | |
^^^^^^^^ | ||
|
||
- :func:`DataFrame.plot` now raises a ``ValueError`` when the ``x`` or ``y`` argument is improperly formed (:issue:`18671`) | ||
- :func:`DataFrame.plot` now supports multiple columns to the ``y`` argument (:issue:`19699`) | ||
- Bug in :func:`DataFrame.plot` with ``x`` or ``y`` arguments as positions (:issue:`20056`) | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Could you describe the issue a bit more? |
||
- Bug in formatting tick labels with ``datetime.time()`` and fractional seconds (:issue:`18478`). | ||
- :meth:`Series.plot.kde` has exposed the args ``ind`` and ``bw_method`` in the docstring (:issue:`18461`). The argument ``ind`` may now also be an integer (number of sample points). | ||
|
||
|
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -1706,21 +1706,37 @@ def _plot(data, x=None, y=None, subplots=False, | |
plot_obj = klass(data, subplots=subplots, ax=ax, kind=kind, **kwds) | ||
else: | ||
if isinstance(data, ABCDataFrame): | ||
new_data = data.copy() # don't modify until necessary | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. this is not necessary, just use 'data' don't add something else here There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I added the copy to fix the bug from #20056 so we get correct integer indexing - what did you have in mind? There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I think it's best to convert from integer to labels as soon as possible. Can you check if you can do it at the very start of |
||
if x is not None: | ||
if is_integer(x) and not data.columns.holds_integer(): | ||
x = data.columns[x] | ||
elif not isinstance(data[x], ABCSeries): | ||
raise ValueError("x must be a label or position") | ||
data = data.set_index(x) | ||
new_data = data.set_index(x) | ||
|
||
if y is not None: | ||
if is_integer(y) and not data.columns.holds_integer(): | ||
int_cols = is_integer(y) or all(is_integer(col) for col in y) | ||
if int_cols and not data.columns.holds_integer(): | ||
y = data.columns[y] | ||
elif not isinstance(data[y], ABCSeries): | ||
raise ValueError("y must be a label or position") | ||
label = kwds['label'] if 'label' in kwds else y | ||
series = data[y].copy() # Don't modify | ||
series.name = label | ||
elif not isinstance(data[y], (ABCSeries, ABCDataFrame)): | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. where is the test for this? (as you expanded to both Series & DataFrame)? There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. good point, removing There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. why are you actually selecting There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Yeah, this can probably be removed / replaced? There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. agreed, will remove |
||
raise ValueError( | ||
"y must be a label or position or list of them" | ||
) | ||
|
||
label_kw = kwds['label'] if 'label' in kwds else False | ||
new_data = new_data[y].copy() | ||
|
||
if isinstance(data[y], ABCSeries): | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. you can just test if y is_scalar There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Duplicate column names may mess that up, not sure if we allow that here though. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
right, that's why I have the check for |
||
label_name = label_kw or y | ||
new_data.name = label_name | ||
else: | ||
match = is_list_like(label_kw) and len(label_kw) == len(y) | ||
if label_kw and not match: | ||
raise ValueError( | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Can you add a test that raises this assertion? There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. done There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. where is this test? There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. |
||
"label should be list-like and same length as y" | ||
) | ||
label_name = label_kw or data[y].columns | ||
new_data.columns = label_name | ||
|
||
for kw in ['xerr', 'yerr']: | ||
if (kw in kwds) and \ | ||
|
@@ -1730,7 +1746,7 @@ def _plot(data, x=None, y=None, subplots=False, | |
kwds[kw] = data[kwds[kw]] | ||
except (IndexError, KeyError, TypeError): | ||
pass | ||
data = series | ||
data = new_data | ||
plot_obj = klass(data, subplots=subplots, ax=ax, kind=kind, **kwds) | ||
|
||
plot_obj.generate() | ||
|
@@ -1743,7 +1759,7 @@ def _plot(data, x=None, y=None, subplots=False, | |
series_kind = "" | ||
|
||
df_coord = """x : label or position, default None | ||
y : label or position, default None | ||
y : label, position or list of label, positions, default None | ||
Allows plotting of one column versus another""" | ||
series_coord = "" | ||
|
||
|
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -2170,26 +2170,51 @@ def test_invalid_kind(self): | |
with pytest.raises(ValueError): | ||
df.plot(kind='aasdf') | ||
|
||
@pytest.mark.parametrize("x,y", [ | ||
(['B', 'C'], 'A'), | ||
('A', ['B', 'C']) | ||
@pytest.mark.parametrize("x,y,lbl", [ | ||
(['B', 'C'], 'A', 'a'), | ||
(['A'], ['B', 'C'], ['b', 'c']), | ||
('A', ['B', 'C'], 'badlabel') | ||
]) | ||
def test_invalid_xy_args(self, x, y): | ||
# GH 18671 | ||
def test_invalid_xy_args(self, x, y, lbl): | ||
# GH 18671, 19699 allows y to be list-like but not x | ||
df = DataFrame({"A": [1, 2], 'B': [3, 4], 'C': [5, 6]}) | ||
with pytest.raises(ValueError): | ||
df.plot(x=x, y=y) | ||
df.plot(x=x, y=y, label=lbl) | ||
|
||
@pytest.mark.parametrize("x,y", [ | ||
('A', 'B'), | ||
('B', 'A') | ||
(['A'], 'B') | ||
]) | ||
def test_invalid_xy_args_dup_cols(self, x, y): | ||
# GH 18671 | ||
# GH 18671, 19699 allows y to be list-like but not x | ||
df = DataFrame([[1, 3, 5], [2, 4, 6]], columns=list('AAB')) | ||
with pytest.raises(ValueError): | ||
df.plot(x=x, y=y) | ||
|
||
@pytest.mark.parametrize("x,y,lbl,colors", [ | ||
('A', ['B'], ['b'], ['red']), | ||
('A', ['B', 'C'], ['b', 'c'], ['red', 'blue']), | ||
(0, [1, 2], ['bokeh', 'cython'], ['green', 'yellow']) | ||
]) | ||
def test_y_listlike(self, x, y, lbl, colors): | ||
# GH 19699 | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. can you give a 1-liner expln |
||
df = DataFrame({"A": [1, 2], 'B': [3, 4], 'C': [5, 6]}) | ||
_check_plot_works(df.plot, x='A', y=y, label=lbl) | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Could you get the There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. We should also check the color. What happens? Ideally it's the same as |
||
|
||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Could you add tests for
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
Good point. Shouldn't this raise tho? If you try this on a |
||
ax = df.plot(x=x, y=y, label=lbl, color=colors) | ||
assert len(ax.lines) == len(y) | ||
self._check_colors(ax.get_lines(), linecolors=colors) | ||
|
||
@pytest.mark.parametrize("x,y,colnames", [ | ||
(0, 1, ['A', 'B']), | ||
(1, 0, [0, 1]) | ||
]) | ||
def test_xy_args_integer(self, x, y, colnames): | ||
# GH 20056 | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. same |
||
df = DataFrame({"A": [1, 2], 'B': [3, 4]}) | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. we do you think this should be allowed? having both labels and positions is so very confusing. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I agree that it's confusing and definitely don't think it should be allowed. Let's remove support for positions? |
||
df.columns = colnames | ||
_check_plot_works(df.plot, x=x, y=y) | ||
|
||
@pytest.mark.slow | ||
def test_hexbin_basic(self): | ||
df = self.hexbin_df | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this first should be on Other Enhancements section
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could you move this to other enhancements?