-
-
Notifications
You must be signed in to change notification settings - Fork 18.4k
API & BUG: allow list-like y argument to df.plot & fix integer arg to x,y #20000
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good thanks. Just a couple edge cases that need to be tested I think (your code should handle them already. Just for future regressions).
pandas/tests/plotting/test_frame.py
Outdated
def test_y_listlike(self, y, lbl): | ||
# GH 19699 | ||
df = DataFrame({"A": [1, 2], 'B': [3, 4], 'C': [5, 6]}) | ||
_check_plot_works(df.plot, x='A', y=y, label=lbl) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could you get the ax
from df.plot
and assert that it has two lines? I think len(ax.lines)
should work.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We should also check the color. What happens? Ideally it's the same as df.set_index('x').plot()
, so two different colors.
pandas/tests/plotting/test_frame.py
Outdated
# GH 19699 | ||
df = DataFrame({"A": [1, 2], 'B': [3, 4], 'C': [5, 6]}) | ||
_check_plot_works(df.plot, x='A', y=y, label=lbl) | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could you add tests for
- all integer columns:
x=0, y=[1, 2]
- Mix of int and named columns.
x=0, y=['A', 2]
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Mix of int and named columns. x=0, y=['A', 2]
Good point. Shouldn't this raise tho? If you try this on a DataFrame
you'll get a KeyError
. IMO we shouldn't allow users to specify a mix of int & named cols since it's unclear what you actually want.
pandas/plotting/_core.py
Outdated
else: | ||
match = is_list_like(label_kw) and len(label_kw) == len(y) | ||
if label_kw and not match: | ||
raise ValueError( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you add a test that raises this assertion?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
where is this test?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Codecov Report
@@ Coverage Diff @@
## master #20000 +/- ##
==========================================
+ Coverage 91.77% 91.8% +0.03%
==========================================
Files 152 152
Lines 49205 49222 +17
==========================================
+ Hits 45159 45190 +31
+ Misses 4046 4032 -14
Continue to review full report at Codecov.
|
Yeah, raising is probably best.
…On Tue, Mar 6, 2018 at 7:34 AM, Mason Gallo ***@***.***> wrote:
***@***.**** commented on this pull request.
------------------------------
In pandas/tests/plotting/test_frame.py
<#20000 (comment)>:
> df = DataFrame([[1, 3, 5], [2, 4, 6]], columns=list('AAB'))
with pytest.raises(ValueError):
df.plot(x=x, y=y)
+ @pytest.mark.parametrize("y,lbl", [
+ (['B'], ['b']),
+ (['B', 'C'], ['b', 'c'])
+ ])
+ def test_y_listlike(self, y, lbl):
+ # GH 19699
+ df = DataFrame({"A": [1, 2], 'B': [3, 4], 'C': [5, 6]})
+ _check_plot_works(df.plot, x='A', y=y, label=lbl)
+
Mix of int and named columns. x=0, y=['A', 2]
Good point. Shouldn't this raise tho? If you try this on a DataFrame
you'll get a KeyError. IMO we shouldn't allow users to specify a mix of
int & named cols since it's unclear what you actually want.
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
<#20000 (comment)>,
or mute the thread
<https://github.com/notifications/unsubscribe-auth/ABQHIi3rkFHROQ43I43ijONt8_Oy7Xgkks5tbqyNgaJpZM4SdaP5>
.
|
I addressed #20056 and added test cases to cover it. Also added a bunch more test cases to increase coverage as requested. |
Any ideas on the failure on CircleCI? I'm on a mac so it looks like the test gets skipped locally? |
doc/source/whatsnew/v0.23.0.txt
Outdated
@@ -965,6 +965,8 @@ Plotting | |||
^^^^^^^^ | |||
|
|||
- :func:`DataFrame.plot` now raises a ``ValueError`` when the ``x`` or ``y`` argument is improperly formed (:issue:`18671`) | |||
- :func:`DataFrame.plot` now supports multiple columns to the ``y`` argument (:issue:`19699`) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this first should be on Other Enhancements section
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could you move this to other enhancements?
pandas/plotting/_core.py
Outdated
@@ -1706,21 +1706,37 @@ def _plot(data, x=None, y=None, subplots=False, | |||
plot_obj = klass(data, subplots=subplots, ax=ax, kind=kind, **kwds) | |||
else: | |||
if isinstance(data, ABCDataFrame): | |||
new_data = data.copy() # don't modify until necessary |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this is not necessary, just use 'data' don't add something else here
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I added the copy to fix the bug from #20056 so we get correct integer indexing - what did you have in mind?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think it's best to convert from integer to labels as soon as possible. Can you check if you can do it at the very start of _plot
? Then you don't have to worry about this.
pandas/tests/plotting/test_frame.py
Outdated
]) | ||
def test_xy_args_integer(self, x, y, colnames): | ||
# GH 20056 | ||
df = DataFrame({"A": [1, 2], 'B': [3, 4]}) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
we do you think this should be allowed? having both labels and positions is so very confusing.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I agree that it's confusing and definitely don't think it should be allowed. Let's remove support for positions?
Merge master into your branch and repush. The'll be fixed. |
doc/source/whatsnew/v0.23.0.txt
Outdated
@@ -965,6 +965,8 @@ Plotting | |||
^^^^^^^^ | |||
|
|||
- :func:`DataFrame.plot` now raises a ``ValueError`` when the ``x`` or ``y`` argument is improperly formed (:issue:`18671`) | |||
- :func:`DataFrame.plot` now supports multiple columns to the ``y`` argument (:issue:`19699`) | |||
- Bug in :func:`DataFrame.plot` with ``x`` or ``y`` arguments as positions (:issue:`20056`) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could you describe the issue a bit more?
doc/source/whatsnew/v0.23.0.txt
Outdated
@@ -965,6 +965,8 @@ Plotting | |||
^^^^^^^^ | |||
|
|||
- :func:`DataFrame.plot` now raises a ``ValueError`` when the ``x`` or ``y`` argument is improperly formed (:issue:`18671`) | |||
- :func:`DataFrame.plot` now supports multiple columns to the ``y`` argument (:issue:`19699`) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could you move this to other enhancements?
pandas/plotting/_core.py
Outdated
@@ -1706,21 +1706,37 @@ def _plot(data, x=None, y=None, subplots=False, | |||
plot_obj = klass(data, subplots=subplots, ax=ax, kind=kind, **kwds) | |||
else: | |||
if isinstance(data, ABCDataFrame): | |||
new_data = data.copy() # don't modify until necessary |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think it's best to convert from integer to labels as soon as possible. Can you check if you can do it at the very start of _plot
? Then you don't have to worry about this.
@TomAugspurger good suggestion! I think it's doable depending on where we land with the API discussion - |
pandas/plotting/_core.py
Outdated
"y must be a label or position or list of them" | ||
) | ||
label_kw = kwds['label'] if 'label' in kwds else False | ||
new_data = data[y].copy() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
just call this data
pandas/plotting/_core.py
Outdated
label_kw = kwds['label'] if 'label' in kwds else False | ||
new_data = data[y].copy() | ||
|
||
if isinstance(data[y], ABCSeries): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
you can just test if y is_scalar
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Duplicate column names may mess that up, not sure if we allow that here though.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Duplicate column names may mess that up
right, that's why I have the check for ABCSeries
pandas/tests/plotting/test_frame.py
Outdated
(0, [1, 2], ['bokeh', 'cython'], ['green', 'yellow']) | ||
]) | ||
def test_y_listlike(self, x, y, lbl, colors): | ||
# GH 19699 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can you give a 1-liner expln
pandas/tests/plotting/test_frame.py
Outdated
(1, 0, [0, 1]) | ||
]) | ||
def test_xy_args_integer(self, x, y, colnames): | ||
# GH 20056 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
same
pandas/plotting/_core.py
Outdated
else: | ||
match = is_list_like(label_kw) and len(label_kw) == len(y) | ||
if label_kw and not match: | ||
raise ValueError( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
where is this test?
pandas/plotting/_core.py
Outdated
int_y = is_integer(y) or all(is_integer(c) for c in y) | ||
if int_y and not data.columns.holds_integer(): | ||
y = data_cols[y] | ||
elif not isinstance(data[y], (ABCSeries, ABCDataFrame)): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
where is the test for this? (as you expanded to both Series & DataFrame)?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
good point, removing
pandas/plotting/_core.py
Outdated
int_y = is_integer(y) or all(is_integer(c) for c in y) | ||
if int_y and not data.columns.holds_integer(): | ||
y = data_cols[y] | ||
elif not isinstance(data[y], (ABCSeries, ABCDataFrame)): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why are you actually selecting data[y]
here what else could data[y]
be?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, this can probably be removed / replaced?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
agreed, will remove
pandas/plotting/_core.py
Outdated
label = kwds['label'] if 'label' in kwds else y | ||
series = data[y].copy() # Don't modify | ||
series.name = label | ||
int_y = is_integer(y) or all(is_integer(c) for c in y) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This looks fragile. For better or worse (mostly worse) you can pretty much any object in pandas columns, e.g. df = pd.DataFrame({pd.Timestamp('2017'): [1, 2]})
.
I think your code will fail for y=pd.Timestamp('2017')
, since it's not an integer, but it also isn't iterable.
I'd recommend adding is_list_like(y)
before the all
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
you can pretty much any object in pandas columns, e.g. df = pd.DataFrame({pd.Timestamp('2017'): [1, 2]})
good call
pandas/plotting/_core.py
Outdated
int_y = is_integer(y) or all(is_integer(c) for c in y) | ||
if int_y and not data.columns.holds_integer(): | ||
y = data_cols[y] | ||
elif not isinstance(data[y], (ABCSeries, ABCDataFrame)): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, this can probably be removed / replaced?
pandas/plotting/_core.py
Outdated
label_kw = kwds['label'] if 'label' in kwds else False | ||
new_data = data[y].copy() | ||
|
||
if isinstance(data[y], ABCSeries): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Duplicate column names may mess that up, not sure if we allow that here though.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good. May have a closer look tomorrow, but otherwise +1
ping, tests green |
Thanks @masongallo! |
… x,y (pandas-dev#20000) * Add support for list-like y argument * update whatsnew * add doc change for y * Add test cases and fix position args * don't copy save cols ahead of time and update whatsnew * address fdbck
… x,y (pandas-dev#20000) * Add support for list-like y argument * update whatsnew * add doc change for y * Add test cases and fix position args * don't copy save cols ahead of time and update whatsnew * address fdbck
… x,y (pandas-dev#20000) * Add support for list-like y argument * update whatsnew * add doc change for y * Add test cases and fix position args * don't copy save cols ahead of time and update whatsnew * address fdbck
y
inDataFrame.plot
. #19699git diff upstream/master -u -- "*.py" | flake8 --diff
@TomAugspurger
I added support for the user to pass a list-like to y, as discussed in #19699. The API to
df.plot
is relatively complex with lots of args, so lmk with questions / fdbck. Not sure if I covered everything for docs.