API: Remove integer position args from xy for plotting #20371

masongallo · 2018-03-15T21:44:49Z

closes BUG: df.plot fails when given x,y args as positions #20056
tests added / passed
passes git diff upstream/master -u -- "*.py" | flake8 --diff
[?] whatsnew entry

This is based on discussion from #20000 where we decided it was confusing to allow df.plot to support both labels AND positions. I wasn't sure where to put this in whatsnew, so lmk and I'll update it.

jreback · 2018-03-15T23:06:49Z

pandas/tests/plotting/test_frame.py

-                # n.b. there appears to be no public method to get the colorbar
-                # label
-                assert ax.collections[0].colorbar._label == 'z'
+        if self.mpl_ge_1_3_1:


@TomAugspurger we should be able to blow this code away as we > 1.4.3, yes (separate PR)

jreback

can you add a sub-section in the whatsnew about this in api-breaking changes. Note we may want to consider deprecating this and removing in a future version @jorisvandenbossche @TomAugspurger . though I am fine with just changing it.

jreback · 2018-03-15T23:07:15Z

pandas/tests/plotting/test_frame.py

        df = DataFrame({"A": [1, 2], 'B': [3, 4], 'C': [5, 6]})
        with pytest.raises(ValueError):
            df.plot(x=x, y=y)

+    @pytest.mark.parametrize("x,y,colnames", [


can you add a test that checks the raise when passing ints and a non-int index

jorisvandenbossche · 2018-03-16T08:53:27Z

Note we may want to consider deprecating this and removing in a future version @jorisvandenbossche @TomAugspurger . though I am fine with just changing it.

If it worked, I think we should deprecate it. But did it actually work? Because #20056 seems to indicate not? (if it didn't work, of course fine to just remove)

jorisvandenbossche · 2018-03-16T08:58:04Z

So it seems that for line, bar, area it doesn't work, but for scatter, pie and hexbin it does?

TomAugspurger · 2018-03-16T11:06:58Z

I don't think we should remove this. I know positional plotting gets some use, especially for exploratory analysis when you have long column names.

I'll look more at why the original issue was failiing.

masongallo · 2018-03-16T14:26:16Z

So it seems that for line, bar, area it doesn't work, but for scatter, pie and hexbin it does?

@jorisvandenbossche yes I believe this is due to the structure of the code - the index setting I mentioned in #20056 only happens to kind not from that set - called _dataframe_kinds and _series_kinds.

I don't think we should remove this. I know positional plotting gets some use, especially for exploratory analysis when you have long column names.

@TomAugspurger why not ask the user to supply data.columns[ind] instead of us making assumptions and calculating it for them? IMO that would make the API less complex.

TomAugspurger · 2018-03-16T14:33:28Z

IME, DataFrame.plot is most often used in exploratory analysis. For this, speed typing becomes an API consideration. If you have a long column name, but want to quickly plot it, `df.plot(y=0)` is nicer than `df.plot(y='percent_return_for_foo_over_bar')`.

instead of us making assumptions

I don't think we ever make assumptions? If there's ever an integer in the column we don't fall back to positional indexing. IIUC there was just a bug in how we computed the positions when both `x` and `y` are specified.

…

On Fri, Mar 16, 2018 at 9:26 AM, Mason Gallo ***@***.***> wrote: So it seems that for line, bar, area it doesn't work, but for scatter, pie and hexbin it does? @jorisvandenbossche <https://github.com/jorisvandenbossche> yes I believe this is due to the structure of the code - the index setting I mentioned in #20056 <#20056> only happens to kind not from that set - called _dataframe_kinds and _series_kinds. I don't think we should remove this. I know positional plotting gets some use, especially for exploratory analysis when you have long column names. @TomAugspurger <https://github.com/tomaugspurger> why not ask the user to supply data.columns[ind] instead of us making assumptions and calculating it for them? IMO that would make the API less complex. — You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub <#20371 (comment)>, or mute the thread <https://github.com/notifications/unsubscribe-auth/ABQHIpGFXfj-iGl2wL60jY-nACHpf7MNks5te8uQgaJpZM4Ss7iz> .

masongallo · 2018-03-16T15:20:07Z

IME, DataFrame.plot is most often used in exploratory analysis. For this,
speed typing becomes an API consideration.

That's fair - you have much more context than I do on usage.

I don't think we ever make assumptions?

We make an assumption by allowing mixed types in column names - i.e. we check with holds_integer and then assume a given int is meant as a column name.

Seems like we don't have consensus on whether this should be removed from API. How should we proceed? I can close this and fix the integer indexing bug in #20000 ?

TomAugspurger · 2018-03-16T15:29:34Z

We make an assumption by allowing mixed types in column names - i.e. we check with holds_integer and then assume a given int is meant as a column name.

Perhaps it's just semantics then, but I wouldn't call that assuming :) To me the behavior is clear (but could be better documented). Specifying x and y as positions is allowed if and only if the columns holds no integers.

I can close this and fix the integer indexing bug in #20000 ?

If you're comfortable with maybe wasting some effort, I would work around the bug in #20000. We can wait to hear what the other maintainers say before closing or working further on this issue, but I would be sad to see this go.

jorisvandenbossche · 2018-03-16T15:58:35Z

Perhaps it's just semantics then, but I wouldn't call that assuming :) To me the behavior is clear (but could be better documented). Specifying x and y as positions is allowed if and only if the columns holds no integers.

Although there might be some "guessing" cases (mixed integer columns), this is also behaviour that used a lot throughout the code base (eg level can often be specified both as integer or as name).

--

I don't have a strong feeling about this. If it is not too difficult to fix this to actually work, I am certainly fine with keeping it.

masongallo · 2018-03-19T16:16:56Z

FYI: will be closing this in favor of #20000

masongallo added 2 commits March 15, 2018 17:37

adjust tests to raise when positions given

5b71113

raise when positions given unless data contains ints as col names

47916fb

jreback reviewed Mar 15, 2018

View reviewed changes

jreback requested changes Mar 15, 2018

View reviewed changes

jreback added the API Design label Mar 15, 2018

jreback added Visualization plotting Deprecate Functionality to remove in pandas labels Mar 16, 2018

masongallo closed this Mar 22, 2018

masongallo deleted the remove-positions-plotting branch March 22, 2018 14:41

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

API: Remove integer position args from xy for plotting #20371

API: Remove integer position args from xy for plotting #20371

masongallo commented Mar 15, 2018

jreback Mar 15, 2018

jreback left a comment

jreback Mar 15, 2018

jorisvandenbossche commented Mar 16, 2018

jorisvandenbossche commented Mar 16, 2018

TomAugspurger commented Mar 16, 2018

masongallo commented Mar 16, 2018

TomAugspurger commented Mar 16, 2018 via email

masongallo commented Mar 16, 2018

TomAugspurger commented Mar 16, 2018

jorisvandenbossche commented Mar 16, 2018

masongallo commented Mar 19, 2018

API: Remove integer position args from xy for plotting #20371

API: Remove integer position args from xy for plotting #20371

Conversation

masongallo commented Mar 15, 2018

jreback Mar 15, 2018

Choose a reason for hiding this comment

jreback left a comment

Choose a reason for hiding this comment

jreback Mar 15, 2018

Choose a reason for hiding this comment

jorisvandenbossche commented Mar 16, 2018

jorisvandenbossche commented Mar 16, 2018

TomAugspurger commented Mar 16, 2018

masongallo commented Mar 16, 2018

TomAugspurger commented Mar 16, 2018 via email

masongallo commented Mar 16, 2018

TomAugspurger commented Mar 16, 2018

jorisvandenbossche commented Mar 16, 2018

masongallo commented Mar 19, 2018