Skip to content

BUG: legend behaves inconsistently when plotting to the same axes #6678

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 1 commit into from
Apr 22, 2014

Conversation

sinhrks
Copy link
Member

@sinhrks sinhrks commented Mar 21, 2014

There seems to be some inconsistencies related to DataFrame.plot and Series.plot legend behaviors.

Problems:

  • When DataFrame.plot or Series.plot plots data on the same axes repeatedly:
    • If the target axes already has a legend, line plot always appends its legend to existing one ignoring legend kw and existing legend will be drawn as line artist regardless of actual artist type. Also, legend cannot be reverseed if the axes already has a legend.
    • Bar/BarH plot deletes the existing legend and overwrites with latest one.
    • KDE plot appends its legend to the existing one, and will apply reverse to all artists including the existing one.
  • When subplots is enabled, line plot draws legends on each axes but bar plot doesn't.
  • Scatter plot does not draw legend even if label keyword is passed.

Fix:
I've prepared a fix based on following concept, except hexbin which doesn't use legend.

  • Legend should be drawn by the order of the artists drawn.
  • Each legend should be drawn according to the passed legend value.
    • If df1 plots with legend=True and df2 with legend=False, only df1's legend should appear.
    • If df2 plots with legend='reverse'd, only df2's legend should be reversed.
  • When subplots=True and legend=True, each subplot axes should have its own legend (standardize current line plot behavior).

Example Code

df1 = DataFrame(randn(6, 3), index=range(6), columns=['a', 'b', 'c'])
df2 = DataFrame(randn(6, 3), index=range(6), columns=['d', 'e', 'f'])
df3 = DataFrame(randn(6, 3), index=range(6), columns=['x', 'y', 'z'])

fig, axes = plt.subplots(1, 5, figsize=(14, 3))
for i, (legend, df) in enumerate(zip([True, False, 'reverse'], [df1, df2, df3])):
    df.plot(ax=axes[0],  legend=legend, title='line')
    df.plot(kind='bar', ax=axes[1], legend=legend, title='bar')
    df.plot(kind='barh', ax=axes[2],  legend=legend, title='barh')
    df.plot(kind='kde', ax=axes[3],  legend=legend, title='kde')
    df.plot(kind='scatter', ax=axes[4], x=df.columns[0], y=df.columns[1], label=i, 
        legend=legend, title='scatter')
plt.show()

Output using current repository

  • For line, bar, kde plot, expected legend is a, b, c, z, y x. Because df2 was plot by legend=False, and df3 was plot by legend='reverse'.
  • For scatter plot, 0, 2 is expected because df2 was plot by legend=False.

figure_legend_current

Output after fix
figure_fixed

If there is anything should be considered, please let me know. Thanks.

@jreback
Copy link
Contributor

jreback commented Apr 5, 2014

@TomAugspurger can you take a look at this?

@jtratner
Copy link
Contributor

jtratner commented Apr 6, 2014

@sinhrks - you're going to need to rebase this before we can merge it.

@sinhrks
Copy link
Member Author

sinhrks commented Apr 6, 2014

ok, rebased.

@@ -1345,7 +1379,16 @@ def __init__(self, data, x, y, **kwargs):
def _make_plot(self):
x, y, data = self.x, self.y, self.data
ax = self.axes[0]
ax.scatter(data[x].values, data[y].values, **self.kwds)

if self.legend and hasattr(self, 'label') and not self.label is None:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the and not self.label is None is redundant. If self.label is set to None then label becomes None, which is the same as your else. So this could be

if self.legend and hasattr(self, 'label'):
    label = self.label
else:
    label = None

unless I'm missing something.

@TomAugspurger
Copy link
Contributor

This will technically cause the test suite fail if someone runs the slow tests on py2.6 since the assertIsNone and assertIsNotNone methods were added to unittest in python 2.7. Our build matrix doesn't catch that since 2.6 is run with not slow. Is this a problem @jreback? The tests could be changed to use self.assertTrue(x is not None) or self.assertTrue(x is None)

@TomAugspurger
Copy link
Contributor

Other than that, this should be fine.

@sinhrks I'll get to your area plot PR soon.

@jreback
Copy link
Contributor

jreback commented Apr 7, 2014

hmmm....could enable slow ops on 2.6 (as it skips other things)....let me see

@TomAugspurger
Copy link
Contributor

OK. @sinhrks do you mind changing the asserts in _check_legend_labels to use assertIsTrue or assertIsFalse?

@sinhrks I didn't realize it was already broken before your PR . Could you change the one at the bottom of test_subplots in test_graphics.py also? It would be nice to make sure that the full test suite passes on all the supported versions of python if possible.

@jreback
Copy link
Contributor

jreback commented Apr 7, 2014

hmm..current plotting DOES seem ok on 2.6

see here: https://travis-ci.org/jreback/pandas/jobs/22479235 (this with matplotlib 1.3.1).

am thinking about putting this in master - ownly drawback as this now takes a fair amount of time more than the other runs (master is about 50% faster than these anyhow because of caching).

worth it?

@TomAugspurger
Copy link
Contributor

I'm not sure if it's worth it for this specific issue. The actual code is valid python 2.6+, it's just the tests that rely on 2.7+ methods. But if you have other reasons for running the slow ones on 2.6 then go for it.

@jreback
Copy link
Contributor

jreback commented Apr 8, 2014

@TomAugspurger no...going to pass on the 2.6 slow for now...doesn't seem worth it....

go ahead on this PR when you are ready

@sinhrks
Copy link
Member Author

sinhrks commented Apr 9, 2014

Sure, I've modified condition formula and remove assertIsNone.

Also, followings are changed:

  • test_subplots to cover bar plots tests
  • adding legend handle and label operation is refactored to _add_legend_handle

@@ -370,10 +370,12 @@ Bug Fixes
- Bug in ``DataFrame.replace()`` where changing a dtype through replacement
would only replace the first occurrence of a value (:issue:`6689`)
- Better error message when passing a frequency of 'MS' in ``Period`` construction (GH5332)
<<<<<<< HEAD
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This can be removed when you rebase.

@TomAugspurger
Copy link
Contributor

@sinhrks reviewing this today. Sorry it took so long. Would you mind rebasing on top of master? (last time, I promise) I'll get this one merged, then you can rebase your Area one on top of this and we can merge that too.

Could you also clean up the docs now that the legend API is so much better?
In doc/source/visualization.rst:

plt.figure(); ts.plot(style='k--', label='Series'); plt.legend()

You can pass legend=True in the plot call and remove the plt.legend()

I think you can remove the plt.legend() in the next one too:
plt.figure(); df.plot(); plt.legend(loc='best')

since loc=best is the default right?

The rest should be good. You could add a reversed example too if you want,
the docstring is pretty clear about that though, so no need.

@sinhrks
Copy link
Member Author

sinhrks commented Apr 22, 2014

@TomAugspurger Sure. Modified docs and rebased. I agree that reversed example in the doc is not necessary.

TomAugspurger pushed a commit that referenced this pull request Apr 22, 2014
BUG: legend behaves inconsistently when plotting to the same axes
@TomAugspurger TomAugspurger merged commit 583b4d4 into pandas-dev:master Apr 22, 2014
@TomAugspurger
Copy link
Contributor

Thanks! Ahh I may have been a bit quick on merging. I'll keep my eye on what Travis says https://travis-ci.org/pydata/pandas/builds/23507773

@TomAugspurger
Copy link
Contributor

Travis is all green: https://travis-ci.org/pydata/pandas/builds/23507773

@sinhrks sinhrks deleted the legend_pr branch April 25, 2014 22:57
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants