Skip to content

Default behavior of plot.bar changed to plot a different color for each bar #20585

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
Deborah-Digges opened this issue Apr 2, 2018 · 9 comments · Fixed by #24504
Closed

Default behavior of plot.bar changed to plot a different color for each bar #20585

Deborah-Digges opened this issue Apr 2, 2018 · 9 comments · Fixed by #24504
Labels
Milestone

Comments

@Deborah-Digges
Copy link

Deborah-Digges commented Apr 2, 2018

Code Sample, a copy-pastable example if possible

BEFORE

df_india.groupby('sector').id.count().sort_values().plot.bar();

would result in a bar plot with all bars having the same color

AFTER version 0.20

df_india.groupby('sector').id.count().sort_values().plot.bar(color='cornflowerblue');

I need to explicitly provide a color, else each bar is colored differently

Minimal reproducible code:

import pandas as pd
df = pd.DataFrame({'account-start': ['2017-02-03', '2017-03-03', '2017-01-01'],
                   'client': ['Alice Anders', 'Bob Baker', 'Charlie Chaplin'],
                   'balance': [-1432.32, 10.43, 30000.00],
                   'db-id': [1234, 2424, 251],
                   'proxy-id': [525, 1525, 2542],
                   'rank': [52, 525, 32],
                   })
df.client.value_counts().plot.bar()

Output:

download

Problem description

Why was the behavior of Series.plot.bar changed to plot bars with different color? Visually these colors add nothing to the plot as different colors should only be used when they correspond to differences of meaning in the data.

Why is the default behavior to provide an unnecessarily visually overwhelming graph? It took me some time to realize why my bars suddenly started acting strangely. Now, I pass color='cornflowerblue' to get all my bars the same pleasant hue.

Reference for visual appeal and the use of colors: http://www.perceptualedge.com/articles/visual_business_intelligence/rules_for_using_color.pdf

Would it be possible to revert to the pre-0.20 behavior?

Output of pd.show_versions()

[paste the output of pd.show_versions() here below this line]

INSTALLED VERSIONS

commit: None
python: 3.6.4.final.0
python-bits: 64
OS: Linux
OS-release: 4.9.0-5-amd64
machine: x86_64
processor:
byteorder: little
LC_ALL: C.UTF-8
LANG: C.UTF-8
LOCALE: en_US.UTF-8

pandas: 0.22.0
pytest: 3.2.1
pip: 9.0.1
setuptools: 38.5.2
Cython: 0.26.1
numpy: 1.14.2
scipy: 1.0.0
pyarrow: 0.8.0
xarray: 0.10.1
IPython: 6.1.0
sphinx: 1.6.3
patsy: 0.4.1
dateutil: 2.6.0
pytz: 2018.3
blosc: None
bottleneck: 1.2.1
tables: 3.4.2
numexpr: 2.6.2
feather: 0.4.0
matplotlib: 2.1.0
openpyxl: 2.4.8
xlrd: 1.1.0
xlwt: 1.3.0
xlsxwriter: 1.0.2
lxml: 4.1.0
bs4: 4.6.0
html5lib: 0.9999999
sqlalchemy: 1.1.13
pymysql: None
psycopg2: None
jinja2: 2.9.6
s3fs: None
fastparquet: None
pandas_gbq: None
pandas_datareader: 0.6.0

@TomAugspurger
Copy link
Contributor

Do you have a reproducible example? This sounds like something that's been fixed, but it's hard to say without an example http://matthewrocklin.com/blog/work/2018/02/28/minimal-bug-reports

@TomAugspurger
Copy link
Contributor

Looks like a duplicate of #18394

@TomAugspurger TomAugspurger added the Duplicate Report Duplicate issue or pull request label Apr 2, 2018
@TomAugspurger TomAugspurger added this to the No action milestone Apr 2, 2018
@Deborah-Digges
Copy link
Author

Hi @TomAugspurger ! Thanks for replying. It's reproducible on version 0.22.0 of pandas. Is it fixed in a later version?

Minimal reproducible code:

import pandas as pd
df = pd.DataFrame({'account-start': ['2017-02-03', '2017-03-03', '2017-01-01'],
                   'client': ['Alice Anders', 'Bob Baker', 'Charlie Chaplin'],
                   'balance': [-1432.32, 10.43, 30000.00],
                   'db-id': [1234, 2424, 251],
                   'proxy-id': [525, 1525, 2542],
                   'rank': [52, 525, 32],
                   })
df.client.value_counts().plot.bar()

Output:

download

@Deborah-Digges
Copy link
Author

Deborah-Digges commented Apr 2, 2018

@TomAugspurger This isn't really a duplicate of the mentioned issue. The other one was caused as a result of this, but is not complaining of the same aesthetic appeal issue I refer to above. It is more complaining about functionality (logy=true) not working due to this new change. Hence, I believe it is a different issue

@Deborah-Digges
Copy link
Author

I think they are related, but not the same issue. The issue I raise here is really one of visual design rather than broken functionality

@TomAugspurger
Copy link
Contributor

Sorry about that, wasn't reading closely enough.

Thanks for the example. Could you edit your original post to include the example?

Are you interested in submitting a pull request to fix this?

@TomAugspurger TomAugspurger reopened this Apr 2, 2018
@TomAugspurger TomAugspurger added Visualization plotting and removed Duplicate Report Duplicate issue or pull request labels Apr 2, 2018
@TomAugspurger TomAugspurger modified the milestones: No action, 0.23.0 Apr 2, 2018
@Deborah-Digges
Copy link
Author

no problem @TomAugspurger! I've updated the original post to include the example. I'd love to submit a PR for this as it's something that's been personally bothering me a lot :D

However, I'm a bit new to the codebase so it may take a little while before I get the hang of stuff. I'll start digging around to see which portions of the code I need to touch for this. Is there a particular portion of the codebase you'd suggest I look at?

@TomAugspurger
Copy link
Contributor

Thanks! The plotting code is a bit tricky, but hopefully not too bad.

I think the issue is lies around either

I notice that _get_standard_colors(num_colors=1) doesn't return a single color, which I think is surprising

(Pdb) _get_standard_colors(num_colors=1)
['#1f77b4', '#ff7f0e', '#2ca02c', '#d62728', '#9467bd', '#8c564b', '#e377c2', '#7f7f7f', '#bcbd22', '#17becf', '#1f77b4']

So I'd say look through that and see if there's something fishy. Let me know if you get stuck, I haven't looked closely at it.

If _get_standard_colors is behaving correctly, then we may need to override _get_colors in BarPlot. But let's hold off on that till we know if ther'es a bug in _get_standard_colors.

@jreback jreback modified the milestones: 0.23.0, Next Major Release Apr 14, 2018
@atulagrwl
Copy link

This definitely looks like a bug in _get_standard_colors specifically in

    if len(colors) != num_colors:
        try:
            multiple = num_colors // len(colors) - 1
        except ZeroDivisionError:
            raise ValueError("Invalid color argument: ''")
        mod = num_colors % len(colors)

        colors += multiple * colors
        colors += colors[:mod]

It is trying to increase the size of colors array in case number of items are less but does not handle case when size of colors array is larger than number of items asked. The side effect can be seen by calling (for any number less than 10 and result is 10 + number)

_get_standard_colors(num_colors=9)
Out[7]: 
['#1f77b4',
 '#ff7f0e',
 '#2ca02c',
 '#d62728',
 '#9467bd',
 '#8c564b',
 '#e377c2',
 '#7f7f7f',
 '#bcbd22',
 '#17becf',
 '#1f77b4',
 '#ff7f0e',
 '#2ca02c',
 '#d62728',
 '#9467bd',
 '#8c564b',
 '#e377c2',
 '#7f7f7f',
 '#bcbd22']

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
4 participants