Skip to content

Bug: different colors from plot and legend using groupby and unstack. #19544

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
pesap opened this issue Feb 6, 2018 · 3 comments
Open

Bug: different colors from plot and legend using groupby and unstack. #19544

pesap opened this issue Feb 6, 2018 · 3 comments
Labels

Comments

@pesap
Copy link

pesap commented Feb 6, 2018

Code Sample, a copy-pastable example if possible

import pandas as pd
import numpy as np
import matplotlib.pyplot as plt


time = pd.Timestamp('2017-01-01')

x = np.arange(0,1000)
y = x**2
d = {'A' : x, 'B' : y}
df = pd.DataFrame(d)

df['cat_a'] = np.random.choice([0, 1, 2], 1000)
df['cat_b'] = np.random.choice(['A','B', 'C'], 1000)
df['cat_c'] = np.random.choice([100,200, 300], 1000)
df['step'] = np.arange(0,1000)
df['time'] = time
df = df.set_index(['time','cat_a', 'cat_b','cat_c','step'])
fig, ax = plt.subplots()
(df.query('cat_a == 2 & cat_b == "C" & cat_c == [200,300]')
    .groupby(['cat_c', 'step']).mean()
    .unstack(0).plot(x='A', y='B', ax=ax))

Problem description

I have a DataFrame like the one above. I use the groupby function to get the average of one specific category and a column called 'step' that is the same between different categories. When I try to plot the query results I used the unstack function to get the index to columns and then plot each column. This works, but the colors of the figure do not correspond to the ones on the label.
A quick fix is to set each color manually.

bug

Expected Output

Same colors from legend.

Output of pd.show_versions()

INSTALLED VERSIONS

commit: None
python: 3.6.4.final.0
python-bits: 64
OS: Darwin
OS-release: 17.4.0
machine: x86_64
processor: i386
byteorder: little
LC_ALL: None
LANG: en_US.UTF-8
LOCALE: en_US.UTF-8

pandas: 0.22.0
pytest: None
pip: 9.0.1
setuptools: 38.5.0
Cython: None
numpy: 1.14.0
scipy: None
pyarrow: None
xarray: None
IPython: 6.2.1
sphinx: None
patsy: None
dateutil: 2.6.1
pytz: 2017.3
blosc: None
bottleneck: None
tables: None
numexpr: None
feather: None
matplotlib: 2.1.2
openpyxl: None
xlrd: None
xlwt: None
xlsxwriter: None
lxml: None
bs4: None
html5lib: 1.0.1
sqlalchemy: None
pymysql: None
psycopg2: None
jinja2: 2.10
s3fs: None
fastparquet: None
pandas_gbq: None
pandas_datareader: None

@TomAugspurger
Copy link
Contributor

Hmm, your example raises on master, due to #18695, since you're passing multiple columns to x and y, when they're supposed to be single columns.

@pesap
Copy link
Author

pesap commented Feb 6, 2018

Is there a better way to it or avoid this it?

@TomAugspurger
Copy link
Contributor

Not sure yet, haven't had a chance to look closer. Let me know if you find a workaround before I do.

@jbrockmendel jbrockmendel added the Visualization plotting label Jul 30, 2018
@mroeschke mroeschke added the Bug label Jun 18, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

4 participants