Skip to content

BUG: Pandas plot does not accept Columns for color keyword #44670

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
1 of 3 tasks
mosc9575 opened this issue Nov 29, 2021 · 6 comments · Fixed by #44856
Closed
1 of 3 tasks

BUG: Pandas plot does not accept Columns for color keyword #44670

mosc9575 opened this issue Nov 29, 2021 · 6 comments · Fixed by #44856
Labels
API - Consistency Internal Consistency of API/Behavior Visualization plotting
Milestone

Comments

@mosc9575
Copy link
Contributor

mosc9575 commented Nov 29, 2021

  • I have checked that this issue has not already been reported.

  • I have confirmed this bug exists on the latest version of pandas.

  • I have confirmed this bug exists on the master branch of pandas.

Reproducible Example

import pandas as pd

df = pd.DataFrame({
 'A': {1: 6, 2: 2, 3: 1, 4: 8, 5: 6},
 'B': {1: 2, 2: 10, 3: 3, 4: 1, 5: 1},
 'pos': {1: 1, 2: 2, 3: 4, 4: 1, 5: 1},
 'color': {1: 'red', 2: 'blue', 3: 'green', 4: 'red', 5: 'red'}
})

# df >>>
#    A   B  color
# 1  6   2    red
# 2  2  10   blue
# 3  1   3  green
# 4  8   1    red
# 5  6   1    red


df.plot(x='A', y='B', kind='scatter', color=df['color'].values)
# df.plot(x='A', y='B', kind='scatter', color='color')

Issue Description

I tried to plot a scatterplot with different groups, with one color for each group. I thought is is enought to pass the name of the column to the color-keyword, but this throws an error:

ValueError: 'c' argument must be a color, a sequence of colors, or a sequence of numbers, not ['color']

Expected Behavior

I expect this figure, which was generated using the commented line

s.plot(x='A', y='B', kind='scatter', color=s['color'].values)

grafik

Installed Versions

INSTALLED VERSIONS ------------------ commit : 945c9ed python : 3.9.7.final.0 python-bits : 64 OS : Linux OS-release : 5.10.0-8-amd64 Version : #1 SMP Debian 5.10.46-4 (2021-08-03) machine : x86_64 processor : x86_64 byteorder : little LC_ALL : en_US.UTF-8 LANG : en_US.UTF-8 LOCALE : en_US.UTF-8

pandas : 1.3.4
numpy : 1.20.3
pytz : 2021.1
dateutil : 2.8.2
pip : 21.2.4
setuptools : 58.0.4
Cython : 0.29.24
pytest : 6.2.5
hypothesis : None
sphinx : None
blosc : None
feather : None
xlsxwriter : None
lxml.etree : 4.6.3
html5lib : None
pymysql : None
psycopg2 : None
jinja2 : 3.0.1
IPython : 7.28.0
pandas_datareader: None
bs4 : None
bottleneck : None
fsspec : 2021.09.0
fastparquet : None
gcsfs : None
matplotlib : 3.4.3
numexpr : 2.7.3
odfpy : None
openpyxl : 3.0.7
pandas_gbq : None
pyarrow : None
pyxlsb : None
s3fs : None
scipy : 1.7.1
sqlalchemy : 1.4.23
tables : 3.6.1
tabulate : None
xarray : None
xlrd : None
xlwt : None
numba : 0.54.0
1

@mosc9575 mosc9575 added Bug Needs Triage Issue that has not been reviewed by a pandas team member labels Nov 29, 2021
@mosc9575
Copy link
Contributor Author

This is mentioned as a workaround in #40605.

@MarcoGorelli
Copy link
Member

MarcoGorelli commented Nov 29, 2021

I thought is is enought to pass the name of the column to the color-keyword

Is that documented anywhere? If not, I'd consider this as a feature request rather than a bug


Actually, the docs say

c str, int or array-like, optional
The color of each point. Possible values are:

A single color string referred to by name, RGB or RGBA code, for instance ‘red’ or ‘#a98d19’.

A sequence of color strings referred to by name, RGB or RGBA code, which will be used for each point’s color recursively. For instance [‘green’,’yellow’] all points will be filled in green or yellow, alternatively.

A column name or position whose values will be used to color the marker points according to a colormap.

https://pandas.pydata.org/docs/reference/api/pandas.DataFrame.plot.scatter.html

Does it work with c= rather than color=?

@mosc9575
Copy link
Contributor Author

mosc9575 commented Nov 29, 2021

Yes, the line df.plot(x='A', y='B', kind='scatter', c='color') creates the expected result. Thank your for the link to the documentation. Why is it named c instead of color? This is odd.

@mosc9575
Copy link
Contributor Author

mosc9575 commented Nov 29, 2021

This is at least a inconsistency, beacuse in barh and bar the keyword seem to be color and not c.

@MarcoGorelli MarcoGorelli added Visualization plotting and removed Bug Needs Triage Issue that has not been reviewed by a pandas team member labels Nov 29, 2021
@MarcoGorelli
Copy link
Member

Yes, it is inconsistent - I can't think of why color= should be disallowed.

I think I'd suggest to allow users to pass color=, but to let c= continue working (without documenting it, just to not break people's code). Do you want to submit a pull request?

@mosc9575
Copy link
Contributor Author

Yes, I'll come back and give a PR a try.

@rhshadrach rhshadrach added the API - Consistency Internal Consistency of API/Behavior label Nov 29, 2021
@rhshadrach rhshadrach added this to the Contributions Welcome milestone Nov 29, 2021
@github-actions github-actions bot added the Stale label Jan 11, 2022
@mroeschke mroeschke removed the Stale label Jan 17, 2022
@jreback jreback modified the milestones: Contributions Welcome, 1.5 Apr 7, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
API - Consistency Internal Consistency of API/Behavior Visualization plotting
Projects
None yet
Development

Successfully merging a pull request may close this issue.

5 participants