-
-
Notifications
You must be signed in to change notification settings - Fork 18.4k
BUG: Fix scatter plot colors in groupby context to match line plot behavior (#59846) #61233
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
6a50022
to
183a686
Compare
pandas/core/groupby/groupby.py
Outdated
def f(self): | ||
return self.plot(*args, **kwargs) | ||
# Special case for scatter plots to enable automatic colors in groupby context | ||
if kwargs.get("kind") == "scatter": |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
From the issue report, this plot works with kine="line"
. Why do we need all this logic for scatter
but it just work for line
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You are right - I was trying to solve the issue as an edge case, on top of the piece that was causing this inconsistency of coloring between ScatterPlot and LinePlot.
The source of the issue is that matplotlib handles ploting both line and scatter plots. Matplotlib has its own way of color-cycling when no color is defined. Here is an example:
plt.plot([1, 2, 3], [1, 2, 3])
plt.plot([4, 5, 6], [4, 5, 6])
plt.plot([7, 8, 9], [7, 8, 9])
# This results in having 3 different colors, one for each of the plots above
Similarly
plt.scatter(0, 0)
plt.scatter(1, 1)
plt.scatter(2, 2)
# This results in having 3 different colors, one for each of the points above
However, the reason for this inconsistency between ScatterPlot
and LinePlot
behaviours is that LinePlot doesn't pass any color in kwds
to ax.plot
when calling
df.groupby("layer").plot(x='x', y='y', ax= plt.gca(), kind='line')
While ScatterPlot explicitly defines the c=
argument in ax.scatter
when calling
df.groupby("layer").plot(x='x', y='y', ax= plt.gca(), kind='scatter')
The solution for the source issue is to set c_values = None
when no color is passed to ScatterPlot
; When self.c is None and self.color is None
a8bb0f3
to
ce37450
Compare
…59846 # Conflicts: # doc/source/whatsnew/v3.0.0.rst
doc/source/whatsnew/vX.X.X.rst
file if fixing a bug or adding a new feature.