dataframe.plot accepts undocumented 'by=' keyword but ignores it. #9274

cswarth · 2015-01-16T20:13:41Z

Maybe this is a request for enhancement, or documentation of a bug - I don't know.

A tutorial showed that DataFrame.boxplot() can take a by= keyword to produced plot stratified by the values in one column. I naively assumed I could apply that to other plots as well. DataFrame.plot() accepts by= but seems to silently ignores it. This seems to be an undocumented keyword for DataFrame.plot().

Is this intentional? Why not include the groupby functionality from boxplot into plot so it can do something reasonable with all the other plot types as well? If not, then why does plot accept the by= keyword?

Example code below,

import pandas as pd
pd.show_versions()

plt.rcParams.update(pd.tools.plotting.mpl_stylesheet)

# taken from http://pandas.pydata.org/pandas-docs/stable/visualization.html#box-plots
df = pd.DataFrame(rand(10,2), columns=['Col1', 'Col2'] )
df['X'] = pd.Series(['A','A','A','A','A','B','B','B','B','B'])
plt.figure();
print(df)

# boxplot produces two subplots segregated by the values in X
df.boxplot(by='X')

# Other kinds of plots seem to silently ignore 'by='
df.plot(kind='box', by='X')

# Other kinds of plots seem to silently ignore 'by='
df.plot(kind='scatter', x=0, y=1, by='X')

# Other kinds of plots seem to silently ignore 'by='
df.plot(kind='line', by='X')

INSTALLED VERSIONS
------------------
commit: None
python: 2.7.6.final.0
python-bits: 64
OS: Linux
OS-release: 3.13.0-32-generic
machine: x86_64
processor: x86_64
byteorder: little
LC_ALL: None
LANG: en_US.UTF-8

pandas: 0.15.2
nose: 1.3.4
Cython: 0.20.1post0
numpy: 1.9.1
scipy: 0.13.3
statsmodels: 0.6.0
IPython: 2.3.1
sphinx: 1.2.3
patsy: 0.3.0
dateutil: 2.4.0
pytz: 2014.10
bottleneck: None
tables: 3.1.1
numexpr: 2.2.2
matplotlib: 1.4.2
openpyxl: 1.8.6
xlrd: 0.9.3
xlwt: 0.7.5
xlsxwriter: None
lxml: 3.3.3
bs4: 4.3.2
html5lib: 0.999
httplib2: 0.9
apiclient: None
rpy2: 2.5.2
sqlalchemy: 0.8.4
pymysql: None
psycopg2: None
       Col1      Col2  X
0  0.561433  0.329668  A
1  0.502967  0.111894  A
2  0.607194  0.565945  A
3  0.006764  0.617442  A
4  0.912123  0.790524  A
5  0.992081  0.958802  B
6  0.791964  0.285251  B
7  0.624917  0.478094  B
8  0.195675  0.382317  B
9  0.053874  0.451648  B

The text was updated successfully, but these errors were encountered:

TomAugspurger · 2015-01-16T20:42:14Z

DataFrame.boxplot was united under DataFrame.plot() with the kind='boxplot' keyword argument a while back. boxplot was the only plot type to ever support faceting (with the by keyword).

There aren't any plans to add faceting for other plot types right now. Other libraries like seaborn or ggplot do it and handle DataFrames nicely.

A pull request to document the by keyword, noting that it only has an effect with kind=boxplot would be welcome! There's an issue here about validating keyword arguments to df.plot and warning / raising when things are unused.

shoyer · 2015-01-16T20:43:50Z

This is a hazard of our current API design, which uses a single plot method to dispatch to many different underlying plot types. The ultimate fix is probably to encourage more specific methods like df.plot.scatter() rather than df.plot(). See #9124 for more discussion of that design.

cswarth · 2015-01-16T20:45:15Z

Thanks Tom, I'll dig into the code to see how it's used and document the 'by=' keyword.

TomAugspurger added Visualization plotting Docs labels Jan 16, 2015

TomAugspurger added this to the 0.16.0 milestone Jan 16, 2015

jreback modified the milestones: 0.16.0, Next Major Release Mar 5, 2015

ischurov mentioned this issue Jan 7, 2017

DataFrame.plot.box ignores by argument #15079

Closed

datapythonista mentioned this issue Sep 10, 2019

Fix by parameter in pandas plotting python-sprints/pandas-mentoring#165

Open

mroeschke removed this from the Contributions Welcome milestone Oct 13, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

dataframe.plot accepts undocumented 'by=' keyword but ignores it. #9274

dataframe.plot accepts undocumented 'by=' keyword but ignores it. #9274

cswarth commented Jan 16, 2015

TomAugspurger commented Jan 16, 2015

shoyer commented Jan 16, 2015

cswarth commented Jan 16, 2015

dataframe.plot accepts undocumented 'by=' keyword but ignores it. #9274

dataframe.plot accepts undocumented 'by=' keyword but ignores it. #9274

Comments

cswarth commented Jan 16, 2015

TomAugspurger commented Jan 16, 2015

shoyer commented Jan 16, 2015

cswarth commented Jan 16, 2015