Skip to content

Styling groupby boxplots #5263

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
waltonjones opened this issue Oct 19, 2013 · 4 comments
Closed

Styling groupby boxplots #5263

waltonjones opened this issue Oct 19, 2013 · 4 comments

Comments

@waltonjones
Copy link

related #4264

The normal matplotlib boxplot command in Python returns a dictionary with keys for the boxes, median, whiskers, fliers, and caps. This makes styling really easy. Pandas.groupby boxplots, however, return an AxesSubplot object. This makes styling the plots more difficult.

I posted this question recently on Stack Overflow and eventually came to this solution. I hope it will be useful to others.

from numpy.random import rand
import matplotlib.pyplot as plt
import pandas as pd

#2 columns produces an array of 2 matplotlib.axes.AxesSubplot objects
df2 = pd.DataFrame(rand(10,2), columns=['Col1', 'Col2'] )
df2['X'] = pd.Series(['A','B','A','B','A','B','A','B','A','B'])

#1 column produces a single matplotlib.axes.AxesSubplot object
df1 = pd.DataFrame(rand(10), columns=['Col1'] )
df1['X'] = pd.Series(['A','B','A','B','A','B','A','B','A','B'])

def stylable_groupby_boxplot(df, by):
    '''
    If you plot only one column, boxplot returns a single AxesSubplot object.
    If there are several columns, boxplot returns an array of several AxesSubplot objects.
    '''
    bp = df.boxplot(by=by, grid=False)
    bptype = str(type(bp))
    if bptype == "<class 'matplotlib.axes.AxesSubplot'>":
        cl = bp.get_children()
        cl=[item for item in cl if isinstance(item, matplotlib.lines.Line2D)]
        bpdict = {}
        groups = df.groupby(by).groups.keys()
        for i in range(len(groups)):
            bpdict[groups[i]] = {'boxes': [], 'caps': [], 'fliers': [], 'medians': [], 'whiskers': []}
            bpdict[groups[i]]['boxes'] = [cl[4+8*i]]
            bpdict[groups[i]]['caps'] = [cl[2+8*i], cl[3+8*i]]
            bpdict[groups[i]]['fliers'] = [cl[6+8*i], cl[7+8*i]]
            bpdict[groups[i]]['medians'] = [cl[5+8*i]]
            bpdict[groups[i]]['whiskers'] = [cl[0+8*i], cl[1+8*i]]
    else:
        bpdict = {}
        groups = df.groupby(by).groups.keys()
        keys = range(len(bp))
        for i in keys:
            bpdict[keys[i]] = {}
            cl = bp[i].get_children()
            cl=[item for item in cl if isinstance(item, matplotlib.lines.Line2D)]
            for j in range(len(groups)):
                bpdict[keys[i]][groups[j]] = {'boxes': [], 'caps': [], 'fliers': [], 'medians': [], 'whiskers': []}
                bpdict[keys[i]][groups[j]]['boxes'] = [cl[4+8*j]]
                bpdict[keys[i]][groups[j]]['caps'] = [cl[2+8*j], cl[3+8*j]]
                bpdict[keys[i]][groups[j]]['fliers'] = [cl[6+8*j], cl[7+8*j]]
                bpdict[keys[i]][groups[j]]['medians'] = [cl[5+8*j]]
                bpdict[keys[i]][groups[j]]['whiskers'] = [cl[0+8*j], cl[1+8*j]]
    return bpdict

bp2 = stylable_groupby_boxplot(df2, by="X")
bp1 = stylable_groupby_boxplot(df1, by="X")


#2 column styling
plt.suptitle("")
plt.setp(bp2[0]['A']['boxes'], color='blue')
plt.setp(bp2[0]['A']['medians'], color='red')
plt.setp(bp2[0]['A']['whiskers'], color='blue')
plt.setp(bp2[0]['A']['fliers'], color='blue')
plt.setp(bp2[0]['A']['caps'], color='blue')
plt.setp(bp2[0]['B']['boxes'], color='red')
plt.setp(bp2[0]['B']['medians'], color='blue')
plt.setp(bp2[0]['B']['whiskers'], color='red')
plt.setp(bp2[0]['B']['fliers'], color='red')
plt.setp(bp2[0]['B']['caps'], color='red')
plt.setp(bp2[1]['A']['boxes'], color='green')
plt.setp(bp2[1]['A']['medians'], color='purple')
plt.setp(bp2[1]['A']['whiskers'], color='green')
plt.setp(bp2[1]['A']['fliers'], color='green')
plt.setp(bp2[1]['A']['caps'], color='green')
plt.setp(bp2[1]['B']['boxes'], color='purple')
plt.setp(bp2[1]['B']['medians'], color='green')
plt.setp(bp2[1]['B']['whiskers'], color='purple')
plt.setp(bp2[1]['B']['fliers'], color='purple')
plt.setp(bp2[1]['B']['caps'], color='purple')

#1 column styling
plt.suptitle("")
plt.setp(bp1['A']['boxes'], color='blue')
plt.setp(bp1['A']['medians'], color='red')
plt.setp(bp1['A']['whiskers'], color='blue')
plt.setp(bp1['A']['fliers'], color='blue')
plt.setp(bp1['A']['caps'], color='blue')
plt.setp(bp1['B']['boxes'], color='red')
plt.setp(bp1['B']['medians'], color='blue')
plt.setp(bp1['B']['whiskers'], color='red')
plt.setp(bp1['B']['fliers'], color='red')
plt.setp(bp1['B']['caps'], color='red')
@jreback
Copy link
Contributor

jreback commented Feb 18, 2014

do you want to make a pull-request of this?

@waltonjones
Copy link
Author

I don't have time at the moment to figure out how to fit this addition into the original pandas boxplot code for a proper pull request. That is why I just wrapped the original function. Any faster/easier way to proceed?

@jreback
Copy link
Contributor

jreback commented Feb 19, 2014

ok.....when you do have a chance this issue will be waiting

@jreback jreback modified the milestones: 0.15.0, 0.14.0 Feb 19, 2014
@jreback jreback modified the milestones: 0.16.0, Next Major Release Mar 3, 2015
@TomAugspurger
Copy link
Contributor

I think we have apply and return_type=diet to handle this

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants