Skip to content

Violin plots #2116

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 26 commits into from
Nov 1, 2017
Merged
Show file tree
Hide file tree
Changes from 1 commit
Commits
Show all changes
26 commits
Select commit Hold shift + click to select a range
c578cde
replace gd.numboxes by fullLayout._numBoxes
etpinard Oct 24, 2017
8f34227
replace 'emptybox' with 'empty' in box calc item
etpinard Oct 24, 2017
c8f38ff
factor out box defaults and boxpoint plot methods
etpinard Oct 24, 2017
3438eae
first cut violin
etpinard Oct 24, 2017
d779509
first cut violin mocks
etpinard Oct 24, 2017
bb252a5
Merge branch 'master' into violins-dev
etpinard Oct 30, 2017
ea43b25
rename 'box' category 'box-violin'
etpinard Oct 31, 2017
a706f2a
factor out box/whiskers and mean/sd plotting routine
etpinard Oct 31, 2017
1eb453b
split box hover into onBoxes and onPoints routines
etpinard Oct 31, 2017
7769f20
implement violinmode, violingroup and violingroupgap
etpinard Oct 31, 2017
ad51966
2nd cut violin calc/plot attributes + improve violin curve paths
etpinard Oct 31, 2017
4a40fc7
add findPointOnPath geometry2d util function
etpinard Oct 31, 2017
6ffc379
implement violin 'inner' style options
etpinard Oct 31, 2017
dfa918f
pass hoverlayer to trace module hoverPoints
etpinard Oct 31, 2017
bc6bc02
implement violin hover
etpinard Oct 31, 2017
e737664
2nd cut violin mocks
etpinard Oct 31, 2017
e625c44
1st cut violin jasmine tests
etpinard Oct 31, 2017
14bded3
fill violin attribute descriptions + fix typo in comment
etpinard Oct 31, 2017
48758d8
fixup scalemode 'count' calculation
etpinard Oct 31, 2017
17f65d0
add inner box and mean line to side-by-side violin mock
etpinard Oct 31, 2017
71700de
2nd cut violin jasmine tests
etpinard Oct 31, 2017
677aacc
remove 'kernel' from violin attributes
etpinard Nov 1, 2017
0a32b98
update box and meanline attribute syntax
etpinard Nov 1, 2017
3c9e0a0
add violin style mock
etpinard Nov 1, 2017
ea66dea
add editType to new violin attr containers
etpinard Nov 1, 2017
789121c
update violin layout attr descriptions
etpinard Nov 1, 2017
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
3 changes: 2 additions & 1 deletion lib/index-cartesian.js
Original file line number Diff line number Diff line change
Expand Up @@ -19,7 +19,8 @@ Plotly.register([
require('./histogram2dcontour'),
require('./pie'),
require('./contour'),
require('./scatterternary')
require('./scatterternary'),
require('./violin')
]);

module.exports = Plotly;
7 changes: 5 additions & 2 deletions lib/index.js
Original file line number Diff line number Diff line change
Expand Up @@ -21,7 +21,7 @@ Plotly.register([
require('./pie'),
require('./contour'),
require('./scatterternary'),
require('./sankey'),
require('./violin'),

require('./scatter3d'),
require('./surface'),
Expand All @@ -34,10 +34,13 @@ Plotly.register([
require('./pointcloud'),
require('./heatmapgl'),
require('./parcoords'),
require('./table'),

require('./scattermapbox'),

require('./sankey'),

require('./table'),

require('./carpet'),
require('./scattercarpet'),
require('./contourcarpet'),
Expand Down
11 changes: 11 additions & 0 deletions lib/violin.js
Original file line number Diff line number Diff line change
@@ -0,0 +1,11 @@
/**
* Copyright 2012-2017, Plotly, Inc.
* All rights reserved.
*
* This source code is licensed under the MIT license found in the
* LICENSE file in the root directory of this source tree.
*/

'use strict';

module.exports = require('../src/traces/violin');
9 changes: 2 additions & 7 deletions src/plots/cartesian/constants.js
Original file line number Diff line number Diff line change
Expand Up @@ -57,18 +57,13 @@ module.exports = {
DFLTRANGEX: [-1, 6],
DFLTRANGEY: [-1, 4],

// Layers to keep trace types in the right order.
// from back to front:
// 1. heatmaps, 2D histos and contour maps
// 2. bars / 1D histos
// 3. errorbars for bars and scatter
// 4. scatter
// 5. box plots
// Layers to keep trace types in the right order
traceLayerClasses: [
'imagelayer',
'maplayer',
'barlayer',
'carpetlayer',
'violinlayer',
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Making sure violins are under boxes always.

'boxlayer',
'scatterlayer'
],
Expand Down
1 change: 1 addition & 0 deletions src/plots/plots.js
Original file line number Diff line number Diff line change
Expand Up @@ -2160,6 +2160,7 @@ plots.doCalcdata = function(gd, traces) {

// how many box/violins plots do we have (in case they're grouped)
fullLayout._numBoxes = 0;
fullLayout._numViolins = 0;

// for calculating avg luminosity of heatmaps
gd._hmpixcount = 0;
Expand Down
3 changes: 2 additions & 1 deletion src/traces/box/plot.js
Original file line number Diff line number Diff line change
Expand Up @@ -161,7 +161,8 @@ function plotPoints(sel, plotinfo, trace, t) {
var bdPos = t.bdPos;
var bPos = t.bPos;

var mode = trace.boxpoints;
// TODO ... unfortunately
var mode = trace.boxpoints || trace.points;
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We might want to rename boxpoints -> points as well as boxmean (maybe to showmean + showsd ?)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I thought about combining display attributes for box, mean line and standard dev line inside the violins in one flaglist attributes e.g. stats: 'box+mean+sd' or innermode: /* */, but when considering #1774 it might be best to have separate booleans e.g. showbox, showmean, showsd to keep things consisting with features like axis line and grids

So, unless someone opposes:

showmeanline: true,
meanlinecolor: 'blue',
meanlinewidth: 1,
meanlinedash: 'dot'

showbox: true,
boxlinecolor: 'black'
boxlinewidth: 2,
boxfillcolor: 'red'

Copy link
Contributor Author

@etpinard etpinard Oct 31, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done in 6ffc379 with the help of Lib.findPointOnPath (for meanline) added in 4a40fc7


// repeatable pseudorandom number generator
seed();
Expand Down
9 changes: 6 additions & 3 deletions src/traces/box/set_positions.js
Original file line number Diff line number Diff line change
Expand Up @@ -12,13 +12,14 @@ var Registry = require('../../registry');
var Axes = require('../../plots/cartesian/axes');
var Lib = require('../../lib');


module.exports = function setPositions(gd, plotinfo) {
var fullLayout = gd._fullLayout;
var xa = plotinfo.xaxis;
var ya = plotinfo.yaxis;
var orientations = ['v', 'h'];

// TODO figure this out
// should violins and boxes share 'num' fields?
var numKey = '_numBoxes';

var posAxis, i, j, k;
Expand Down Expand Up @@ -84,8 +85,10 @@ module.exports = function setPositions(gd, plotinfo) {
gd.calcdata[boxListIndex][0].t.dPos = dPos;
}

var gap = fullLayout.boxgap;
var groupgap = fullLayout.boxgroupgap;
// TODO this won't work when both boxes and violins are present
// on same graph
var gap = fullLayout.boxgap || fullLayout.violingap;
var groupgap = fullLayout.boxgroupgap || fullLayout.violingroupgap;

// autoscale the x axis - including space for points if they're off the side
// TODO: this will overdo it if the outermost boxes don't have
Expand Down
113 changes: 113 additions & 0 deletions src/traces/violin/attributes.js
Original file line number Diff line number Diff line change
@@ -0,0 +1,113 @@
/**
* Copyright 2012-2017, Plotly, Inc.
* All rights reserved.
*
* This source code is licensed under the MIT license found in the
* LICENSE file in the root directory of this source tree.
*/

'use strict';

var boxAttrs = require('../box/attributes');
var scatterAttrs = require('../scatter/attributes');
var extendFlat = require('../../lib/extend').extendFlat;

module.exports = {
y: boxAttrs.y,
x: boxAttrs.x,
x0: boxAttrs.x0,
y0: boxAttrs.y0,
name: boxAttrs.name,
orientation: extendFlat({}, boxAttrs.orientation, {
description: [
'Sets the orientation of the violin(s).',
'If *v* (*h*), the distribution is visualized along',
'the vertical (horizontal).'
].join(' ')
}),

bandwidth: {
valType: 'number',
min: 0,
role: 'info',
editType: 'plot',
description: [
'Sets the bandwidth used to compute the kernel density estimate.',
'By default, the bandwidth is determined by Silverman\'s rule of thumb.'
].join(' ')
Copy link
Contributor Author

@etpinard etpinard Oct 24, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No hard default for bandwidth. This rule-of-thumb that depends on the sample length and standard deviation seems pretty popular in other libraries.

},
scaleby: {
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

inspired by seaborn's scale attribute.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Replaced by scalegroup (taken from pie traces) and scalemode in ad51966

valType: 'enumerated',
values: ['width', 'area', 'count'],
dflt: 'width',
role: 'info',
editType: 'calc',
description: [
'Sets the method by which the width of each violin is determined.',
'*width* means each violin has the same (max) width',
'*area* means each violin has the same area',
'*count* means the violins are scaled by the number of sample points making',
'up each violin.'
].join('')
},
span: {
valType: 'info_array',
items: [
{valType: 'any', editType: 'plot'},
{valType: 'any', editType: 'plot'}
],
role: 'info',
editType: 'plot',
description: [
'Sets the span in data space for which the density function will be computed.',
'By default, the span goes from the minimum value to maximum value in the sample.'
].join(' ')
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To address, @alexcjohnson 's concern (from a private convo):

mmm, looking through example images I don’t actually see the behavior I was imagining… might be something to play with at some point but not for now, unless some standard existing package does it.

One thing I see a lot in examples that seems to me a bad idea is truncating the violin at value where it has a finite width - particularly if that value is actually a data value. Unfortunately one of the examples I see that does exactly that is on our own site… https://plot.ly/python/basic-statistics/ out[5]

unless there’s some physical limit to the variable… that seems like it would justify doing this, but aren’t you still throwing away area (and therefore visual weight) from the points at the end?

at any rate, seems like if one is going to truncate like that it should be done explicitly, and should not be the default.

Ideally I’d like you to only be able to truncate by value (per the argument about physical limits) rather than “truncate at the ends of the data range” but I imagine people will complain if they can’t do both…

Copy link
Contributor Author

@etpinard etpinard Oct 24, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To note, span was inspired seaborn's cut argument and ggplot trim,

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I love that this is two numbers (or date strings, I guess), one for each end. - that's way more flexible than seaborn's one number or ggplot2's boolean. I also love that it's data values, rather than a delta from the ends of the distribution. Between the two of those you could do what seems to be really the right thing with seaborn's cut example and lop off unphysical values < 0 but not restrict the upper end.

Ideally I would prefer if the default were no clipping at all (which, as discussed, probably means data bounds extended by 2 or 3 times the bandwidth), rather than clipped at the data bounds. The challenge then is to make it easy for people who do want to clip tightly to the data bounds - particularly if we think about the distribution changing over time, having to manually update the bounds for either of these cases seems awkward.

I suppose we could define two special values of span[i] - one for "clip tightly to this end of the data" and another for "do not clip this end"? 'tight' and 'loose' perhaps? Alternatively there could be another attribute for this, which would be nice as we wouldn't need to mix numbers and strings (or whatever the special value is) but it seems tricky to cover all cases this way, like if you want the low end tight and the upper end loose.

Copy link
Contributor Author

@etpinard etpinard Oct 26, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I suppose we could define two special values of span[i]

That would be dangerous, as we want the span items to be set in d coordinates. So a violin trace with 'tight' and 'loose' categories would totally break this.

@alexcjohnson 's 'tight' and 'loose' suggestions made me think of #1876 where the specs axis.bounds and axis.boundsmode are written down.

For consistency, I'm thinking about adding a spanmode attribute alongside span with possible values 'hard', 'soft', 'manual'

{
   spanmode: 'soft',
   span: [null, 10]
   // where the null means pick the 'soft' default value
   // i.e. data min minus 2 bandwidths
}

{
   spanmode: 'hard',
   span: [0, null]
   // where the null means pick the 'hard' default value
   // i.e. the data max.
}

We could also add 'tight-soft' and 'soft-tight' spanmode values to allow users to pick hard and soft defaults for each ends without having to set span at all.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done in ad51966

},
side: {
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

to remake, seaborn's

image

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nice! also 'top' and 'bottom' for horizontal violins.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As of ad51966, side is now an enumerated with values 'both', 'negative' and 'positive' so that the same set of values work with horizontal and vertical violins.

valType: 'enumerated',
values: ['both', 'left', 'right'],
dflt: 'both',
role: 'info',
editType: 'plot',
description: [
'Determines which side of the position line the density function making up',
'one half of a is plotting.',
'Useful when comparing two violin traces under *overlay* mode, where one trace.'
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

where one trace what?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Improved description in ad51966

].join(' ')
},

// TODO update description
points: boxAttrs.boxpoints,
jitter: boxAttrs.jitter,
pointpos: boxAttrs.pointpos,
marker: boxAttrs.marker,
text: boxAttrs.text,

// TODO need attribute(s) similar to 'boxmean' to toggle lines for:
// - mean
// - median
// - std
// - quartiles

line: {
color: {
valType: 'color',
role: 'style',
editType: 'style',
description: 'Sets the color of line bounding the violin(s).'
},
width: {
valType: 'number',
role: 'style',
min: 0,
dflt: 2,
editType: 'style',
description: 'Sets the width (in px) of line bounding the violin(s).'
},
smoothing: scatterAttrs.line.smoothing,
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is smoothing needed? Seems weird, especially since we're just kind of making up points for the path.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

line.smoothing was 🔪 in ad51966

editType: 'plot'
},

fillcolor: boxAttrs.fillcolor,
hoveron: boxAttrs.hoveron
};
75 changes: 75 additions & 0 deletions src/traces/violin/calc.js
Original file line number Diff line number Diff line change
@@ -0,0 +1,75 @@
/**
* Copyright 2012-2017, Plotly, Inc.
* All rights reserved.
*
* This source code is licensed under the MIT license found in the
* LICENSE file in the root directory of this source tree.
*/

'use strict';

var Lib = require('../../lib');
var boxCalc = require('../box/calc');

var kernels = {
gaussian: function(v) {
return (1 / Math.sqrt(2 * Math.PI)) * Math.exp(-0.5 * v * v);
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good thinking to make this extensible down the line. There would be something nice about using one of the polynomial kernels that goes smoothly to zero at a finite position but to start just gaussian seems fine. That's the ggplot2 default anyway, and I can't find anything to say which one seaborn uses.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I added the epanechnikov kernel in ad51966

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

... and of course, I could add a few more if desired. Note that other kernels make the violins look a little less smooth than the gaussian. Perhaps this is why other libraries (e.g. seaborn and ggplot) only use the gaussian kernel for violin plots 🤔

}
};

module.exports = function calc(gd, trace) {
var cd = boxCalc(gd, trace);

if(cd[0].t.empty) return cd;

for(var i = 0; i < cd.length; i++) {
var cdi = cd[i];
var vals = cdi.pts.map(extractVal);
var len = vals.length;
var span = trace.span || [cdi.min, cdi.max];
var dist = span[1] - span[0];
// sample standard deviation
var ssd = Lib.stdev(vals, len - 1, cdi.mean);
var bandwidthDflt = ruleOfThumbBandwidth(vals, ssd, cdi.q3 - cdi.q1);
var bandwidth = trace.bandwidth || bandwidthDflt;
var kde = makeKDE(vals, kernels.gaussian, bandwidth);
// step that well covers the bandwidth and is multiple of span distance
var n = Math.ceil(dist / (Math.min(bandwidthDflt, bandwidth) / 3));
var step = dist / n;

cdi.density = new Array(n);
cdi.violinMaxWidth = 0;

for(var k = 0, t = span[0]; t < (span[1] + step / 2); k++, t += step) {
var v = kde(t);
cdi.violinMaxWidth = Math.max(cdi.violinMaxWidth, v);
cdi.density[k] = {v: v, t: t};
}
}

return cd;
};

// Default to Silveman's rule of thumb:
// - https://en.wikipedia.org/wiki/Kernel_density_estimation#A_rule-of-thumb_bandwidth_estimator
// - https://github.com/statsmodels/statsmodels/blob/master/statsmodels/nonparametric/bandwidths.py
function ruleOfThumbBandwidth(vals, ssd, iqr) {
var a = Math.min(ssd, iqr / 1.349);
return 1.059 * a * Math.pow(vals.length, -0.2);
}

function makeKDE(vals, kernel, bandwidth) {
var len = vals.length;
var factor = 1 / (len * bandwidth);

// don't use Lib.aggNums to skip isNumeric checks
return function(x) {
var sum = 0;
for(var i = 0; i < len; i++) {
sum += kernel((x - vals[i]) / bandwidth);
}
return factor * sum;
};
}

function extractVal(o) { return o.v; }
36 changes: 36 additions & 0 deletions src/traces/violin/defaults.js
Original file line number Diff line number Diff line change
@@ -0,0 +1,36 @@
/**
* Copyright 2012-2017, Plotly, Inc.
* All rights reserved.
*
* This source code is licensed under the MIT license found in the
* LICENSE file in the root directory of this source tree.
*/

'use strict';

var Lib = require('../../lib');
var Color = require('../../components/color');

var boxDefaults = require('../box/defaults');
var attributes = require('./attributes');

module.exports = function supplyDefaults(traceIn, traceOut, defaultColor, layout) {
function coerce(attr, dflt) {
return Lib.coerce(traceIn, traceOut, attributes, attr, dflt);
}

boxDefaults.handleSampleDefaults(traceIn, traceOut, coerce, layout);
if(traceOut.visible === false) return;

coerce('bandwidth');
coerce('scaleby');
coerce('span');
coerce('side');

coerce('line.color', (traceIn.marker || {}).color || defaultColor);
coerce('line.width');
coerce('line.smoothing');
coerce('fillcolor', Color.addOpacity(traceOut.line.color, 0.5));

boxDefaults.handlePointsDefaults(traceIn, traceOut, coerce, {prefix: ''});
};
40 changes: 40 additions & 0 deletions src/traces/violin/index.js
Original file line number Diff line number Diff line change
@@ -0,0 +1,40 @@
/**
* Copyright 2012-2017, Plotly, Inc.
* All rights reserved.
*
* This source code is licensed under the MIT license found in the
* LICENSE file in the root directory of this source tree.
*/

'use strict';

module.exports = {
attributes: require('./attributes'),
layoutAttributes: require('./layout_attributes'),
supplyDefaults: require('./defaults'),
supplyLayoutDefaults: require('./layout_defaults'),
calc: require('./calc'),
setPositions: require('../box/set_positions'),
plot: require('./plot'),
style: require('./style'),
hoverPoints: require('../box/hover'),
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should violin show hover labels about the kernel density curve?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's tricky to figure out what someone would want to see labeled on the curve. The only thing I can think of that might be cool is a label that moves continuously (along the distribution axis) with the mouse, so you can look at a peak or a valley or something and see exactly what data value it's at... would help you read quantitative differences off several violins. Would a label like that get any value reported for the density? It wouldn't mean much on its own so could be omitted, though it would have meaning relative to other such values on the same or different violins.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Here's what I came up with:

peek 2017-10-27 14-36

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In my excitement at how beautiful this effect is, I missed an important piece: you should see both the kde and the y value in that label, so you can use it to read out the exact peak/valley locations for example. Should show just enough digits that each pixel is a different y value.

Copy link
Contributor Author

@etpinard etpinard Oct 31, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Implemented in bc6bc02 with the help of Lib.findPointOnPath added in 4a40fc7

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

... now looks like:

peek 2017-10-30 23-38

selectPoints: require('../box/select'),

moduleType: 'trace',
name: 'violin',
basePlotModule: require('../../plots/cartesian'),
// TODO
// - should maybe rename 'box' category to something more general
categories: ['cartesian', 'symbols', 'oriented', 'box', 'showLegend'],
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this box category is currently used:

image

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

... which for the most part are common places for violin traces too.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Replaced by box-violin in ea43b25

meta: {
description: [
'In vertical (horizontal) violin plots,',
'statistics are computed using `y` (`x`) values.',
'By supplying an `x` (`y`) array, one violin per distinct x (y) value',
'is drawn',
'If no `x` (`y`) {array} is provided, a single violin is drawn.',
'That violin position is then positioned with',
'with `name` or with `x0` (`y0`) if provided.'
].join(' ')
}
};
Loading