-
-
Notifications
You must be signed in to change notification settings - Fork 1.9k
Violin plots #2116
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Violin plots #2116
Changes from 1 commit
c578cde
8f34227
c8f38ff
3438eae
d779509
bb252a5
ea43b25
a706f2a
1eb453b
7769f20
ad51966
4a40fc7
6ffc379
dfa918f
bc6bc02
e737664
e625c44
14bded3
48758d8
17f65d0
71700de
677aacc
0a32b98
3c9e0a0
ea66dea
789121c
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,11 @@ | ||
/** | ||
* Copyright 2012-2017, Plotly, Inc. | ||
* All rights reserved. | ||
* | ||
* This source code is licensed under the MIT license found in the | ||
* LICENSE file in the root directory of this source tree. | ||
*/ | ||
|
||
'use strict'; | ||
|
||
module.exports = require('../src/traces/violin'); |
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -161,7 +161,8 @@ function plotPoints(sel, plotinfo, trace, t) { | |
var bdPos = t.bdPos; | ||
var bPos = t.bPos; | ||
|
||
var mode = trace.boxpoints; | ||
// TODO ... unfortunately | ||
var mode = trace.boxpoints || trace.points; | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. We might want to rename There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I thought about combining display attributes for box, mean line and standard dev line inside the violins in one flaglist attributes e.g. So, unless someone opposes: showmeanline: true,
meanlinecolor: 'blue',
meanlinewidth: 1,
meanlinedash: 'dot'
showbox: true,
boxlinecolor: 'black'
boxlinewidth: 2,
boxfillcolor: 'red' There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. |
||
|
||
// repeatable pseudorandom number generator | ||
seed(); | ||
|
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,113 @@ | ||
/** | ||
* Copyright 2012-2017, Plotly, Inc. | ||
* All rights reserved. | ||
* | ||
* This source code is licensed under the MIT license found in the | ||
* LICENSE file in the root directory of this source tree. | ||
*/ | ||
|
||
'use strict'; | ||
|
||
var boxAttrs = require('../box/attributes'); | ||
var scatterAttrs = require('../scatter/attributes'); | ||
var extendFlat = require('../../lib/extend').extendFlat; | ||
|
||
module.exports = { | ||
y: boxAttrs.y, | ||
x: boxAttrs.x, | ||
x0: boxAttrs.x0, | ||
y0: boxAttrs.y0, | ||
name: boxAttrs.name, | ||
orientation: extendFlat({}, boxAttrs.orientation, { | ||
description: [ | ||
'Sets the orientation of the violin(s).', | ||
'If *v* (*h*), the distribution is visualized along', | ||
'the vertical (horizontal).' | ||
].join(' ') | ||
}), | ||
|
||
bandwidth: { | ||
valType: 'number', | ||
min: 0, | ||
role: 'info', | ||
editType: 'plot', | ||
description: [ | ||
'Sets the bandwidth used to compute the kernel density estimate.', | ||
'By default, the bandwidth is determined by Silverman\'s rule of thumb.' | ||
].join(' ') | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. No hard default for |
||
}, | ||
scaleby: { | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. inspired by seaborn's There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Replaced by scalegroup (taken from pie traces) and scalemode in ad51966 |
||
valType: 'enumerated', | ||
values: ['width', 'area', 'count'], | ||
dflt: 'width', | ||
role: 'info', | ||
editType: 'calc', | ||
description: [ | ||
'Sets the method by which the width of each violin is determined.', | ||
'*width* means each violin has the same (max) width', | ||
'*area* means each violin has the same area', | ||
'*count* means the violins are scaled by the number of sample points making', | ||
'up each violin.' | ||
].join('') | ||
}, | ||
span: { | ||
valType: 'info_array', | ||
items: [ | ||
{valType: 'any', editType: 'plot'}, | ||
{valType: 'any', editType: 'plot'} | ||
], | ||
role: 'info', | ||
editType: 'plot', | ||
description: [ | ||
'Sets the span in data space for which the density function will be computed.', | ||
'By default, the span goes from the minimum value to maximum value in the sample.' | ||
].join(' ') | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. To address, @alexcjohnson 's concern (from a private convo):
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I love that this is two numbers (or date strings, I guess), one for each end. - that's way more flexible than seaborn's one number or ggplot2's boolean. I also love that it's data values, rather than a delta from the ends of the distribution. Between the two of those you could do what seems to be really the right thing with seaborn's cut example and lop off unphysical values < 0 but not restrict the upper end. Ideally I would prefer if the default were no clipping at all (which, as discussed, probably means data bounds extended by 2 or 3 times the bandwidth), rather than clipped at the data bounds. The challenge then is to make it easy for people who do want to clip tightly to the data bounds - particularly if we think about the distribution changing over time, having to manually update the bounds for either of these cases seems awkward. I suppose we could define two special values of There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
That would be dangerous, as we want the @alexcjohnson 's For consistency, I'm thinking about adding a {
spanmode: 'soft',
span: [null, 10]
// where the null means pick the 'soft' default value
// i.e. data min minus 2 bandwidths
}
{
spanmode: 'hard',
span: [0, null]
// where the null means pick the 'hard' default value
// i.e. the data max.
} We could also add There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Done in ad51966 |
||
}, | ||
side: { | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. nice! also There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. As of ad51966, |
||
valType: 'enumerated', | ||
values: ['both', 'left', 'right'], | ||
dflt: 'both', | ||
role: 'info', | ||
editType: 'plot', | ||
description: [ | ||
'Determines which side of the position line the density function making up', | ||
'one half of a is plotting.', | ||
'Useful when comparing two violin traces under *overlay* mode, where one trace.' | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. where one trace what? There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Improved description in ad51966 |
||
].join(' ') | ||
}, | ||
|
||
// TODO update description | ||
points: boxAttrs.boxpoints, | ||
jitter: boxAttrs.jitter, | ||
pointpos: boxAttrs.pointpos, | ||
marker: boxAttrs.marker, | ||
text: boxAttrs.text, | ||
|
||
// TODO need attribute(s) similar to 'boxmean' to toggle lines for: | ||
// - mean | ||
// - median | ||
// - std | ||
// - quartiles | ||
|
||
line: { | ||
color: { | ||
valType: 'color', | ||
role: 'style', | ||
editType: 'style', | ||
description: 'Sets the color of line bounding the violin(s).' | ||
}, | ||
width: { | ||
valType: 'number', | ||
role: 'style', | ||
min: 0, | ||
dflt: 2, | ||
editType: 'style', | ||
description: 'Sets the width (in px) of line bounding the violin(s).' | ||
}, | ||
smoothing: scatterAttrs.line.smoothing, | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Is There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
|
||
editType: 'plot' | ||
}, | ||
|
||
fillcolor: boxAttrs.fillcolor, | ||
hoveron: boxAttrs.hoveron | ||
}; |
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,75 @@ | ||
/** | ||
* Copyright 2012-2017, Plotly, Inc. | ||
* All rights reserved. | ||
* | ||
* This source code is licensed under the MIT license found in the | ||
* LICENSE file in the root directory of this source tree. | ||
*/ | ||
|
||
'use strict'; | ||
|
||
var Lib = require('../../lib'); | ||
var boxCalc = require('../box/calc'); | ||
|
||
var kernels = { | ||
gaussian: function(v) { | ||
return (1 / Math.sqrt(2 * Math.PI)) * Math.exp(-0.5 * v * v); | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. We could support more: https://en.wikipedia.org/wiki/Kernel_(statistics)#Kernel_functions_in_common_use There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Good thinking to make this extensible down the line. There would be something nice about using one of the polynomial kernels that goes smoothly to zero at a finite position but to start just gaussian seems fine. That's the ggplot2 default anyway, and I can't find anything to say which one seaborn uses. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I added the epanechnikov kernel in ad51966 There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. ... and of course, I could add a few more if desired. Note that other kernels make the violins look a little less smooth than the gaussian. Perhaps this is why other libraries (e.g. seaborn and ggplot) only use the gaussian kernel for violin plots 🤔 |
||
} | ||
}; | ||
|
||
module.exports = function calc(gd, trace) { | ||
var cd = boxCalc(gd, trace); | ||
|
||
if(cd[0].t.empty) return cd; | ||
|
||
for(var i = 0; i < cd.length; i++) { | ||
var cdi = cd[i]; | ||
var vals = cdi.pts.map(extractVal); | ||
var len = vals.length; | ||
var span = trace.span || [cdi.min, cdi.max]; | ||
var dist = span[1] - span[0]; | ||
// sample standard deviation | ||
var ssd = Lib.stdev(vals, len - 1, cdi.mean); | ||
var bandwidthDflt = ruleOfThumbBandwidth(vals, ssd, cdi.q3 - cdi.q1); | ||
var bandwidth = trace.bandwidth || bandwidthDflt; | ||
var kde = makeKDE(vals, kernels.gaussian, bandwidth); | ||
// step that well covers the bandwidth and is multiple of span distance | ||
var n = Math.ceil(dist / (Math.min(bandwidthDflt, bandwidth) / 3)); | ||
var step = dist / n; | ||
|
||
cdi.density = new Array(n); | ||
cdi.violinMaxWidth = 0; | ||
|
||
for(var k = 0, t = span[0]; t < (span[1] + step / 2); k++, t += step) { | ||
var v = kde(t); | ||
cdi.violinMaxWidth = Math.max(cdi.violinMaxWidth, v); | ||
cdi.density[k] = {v: v, t: t}; | ||
} | ||
} | ||
|
||
return cd; | ||
}; | ||
|
||
// Default to Silveman's rule of thumb: | ||
// - https://en.wikipedia.org/wiki/Kernel_density_estimation#A_rule-of-thumb_bandwidth_estimator | ||
// - https://github.com/statsmodels/statsmodels/blob/master/statsmodels/nonparametric/bandwidths.py | ||
function ruleOfThumbBandwidth(vals, ssd, iqr) { | ||
var a = Math.min(ssd, iqr / 1.349); | ||
return 1.059 * a * Math.pow(vals.length, -0.2); | ||
} | ||
|
||
function makeKDE(vals, kernel, bandwidth) { | ||
var len = vals.length; | ||
var factor = 1 / (len * bandwidth); | ||
|
||
// don't use Lib.aggNums to skip isNumeric checks | ||
return function(x) { | ||
var sum = 0; | ||
for(var i = 0; i < len; i++) { | ||
sum += kernel((x - vals[i]) / bandwidth); | ||
} | ||
return factor * sum; | ||
}; | ||
} | ||
|
||
function extractVal(o) { return o.v; } |
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,36 @@ | ||
/** | ||
* Copyright 2012-2017, Plotly, Inc. | ||
* All rights reserved. | ||
* | ||
* This source code is licensed under the MIT license found in the | ||
* LICENSE file in the root directory of this source tree. | ||
*/ | ||
|
||
'use strict'; | ||
|
||
var Lib = require('../../lib'); | ||
var Color = require('../../components/color'); | ||
|
||
var boxDefaults = require('../box/defaults'); | ||
var attributes = require('./attributes'); | ||
|
||
module.exports = function supplyDefaults(traceIn, traceOut, defaultColor, layout) { | ||
function coerce(attr, dflt) { | ||
return Lib.coerce(traceIn, traceOut, attributes, attr, dflt); | ||
} | ||
|
||
boxDefaults.handleSampleDefaults(traceIn, traceOut, coerce, layout); | ||
if(traceOut.visible === false) return; | ||
|
||
coerce('bandwidth'); | ||
coerce('scaleby'); | ||
coerce('span'); | ||
coerce('side'); | ||
|
||
coerce('line.color', (traceIn.marker || {}).color || defaultColor); | ||
coerce('line.width'); | ||
coerce('line.smoothing'); | ||
coerce('fillcolor', Color.addOpacity(traceOut.line.color, 0.5)); | ||
|
||
boxDefaults.handlePointsDefaults(traceIn, traceOut, coerce, {prefix: ''}); | ||
}; |
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,40 @@ | ||
/** | ||
* Copyright 2012-2017, Plotly, Inc. | ||
* All rights reserved. | ||
* | ||
* This source code is licensed under the MIT license found in the | ||
* LICENSE file in the root directory of this source tree. | ||
*/ | ||
|
||
'use strict'; | ||
|
||
module.exports = { | ||
attributes: require('./attributes'), | ||
layoutAttributes: require('./layout_attributes'), | ||
supplyDefaults: require('./defaults'), | ||
supplyLayoutDefaults: require('./layout_defaults'), | ||
calc: require('./calc'), | ||
setPositions: require('../box/set_positions'), | ||
plot: require('./plot'), | ||
style: require('./style'), | ||
hoverPoints: require('../box/hover'), | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Should violin show hover labels about the kernel density curve? There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. It's tricky to figure out what someone would want to see labeled on the curve. The only thing I can think of that might be cool is a label that moves continuously (along the distribution axis) with the mouse, so you can look at a peak or a valley or something and see exactly what data value it's at... would help you read quantitative differences off several violins. Would a label like that get any value reported for the density? It wouldn't mean much on its own so could be omitted, though it would have meaning relative to other such values on the same or different violins. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. In my excitement at how beautiful this effect is, I missed an important piece: you should see both the kde and the y value in that label, so you can use it to read out the exact peak/valley locations for example. Should show just enough digits that each pixel is a different y value. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. |
||
selectPoints: require('../box/select'), | ||
|
||
moduleType: 'trace', | ||
name: 'violin', | ||
basePlotModule: require('../../plots/cartesian'), | ||
// TODO | ||
// - should maybe rename 'box' category to something more general | ||
categories: ['cartesian', 'symbols', 'oriented', 'box', 'showLegend'], | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. ... which for the most part are common places for There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Replaced by |
||
meta: { | ||
description: [ | ||
'In vertical (horizontal) violin plots,', | ||
'statistics are computed using `y` (`x`) values.', | ||
'By supplying an `x` (`y`) array, one violin per distinct x (y) value', | ||
'is drawn', | ||
'If no `x` (`y`) {array} is provided, a single violin is drawn.', | ||
'That violin position is then positioned with', | ||
'with `name` or with `x0` (`y0`) if provided.' | ||
].join(' ') | ||
} | ||
}; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Making sure violins are under boxes always.