-
-
Notifications
You must be signed in to change notification settings - Fork 1.9k
Violin plots #2116
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Violin plots #2116
Conversation
... in preparation for violin
... in preparation for violin
src/traces/violin/attributes.js
Outdated
description: [ | ||
'Sets the bandwidth used to compute the kernel density estimate.', | ||
'By default, the bandwidth is determined by Silverman\'s rule of thumb.' | ||
].join(' ') |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No hard default for bandwidth
. This rule-of-thumb that depends on the sample length and standard deviation seems pretty popular in other libraries.
src/traces/violin/attributes.js
Outdated
description: [ | ||
'Sets the span in data space for which the density function will be computed.', | ||
'By default, the span goes from the minimum value to maximum value in the sample.' | ||
].join(' ') |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
To address, @alexcjohnson 's concern (from a private convo):
mmm, looking through example images I don’t actually see the behavior I was imagining… might be something to play with at some point but not for now, unless some standard existing package does it.
One thing I see a lot in examples that seems to me a bad idea is truncating the violin at value where it has a finite width - particularly if that value is actually a data value. Unfortunately one of the examples I see that does exactly that is on our own site… https://plot.ly/python/basic-statistics/ out[5]
unless there’s some physical limit to the variable… that seems like it would justify doing this, but aren’t you still throwing away area (and therefore visual weight) from the points at the end?
at any rate, seems like if one is going to truncate like that it should be done explicitly, and should not be the default.
Ideally I’d like you to only be able to truncate by value (per the argument about physical limits) rather than “truncate at the ends of the data range” but I imagine people will complain if they can’t do both…
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I love that this is two numbers (or date strings, I guess), one for each end. - that's way more flexible than seaborn's one number or ggplot2's boolean. I also love that it's data values, rather than a delta from the ends of the distribution. Between the two of those you could do what seems to be really the right thing with seaborn's cut example and lop off unphysical values < 0 but not restrict the upper end.
Ideally I would prefer if the default were no clipping at all (which, as discussed, probably means data bounds extended by 2 or 3 times the bandwidth), rather than clipped at the data bounds. The challenge then is to make it easy for people who do want to clip tightly to the data bounds - particularly if we think about the distribution changing over time, having to manually update the bounds for either of these cases seems awkward.
I suppose we could define two special values of span[i]
- one for "clip tightly to this end of the data" and another for "do not clip this end"? 'tight'
and 'loose'
perhaps? Alternatively there could be another attribute for this, which would be nice as we wouldn't need to mix numbers and strings (or whatever the special value is) but it seems tricky to cover all cases this way, like if you want the low end tight and the upper end loose.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I suppose we could define two special values of span[i]
That would be dangerous, as we want the span
items to be set in d
coordinates. So a violin trace with 'tight'
and 'loose'
categories would totally break this.
@alexcjohnson 's 'tight'
and 'loose'
suggestions made me think of #1876 where the specs axis.bounds
and axis.boundsmode
are written down.
For consistency, I'm thinking about adding a spanmode
attribute alongside span
with possible values 'hard'
, 'soft'
, 'manual'
{
spanmode: 'soft',
span: [null, 10]
// where the null means pick the 'soft' default value
// i.e. data min minus 2 bandwidths
}
{
spanmode: 'hard',
span: [0, null]
// where the null means pick the 'hard' default value
// i.e. the data max.
}
We could also add 'tight-soft'
and 'soft-tight'
spanmode values to allow users to pick hard and soft defaults for each ends without having to set span
at all.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done in ad51966
src/traces/violin/attributes.js
Outdated
'By default, the bandwidth is determined by Silverman\'s rule of thumb.' | ||
].join(' ') | ||
}, | ||
scaleby: { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
inspired by seaborn's scale
attribute.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Replaced by scalegroup (taken from pie traces) and scalemode in ad51966
src/traces/violin/attributes.js
Outdated
'By default, the span goes from the minimum value to maximum value in the sample.' | ||
].join(' ') | ||
}, | ||
side: { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nice! also 'top'
and 'bottom'
for horizontal violins.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As of ad51966, side
is now an enumerated with values 'both'
, 'negative'
and 'positive'
so that the same set of values work with horizontal and vertical violins.
src/traces/box/plot.js
Outdated
@@ -161,7 +161,8 @@ function plotPoints(sel, plotinfo, trace, t) { | |||
var bdPos = t.bdPos; | |||
var bPos = t.bPos; | |||
|
|||
var mode = trace.boxpoints; | |||
// TODO ... unfortunately | |||
var mode = trace.boxpoints || trace.points; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We might want to rename boxpoints
-> points
as well as boxmean
(maybe to showmean
+ showsd
?)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I thought about combining display attributes for box, mean line and standard dev line inside the violins in one flaglist attributes e.g. stats: 'box+mean+sd'
or innermode: /* */
, but when considering #1774 it might be best to have separate booleans e.g. showbox
, showmean
, showsd
to keep things consisting with features like axis line and grids
So, unless someone opposes:
showmeanline: true,
meanlinecolor: 'blue',
meanlinewidth: 1,
meanlinedash: 'dot'
showbox: true,
boxlinecolor: 'black'
boxlinewidth: 2,
boxfillcolor: 'red'
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
src/traces/violin/index.js
Outdated
basePlotModule: require('../../plots/cartesian'), | ||
// TODO | ||
// - should maybe rename 'box' category to something more general | ||
categories: ['cartesian', 'symbols', 'oriented', 'box', 'showLegend'], |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
... which for the most part are common places for violin
traces too.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Replaced by box-violin
in ea43b25
module.exports = { | ||
violinmode: boxLayoutAttrs.boxmode, | ||
violingap: boxLayoutAttrs.boxgap, | ||
violingroupgap: boxLayoutAttrs.boxgroupgap |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Important: Should box
and violin
the same gap, groupgap and even mode attributes?
In other words, should violin
be thought as a different mode for box
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I could see users wanting to use the gap
attributes on violins the same way they would with boxes. imo it seems pretty intuitive for these charts to behave the same way and have the same grouping attributes available.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Violin-specific violinmode, violingroup and violingroupgap are implemented in 7769f20
src/traces/violin/calc.js
Outdated
|
||
var kernels = { | ||
gaussian: function(v) { | ||
return (1 / Math.sqrt(2 * Math.PI)) * Math.exp(-0.5 * v * v); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We could support more:
https://en.wikipedia.org/wiki/Kernel_(statistics)#Kernel_functions_in_common_use
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good thinking to make this extensible down the line. There would be something nice about using one of the polynomial kernels that goes smoothly to zero at a finite position but to start just gaussian seems fine. That's the ggplot2 default anyway, and I can't find anything to say which one seaborn uses.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I added the epanechnikov kernel in ad51966
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
... and of course, I could add a few more if desired. Note that other kernels make the violins look a little less smooth than the gaussian. Perhaps this is why other libraries (e.g. seaborn and ggplot) only use the gaussian kernel for violin plots 🤔
traceLayerClasses: [ | ||
'imagelayer', | ||
'maplayer', | ||
'barlayer', | ||
'carpetlayer', | ||
'violinlayer', |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Making sure violins are under boxes always.
src/traces/violin/index.js
Outdated
setPositions: require('../box/set_positions'), | ||
plot: require('./plot'), | ||
style: require('./style'), | ||
hoverPoints: require('../box/hover'), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should violin show hover labels about the kernel density curve?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's tricky to figure out what someone would want to see labeled on the curve. The only thing I can think of that might be cool is a label that moves continuously (along the distribution axis) with the mouse, so you can look at a peak or a valley or something and see exactly what data value it's at... would help you read quantitative differences off several violins. Would a label like that get any value reported for the density? It wouldn't mean much on its own so could be omitted, though it would have meaning relative to other such values on the same or different violins.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In my excitement at how beautiful this effect is, I missed an important piece: you should see both the kde and the y value in that label, so you can use it to read out the exact peak/valley locations for example. Should show just enough digits that each pixel is a different y value.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
module.exports = { | ||
supplyDefaults: supplyDefaults, | ||
handleSampleDefaults: handleSampleDefaults, | ||
handlePointsDefaults: handlePointsDefaults |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I made a separate reorganisation commit, to show a new way to factor out common trace module blocks. I think this method is a little more consistent with ES6 modules. For example here, supplyDefaults
would be the default export.
src/traces/violin/attributes.js
Outdated
description: [ | ||
'Determines which side of the position line the density function making up', | ||
'one half of a is plotting.', | ||
'Useful when comparing two violin traces under *overlay* mode, where one trace.' |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
where one trace what?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Improved description in ad51966
src/traces/violin/attributes.js
Outdated
editType: 'style', | ||
description: 'Sets the width (in px) of line bounding the violin(s).' | ||
}, | ||
smoothing: scatterAttrs.line.smoothing, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is smoothing
needed? Seems weird, especially since we're just kind of making up points for the path.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
line.smoothing
was 🔪 in ad51966
src/traces/violin/plot.js
Outdated
x: d.pos + bPos - (density[i].v / scale), | ||
y: density[i].t | ||
}); | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If the ends are cut off (and even if they aren't, if using the gaussian kernel that doesn't really go to zero) we should make two separate smoothed lines and connect them with straight segments, rather than one long curved line. Should be able to do this just with Drawing.smoothopen
, tweaking the first and/or last characters, and concatenating them.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Better drawing algo in ad51966
- to make them reusable in violin/plot.js - add support for asymmetric bdPos to support one-sided violins
- to reuse in violin hover
- totally indenpendent from their box* counterpart
- add 'kernel' enumerated - implement 'scalegroup & 'scalemode' (instead of 'scaleby') - implement 'spanmode' & 'span' - make 'side' an enumerated w/ vals 'both', 'negative' and 'positive' to be general enough for horizontal and vertical violin - 🔪 'line.smoothing'
- to be used to find pt on violin bezier curves
- i.e. showinnerbox & showmeanline and friends.
- in preparation for violin 'kde' hover handling
- with three flags: 'violins', 'points' (similar to box traces) and 'kde' which show the point on the kde line along with the line to crosses the hovered-on violin
@alexcjohnson tagging this thing as I still need to fill in a few descriptions ✏️ , add a few tests (especially for one-sided violins) 🔒 and I think something is off in the way I'm computing |
- as non-gaussian kernels may require us to make soft spanmode bounds kernel-dependent, which will require some trial-and-error - as most other libraries don't support non-guassian kernel, let's defer this.
- use nested attribute style for box and meanline settings - update test and mocks
- that 🔒 custom bandwidth and some box style settings.
Beautiful! 🎵 Now we really have some music for our 💃 ! |
Violin plots are coming to plotly.js 🎉 🎻 🎉
Python users can already create violins using @cldougl's
create_violin
figure factory (example); this PR will bring violin plots to all plotly.js consumer with an API very similar to thebox
trace type.IMPORTANT: After the first push of 2017/10/24, this PR is very much a WIP. Several API decisions remain to be made. See the first few comments on commit 3438eae for more info.