Skip to content

Add 'cumulative' histogram 'mode' for CDF #1189

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 8 commits into from
Jan 18, 2017
Merged
Show file tree
Hide file tree
Changes from 7 commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
3 changes: 2 additions & 1 deletion src/plot_api/plot_api.js
Original file line number Diff line number Diff line change
Expand Up @@ -1296,7 +1296,8 @@ function _restyle(gd, aobj, _traces) {
'tilt', 'tiltaxis', 'depth', 'direction', 'rotation', 'pull',
'line.showscale', 'line.cauto', 'line.autocolorscale', 'line.reversescale',
'marker.line.showscale', 'marker.line.cauto', 'marker.line.autocolorscale', 'marker.line.reversescale',
'xcalendar', 'ycalendar'
'xcalendar', 'ycalendar',
'cumulative', 'cumulative.enabled', 'cumulative.direction', 'cumulative.currentbin'
];

for(i = 0; i < traces.length; i++) {
Expand Down
5 changes: 4 additions & 1 deletion src/plots/cartesian/axes.js
Original file line number Diff line number Diff line change
Expand Up @@ -609,7 +609,10 @@ function autoShiftNumericBins(binStart, data, ax, dataMin, dataMax) {
// otherwise start half an integer down regardless of
// the bin size, just enough to clear up endpoint
// ambiguity about which integers are in which bins.
else binStart -= 0.5;
else {
binStart -= 0.5;
if(binStart + ax.dtick < dataMin) binStart += ax.dtick;
}
}
else if(midcount < dataCount * 0.1) {
if(edgecount > dataCount * 0.3 ||
Expand Down
60 changes: 54 additions & 6 deletions src/traces/histogram/attributes.js
Original file line number Diff line number Diff line change
Expand Up @@ -56,21 +56,69 @@ module.exports = {
'If **, the span of each bar corresponds to the number of',
'occurrences (i.e. the number of data points lying inside the bins).',

'If *percent*, the span of each bar corresponds to the percentage',
'of occurrences with respect to the total number of sample points',
'(here, the sum of all bin area equals 100%).',
'If *percent* / *probability*, the span of each bar corresponds to',
'the percentage / fraction of occurrences with respect to the total',
'number of sample points',
'(here, the sum of all bin HEIGHTS equals 100% / 1).',

'If *density*, the span of each bar corresponds to the number of',
'occurrences in a bin divided by the size of the bin interval',
'(here, the sum of all bin area equals the',
'(here, the sum of all bin AREAS equals the',
'total number of sample points).',

'If *probability density*, the span of each bar corresponds to the',
'If *probability density*, the area of each bar corresponds to the',
'probability that an event will fall into the corresponding bin',
'(here, the sum of all bin area equals 1).'
'(here, the sum of all bin AREAS equals 1).'
].join(' ')
},

cumulative: {
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It would be nice to add one image mock. Maybe one that combines a currentbin: 'include' and currentbin: 'exclude' traces like in:

image

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

test image in 53b61aa

One thing this showed is that we need a way to harmonize autobins across traces, and that it needs to know about cumulative. To make this example work I needed to manually extend the bin range for the smaller trace, otherwise its CDF ended too soon. Actually, CDFs never end, really... so perhaps the even better thing to do would be to look at the axis range and draw bins out to the edge. Anyway, fixing this will be a bigger project so I'll make an issue for it rather than try to address it here.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Anyway, fixing this will be a bigger project so I'll make an issue for it rather than try to address it here.

That's fine. Thanks for the info!

enabled: {
valType: 'boolean',
dflt: false,
role: 'info',
description: [
'If true, display the cumulative distribution by summing the',
'binned values. Use the `direction` and `centralbin` attributes',
'to tune the accumulation method.',
'Note: in this mode, the *density* `histnorm` settings behave',
'the same as their equivalents without *density*:',
'** and *density* both rise to the number of data points, and',
'*probability* and *probability density* both rise to the',
'number of sample points.'
].join(' ')
},

direction: {
valType: 'enumerated',
values: ['increasing', 'decreasing'],
dflt: 'increasing',
role: 'info',
description: [
'Only applies if cumulative is enabled.',
'If *increasing* (default) we sum all prior bins, so the result',
'increases from left to right. If *decreasing* we sum later bins',
'so the result decreases from left to right.'
].join(' ')
},

currentbin: {
valType: 'enumerated',
values: ['include', 'exclude', 'half'],
dflt: 'include',
role: 'info',
description: [
'Only applies if cumulative is enabled.',
'Sets whether the current bin is included, excluded, or has half',
'of its value included in the current cumulative value.',
'*include* is the default for compatibility with various other',
'tools, however it introduces a half-bin bias to the results.',
'*exclude* makes the opposite half-bin bias, and *half* removes',
'it.'
].join(' ')
}
},

autobinx: {
valType: 'boolean',
dflt: null,
Expand Down
6 changes: 4 additions & 2 deletions src/traces/histogram/bin_functions.js
Original file line number Diff line number Diff line change
Expand Up @@ -47,8 +47,9 @@ module.exports = {
return v;
}
else if(size[n] > v) {
var delta = v - size[n];
size[n] = v;
return v - size[n];
return delta;
}
}
return 0;
Expand All @@ -63,8 +64,9 @@ module.exports = {
return v;
}
else if(size[n] < v) {
var delta = v - size[n];
size[n] = v;
return v - size[n];
return delta;
}
}
return 0;
Expand Down
100 changes: 91 additions & 9 deletions src/traces/histogram/calc.js
Original file line number Diff line number Diff line change
Expand Up @@ -33,22 +33,38 @@ module.exports = function calc(gd, trace) {
trace.orientation === 'h' ? (trace.yaxis || 'y') : (trace.xaxis || 'x')),
maindata = trace.orientation === 'h' ? 'y' : 'x',
counterdata = {x: 'y', y: 'x'}[maindata],
calendar = trace[maindata + 'calendar'];
calendar = trace[maindata + 'calendar'],
cumulativeSpec = trace.cumulative;

cleanBins(trace, pa, maindata);

// prepare the raw data
var pos0 = pa.makeCalcdata(trace, maindata);

// calculate the bins
if((trace['autobin' + maindata] !== false) || !(maindata + 'bins' in trace)) {
trace[maindata + 'bins'] = Axes.autoBin(pos0, pa, trace['nbins' + maindata], false, calendar);
var binAttr = maindata + 'bins',
binspec;
if((trace['autobin' + maindata] !== false) || !(binAttr in trace)) {
binspec = Axes.autoBin(pos0, pa, trace['nbins' + maindata], false, calendar);

// adjust for CDF edge cases
if(cumulativeSpec.enabled && (cumulativeSpec.currentbin !== 'include')) {
if(cumulativeSpec.direction === 'decreasing') {
binspec.start = pa.c2r(pa.r2c(binspec.start) - binspec.size);
}
else {
binspec.end = pa.c2r(pa.r2c(binspec.end) + binspec.size);
}
}

// copy bin info back to the source data.
trace._input[maindata + 'bins'] = trace[maindata + 'bins'];
// copy bin info back to the source and full data.
trace._input[binAttr] = trace[binAttr] = binspec;
}
else {
binspec = trace[binAttr];
}

var binspec = trace[maindata + 'bins'],
nonuniformBins = typeof binspec.size === 'string',
var nonuniformBins = typeof binspec.size === 'string',
bins = nonuniformBins ? [] : binspec,
// make the empty bin array
i2,
Expand All @@ -59,8 +75,16 @@ module.exports = function calc(gd, trace) {
total = 0,
norm = trace.histnorm,
func = trace.histfunc,
densitynorm = norm.indexOf('density') !== -1,
extremefunc = func === 'max' || func === 'min',
densitynorm = norm.indexOf('density') !== -1;

if(cumulativeSpec.enabled && densitynorm) {
// we treat "cumulative" like it means "integral" if you use a density norm,
// which in the end means it's the same as without "density"
norm = norm.replace(/ ?density$/, '');
densitynorm = false;
}

var extremefunc = func === 'max' || func === 'min',
sizeinit = extremefunc ? null : 0,
binfunc = binFunctions.count,
normfunc = normFunctions[norm],
Expand Down Expand Up @@ -115,6 +139,10 @@ module.exports = function calc(gd, trace) {
if(doavg) total = doAvg(size, counts);
if(normfunc) normfunc(size, total, inc);

// after all normalization etc, now we can accumulate if desired
if(cumulativeSpec.enabled) cdf(size, cumulativeSpec.direction, cumulativeSpec.currentbin);


var serieslen = Math.min(pos.length, size.length),
cd = [],
firstNonzero = 0,
Expand Down Expand Up @@ -142,3 +170,57 @@ module.exports = function calc(gd, trace) {

return cd;
};

function cdf(size, direction, currentbin) {
var i,
vi,
prevSum;

function firstHalfPoint(i) {
prevSum = size[i];
size[i] /= 2;
}

function nextHalfPoint(i) {
vi = size[i];
size[i] = prevSum + vi / 2;
prevSum += vi;
}

if(currentbin === 'half') {

if(direction === 'increasing') {
firstHalfPoint(0);
for(i = 1; i < size.length; i++) {
nextHalfPoint(i);
}
}
else {
firstHalfPoint(size.length - 1);
for(i = size.length - 2; i >= 0; i--) {
nextHalfPoint(i);
}
}
}
else if(direction === 'increasing') {
for(i = 1; i < size.length; i++) {
size[i] += size[i - 1];
}

// 'exclude' is identical to 'include' just shifted one bin over
if(currentbin === 'exclude') {
size.unshift(0);
size.pop();
}
}
else {
for(i = size.length - 2; i >= 0; i--) {
size[i] += size[i + 1];
}

if(currentbin === 'exclude') {
size.push(0);
size.shift();
}
}
}
6 changes: 6 additions & 0 deletions src/traces/histogram/defaults.js
Original file line number Diff line number Diff line change
Expand Up @@ -27,6 +27,12 @@ module.exports = function supplyDefaults(traceIn, traceOut, defaultColor, layout
var x = coerce('x'),
y = coerce('y');

var cumulative = coerce('cumulative.enabled');
if(cumulative) {
coerce('cumulative.direction');
coerce('cumulative.currentbin');
}

coerce('text');

var orientation = coerce('orientation', (y && !x) ? 'h' : 'v'),
Expand Down
5 changes: 4 additions & 1 deletion test/jasmine/tests/axes_test.js
Original file line number Diff line number Diff line change
Expand Up @@ -1809,7 +1809,7 @@ describe('Test axes', function() {
);

expect(out).toEqual({
start: -0.5,
start: 0.5,
end: 4.5,
size: 1
});
Expand All @@ -1822,6 +1822,9 @@ describe('Test axes', function() {
2
);

// when size > 1 with all integers, we want the starting point to be
// a half integer below the round number a tick would be at (in this case 0)
// to approximate the half-open interval [) that's commonly used.
expect(out).toEqual({
start: -0.5,
end: 5.5,
Expand Down
105 changes: 105 additions & 0 deletions test/jasmine/tests/histogram_test.js
Original file line number Diff line number Diff line change
Expand Up @@ -246,5 +246,110 @@ describe('Test histogram', function() {
]);
});

describe('cumulative distribution functions', function() {
var base = {
x: [0, 5, 10, 15, 5, 10, 15, 10, 15, 15],
y: [2, 2, 2, 14, 6, 6, 6, 10, 10, 2]
};

it('makes the right base histogram', function() {
var baseOut = _calc(base);
expect(baseOut).toEqual([
{b: 0, p: 2, s: 1},
{b: 0, p: 7, s: 2},
{b: 0, p: 12, s: 3},
{b: 0, p: 17, s: 4},
]);
});

var CDFs = [
{p: [2, 7, 12, 17], s: [1, 3, 6, 10]},
{
direction: 'decreasing',
p: [2, 7, 12, 17], s: [10, 9, 7, 4]
},
{
currentbin: 'exclude',
p: [7, 12, 17, 22], s: [1, 3, 6, 10]
},
{
direction: 'decreasing', currentbin: 'exclude',
p: [-3, 2, 7, 12], s: [10, 9, 7, 4]
},
{
currentbin: 'half',
p: [2, 7, 12, 17, 22], s: [0.5, 2, 4.5, 8, 10]
},
{
direction: 'decreasing', currentbin: 'half',
p: [-3, 2, 7, 12, 17], s: [10, 9.5, 8, 5.5, 2]
},
{
direction: 'decreasing', currentbin: 'half', histnorm: 'percent',
p: [-3, 2, 7, 12, 17], s: [100, 95, 80, 55, 20]
},
{
currentbin: 'exclude', histnorm: 'probability',
p: [7, 12, 17, 22], s: [0.1, 0.3, 0.6, 1]
},
{
// behaves the same as without *density*
direction: 'decreasing', currentbin: 'half', histnorm: 'density',
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looking good!

p: [-3, 2, 7, 12, 17], s: [10, 9.5, 8, 5.5, 2]
},
{
// behaves the same as without *density*, only *probability*
direction: 'decreasing', currentbin: 'half', histnorm: 'probability density',
p: [-3, 2, 7, 12, 17], s: [1, 0.95, 0.8, 0.55, 0.2]
},
{
currentbin: 'half', histfunc: 'sum',
p: [2, 7, 12, 17, 22], s: [1, 6, 19, 44, 60]
},
{
currentbin: 'half', histfunc: 'sum', histnorm: 'probability',
p: [2, 7, 12, 17, 22], s: [0.5 / 30, 0.1, 9.5 / 30, 22 / 30, 1]
},
{
direction: 'decreasing', currentbin: 'half', histfunc: 'max', histnorm: 'percent',
p: [-3, 2, 7, 12, 17], s: [100, 3100 / 32, 2700 / 32, 1900 / 32, 700 / 32]
},
{
direction: 'decreasing', currentbin: 'half', histfunc: 'min', histnorm: 'density',
p: [-3, 2, 7, 12, 17], s: [8, 7, 5, 3, 1]
},
{
currentbin: 'exclude', histfunc: 'avg', histnorm: 'probability density',
p: [7, 12, 17, 22], s: [0.1, 0.3, 0.6, 1]
}
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It might be nice to test cumulative: true with other histfunc and histnorm settings.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Another good call by @etpinard 🌮 (and see also #1189 (comment))

What should we do with cumulative enabled and histnorm='density' or 'probability density'? As the code stands, CDFs using 'density' would rise to N/binSize (# samples / width of each bin) and 'probability density' would rise to 1/binSize. That seems useless and confusing, so I'd propose to interpret "cumulative" to mean an integral in these cases, ie 'density' would rise to N and 'probability density' would rise to 1, which then means in CDF mode these are equivalent to histnorm='' and 'probability' respectively.

I don't think there's anything special to do based on histfunc - some of these would also give strange results, but then the user is clearly asking for something strange.

Thoughts on any of this?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think there's anything special to do based on histfunc - some of these would also give strange results, but then the user is clearly asking for something strange.

I can see this being used in time series CDFs. Think payments over time: bin by date and then cumulatively sum by payment amount

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd propose to interpret "cumulative" to mean an integral in these cases, ie 'density' would rise to N and 'probability density' would rise to 1

I agree 100% here.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@chriddyp

I can see this being used in time series CDFs. Think payments over time: bin by date and then cumulatively sum by payment amount

Absolutely - and that will work just fine without modification (tests to come). I was just saying I don't think there's anything that needs altering based on histfunc, like what I'm planning to do for histnorm.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ha! turns out we didn't have any tests with histfunc and histnorm together, and max/min were broken. Fixed in c12b7cf and tests for all of this (cumulative + histfunc + histnorm all in one go!) in 4d02af7

];

CDFs.forEach(function(CDF) {
var p = CDF.p,
s = CDF.s;

it('handles direction=' + CDF.direction + ', currentbin=' + CDF.currentbin +
', histnorm=' + CDF.histnorm + ', histfunc=' + CDF.histfunc, function() {
var traceIn = Lib.extendFlat({}, base, {
cumulative: {
enabled: true,
direction: CDF.direction,
currentbin: CDF.currentbin
},
histnorm: CDF.histnorm,
histfunc: CDF.histfunc
});
var out = _calc(traceIn);

expect(out.length).toBe(p.length);
out.forEach(function(outi, i) {
expect(outi.p).toBe(p[i]);
expect(outi.s).toBeCloseTo(s[i], 6);
expect(outi.b).toBe(0);
});
});
});
});

});
});