Allow plotly objects to use aggregated / precomputed statistics. #242

nehalecky · 2016-02-08T20:16:45Z

Hi Plotly team, thanks for an amazing library.

I work with datasets much too large to be computed on by the plotly client, however, would still love to be able to use the beautiful visualizations that plotly produces in my analysis. As such, I am looking for the ability to pass aggregate or precomputed statistics to Plotly graph objects, bypassing plotly computation. A nice example of a particular need for this is demonstrated in this PR for aggregate values for a box plot.

From the discussion there ...

One alternative would be to wait for the completion of modular milestone where requiring trace module individually will make it possible to override the box calc step.

...it seems that this prospect is closer to being possible, as the modularization milestone is complete—congrats on this!

There has been a lot of nice discussion on the modularization topic, including:

a well-written blog post on modularization architecture and design decisions,
the README section on Modules, with info on packaging plotly components to your own needs.

However, I am looking for any guidance on where I can begin to understand, build and eventually access an aggregate stats API on plotly graph objects, and in particular, expose it to the plotly.py python client. I'm not really familiar with plotly architecture, have minimal JS chops, but can hold my own in Python (I think), and would love to help out. :)

Any references that would be helpful and/or pointers in this endeavor? Much appreciated, thanks again.

etpinard · 2016-02-09T14:29:53Z

@nehalecky thanks for the feedback.

Let me sketch out the solution for you.

The idea consists of making a custom trace module, with a index file similar to the one in src/traces/box/index.js but with a custom calc step (as first pointed out in #138) :

// in custom-box.js

var Box = {};

Box.attributes = require('plotly.js/src/traces/box/attributes');
Box.layoutAttributes = require('plotly.js/src/traces/box/layout_attributes');
Box.supplyDefaults = require('plotly.js/src/traces/box/defaults');
Box.supplyLayoutDefaults = require('plotly.js/src/traces/box/layout_defaults');

Box.calc = function(gd, trace) {
  // Do something in here that generates the calculated data
  //
  // See https://github.com/plotly/plotly.js/blob/master/src/traces/box/calc.js
  // for an example
  //
  // Note also that all keys sent to `Plotly.plot()` in the data or layout argument 
  // make it through to the calc step in 
  // `gd.data[i]['name of the attribute']` or 
  // `gd.layout['name of the attributes']`, 
  // no need to hijack the defaults step as in #138 
};

Box.setPositions = require('plotly.js/src/traces/box/set_positions');
Box.plot = require('plotly.js/src/traces/box/plot');
Box.style = require('plotly.js/src/traces/box/style');
Box.hoverPoints = require('plotly.js/src/traces/box/hover');

Box.moduleType = 'trace';
Box.name = 'customBox';
Box.basePlotModule = require('plotly.js/src/plots/cartesian');
Box.categories = ['cartesian', 'symbols', 'oriented', 'box', 'showLegend'];
Box.meta = {
    description: 'my custom box trace'
};

module.exports = Box;

Then, import this custom module along with the plotly core module and register it:

// in custom-plotly.js

var customBox = require('./custom-box');
var Plotly = require('plotly.js/lib/core');

Plotly.register(customBox);

module.exports = Plotly;

and then either bundle up custom-plotly.js directly or import it elsewhere in your app.

etpinard · 2016-02-09T14:31:21Z

and the use your new trace module as:

Plotly.plot('graph', [{type: 'customBox', x: [1,1,1,2,2,2,3,3,3,3]}]);

nehalecky · 2016-02-10T01:36:35Z

Thanks for the detailed feedback, @etpinard. :)

For creating a custom trace, this seems like a clear way to go about doing so—I like it.

While creating custom traces is powerful, I'm proposing something a bit more general in allowing for the possibility for computations (those captured in a trace's .calc attribute) to take place outside of the plotly core, across all trace modules in plotly. In general, this conforms with the workflows and interfaces for basic charts types (e.g., the key and values expected for pie charts), it just makes it part of the standard plotly interface. I'll try and explain more... :)

In the sequence of object definitions that take place in a plotly trace module index (such as what you've proposed in custom-plot.js), I believe that there should be a separation in functionality that computes aggregate or summary statistics and those that set these data on any particular trace. This would require in an additional attribute in the index, such as .setData, which might reference a function responsible for mapping such values to the data structure expected (e.g., the cdi var in the case of the boxplot object) for the trace.

This could be exposed to the primary interface by allowing passing through of key values pairs that conform to those required by the plot, or could be all wrapped up in a higher level object. For instance, in the case of Python client, this might be allowing the go.Box() object to accept precomputed stats as arguments (e.g. min, max, mean, sd, q1, med, ...) upon instantiation. Passing a nested dictionary (e.g., a plot_data argument) could be nice and efficient as well.

Ultimately, this would allow any processing compute the plot data needed for an particular trace, and handing off to plotly to do it's awesomeness. Hope that was clear, please let me know if you have any additional thoughts / critique and thanks again for the thoughtful discussion. :)

etpinard · 2016-02-10T14:31:57Z

I believe that there should be a separation in functionality that computes aggregate or summary statistics and those that set these data on any particular trace [...] For instance, in the case of Python client, this might be allowing the go.Box() object to accept precomputed stats as arguments (e.g. min, max, mean, sd, q1, med, ...)

I like the idea of adding support for passing pre-computed values inside box traces (and perhaps also histogram traces). Pre-computed attributes would take precedence over the raw sample in the defaults step and by-pass part of the calc step. I believe this use case is not uncommon, even for users with non-big data.

In this case the q1, q3, min, max and mean attributes would all have to be present for pre-computed to be taken into consideration.

As an aside,

which might reference a function responsible for mapping such values to the data structure expected

This would have been considered an anti-pattern for us, plotly.js tries to keep all of its user interface JSON serializable so that our API clients can be on-par with Plotly.plot.

pnorth423 · 2016-05-23T20:05:57Z

you could pass through your aggregates as the 'raw data' ? The 5 number summary of a 5 number summary is the same : )

etpinard · 2016-11-28T20:01:32Z

Closed in favor of #1059

nehalecky · 2016-12-01T14:57:27Z

@etpinard, #1059 is a start of this, no doubt, but is specific to Box Plots only. In this ticket, I was proposing a more general approach, but understand that this might be out of scope for the project. Hope that I can contribute more when I have more time. Thanks.

cpsievert mentioned this issue Apr 28, 2016

Add support for geom_boxplot(stat="identity") in ggplotly or similar for plot_ly boxplot plotly/plotly.R#565

Open

etpinard closed this as completed Nov 28, 2016

etpinard mentioned this issue Nov 20, 2017

Needs discussion: Registering bespoke trace types #2174

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Allow plotly objects to use aggregated / precomputed statistics. #242

Allow plotly objects to use aggregated / precomputed statistics. #242

nehalecky commented Feb 8, 2016

etpinard commented Feb 9, 2016

etpinard commented Feb 9, 2016

nehalecky commented Feb 10, 2016

etpinard commented Feb 10, 2016

pnorth423 commented May 23, 2016

etpinard commented Nov 28, 2016

nehalecky commented Dec 1, 2016 •

edited

Loading

Allow plotly objects to use aggregated / precomputed statistics. #242

Allow plotly objects to use aggregated / precomputed statistics. #242

Comments

nehalecky commented Feb 8, 2016

etpinard commented Feb 9, 2016

etpinard commented Feb 9, 2016

nehalecky commented Feb 10, 2016

etpinard commented Feb 10, 2016

pnorth423 commented May 23, 2016

etpinard commented Nov 28, 2016

nehalecky commented Dec 1, 2016 • edited Loading

nehalecky commented Dec 1, 2016 •

edited

Loading