Skip to content

Allow plotly objects to use aggregated / precomputed statistics. #242

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
nehalecky opened this issue Feb 8, 2016 · 7 comments
Closed

Allow plotly objects to use aggregated / precomputed statistics. #242

nehalecky opened this issue Feb 8, 2016 · 7 comments

Comments

@nehalecky
Copy link

Hi Plotly team, thanks for an amazing library.

I work with datasets much too large to be computed on by the plotly client, however, would still love to be able to use the beautiful visualizations that plotly produces in my analysis. As such, I am looking for the ability to pass aggregate or precomputed statistics to Plotly graph objects, bypassing plotly computation. A nice example of a particular need for this is demonstrated in this PR for aggregate values for a box plot.

From the discussion there ...

One alternative would be to wait for the completion of modular milestone where requiring trace module individually will make it possible to override the box calc step.

...it seems that this prospect is closer to being possible, as the modularization milestone is complete—congrats on this!

There has been a lot of nice discussion on the modularization topic, including:

However, I am looking for any guidance on where I can begin to understand, build and eventually access an aggregate stats API on plotly graph objects, and in particular, expose it to the plotly.py python client. I'm not really familiar with plotly architecture, have minimal JS chops, but can hold my own in Python (I think), and would love to help out. :)

Any references that would be helpful and/or pointers in this endeavor? Much appreciated, thanks again.

@etpinard
Copy link
Contributor

etpinard commented Feb 9, 2016

@nehalecky thanks for the feedback.

Let me sketch out the solution for you.

The idea consists of making a custom trace module, with a index file similar to the one in src/traces/box/index.js but with a custom calc step (as first pointed out in #138) :

// in custom-box.js

var Box = {};

Box.attributes = require('plotly.js/src/traces/box/attributes');
Box.layoutAttributes = require('plotly.js/src/traces/box/layout_attributes');
Box.supplyDefaults = require('plotly.js/src/traces/box/defaults');
Box.supplyLayoutDefaults = require('plotly.js/src/traces/box/layout_defaults');

Box.calc = function(gd, trace) {
  // Do something in here that generates the calculated data
  //
  // See https://github.com/plotly/plotly.js/blob/master/src/traces/box/calc.js
  // for an example
  //
  // Note also that all keys sent to `Plotly.plot()` in the data or layout argument 
  // make it through to the calc step in 
  // `gd.data[i]['name of the attribute']` or 
  // `gd.layout['name of the attributes']`, 
  // no need to hijack the defaults step as in #138 
};

Box.setPositions = require('plotly.js/src/traces/box/set_positions');
Box.plot = require('plotly.js/src/traces/box/plot');
Box.style = require('plotly.js/src/traces/box/style');
Box.hoverPoints = require('plotly.js/src/traces/box/hover');

Box.moduleType = 'trace';
Box.name = 'customBox';
Box.basePlotModule = require('plotly.js/src/plots/cartesian');
Box.categories = ['cartesian', 'symbols', 'oriented', 'box', 'showLegend'];
Box.meta = {
    description: 'my custom box trace'
};

module.exports = Box;

Then, import this custom module along with the plotly core module and register it:

// in custom-plotly.js

var customBox = require('./custom-box');
var Plotly = require('plotly.js/lib/core');

Plotly.register(customBox);

module.exports = Plotly;

and then either bundle up custom-plotly.js directly or import it elsewhere in your app.

@etpinard
Copy link
Contributor

etpinard commented Feb 9, 2016

and the use your new trace module as:

Plotly.plot('graph', [{type: 'customBox', x: [1,1,1,2,2,2,3,3,3,3]}]);

@nehalecky
Copy link
Author

Thanks for the detailed feedback, @etpinard. :)

For creating a custom trace, this seems like a clear way to go about doing so—I like it.

While creating custom traces is powerful, I'm proposing something a bit more general in allowing for the possibility for computations (those captured in a trace's .calc attribute) to take place outside of the plotly core, across all trace modules in plotly. In general, this conforms with the workflows and interfaces for basic charts types (e.g., the key and values expected for pie charts), it just makes it part of the standard plotly interface. I'll try and explain more... :)

In the sequence of object definitions that take place in a plotly trace module index (such as what you've proposed in custom-plot.js), I believe that there should be a separation in functionality that computes aggregate or summary statistics and those that set these data on any particular trace. This would require in an additional attribute in the index, such as .setData, which might reference a function responsible for mapping such values to the data structure expected (e.g., the cdi var in the case of the boxplot object) for the trace.

This could be exposed to the primary interface by allowing passing through of key values pairs that conform to those required by the plot, or could be all wrapped up in a higher level object. For instance, in the case of Python client, this might be allowing the go.Box() object to accept precomputed stats as arguments (e.g. min, max, mean, sd, q1, med, ...) upon instantiation. Passing a nested dictionary (e.g., a plot_data argument) could be nice and efficient as well.

Ultimately, this would allow any processing compute the plot data needed for an particular trace, and handing off to plotly to do it's awesomeness. Hope that was clear, please let me know if you have any additional thoughts / critique and thanks again for the thoughtful discussion. :)

@etpinard
Copy link
Contributor

I believe that there should be a separation in functionality that computes aggregate or summary statistics and those that set these data on any particular trace [...] For instance, in the case of Python client, this might be allowing the go.Box() object to accept precomputed stats as arguments (e.g. min, max, mean, sd, q1, med, ...)

I like the idea of adding support for passing pre-computed values inside box traces (and perhaps also histogram traces). Pre-computed attributes would take precedence over the raw sample in the defaults step and by-pass part of the calc step. I believe this use case is not uncommon, even for users with non-big data.

In this case the q1, q3, min, max and mean attributes would all have to be present for pre-computed to be taken into consideration.

As an aside,

which might reference a function responsible for mapping such values to the data structure expected

This would have been considered an anti-pattern for us, plotly.js tries to keep all of its user interface JSON serializable so that our API clients can be on-par with Plotly.plot.

@pnorth423
Copy link

you could pass through your aggregates as the 'raw data' ? The 5 number summary of a 5 number summary is the same : )

@etpinard
Copy link
Contributor

Closed in favor of #1059

@nehalecky
Copy link
Author

nehalecky commented Dec 1, 2016

@etpinard, #1059 is a start of this, no doubt, but is specific to Box Plots only. In this ticket, I was proposing a more general approach, but understand that this might be out of scope for the project. Hope that I can contribute more when I have more time. Thanks.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants