Passing additional metadata to points #1031

rreusser · 2016-10-12T22:14:37Z

@chriddyp has requested, for example, a slider that selects which subset of data to show. The easiest way to accomplish this is if points are able to carry some additional metadata on which a transform can filter. My simplistic approach was to add a single attribute (see: #1028), but after talking with @etpinard, here is a proposal for a slight generalization:

I will call it METADATA because I haven't found a name I like yet. Please substitute your preferred name for now.

METADATA is supplied to points as follows:

[{
  x: [1, 2, 3, 4],
  y: [5, 6, 7, 8],
  METADATA: [{
    name: 'country',
    values: ['USA', 'Canada', 'Canada', 'Mexico']
  }]
}]

In this example, there is just one field. Adding a transform could filter on this attribute as follows:

[{
  x: [1, 2, 3, 4],
  y: [5, 6, 7, 8],
  METADATA: [{
    name: 'country',
    values: ['USA', 'Canada', 'Canada', 'Mexico']
  }],
  transforms: [{
    type: 'filter',
    operation: '{}',
    filtersrc: 'METADATA.country',
    value: ['USA', 'Canada']
  }]
}]

This would filter out the fourth point since it corresponds to Mexico.

(One small note is that filtersrc would need a special case for METADATA.country since it's not strictly a nested property lookup. This does not seem problematic though.)

The upshot is that we can add metadata to points that adds capability and otherwise interacts with nothing. So it shouldn't be difficult to maintain.

The downside is that this is a new direction for plotly so should be considered carefully and with knowledge of whether this is consistent with how the front-end can be reasonably made to work.

cc: @etpinard @chriddyp @alexcjohnson
(and please cc others who may have a particular opinion about this)

The text was updated successfully, but these errors were encountered:

chriddyp · 2016-10-18T16:59:25Z

Instead, what if filtersrc just handled arrays?

[{
  x: [1, 2, 3, 4],
  y: [5, 6, 7, 8]
},
  transforms: [{
    type: 'filter',
    operation: '{}',
    filtersrc: ['USA', 'Canada', 'Canada', 'Mexico'],
    value: ['USA', 'Canada']
  }]
]

bpostlethwaite · 2016-10-18T17:07:12Z

A fair amount of operations will just want to filter on trace data. Forcing an array will lead to additional copying in that case? A pointer to an array like trace.marker.color or trace.x makes sense for a lot of cases.

The METADATA thing will work but won't we have to have to include it in all the supply machinery?

Could filtersrc be a string that points to an array or an array of actual data?

rreusser · 2016-10-18T17:13:14Z

The METADATA thing will work but won't we have to have to include it in all the supply machinery?

@bpostlethwaite Can you clarify what you mean by that?

rreusser · 2016-10-18T17:16:53Z

@chriddyp Thinking strictly about the json, I'm not sure a transform is the most desirable place for metadata about the points. Thinking about the UI though, I can see why it might be desirable to instantiate a transform that encapsulates the data it needs rather then injecting additional data into the trace itself. If there were multiple traces, you'd need to determine which metadata needs to be included and then insert and update that as the transforms change. I wouldn't think it would be that difficult, but I can see why it might be somewhat undesirable.

rreusser · 2016-10-18T17:23:31Z

To summarize:

It's pretty easy to tack on some extra data in the json and reference that where needed. Wisdom and perspective I may need from someone with more experience than I have:

What does this mean for front-end? Is it straightforward to pull this extra data from another column?
Is this philosophically consistent with what plotly does? What it should do? What it needs to do? This is a nice easy feature to add, but is it a good long-term direction that may be built upon?

chriddyp · 2016-10-18T18:28:38Z

Thinking strictly about the json, I'm not sure a transform is the most desirable place for metadata about the points.

Besides the fact that some data could be duplicated if the transform happened to use the same data as some other attribute, why wouldn't this be desirable?

A fair amount of operations will just want to filter on trace data. Forcing an array will lead to
additional copying in that case?

That's definitely true. But couldn't you make the same argument about multiple traces that share the same x data?

It may be extra copying in the JS side of things but this won't introduce any extra network lag since these attributes will have the same column source ID.

I'm not familiar with the internals of plotly.js's filtering, but if these arrays had the same reference, would there necessarily / does there need to be extra copying ?

What does this mean for front-end? Is it straightforward to pull this extra data from another column?

Nothing entirely blocking, but It'll require some extra things for us:

Instead of setting 1 attribute (trace[0].transforms[0].filtersrc = [1, 2, 3]) we'll have to set 2 attributes (trace[0].metadata.values = [1, 2, 3] and `trace[0].transforms[0].filtersrc = 'metadata.values'). Not a big deal.
If data changes in metadata.values, will the filter and plot update? It will need to.
In plot.ly, we separate datasets and plots and plots refer to datasets using keys like xsrc: myColumnId. We find these keys by crawling the plot-schema. See e.g. scatter-xsrc. A free-form metadata-like object would diverge from this pattern - our front-end and back-end would need to look for *src keys and additionally check for keys inside metadata.
Since it's not in plot-schema, it's a little harder to reference in our documentation.

Is this philosophically consistent with what plotly does? What it should do? What it needs to do? This is a nice easy feature to add, but is it a good long-term direction that may be built upon?

If the motive for introducing this is type of structure is to reduce duplicate copies of data in the JSON, it seems like the ultimate solution would be for plotly to encapsulate the data in the json, like:

{
    dataset: [
         {id: 'id1', values: [1, 2, 3, 4]},
         {id: 'id2', values: [4, 3,1, 5]},
         {id: 'id3', values: [1, 3, 4, 5]},
         {id: 'id4', values: ['a', 'b', 'c' ,'d']},
    ],
    data: [{
        x: 'id1',
        y: 'id2',
        transforms: [{type: 'filter', filtersrc: 'id4', values: ['c']}]
    }, {
        x: 'id1',
        y: 'id3',
        transforms: [{type: 'filter', filtersrc: 'id4', values: ['d']}]
    }]
}

We actually already serialize our data like this on plot.ly so that we can keep data and plots separate and so that we're not sending any more columns over the wire than we need to. I'm not sure of what other benefits we might get by doing something like this in plotly.js.

rreusser · 2016-10-18T18:40:33Z

Interesting! I didn't realize plot.ly was assembling the data client-side. Good to know. I mostly type examples by hand so that duplication is particularly irksome (not so much the typing but the fact that I have to sit there contemplating normalization vs. denormalization as I copy/paste/synchronize)

The only reason for my aversion to transforms owning the data is that it sits logically in parallel with the data so that a transform 'owning' the data seems like an unnecessary denormalization since two transforms talking to the same metadata would need their own individual copies. Also, since the decision to add metadata is a departure from what plotly currently does and since it seems reasonably possible that it could have other uses (similar to the way components and transforms have grown in scope and capability as they interact in new ways), it seems like nesting it inside transforms limits the possibilities without just starting over from scratch the next time it's needed. That said, I'm not currently aware of what those other uses might be. Realistically, even novel, unexpected uses may well just be transforms anyway.

rreusser · 2016-10-18T18:41:51Z

Since it's not in plot-schema, it's a little harder to reference in our documentation.

If it were contained in, e.g., metadata[0].values, that would be in the plot schema, right? (That was the reason for the container object with name and values attributes vs. custom keys coming from the field names)

(Meaning metadata: [{name: 'country', values: [...]}, ...] vs. metadata: {country: [...]})

chriddyp · 2016-10-18T20:00:39Z

metadata[0].values, that would be in the plot schema, right?

Ah yup, if it has a standard name like values then we're good!

etpinard · 2016-10-24T16:05:57Z

Resolved (for now).

Future development will be discussed in https://github.com/plotly/streambed/issues/8112

rreusser added feature something new status: discussion needed labels Oct 12, 2016

etpinard added this to the v1.19.0 milestone Oct 13, 2016

etpinard mentioned this issue Oct 17, 2016

Improved animation merging for layout and traces #1041

Merged

3 tasks

etpinard closed this as completed Oct 24, 2016

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Passing additional metadata to points #1031

Passing additional metadata to points #1031

rreusser commented Oct 12, 2016

chriddyp commented Oct 18, 2016 •

edited

Loading

bpostlethwaite commented Oct 18, 2016 •

edited

Loading

rreusser commented Oct 18, 2016 •

edited

Loading

rreusser commented Oct 18, 2016 •

edited

Loading

rreusser commented Oct 18, 2016

chriddyp commented Oct 18, 2016

rreusser commented Oct 18, 2016

rreusser commented Oct 18, 2016 •

edited

Loading

chriddyp commented Oct 18, 2016

etpinard commented Oct 24, 2016 •

edited

Loading

Passing additional metadata to points #1031

Passing additional metadata to points #1031

Comments

rreusser commented Oct 12, 2016

chriddyp commented Oct 18, 2016 • edited Loading

bpostlethwaite commented Oct 18, 2016 • edited Loading

rreusser commented Oct 18, 2016 • edited Loading

rreusser commented Oct 18, 2016 • edited Loading

rreusser commented Oct 18, 2016

chriddyp commented Oct 18, 2016

rreusser commented Oct 18, 2016

rreusser commented Oct 18, 2016 • edited Loading

chriddyp commented Oct 18, 2016

etpinard commented Oct 24, 2016 • edited Loading

chriddyp commented Oct 18, 2016 •

edited

Loading

bpostlethwaite commented Oct 18, 2016 •

edited

Loading

rreusser commented Oct 18, 2016 •

edited

Loading

rreusser commented Oct 18, 2016 •

edited

Loading

rreusser commented Oct 18, 2016 •

edited

Loading

etpinard commented Oct 24, 2016 •

edited

Loading