Skip to content

Passing additional metadata to points #1031

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
rreusser opened this issue Oct 12, 2016 · 10 comments
Closed

Passing additional metadata to points #1031

rreusser opened this issue Oct 12, 2016 · 10 comments
Labels
feature something new
Milestone

Comments

@rreusser
Copy link
Contributor

@chriddyp has requested, for example, a slider that selects which subset of data to show. The easiest way to accomplish this is if points are able to carry some additional metadata on which a transform can filter. My simplistic approach was to add a single attribute (see: #1028), but after talking with @etpinard, here is a proposal for a slight generalization:

I will call it METADATA because I haven't found a name I like yet. Please substitute your preferred name for now.

METADATA is supplied to points as follows:

[{
  x: [1, 2, 3, 4],
  y: [5, 6, 7, 8],
  METADATA: [{
    name: 'country',
    values: ['USA', 'Canada', 'Canada', 'Mexico']
  }]
}]

In this example, there is just one field. Adding a transform could filter on this attribute as follows:

[{
  x: [1, 2, 3, 4],
  y: [5, 6, 7, 8],
  METADATA: [{
    name: 'country',
    values: ['USA', 'Canada', 'Canada', 'Mexico']
  }],
  transforms: [{
    type: 'filter',
    operation: '{}',
    filtersrc: 'METADATA.country',
    value: ['USA', 'Canada']
  }]
}]

This would filter out the fourth point since it corresponds to Mexico.

(One small note is that filtersrc would need a special case for METADATA.country since it's not strictly a nested property lookup. This does not seem problematic though.)

The upshot is that we can add metadata to points that adds capability and otherwise interacts with nothing. So it shouldn't be difficult to maintain.

The downside is that this is a new direction for plotly so should be considered carefully and with knowledge of whether this is consistent with how the front-end can be reasonably made to work.

cc: @etpinard @chriddyp @alexcjohnson
(and please cc others who may have a particular opinion about this)

@chriddyp
Copy link
Member

chriddyp commented Oct 18, 2016

Instead, what if filtersrc just handled arrays?

[{
  x: [1, 2, 3, 4],
  y: [5, 6, 7, 8]
},
  transforms: [{
    type: 'filter',
    operation: '{}',
    filtersrc: ['USA', 'Canada', 'Canada', 'Mexico'],
    value: ['USA', 'Canada']
  }]
]

@bpostlethwaite
Copy link
Member

bpostlethwaite commented Oct 18, 2016

A fair amount of operations will just want to filter on trace data. Forcing an array will lead to additional copying in that case? A pointer to an array like trace.marker.color or trace.x makes sense for a lot of cases.

The METADATA thing will work but won't we have to have to include it in all the supply machinery?

Could filtersrc be a string that points to an array or an array of actual data?

@rreusser
Copy link
Contributor Author

rreusser commented Oct 18, 2016

The METADATA thing will work but won't we have to have to include it in all the supply machinery?

@bpostlethwaite Can you clarify what you mean by that?

@rreusser
Copy link
Contributor Author

rreusser commented Oct 18, 2016

@chriddyp Thinking strictly about the json, I'm not sure a transform is the most desirable place for metadata about the points. Thinking about the UI though, I can see why it might be desirable to instantiate a transform that encapsulates the data it needs rather then injecting additional data into the trace itself. If there were multiple traces, you'd need to determine which metadata needs to be included and then insert and update that as the transforms change. I wouldn't think it would be that difficult, but I can see why it might be somewhat undesirable.

@rreusser
Copy link
Contributor Author

To summarize:

It's pretty easy to tack on some extra data in the json and reference that where needed. Wisdom and perspective I may need from someone with more experience than I have:

  1. What does this mean for front-end? Is it straightforward to pull this extra data from another column?
  2. Is this philosophically consistent with what plotly does? What it should do? What it needs to do? This is a nice easy feature to add, but is it a good long-term direction that may be built upon?

@chriddyp
Copy link
Member

Thinking strictly about the json, I'm not sure a transform is the most desirable place for metadata about the points.

Besides the fact that some data could be duplicated if the transform happened to use the same data as some other attribute, why wouldn't this be desirable?

A fair amount of operations will just want to filter on trace data. Forcing an array will lead to
additional copying in that case?

That's definitely true. But couldn't you make the same argument about multiple traces that share the same x data?

It may be extra copying in the JS side of things but this won't introduce any extra network lag since these attributes will have the same column source ID.

I'm not familiar with the internals of plotly.js's filtering, but if these arrays had the same reference, would there necessarily / does there need to be extra copying ?

What does this mean for front-end? Is it straightforward to pull this extra data from another column?

Nothing entirely blocking, but It'll require some extra things for us:

  • Instead of setting 1 attribute (trace[0].transforms[0].filtersrc = [1, 2, 3]) we'll have to set 2 attributes (trace[0].metadata.values = [1, 2, 3] and `trace[0].transforms[0].filtersrc = 'metadata.values'). Not a big deal.
  • If data changes in metadata.values, will the filter and plot update? It will need to.
  • In plot.ly, we separate datasets and plots and plots refer to datasets using keys like xsrc: myColumnId. We find these keys by crawling the plot-schema. See e.g. scatter-xsrc. A free-form metadata-like object would diverge from this pattern - our front-end and back-end would need to look for *src keys and additionally check for keys inside metadata.
  • Since it's not in plot-schema, it's a little harder to reference in our documentation.

Is this philosophically consistent with what plotly does? What it should do? What it needs to do? This is a nice easy feature to add, but is it a good long-term direction that may be built upon?

If the motive for introducing this is type of structure is to reduce duplicate copies of data in the JSON, it seems like the ultimate solution would be for plotly to encapsulate the data in the json, like:

{
    dataset: [
         {id: 'id1', values: [1, 2, 3, 4]},
         {id: 'id2', values: [4, 3,1, 5]},
         {id: 'id3', values: [1, 3, 4, 5]},
         {id: 'id4', values: ['a', 'b', 'c' ,'d']},
    ],
    data: [{
        x: 'id1',
        y: 'id2',
        transforms: [{type: 'filter', filtersrc: 'id4', values: ['c']}]
    }, {
        x: 'id1',
        y: 'id3',
        transforms: [{type: 'filter', filtersrc: 'id4', values: ['d']}]
    }]
}

We actually already serialize our data like this on plot.ly so that we can keep data and plots separate and so that we're not sending any more columns over the wire than we need to. I'm not sure of what other benefits we might get by doing something like this in plotly.js.

@rreusser
Copy link
Contributor Author

Interesting! I didn't realize plot.ly was assembling the data client-side. Good to know. I mostly type examples by hand so that duplication is particularly irksome (not so much the typing but the fact that I have to sit there contemplating normalization vs. denormalization as I copy/paste/synchronize)

The only reason for my aversion to transforms owning the data is that it sits logically in parallel with the data so that a transform 'owning' the data seems like an unnecessary denormalization since two transforms talking to the same metadata would need their own individual copies. Also, since the decision to add metadata is a departure from what plotly currently does and since it seems reasonably possible that it could have other uses (similar to the way components and transforms have grown in scope and capability as they interact in new ways), it seems like nesting it inside transforms limits the possibilities without just starting over from scratch the next time it's needed. That said, I'm not currently aware of what those other uses might be. Realistically, even novel, unexpected uses may well just be transforms anyway.

@rreusser
Copy link
Contributor Author

rreusser commented Oct 18, 2016

Since it's not in plot-schema, it's a little harder to reference in our documentation.

If it were contained in, e.g., metadata[0].values, that would be in the plot schema, right? (That was the reason for the container object with name and values attributes vs. custom keys coming from the field names)

(Meaning metadata: [{name: 'country', values: [...]}, ...] vs. metadata: {country: [...]})

@chriddyp
Copy link
Member

metadata[0].values, that would be in the plot schema, right?

Ah yup, if it has a standard name like values then we're good!

@etpinard
Copy link
Contributor

etpinard commented Oct 24, 2016

Resolved (for now).

Future development will be discussed in https://github.com/plotly/streambed/issues/8112

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
feature something new
Projects
None yet
Development

No branches or pull requests

4 participants