Skip to content

Parallel coordinates chart - design #1071

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
monfera opened this issue Oct 24, 2016 · 26 comments
Closed

Parallel coordinates chart - design #1071

monfera opened this issue Oct 24, 2016 · 26 comments
Assignees
Labels
feature something new
Milestone

Comments

@monfera
Copy link
Contributor

monfera commented Oct 24, 2016

image

Approach

This issue is dedicated to parallel coordinates. As this is the first new chart where we're moving away from past gl-vis/***2d charts, some of the discussions is broader than that.

As we discussed on the requirements meeting, it's not easy to have both full, continuous interactivity and a lot of lines, because of the large number of lines (1k and up) either overload the shaders (no matter if WebGL or Canvas2d) or there's a need for time-consuming preprocessing e.g. splatting (we might be able to do on-GPU splatting but probably it's best to start with something simple, reasonably fast and direct).

So our requirements analysis said that - esp. with more than 1k lines - the axis sliders would be debounced, and would only render when the user stops for a bit or releases it. Even with this, a direct rendering is only good till about 10k..20k lines (around 100ms to generate) and beyond that, the direct approach would need incremental rendering. So here's an approach under consideration:

  1. Unlike with many current charts, we should avoid the rerender on every frame (useful irrespective of chart type, because it minimizes processing load) - and similarly, we should avoid "dummy renders", when a chart is getting initialized by rendering it with empty data, to be followed up by subsequent updates with more and more of the necessary input
  2. As a consequence, the line bundle (here shown as 16k blue lines) and the interactive stuff (tooltip, axis slider etc) should reside on separate layers (can be WebGL + SVG, WebGL + Canvas2d, WebGL + WebGL irrelevant for this point)
  3. We may need more layers, e.g. for some kind of a backdrop
  4. This means that the blue line bundle can be rendered with whatever is fastest (Canvas2D or WebGL)
  5. Here's the weird part 🔥 : on low-end hardware (13" Retina Macbook Pro) perf tests show Canvas2d to be faster for interesting cases (with this example, 100ms vs 500ms render time) - and I tried a couple ways of rendering lines in WebGL (though no splatting yet). Canvas2d uses hardware acceleration i.e. the GPU, and probably a lot of work went into optimizing it. On powerful GPUs the bottlenecks are probably elsewhere though.
  6. This 16k line image above is now generated with WebGL (regl) or Canvas2d, it looks almost identical (minuscule blending hue difference)
  7. Canvas2d is lightweight in that there's no additional code dependency, i.e. there's no significant bundle size increase if we use it; also, it's much quicker to prototype and do exploratory coding with Canvas2d than with WebGL
  8. As a consequence, it looks simplest to do layers in Canvas2d and/or SVG, and at some more mature stage (still within this PR) plug in WebGL (via regl) for the heavy-lifting layer to see if it does better or worse - this way we don't end up with the rather large code differences between non-WebGL and WebGL versions
  9. Therefore the calculations should be substrate independent; loose coupling between the viewModel and the renderer
  10. Since WebGL, and the eventual need for Web Workers suggests typed arrays, the parcoords proto-prototype uses typed arrays internally; Canvas2d/SVG can use it too; we can preface it with an untyped -> typed conversion for users
  11. Since typed arrays are 1-dimensional arrays but in reality encode multiple dimensions, the parcoords prototype is using ndarray - it's already part of our dependency stack so no code increase, and perf profiling shows essentially no speed cost relative to indexing manually. It makes the code a lot more readable.
  12. In conclusion, the experiences point to a code structure with a linear flow of data (also in line with current plotly.js practices):
    • user input
    • arrive at defaults
    • calculate a common viewModel
    • populate and render multiple layers, which may be of a heterogenous type, some of them alternatives to one another, and some of them getting new versions as part of shorter iterations (e.g. an SVG or Canvas2d layer gets a WebGL / regl alternative)
@monfera
Copy link
Contributor Author

monfera commented Nov 23, 2016

Checklists:

Functionality:

  • supporting integer variables (ordinal; snapping)
  • double-layer rendering for grey parcoords in the background, showing unselected lines
  • column reordering
  • axis reordering (drag/drop)
  • variable names on top
  • axis ticks and labels
  • top and bottom domain extremum display per axis
  • color specification variable
  • min/max extent for color mapping
  • depth order specification variable
  • relayout/restyle - plotly compatible GUP (biggie)
  • color legend with axis ticks and labels (biggie)
  • reduce dimension name occlusion (slanted headings, or multiline headings)
  • add arrow mouse pointer for anything that can be grabbed for axis drag/drop

Known issues:

  • to stop drag flashing, redraw the 2 panels only that are involved in reordering
  • (minor) clear leftmost panel specially, like the rightmost panel
  • fix incomplete or overzealous incremental rendering
  • limit number of ticks for integer dimensions too
  • horizontal line sections at the extrema are half width
  • dragging axes past the gl window clips the rendered lines
  • check linecount problem visible with low sample count, and degenerate cases
    • 1,
    • 0 and
    • zero-domain
    • some lines still don't show up (cause was: FP precision issue)

Plotly coding conventions:

  • test cases
  • more OO properties rather than plain variables?
  • glslify as a plotly.js devDependency
  • rely on as many preexisting functions as possible (e.g. utils; legends, perhaps axes etc.)

Performance (lower priority; just as an inventory of possibilities):

  • reorder WebGL buffer contents upon nearing axis with the mouse
    -> unrelated test showed that randomized line drawing looks better than ordered
    -> i.e. let's not do that unless even more throughput is requested
  • ... and then call drawArrays only on the currently selected extent
    -> same as above
  • when increasing extent, render only the newly included lines
    -> same as above
  • option for redrawing upon brush release only
    -> with recent render flash removal (per panel render etc.) it isn't needed
  • when redrawing, mitigate flash with motion blur?
    -> with recent render flash removal (per panel render etc.) it isn't needed

@monfera
Copy link
Contributor Author

monfera commented Nov 29, 2016

Datavis design notes

  • The interactive controls (filter bars) are very thin, to minimize occlusion of data ink - yet, to make them easily controllable (Fitts Law) there's a wide (configurable) capture zone around them; as the d3.svg.brush can't do it in itself, a SVG pattern defs is used to yield a thin filter bar
  • A future version might fade out the filter bars completely a few seconds after the user hovered out of the parcoords area
  • The three main overlapping visuals (context lines; focus lines; filter bar) must be easily distinguishable, therefore color choices are constraint driven rather than a matter of individual taste - currently, the context uses grayscale and alpha blending, against which the opaque, colorful lines (Viridis, Jet etc.) stand out well; the filter bar is magenta, which is rarely used in color runs, and according to some research e.g. by Stephen Few, it stands out best against most other colors, including the grayscale context here, and the white background (see the screenshot below)
  • As no assumptions can be made about the numeric values and sensible units, the tick values use SI prefixes; e.g. 40m means 40 milli i.e. 0.04 - it can therefore work on diverse domains and use the narrow axis tick width effectively
  • A variable may be marked as an integer variable - in this case, the brushing will snap to integral values, and there's no display of decimal places; it's one step toward nominal values at some point

Technical notes

  • Unlike with other gl plots
    • first render is performed with the actual data - there's no concept of an initial render with empty data
    • there is no rAF loop for the idle state or other changes that don't need a gl redraw (other gl plots either run in an infinite rAF loop or redraw on simple interactions e.g. mouse hover tooltip) - it's achieved by having separate layers
    • the gl plot is in the main repo with all its pros and cons
  • The crossfiltering is done in the GPU vertex shader, using linear operations to make filtered lines show or disappear
  • Upon brushing, all lines need to be redrawn - a future optimization might avoid full redraw if a filter domain is extended (i.e. there is nothing to delete)
  • Upon axis reordering (drag&drop) only the currently impacted two adjacent panels are redrawn
  • For brushing, incremental redraw is performed, to remain responsive (it's time consuming as all panels are redrawn)
  • Incremental drawing is performed by rendering a configurable number of lines in each rAF loop (scheduled as long as there are still lines to draw); these rAF items are canceled if e.g. filters change, to avoid drawing obsolete data into panels asynchronously
  • The WebGL layers use preserveDrawingBuffer: true for a few of reasons:
    • a negative performance impact didn't show up
    • useful for extracting a printed view (readPixels work with preserveDrawingBuffer: true only)
    • full rerendering is avoided if possible; for example, axis dragging only redraws the neighboring two panels; had we not used preserveDrawingBuffer: true, everything would need rerendering on the smallest change
    • in some circumstances (asynchronous drawing and rectangle clearing) the OpenGL operations need to be synchronized, otherwise some necessary drawings end up being erased
    • at some point, rendering time in a rAF call was measured, so as to maximize the number of rendered lines in each frame, and time can only be measured if we force the GPU to do the actual work (gl.flush and gl.finish were ineffective; gl.readPixels does the trick)
  • For axis reordering, there's no incremental redraw; it's a redraw of all lines because it's only two panels that need to be redrawn, i.e. there is lower load on the shaders
  • The context layer (greyscale, alpha blended) and the focus layer (colorful) are two separate canvas elements and two separate regl instances, but the drawing code is fully shared
  • The interactive controls are yet another layer (SVG); the SVG and WebGL layers are bonded together simply by using the same projections (scales)
  • The layers are separate in that the WebGL layer code can be replaced by a (future) Canvas2d renderer or SVG renderer without touching anything else

@monfera
Copy link
Contributor Author

monfera commented Dec 5, 2016

Questions (maybe rhetorical, as pieces fall into place during the API work) as they come up during relayout/restyle work.

  1. Some plots admit (expect) an array as their data, for example Cartesian plots. With scatter plots, multiple elements in the data array will put multiple data layers on the same (common) Cartesian projection with a shared x and y axes. Even pie charts expect an array for the data although a quick test didn't show multiple pie charts or multiple pie chart layers when adding more than one element. With parcoords, we naturally need an array - an array of variables. It currently feels right to pass on 12 elements in the array if the parcoords has 12 elements. So the specific utility of the array would be different to what it is for Cartesian, but still quite useful. It's not expected that multiple parcoords would be superimposed atop of each other - I can go into details of why it's much less useful than for Cartesians if needed. The obvious alternatives are:

    • as described above: each element in the array represents a parcoords dimension / axis (how it's wired up now)
    • as with Cartesians: each element in the array be superimposed on the previous elements (and the set of axes is the union or intersection of axes)
    • as with the pie chart - apparently ignore all but one array member; and that single array member contains all the dimensions (again, by default, in an array)
  2. Appropriate DOM root(s) for the WebGL canvases and the SVG overlay. Plotly has a svg.main-svg and a div.gl-container, both under div.svg-container in the DOM tree. If we follow the most common D3 data binding pattern, then parcoords has one single root to which the data is bound.

    • Currently, the common root, main-svg is simply bound to 0 which is an adequate key but it doesn't represent the binding of actual data from which rendering is done. There's probably reliance on it to remain in the DOM, or bound to 0 even if the user switches from one plot to the other (glad to be corrected on these). If so:
    • A next best possibility is to create a parcoords-specific div node under div.svg-container in runtime (which must be disposed of on destroy). There's precedence for using plot type specific layers e.g. svg.main-svg>g.pielayer - though I'd prefer the parcoords DOM element to come and go as needed rather than hanging there at all times.
    • A third option is to loosen the D3 data binding pattern and not have a common, parcoords-specific root to the parcoords DOM elements, although I think there's something nice about a single-point DOM attachment for a cohesive unit even ignoring the benefits of data binding.
  3. Axes. parcoords axes are their own thing now, and have their d3.svg.brush components, as well as d3.behavior.drag for rearranging the order of axes, and there are other differences to preexisting plotly axes. Once parcoords is properly integrated otherwise, it'd be interesting to unify the Cartesian and the parcoords axis generation, which is a delicate task given that any regression there may have a wide impact. So my plan for now is to not expose much axis styling options in the API and just work with sensible details, and if any config is needed, they'd need to be compatible with the preexisting Plotly axis attributes and conventions, or put inside some parcoords-specific part of the attributes, lest we introduce a conflict with the future unification of axes, breaking the initial API.

  4. glslify has been added - hope it looks alright but as browserify transforms are found at several places, I might have missed some.

@etpinard
Copy link
Contributor

etpinard commented Dec 5, 2016

@monfera

Some plots admit (expect) an array as their data

I'm having issue understanding your pt 1. Can give examples in terms of "data" / "layout" attributes?

Appropriate DOM root(s) for the WebGL canvases and the SVG overlay.

What kind of svg overlays are talking about here? If the only SVG overlay is the hover layer (this is what I'm suspecting but I might mistaken), then no worry about SVG overlayer creation, Fx.hover will take of this. See how mapbox subplots are implement it here for inspiration.

Axes. parcoords axes are their own thing now, and have their

I thought we agreed on parcoods being a pie-like trace type. Meaning that each would generated the appropriate axes. That is, multiple-parcoord trace subplot wouldn't be possible. Am I missing something? Again here, example in terms "data" / "layout" attribute would help.

@monfera
Copy link
Contributor Author

monfera commented Dec 5, 2016

@etpinard thanks for your reply!

1.

Can give examples in terms of "data" / "layout" attributes?

I probably phrased it poorly; probably best to see these examples and reread Q.1. only then: for example, with pies the data property is an array; or with scatterplots, the same thing.

So the question is if I should strive to do data: [parcoordsDimension1, parcoordsDimension2, ... ] or data: [parcoordsData] where parcoordsData encompasses all the dimensions. Initially I did the former, on the belief that it's easier to add/remove dimensions subsequently but I'm open to whichever way you feel is best.

parcoordsData may feel opaque; here's the core part of what it is now:

{
        variableName: "Fuselage diameter",
        integer: false, // meaning, the numeric value is on a continuous scale
        values: [1233.35, 4384.24, 8903.234, ...],
        filterDomain: [fromValue, toValue]
}

i.e. it represents one dimension (most importantly, the values on the axis of Fuselage diameter).

2.

What kind of svg overlays are talking about here?

To clarify, the current parcoords layer doesn't do hovering - will be a useful feature, but the initial requirements didn't cover it and with tens of thousands of superimposed lines it's not as useful. Instead, the current parcoords SVG layer is responsible for the brush selections and the axis drag&drop. In any event I'll check out the mapbox hover handling, thanks for the link!

3.

we agreed on parcoods being a pie-like trace type.

Indeed I'm not currently planning with multiple datasets superimposed over one another, and there's no code support for it now (it wouldn't be particularly useful for parcoords, see my note in Q.1.).
I simply referred to the possibility that perhaps we'd eventually unify the Cartesian axis component with what's in parcoords right now - at their core, Cartesian and parcoords axes have overlaps in what's rendered and what they do.

4.

Just a note, I added this point above (glslify)

@etpinard
Copy link
Contributor

etpinard commented Dec 6, 2016

I probably should have done this a while ago now. But here they are, some very preliminaries attribute sets for parcoord:

Option 1 (one-trace 2D values)

data = [{
  type: 'parcoord',
  values: [
    [ /* data for labels[0] */ ],
    [ /* data for labels[1] */ ],
    // ...
  ],
  labels: [ 'A', 'B', ... ],  // labels to appear on each axis
  ranges: [
    [ /* axis range for labels[0] / values[0] */ ],
    [ /* axis range for labels[1] / values[1] */ ]
  ], 
  domain: {  // similar to pie
    x: [0, 1],
    y: [0, 1]
   }
]}

// and no additional layout container

Option 2 (one trace with constraint array-container)

data = [{
  type: 'parcoord',
  constraints: [{
    values: [ /*  */ ],
    label: 'A',
    axis: { /* axis settings */ }
  }, {
    values: [ /*  */ ],
    label: 'B',
    axis: { 
     range: [0, 1]
     // other axis settings
    }
  }, {
    // more constraints 
  }],
  domain: {  // similar to pie
    x: [0, 1],
    y: [0, 1]
   }
]}

// and no additional layout container

Options 3 (one trace per constraint, axis settings in layout)

data = [{
  type: 'parcoord',
    values: [ /*  */ ],
    label: 'A',
    xaxis: 'x'
    yaxis: 'y'
}, {
    type: 'parcoord',
    values: [ /*  */ ],
    label: 'B',
    xaxis: 'x',
    yaxis: 'y2'
}, {
   // more constraints as traces
}}

layout = {
  xaxis: {
    domain: [0,1]  // the only meaningful setting here (I think)
  },
  yaxis: {
    domain: [0,1],
    range: [],
    // regular axis style settings
  },
  yaxis2: {
    // similarly ...
  },
  yaxis3: { /**/ }
}

At this stage, I believe option 2 is the most flexible. Although it does add an array-container to a trace object - which isn't the most fun to play with - it feels like the most flexible option.

@etpinard
Copy link
Contributor

etpinard commented Dec 6, 2016

@monfera
Copy link
Contributor Author

monfera commented Dec 6, 2016

Thanks @etpinard, great, we're closest to option 2 already. I might rename the constraints to variables or dimensions. I'm a bit puzzled by the x/y domains in the layout though as parcoords have arbitrary dimensions. I'll make some updates and ask you to take a look at the code in the afternoon or tomorrow morning.

@etpinard
Copy link
Contributor

etpinard commented Dec 6, 2016

I'm a bit puzzled by the x/y domains i

That refers to the position / size of a subplot on the plot paper coordinates. Just like for pie charts or layout.scene.domain

@monfera
Copy link
Contributor Author

monfera commented Dec 7, 2016

@etpinard as we discussed, here's a current version, it's closest to version 2 and also close to what it's been (I renamed a few things to match your version). I'm totally planning to aggregate them in logical units, e.g. for line styling; filter bar geometry etc., and to convert them to lowercase.

Also, a bunch of attributes will change as I'm going to use the plotly.js standard palette etc.
A bunch of them will end up being unexposed as an attribute (i.e. just constants).

I can make it so all of these are in the "data" part, too - for example, introducing a new property at the same level as "type": "parcoords".

[old JSON deleted]

@etpinard
Copy link
Contributor

etpinard commented Dec 7, 2016

@monfera apart from layout.width and layout.height, all the attributes you listed in layout (assuming we go with option 2), should be trace attributes.

Again assumeing that we go with option 2, as an exercise consider a multiple parcoord graph (spanning multiple rows), then all attributes that could be different from one parcoord trace (i.e. row) then another should be trace attributes - as there's only one "layout" per graph.

By the way, plotly.js forbids camel case attributes, we prefer grouping attributes in nested objects. For example, the set of filter attributes above should be:

{
  filter: {
     fillcolor: '',
     opacity: '',
     linewidth: '',
     linecolor: ''
  }
}

@monfera
Copy link
Contributor Author

monfera commented Dec 7, 2016

Multiple rows: some of the geometry and styling options, I believe, should ideally be shared among rows. For example, the brush color, or brush width better be consistent across rows.

Nesting / lowercase: indeed, this is what I meant by "I'm totally planning to aggregate them in logical units, e.g. for line styling; filter bar geometry etc., and to convert them to lowercase."

@etpinard
Copy link
Contributor

etpinard commented Dec 7, 2016

I believe, should ideally be shared among rows. For example, the brush color, or brush width better be consistent across rows.

Let's not do that (at least in v1).

That will be up to user to defined their cross-trace (i.e. cross-row) interactions by adding handlers to plotly_ events. Cross-trace brushing is very hard to describe in a declarative way.

@monfera
Copy link
Contributor Author

monfera commented Dec 7, 2016

@etpinard This is where the piecemeal evolution of the attribute structure is found: https://github.com/monfera/plotly.js/blob/parcoords-historical/src/traces/parcoords/mocks/k2.json

At the time of this note, the "settings" property is a catch-all one for those that I haven't yet created a dedicated group for. For example. "filterbar" attributes have already been extracted and lowercased.

Btw. it's not the final place for the mock, it just helps with a way of local testing (will move it later). Also, the attributes etc. files are a shell for now as attributes shift around.

I'll also need to switch to plotly-standard styling attributes for things like color/opacity and whatever helps DRY and consistency.

@monfera
Copy link
Contributor Author

monfera commented Dec 9, 2016

Thanks Étienne, here's the new, squashed branch.

@monfera monfera mentioned this issue Dec 16, 2016
37 tasks
@etpinard etpinard added this to the 1.24.0 milestone Feb 21, 2017
@monfera
Copy link
Contributor Author

monfera commented Feb 23, 2017

Implementing PR merged.

@monfera monfera closed this as completed Feb 23, 2017
@tantrev
Copy link

tantrev commented Dec 16, 2017

Just thought I'd throw in a request: it'd be really nice to be able to mouse over a part of a line and have that whole line highlighted.

@ibayer
Copy link

ibayer commented Jan 25, 2018

Thanks for the nice work.

Is it possible to have log10 based axis for some of the coordinates?
Link to related Stack-overflow question.

@monfera
Copy link
Contributor Author

monfera commented Jan 25, 2018

@tantrev yes, that'd be a good addition, it's not trivial due to how lines are currently rendered with WebGL and we didn't have it in the original scope, but an additional SVG overlay for a single polyline looks feasible. I'm not sure how this item would be scheduled, maybe the next time we need to make a round of improvements to parcoords.

@monfera
Copy link
Contributor Author

monfera commented Jan 25, 2018

@ibayer thank you, I responded there. In short, it's not yet available, but we'd like to eventually add this, in the meantime only the pre-logging of the data would work, but that'll show log values in the axis ticks.

@ibayer
Copy link

ibayer commented Jan 25, 2018

@monfera Thanks for the feedback, do you have this already on some feature list or should I open a feature request issue?

@monfera
Copy link
Contributor Author

monfera commented Jan 25, 2018

@ibayer I don't see an existing issue for it right now, it's a good idea to add one.

@ibayer
Copy link

ibayer commented Jan 25, 2018

@monfera Thanks I'll open a issue. (done)

@monfera @tantrev
I have also a question regarding line selection. Is it possible to link selected lines to entries in a data table as shown here:
https://syntagmatic.github.io/parallel-coordinates/examples/slickgrid.html

Looks like something similar can be done with plotly in general.
https://plot.ly/r/datatable/

However, I'm not experienced enough with plotly to judge if this is already possible with parallel coordinates as well.

@monfera
Copy link
Contributor Author

monfera commented Jan 25, 2018

@ibayer the parcoords in plotly.js does emit events on line hover and unhover. For an example navigate to this codepen, open the Dev Console and see events making console.log calls:

image

Other events on interactions also show up.

@tantrev
Copy link

tantrev commented Mar 16, 2018

I'm not sure if this is the right place to put it (since it may be considered an entirely new feature), but "Parallel Sets" plots like those from this Mac application would be really great to have on Plotly. Just thought it might go here, since a Parallel Sets plot seems to be the sibling of a Parallel Coordinates plot. 😄

EDIT: ggparallel also produces similar plots.

@tantrev
Copy link

tantrev commented Mar 26, 2018

On a more direct note, is there any way to specify the precision of the max value in the parallel coordinates plot? I've tried doing it with truncated "range" values, but I am still getting seeing many decimals in the final parallel coordinates plot. An example may be viewed here.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
feature something new
Projects
None yet
Development

No branches or pull requests

4 participants