-
-
Notifications
You must be signed in to change notification settings - Fork 1.9k
On new or unified plot/trace types relating to parcoords
and sankey
#2229
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
@alexcjohnson re your question on categorical variables (all lines focusing in one point), in theory the Y screenspace value could be enhanced with random jitter on |
Responding @alexcjohnson #2221 (comment) in new thread.
The idea of mixing categorical support into
For improving the representation of categorical variables in the For purely categorical data, I would like something cleaner that doesn't rely on jitter. Regarding Parallel Sets. This diagram is very close to what I'm after as it is a representation of multi-dimensional categorical data with path continuity between categories. The only issue is that the coloring of the paths between categories is linked to the first dimension (the top in the example linked above). This restriction means that you don't need to draw all of the multi-paths separately for the early dimensions, one wide path can be used instead. But it also means that you can't color paths based on external criteria as is needed to support brushing/cross-filtering. Another way to think about what I'm proposing/developing as Any thoughts on this @alexcjohnson @monfera? |
Thanks for the detailed writeup @monfera You're right that there are a lot of possible extensions and convergence opportunities, but before we resort to those heavier (in terms of development) solutions, lets see what we can get out of extending the existing plot types. To me, @jmmease's problem is fairly simple, and extremely common: What I had in mind for the Then as a second extension, we could implement fat-line rendering, possibly also with per-line thicknesses, which would end up looking exactly like parallel categories/sets. @monfera this would only be a relevant option when the total row count is low, so performance should not be an issue. It would also reduce precision when you're trying to interpret a continuous dimension, so I'd keep it optional anyway. Or perhaps fat lines for categorical dimensions, dropping to 1px for continuous? That might actually be a pleasing effect... @jmmease perhaps what I've drawn is similar to what you had in mind with your two sticking points? (1) I think is a matter of taste, that we can deal with in various ways as I already mentioned. (2) is a good point, though with sufficient visual cues (ie boxes instead of an axis line) it feels to me like it's easy to intuitively distinguish, and the flip side is that when you're exploring via selection the category dimensions can help bring out density information pertaining to the continuous variables. My worry with using random jitter is that it still wouldn't necessarily indicate density, particularly as the sample size gets larger. |
Fwiw I really like the aesthetic of the fixed cadence offsetting, and once we have this kind of deliberate use of vertical space, it opens up both options and maybe more. With too many lines our current WebGL might give Moiré-like patterns but we can do a couple of things to reduce the effect, or switching to a solid shape would be useful. |
Thanks for posting the mockup @alexcjohnson , that's really helpful and I think it alleviates my reservations.
Here's a list of some of the other features that an enhanced
And a few stylistic nice-to-haves that might not be feasible with WebGL in the mix.
I do think this combination would make for a really powerful and flexible multi-dimensional visualization/analysis tool. Is this something the core team would be interested in working on soon? @alexcjohnson @monfera @etpinard From my side, I plan to finish up my |
Oh, and one other consideration comes to mind. With the parallel categories approach, it's possible to specify a count for each point in the dataset. This is just like the This allows the grouping logic that identifies the counts for the unique paths to happen outside of the browser. One or our use-cases is to generate these diagrams in Python from larger than memory datasets. To do this we can first perform a groupby-count using dask or spark and then feed the unique paths and their counts into the parallel categories diagram. Do you have any thoughts on how a sample count could be supported in an enhanced |
Yes, that's a great use case, and we have done similar things in for example histogram traces ( |
It will be great to mix categorial and numeric channels. |
Hi - this issue has been sitting for a while, so as part of our effort to tidy up our public repositories I'm going to close it. If it's still a concern, we'd be grateful if you could open a new issue (with a short reproducible example if appropriate) so that we can add it to our stack. Cheers - @gvwilson |
Technical notes re #2221 (comment)
On multidimensional explorers: made a separate issue as this topic is deep and the topic of "New charts" will likely be quite broad.
We've been planning multidimensional extensions eg. SPLOM. The above directions also make a lot of sense, and as usual there's always the tradeoff among implementation time (ultimately $), payload size and functionality. A nice option would be a kind of unification of plots, eg.
parcoords
andsankey
, both of which are relatively compact as they do one thing without much configuration. Another option is a new plot type, or a new trace type eg. on the substrate ofparcoords
.Here's a quick note on the expected challenges of integrating the new thing with either:
parcoords
had been written for the express purpose of performance, so it uses WebGL for rendering. It usesGL.LINES
as that's the most compact geometry ie. fastest to do the interactions with (there were response time criteria). To make lines thicker, we'd need to convert to GL.TRIANGLES or similar, ie. 2-3x work in the vertex shader (which also does GPU crossfiltering). WebGL does have alineWidth
method but the standard permits that implementations cap line width to 1px, which browsers increasingly opted for over time... If Sankey-like splines are also needed, that's another layer of performance hit and implementational complexity. Yet it'd be possible to add an SVG or polygonal WebGL trace type for those cases where data points are not in the multiple 10k range and more diverse geometry eg. thicker lines are needed.parcoords
is already multilayer, the axes are in SVG and there could be more SVG layers.sankey
is an alternative target. After all, it internally works as a layered graph, ie. the internal representation is already close to the axis cadence of parallel coordinates and their ilk. But our implementation uses the heuristic as it is ind3-sankey
(we only added support for multiedges via a PR), and it's free to arrange the edges, resulting in the observed line discontinuity that's definitely not in the style of parallel coordinates. This freedom yields a more optimal arrangement in the case of general Sankey work, ie. minimizing the line crossings, not a concern for parcoords-like work. We'd need to somehow add configuration for bypassing the heuristic or the entired3-sankey
in favor of a parcoords-like continuous line layout.There's the option that these two, and the new functionality, be unified, which is doable but sounds like even more work. Also, the chart design space is enormously large and the way
plotly.js
is set up, for better or worse, it's geared more for specific, configurable chart types rather than a Grammar of Graphics like, fluid or low level way of building up toward a desired chart type. The reason it comes up is that there are a lot of possibly useful additions and improvements that can be made to the charts we speak of. The implementationally easiest thing on the other hand is a completely new chart type but that probably adds the largest JS payload (not sure if it's a concern, @alexcjohnson or @etpinard could tell). Btw. there's a parallel sets implementation that interacts not unlike ourparcoords
andsankey
with drag&drop: https://www.jasondavies.com/parallel-sets/ which shows line continuity but not sure if it supports multiedges.The text was updated successfully, but these errors were encountered: