RFC: fix traces subdivision by multiple properties #471

alyst · 2016-02-23T01:17:57Z

This is my version of the #418 fix:

add indices property to the trace to reference the indices of
the data elements associated with the trace
replace tracify() by subdivide_traces() that correctly subdivides
the current set of traces based on the given data property

Note that while the traces are assigned the correct visual properties, the PR does not fix #381 -- individual group traces are still being created and displayed in the legend.

Comments, especially regarding the ways to fix #381 are most welcome.

cpsievert · 2016-02-23T02:42:01Z

Wow! At first glance this looks great and attacks a big problem 🍻

I'll need to a few days to mull this over. In the meantime, could you add some tests to ensure this new behavior works as intended?

Also, don't worry about the current test failures... I need to find a way to skip those on external pull requests.

cpsievert · 2016-03-26T06:09:36Z

R/plotly.R

+          }
+          sub_trace
+        })
+      }


@alyst I'm not quite sure what this if statement is doing. Should this say legendgroup instead of legenditem?

@cpsievert It tries to solve the problem of having an item in the legend for each trace (which is not what you want if e.g. you are doing something similar to geom_segment()). The new property legenditem specifies the name of the single element in the legend that all the traces in the same legenditem-group would be referring to (unless they are further subdivided by something else).
To implement this, only the first trace of the group is shown in the legend, but all the other traces still share the same legendgroup.
Although it utilizes legendgroup mechanism, I thought it's better to call the property legenditem, because from the user POV it doesn't do any grouping of legend elements, it just creates one item for the group of traces.

I should have mentioned this a long time ago in response to your comment:

Instead of creating a trace for each group and managing legend entries via legendgroup, I'd prefer to take the same approach as ggplotly() where group doesn't create multiple traces, but groups are separated by NAs. For a simple example, plot_ly(x = c(1:10), y = c(1:10), group = rep(1:2, each = 5)) should translate to plot_ly(x = c(1:5, NA, 6:10), y = c(1:5, NA, 6:10)).

We use plotly:::group2NA() to handle this ggplotly(). Note that this line checks to see if we are drawing a polygon (in plotlyjs, a polygon is a scatter trace with mode of lines). In that case, we need to "retrace" the first point in each group to effectively close each polygon. There isn't any nice, high-level way to say "draw a polygon" using plot_ly(), so for now, I think we could just slip in NAs without worrying about retracing points. Does that make sense @alyst?

alyst · 2016-04-07T08:00:42Z

@cpsievert Oh, I wasn't aware about the NA trick. Definitely it makes things a lot easier! I'll try to update the PR to make use of it (actually, it would also be easier to implement (sub)trace grouping using dplyr::group_by() & co, because current loop-based implementation is quite slow for medium to large-sized datasets).
However, in your example group serves for defining the subtraces and disappears during the postprocessing. What if somebody wants to use group to define the trace name? Maybe we can leave the group behaviour as is and introduce the subgroup(subtrace?) property that would be processed exactly as you described?

cpsievert · 2016-04-07T09:14:13Z

in your example group serves for defining the subtraces and disappears during the postprocessing.

What do you mean by subtraces? My example is really meant to show how the group argument should work (i.e. it should create just one trace with a gap).

What if somebody wants to use group to define the trace name?

I'm not sure I see the use case. Can you provide an example? In my example, since it should create just one trace, you could name it like this: plot_ly(x = c(1:10), y = c(1:10), group = rep(1:2, each = 5), name = "myTrace")

Maybe we can leave the group behaviour as is and introduce the subgroup(subtrace?) property that would be processed exactly as you described?

I don't think so. group should not create new traces. It should only control NA placement after we've "traceified" for color/shape/etc.

alyst · 2016-04-07T09:53:14Z

Ok, I see: the color and shape can tracify (i.e. introduce new traces), but the group can not. I had an idea that maybe group could also tracify, then it would have been possible to control the trace names via the group values. The use case is when both colors and shapes are mapped essentially to the same data column, then the autogenerated trace names in the legend would look strange, and it would be nice to use group to override it.
Of course, ATM one can do add_trace(name=...) for each trace, but that's rather tedious. Maybe we can allow binding name to a column and support tracifying by name?

alyst · 2016-05-19T17:17:24Z

@cpsievert Based on the comments, I've updated (redesigned) the PR. It passes all the tests except those where interaction with the plotly server is required.

Tracification is now using dplyr::group_by() (hence plyr dependency (I have not detected where it is used currently) was replaced by dplyr). That is faster than looping through the unique levels of the data vector and it's done only once for all data vectors that affect tracification (e.g. symbol, color etc).
Tracification is done in 2 phases: first, the distinct traces (i.e. the indices of dat data vectors elements) are identified, then dat is split into traces by generate_traces() function accordingly.
The adjustment of trace visuals that was previously done by colorize()/symbolize() functions has been generalized into the "brushes" concept: colorize() is replaced by color_brush() function that returns a "brush" -- the function that has to be applied to a trace to adjust its color. generate_traces() gets a list of the brushes to be applied to each trace. It should allow easier extension of the tracification logic.
As suggested, group property no longer generates the new traces (so 2 tests are disabled). Basically, it does nothing unless we are in "scatter" "lines" mode, in which case it inserts NA into the data at each group boundary to break the polylines. Since it utilizes dplyr::group_by(), it should be noticeably faster than the trace-based approach. It should be also faster on the client side.
I've kept the "legenditem" property that I proposed in the previous version of the patch. It would allow to group the traces with different visual properties under a single legend item. Should be useful when there are too many different traces to show in the legend. Another alternative would be to enable tracification for the name/legendgroup/showlegend properties.

- separate the grouping of data into traces (this now done with the help of dplyr::group_by()) from traces creating and applying styles(brushes) to each resulting trace - replace `tracify()` by `generate_traces()`, replace `colorize/symbolize()` by `color/symbol_brush()` - import dplyr - update "group" tests

"group" property doesn't create any new trace, but if the trace mode is "lines", NA points are inserted at the "group" boundaries within the same trace. That is interpreted as the polyline break by plotlyjs.

legenditem allows to group multiply traces into one legend item

cpsievert · 2016-07-26T15:10:18Z

This problem was fixed by #628. Thanks for the fruitful discussion @alyst

cpsievert reviewed Mar 26, 2016
View reviewed changes

alyst force-pushed the fix_traces branch 5 times, most recently from 7c49ca8 to 14c99fd Compare May 19, 2016 16:33

alyst changed the title ~~WIP: fix traces subdivision by multiple properties~~ RFC: fix traces subdivision by multiple properties May 23, 2016

alyst force-pushed the fix_traces branch 3 times, most recently from edc77b0 to e49b2b0 Compare June 2, 2016 16:37

alyst added 3 commits June 3, 2016 00:19

support "group" property to break lines

188ee19

"group" property doesn't create any new trace, but if the trace mode is "lines", NA points are inserted at the "group" boundaries within the same trace. That is interpreted as the polyline break by plotlyjs.

add "legenditem" property

ed2a0df

legenditem allows to group multiply traces into one legend item

alyst force-pushed the fix_traces branch from 91dc0e7 to ed2a0df Compare June 2, 2016 22:21

alyst mentioned this pull request Jun 2, 2016

ggplotly: separate facet columns by <br> #593

Merged

alyst closed this Jun 2, 2016

alyst reopened this Jun 2, 2016

cpsievert closed this Jul 26, 2016

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

RFC: fix traces subdivision by multiple properties #471

RFC: fix traces subdivision by multiple properties #471

Uh oh!

alyst commented Feb 23, 2016

Uh oh!

cpsievert commented Feb 23, 2016

Uh oh!

cpsievert Mar 26, 2016

Uh oh!

alyst Mar 26, 2016

Uh oh!

cpsievert Apr 6, 2016

Uh oh!

alyst commented Apr 7, 2016

Uh oh!

cpsievert commented Apr 7, 2016

Uh oh!

alyst commented Apr 7, 2016

Uh oh!

alyst commented May 19, 2016

Uh oh!

cpsievert commented Jul 26, 2016

Uh oh!

Uh oh!

RFC: fix traces subdivision by multiple properties #471

RFC: fix traces subdivision by multiple properties #471

Uh oh!

Conversation

alyst commented Feb 23, 2016

Uh oh!

cpsievert commented Feb 23, 2016

Uh oh!

cpsievert Mar 26, 2016

Choose a reason for hiding this comment

Uh oh!

alyst Mar 26, 2016

Choose a reason for hiding this comment

Uh oh!

cpsievert Apr 6, 2016

Choose a reason for hiding this comment

Uh oh!

alyst commented Apr 7, 2016

Uh oh!

cpsievert commented Apr 7, 2016

Uh oh!

alyst commented Apr 7, 2016

Uh oh!

alyst commented May 19, 2016

Uh oh!

cpsievert commented Jul 26, 2016

Uh oh!

Uh oh!