Skip to content

RFC: fix traces subdivision by multiple properties #471

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
wants to merge 3 commits into from

Conversation

alyst
Copy link
Contributor

@alyst alyst commented Feb 23, 2016

This is my version of the #418 fix:

  • add indices property to the trace to reference the indices of
    the data elements associated with the trace
  • replace tracify() by subdivide_traces() that correctly subdivides
    the current set of traces based on the given data property

Note that while the traces are assigned the correct visual properties, the PR does not fix #381 -- individual group traces are still being created and displayed in the legend.

Comments, especially regarding the ways to fix #381 are most welcome.

@cpsievert
Copy link
Collaborator

Wow! At first glance this looks great and attacks a big problem 🍻

I'll need to a few days to mull this over. In the meantime, could you add some tests to ensure this new behavior works as intended?

Also, don't worry about the current test failures... I need to find a way to skip those on external pull requests.

}
sub_trace
})
}
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@alyst I'm not quite sure what this if statement is doing. Should this say legendgroup instead of legenditem?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@cpsievert It tries to solve the problem of having an item in the legend for each trace (which is not what you want if e.g. you are doing something similar to geom_segment()). The new property legenditem specifies the name of the single element in the legend that all the traces in the same legenditem-group would be referring to (unless they are further subdivided by something else).
To implement this, only the first trace of the group is shown in the legend, but all the other traces still share the same legendgroup.
Although it utilizes legendgroup mechanism, I thought it's better to call the property legenditem, because from the user POV it doesn't do any grouping of legend elements, it just creates one item for the group of traces.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I should have mentioned this a long time ago in response to your comment:

Instead of creating a trace for each group and managing legend entries via legendgroup, I'd prefer to take the same approach as ggplotly() where group doesn't create multiple traces, but groups are separated by NAs. For a simple example, plot_ly(x = c(1:10), y = c(1:10), group = rep(1:2, each = 5)) should translate to plot_ly(x = c(1:5, NA, 6:10), y = c(1:5, NA, 6:10)).

We use plotly:::group2NA() to handle this ggplotly(). Note that this line checks to see if we are drawing a polygon (in plotlyjs, a polygon is a scatter trace with mode of lines). In that case, we need to "retrace" the first point in each group to effectively close each polygon. There isn't any nice, high-level way to say "draw a polygon" using plot_ly(), so for now, I think we could just slip in NAs without worrying about retracing points. Does that make sense @alyst?

@alyst
Copy link
Contributor Author

alyst commented Apr 7, 2016

@cpsievert Oh, I wasn't aware about the NA trick. Definitely it makes things a lot easier! I'll try to update the PR to make use of it (actually, it would also be easier to implement (sub)trace grouping using dplyr::group_by() & co, because current loop-based implementation is quite slow for medium to large-sized datasets).
However, in your example group serves for defining the subtraces and disappears during the postprocessing. What if somebody wants to use group to define the trace name? Maybe we can leave the group behaviour as is and introduce the subgroup(subtrace?) property that would be processed exactly as you described?

@cpsievert
Copy link
Collaborator

in your example group serves for defining the subtraces and disappears during the postprocessing.

What do you mean by subtraces? My example is really meant to show how the group argument should work (i.e. it should create just one trace with a gap).

What if somebody wants to use group to define the trace name?

I'm not sure I see the use case. Can you provide an example? In my example, since it should create just one trace, you could name it like this: plot_ly(x = c(1:10), y = c(1:10), group = rep(1:2, each = 5), name = "myTrace")

Maybe we can leave the group behaviour as is and introduce the subgroup(subtrace?) property that would be processed exactly as you described?

I don't think so. group should not create new traces. It should only control NA placement after we've "traceified" for color/shape/etc.

@alyst
Copy link
Contributor Author

alyst commented Apr 7, 2016

Ok, I see: the color and shape can tracify (i.e. introduce new traces), but the group can not. I had an idea that maybe group could also tracify, then it would have been possible to control the trace names via the group values. The use case is when both colors and shapes are mapped essentially to the same data column, then the autogenerated trace names in the legend would look strange, and it would be nice to use group to override it.
Of course, ATM one can do add_trace(name=...) for each trace, but that's rather tedious. Maybe we can allow binding name to a column and support tracifying by name?

@alyst alyst force-pushed the fix_traces branch 5 times, most recently from 7c49ca8 to 14c99fd Compare May 19, 2016 16:33
@alyst
Copy link
Contributor Author

alyst commented May 19, 2016

@cpsievert Based on the comments, I've updated (redesigned) the PR. It passes all the tests except those where interaction with the plotly server is required.

  1. Tracification is now using dplyr::group_by() (hence plyr dependency (I have not detected where it is used currently) was replaced by dplyr). That is faster than looping through the unique levels of the data vector and it's done only once for all data vectors that affect tracification (e.g. symbol, color etc).
  2. Tracification is done in 2 phases: first, the distinct traces (i.e. the indices of dat data vectors elements) are identified, then dat is split into traces by generate_traces() function accordingly.
  3. The adjustment of trace visuals that was previously done by colorize()/symbolize() functions has been generalized into the "brushes" concept: colorize() is replaced by color_brush() function that returns a "brush" -- the function that has to be applied to a trace to adjust its color. generate_traces() gets a list of the brushes to be applied to each trace. It should allow easier extension of the tracification logic.
  4. As suggested, group property no longer generates the new traces (so 2 tests are disabled). Basically, it does nothing unless we are in "scatter" "lines" mode, in which case it inserts NA into the data at each group boundary to break the polylines. Since it utilizes dplyr::group_by(), it should be noticeably faster than the trace-based approach. It should be also faster on the client side.
  5. I've kept the "legenditem" property that I proposed in the previous version of the patch. It would allow to group the traces with different visual properties under a single legend item. Should be useful when there are too many different traces to show in the legend. Another alternative would be to enable tracification for the name/legendgroup/showlegend properties.

@alyst alyst changed the title WIP: fix traces subdivision by multiple properties RFC: fix traces subdivision by multiple properties May 23, 2016
@alyst alyst force-pushed the fix_traces branch 3 times, most recently from edc77b0 to e49b2b0 Compare June 2, 2016 16:37
alyst added 3 commits June 3, 2016 00:19
- separate the grouping of data into traces (this now done
  with the help of dplyr::group_by()) from traces creating and
  applying styles(brushes) to each resulting trace
- replace `tracify()` by `generate_traces()`,
  replace `colorize/symbolize()` by `color/symbol_brush()`
- import dplyr
- update "group" tests
"group" property doesn't create any new trace, but if the trace
mode is "lines", NA points are inserted at the "group" boundaries
within the same trace. That is interpreted as the polyline break by
plotlyjs.
legenditem allows to group multiply traces into one legend item
@cpsievert
Copy link
Collaborator

This problem was fixed by #628. Thanks for the fruitful discussion @alyst

@cpsievert cpsievert closed this Jul 26, 2016
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

group should not create multiple traces
2 participants