Filter and groupby transforms in main bundle #978

etpinard · 2016-09-26T18:47:55Z

This PR continues on the work done #936 and attempts to address

- instead of calling it for every transform trace - call findArrayAttributes in transform function - make findArrayAttributes 'aware' of array attribute in trace so that it lists only the array attributes present in given trace

- to allow calcTransform methods where fullTrace / or calcdata items can be modified during the calc step. - this allows writing transforms that are applied after much of the 'computed' fields are filled e.g. axis (auto) 'type' - calcTransforms can only manipulate trace object 1-to-1 e.g. grouping operation should be done at the defaults step as previously.

- rewrite transform as a calcTransform - add 'active' (like in groupby) - add 'strictinterval' (boolean) to determine e.g. `>` vs `>=` - makes filter works with numerical, categorical and dates!!!

- extend original traces with extendDeepNoArrays (no need to copy arrays - they are filled during the grouping operation) - swap .forEach for good old for loop

- move some more general cases our of filter_test -> multi_test - add some filter default cases (around 'active' and 'strictinterval') - add a bunch of calcTransform test (numeric, categorical and dates are now covered!)

etpinard · 2016-09-29T18:19:38Z

@bpostlethwaite @alexcjohnson

This PR adds another layer of functionality to transform plugins ➡️ transforms can now be applied during the calc step. In details, the plotting code will apply transforms:

at the defaults step if the registered transform module has transform method, and
at the calc step if the registered transform module has a calcTransform method.

Applying transforms at the calc has a few important advantages:

transforms have access to several computed fields such as axis the type (that the calc step comes after the auto-type routine) and the very useful data-to-coordinate methods (d2c)
the calc step is not invoked as often as the defaults step on redraws. Loosely speaking, the calc is (or should be) invoked only when the data arrays change. For example, zooming on a cartesian plot triggers a supply-defaults call but no calc step.

Important calcTransform are expected to mutate a lone trace object. That is, they are only meant for 1-trace-to-1-trace transforms. For example, groupby transforms should be applied at the defaults as is the case currently.

For implementation details, see commit 1238236

In turn the filter transform was mostly rewritten to handle numerical, categorical and dates. See 0e1e9a7

etpinard · 2016-09-29T18:24:19Z

src/lib/coerce.js

+ * @return {array} arrayAttributes
+ *  list of array attributes for the given trace
+ */
+exports.findArrayAttributes = function(trace) {


@monfera FYI I decided to make findArrayAttributes a lib function instead meant to be called on-demand by the transform modules.

That way, this same function can now find the array attributes per trace module and filter them to match the array attribute in the user trace.

etpinard · 2016-09-29T18:27:45Z

src/transforms/filter.js

 exports.attributes = {
+    active: {


I added an active attribute similar to the active in the groupby transform.

BUT, if no one objects, I'll change active -> visible for consistency with our other components. Moreover, active is already used in layout.updatemenus where it an 'integer' attribute, having both a boolean and an integer active attribute in our schema would be pretty confusing.

Sounds fine to me!

I am a bit bothered by visible=true exactly causing things to become _in_visible.
How about enabled?

active -> enabled in f4b9dcc

etpinard · 2016-09-29T18:28:33Z

src/transforms/filter.js

+            'Has only an effect for `operation` *>*, *<*, *within* and *notwithin*',
+            'When `operation` is set to  *within* and *notwithin*,',
+            '`strictinterval` is expected to be a 2-item array where the first (second)',
+            'item determines strictness for the lower (second) bound.'


I'm not super happy about this attribute name.

Any suggestion would be appreciated.

I looked to wikipedia for guidance, but it suggested 'strict inequality', so that that's perhaps where you already looked. Otherwise I usually mentally call it an 'open' or 'closed' interval. Wouldn't terminology from mathematics be the most universally understood and transparent nomenclature? 😬

Srsly though, what about grouping this into the operator? Would <= be more straightforward than setting < and applying strictinterval: false?

but it suggested 'strict inequality', so that that's perhaps where you already looked.

Nop, I just remembered that term from Math classes. 😄

Would <= be more straightforward than setting < and applying strictinterval: false

I thought about that. But > + >= doesn't extend to within and notwithin.

To make it really mathy, maybe we could change operation: 'within' for operation: '[]' - where [] is a closed interval () would an open interval and similarly for [) and (] ?

@alexcjohnson @cldougl want to chip in on this?

and maybe operation: 'in' should be operation: '{}' as in a set.

I generally like fewer attributes (see orientation + openreverse vs. just direction discussion), but I have no objection to the status quo, even if it's not the most beautiful, satisfying, elegant API ever. As long as it's straightforward and usable and correct, that's probably sufficient.

I'm just muddying it further but for lots of charting, e.g. binning, time series etc. a frequent one is left closed, right open...

👍 for collapsing these two into one attribute. Having > sometimes mean >= would be strange, just define them both to mean what they always mean.

I actually kinda like [], (), [), and (]! Would it be going too far to turn notwithin into ][, ](, )[, and )(?

But if people think that would be too confusing, I'd also be happy to simplify it and say within is always inclusive, and notwithin is always exclusive (so it's the precise complement of within) and if a user wants something else done with the boundaries they can manually tweak them, similar to how in auto mode we try to adjust histogram bins so there actually aren't any samples on the boundaries and this distinction is moot.

That said, I think in and notin should stay as they are.

Yeah my main concern was if we use [], (), [), and (] instead of within what would happen to the notwithin option?
I was thinking a similar thing to what @alexcjohnson suggested - if we go with [], (), [), and (] for within I think there should be a parallel syntax for notwithin. I don't think ][, ](, )[, and )( is too confusing (as long as we write a thorough description).

done in 104123e

would love to get a review.

etpinard · 2016-09-29T18:30:11Z

src/transforms/groupby.js

        // as another round of supplyDefaults is done on the transformed traces
-        newTrace = Lib.extendDeep(newTrace, style[groupName] || {});
+        newTrace = Lib.extendDeepNoArrays(newTrace, style[groupName] || {});


@monfera no need to extend arrays here, right?

This could be a solid performance boost.

Yes, I think it should be fine, haven't tested it myself.

etpinard · 2016-09-29T18:33:53Z

test/jasmine/tests/transform_filter_test.js

+                xaxis: { type: 'category' }
+            });
+
+            _assert(out, [-2, -3], [3, 1], [0.3, 0.4]);


@bpostlethwaite this is a very important test case.

Here, a user inputs numerical data on a category axis - meaning that the data items are cast to strings and drawn in the ordered that they are input. That is,

// x [1, 2, 3, 0, -1, -2, -3] // is plotted with x coordinates [0, 1, 2, 3, 4, 5, 6]

The calcTransform filter using fullLayout._xaxis.d2c performs admirably. ⛵

monfera · 2016-09-30T20:53:28Z

src/transforms/filter.js

+    },
+    filtersrc: {
+        valType: 'enumerated',
+        values: ['x', 'y', 'z', 'ids'],


are we keeping this explicit list of attributes, or is it obsoleted by crawling the attribute tree for filterable arrays?

are we keeping this explicit list of attributes, or is it obsoleted by crawling the attribute tree for filterable arrays?

👍 would be awesome to filter on marker.size, or text, anything that has an array...

Based on request and code sample from @etpinard, the groupby transform already uses the attribute crawler which had been extracted out of plot_schema.js and now lives in Lib/coerce.js. I was asked during the groupby PR review to elevate the crawler call to transform so I think there's plan to eventually use it in filter, maybe as a subsequent PR.

done in ad77818

monfera · 2016-09-30T20:58:43Z

src/transforms/filter.js

+                var cv = d2c(v);
+                return cv <= coercedValue[0] || cv > coercedValue[1];
+            };
+


nice run of all combinations, and the notation looks good and math

monfera · 2016-09-30T21:02:15Z

test/jasmine/tests/gl2d_pointcloud_test.js

-Plotly.register([
-    require('@lib/pointcloud')
-]);
-
 // Test utilities


nice catch!

alexcjohnson · 2016-09-30T21:13:34Z

src/transforms/filter.js

+        case '][':
+            return function(v) {
+                var cv = d2c(v);
+                return cv < coercedValue[0] || cv > coercedValue[1];


this isn't what I thought ][ would mean... shouldn't it be cv <= coercedValue[0] || cv >= coercedValue[1]? ie ] always means <= whether it's on the left side or the right side.

yeah, I guess that's easier to remember than ][ meaning not in the closed interval.

out of context, the ][ notation would be ambiguous but one of its meanings is the complement interval, and current code is consistent with that meaning (since there's also the () symbols, it's actually disambiguated)

fixed in 7cfa4ff

btw ][ means 'not in the open interval' and code is correctly implemented that way

"Some authors use ]a, b[ to denote the complement of the interval (a, b); namely, the set of all real numbers that are either less than or equal to a, or greater than or equal to b." - wiki

I think it makes intuitive sense also, in that the retained part, i.e. the part outside the interval, has closed bounds

@etpinard has already changed it per my suggestion (thanks!) so not to beat a dead horse but I find it much more persuasive to think of these as a combination of two independent operators (] always means <=) than to try to relate them to the corresponding "within" forms. Likewise, think about histograms: if you have a bin with an interval like [), then to meet it exactly the next bin must start with [, hence [ is the complement of )

@alexcjohnson I'm not requesting anything one way or another, it just looked like @etpinard's original code was in line with the standard math notation which is why I mentioned this on my initial review. I may be mistaken or we may not want to follow it, I have less experience than you figuring out what's expected by users.

@monfera no problem - also somehow I didn't see your previous comment including "]a, b[" before writing mine - which I think ends up being exactly what we ended up with ⭐

@alexcjohnson just to clarify myself, when I wrote 'complement' it's meant in the set operations sense, for example, the complement of [8, 12] is the union of (-∞, 8) and (12, ∞) which, at least according to my comment you quote, from the wiki, can be denoted as )8, 12( though again, whichever notation you guys settle will work out just fine.

alexcjohnson · 2016-09-30T21:21:11Z

src/transforms/filter.js

+            '*)[* filters items outside `value[0]` to value[1]` and not equal to `value[1]`',
+
+            '*{}* filters items present in a set of values',
+            '*}{* filters items not present in a set of values'


the verb "filters" is a bit ambiguous - could be taken to mean "filters out" which is the opposite of what you mean. How about "keeps" or "shows"?

good point - done in 7cfa4ff

yes I also like "keeps" or "retains"

- add a missing test cases for semi-open intervals - replace verb 'filter' by 'keep' in operation description

alexcjohnson · 2016-09-30T23:13:20Z

From an off-line chat with @etpinard - calcTransform transforms always happen after transform (supplyDefaults) transforms, even if you put the calc ones first in the transform list. Within our current structure there isn't really any way around this. The situation will get even more complicated if a transform has both methods. This doesn't mean we should change anything about what we're building here, but we need to make it crystal clear to transform users what this means. Also note that it's irrelevant for transforms that commute (which groupBy and filter do)

To be concrete: lets imagine a transform truncate(N) that just passes on the first N points. This could be implemented either way (and both would have limitations) but let's imagine we make it with calcTransform. Conceptually, [truncate(100), groupBy] should result in grouping just the first 100 points of the trace into separate traces (each with <100 points) but it will actually result in the first 100 points of each group, which is what we expect from [groupBy, truncate(100)], and there would be no way to get the former behavior without making a new implementation of truncate using transform only. However, if you combine truncate and filter, the result will respect the order you give them.

Make this limitation crystal clear to users (through appropriate docs / comments)

alexcjohnson · 2016-09-30T23:36:37Z

test/jasmine/tests/transform_filter_test.js

-            assertStyle([1], ['rgb(255, 0, 0)'], [1]);
-
-            done();
+        it('+ *within*', function() {


names could be updated

done in 0aa4958

monfera · 2016-09-30T23:52:22Z

Similar to @alexcjohnson's above comment (excerpt from discussion on truncate vs filter with @etpinard) I had concerns about the meaning and interaction between groupby and filter, e.g. would a [groupby, filter] sequence be analogous to the SQL having, and does it make sense to have e.g. [groupby, filter, groupby, filter, groupby, filter, groupby, filter] or just thinking of it, just consecutive groupby steps.

A possibility for now is to limit what the API caller can do to well-understood, test covered sequences e.g. [filter], [groupby] or [filter, groupby].

- transform module are now required to have a transfrom OR calcTransform method

- so that filters can be applied to ALL possible data array e.g `filtersrc: 'marker.color'` - add tests for geo 'lon' / 'lat' and 'marker.color'

- using axisIds.getFromTrace !

etpinard · 2016-10-03T15:56:21Z

test/jasmine/tests/transform_filter_test.js

@@ -120,6 +120,23 @@ describe('filter transforms calc:', function() {
        expect(out[0].y).toEqual(base.y);
    });

+    it('filters should handle 3D *z* data', function() {


fyi @bpostlethwaite this works 🎉 !!!

etpinard · 2016-10-03T16:00:09Z

@alexcjohnson

Make this limitation crystal clear to users (through appropriate docs / comments)

With commit 2a7738e registering a transform module with both a transform and calcTransform method will log info about execution order. Were you looking for something even clearer?

Maybe we could add a src/transforms/_example.js file or a src/transforms/README.md with more info about transforms?

alexcjohnson · 2016-10-03T16:43:16Z

With commit 2a7738e registering a transform module with both a transform and calcTransform method will log info about execution order. Were you looking for something even clearer?

Maybe we could add a src/transforms/_example.js file or a src/transforms/README.md with more info about transforms?

If someone goes to the trouble of making a single transform that has both components, they probably already know what this will do. The issue I wanted to highlight comes when people (who probably weren't the transform author) use multiple transforms on the same plot that are split between the two modalities, so that, it seems to me, should be mentioned in two places: 1) wherever users will first learn about transforms, and 2) a log message would be great, but the best place for it would be when transforms are found that may not do what you expect. Specifically, if the first transform that has a calcTransform is earlier in the list than the last one that has a transform.

rreusser · 2016-10-03T16:57:40Z

Make this limitation crystal clear to users (through appropriate docs / comments)

Regarding making things clear through documentation, just a side note on docs philosophy. It was expressed to me that the goal of the plotly docs is to communicate through examples rather than through explanation. I think this can make it difficult to just explain some things in a straightforward manner though, so I added a markdown_content field that allows you to preface examples with a brief markdown-formatted explanation. I'm absolutely open to different approaches and thoughts here; I just found it difficult to communicate subtle/important details through code comments alone.

See the animation docs for an example.

etpinard · 2016-10-05T14:13:17Z

@alexcjohnson

wherever users will first learn about transforms

We don't have a place for that yet unfortunately. I don't think we advertise about writing custom transforms to users just yet - until transform stabilized a little more.

a log message would be great, but the best place for it would be when transforms are found that may not do what you expect. Specifically, if the first transform that has a calcTransform is earlier in the list than the last one that has a transform

That sounds like a lot of logic just to log a message. Ok to differ until transforms stabilize?

Ok to merge?

alexcjohnson · 2016-10-05T21:14:07Z

@etpinard I see. Right, people don't have a lot to go on yet... but that may not stop people from trying! How about just a warning here https://github.com/plotly/plotly.js/pull/978/files#diff-6d186b954a58d5bb740f73d84fe39073R34 - where anyone who writes a transform will have to list it - that this feature is still alpha.

alexcjohnson · 2016-10-05T23:32:59Z

Thx 💃

etpinard added 4 commits September 26, 2016 14:38

fix linting in lib/filter.js

6309837

lib: add groupby module + register it in main index

96bcbfc

lint: rm useless comments in transform modules

06c805d

lint: rm uselss Plotly.register calls in test suites

f6eb2f4

etpinard added feature something new status: in progress labels Sep 26, 2016

etpinard added this to the v1.18.0 milestone Sep 26, 2016

etpinard mentioned this pull request Sep 26, 2016

Transform input data: groupby, filter #917

Closed

2 tasks

test: move user-defined transform test to transform_multi

0158f45

etpinard mentioned this pull request Sep 26, 2016

Fix linting in lib/filter.js #981

Merged

etpinard added 10 commits September 28, 2016 16:59

transforms: make findArrayAttributes a lib function

64dc72a

- instead of calling it for every transform trace - call findArrayAttributes in transform function - make findArrayAttributes 'aware' of array attribute in trace so that it lists only the array attributes present in given trace

transforms: filter take II

0e1e9a7

- rewrite transform as a calcTransform - add 'active' (like in groupby) - add 'strictinterval' (boolean) to determine e.g. `>` vs `>=` - makes filter works with numerical, categorical and dates!!!

transforms: fill in groupby descriptions

f722c5d

transforms: perf in groupby

037fa5c

- extend original traces with extendDeepNoArrays (no need to copy arrays - they are filled during the grouping operation) - swap .forEach for good old for loop

lint: some line spacing in plots.js

01f8595

test: one big nasty commit untangling / adding transforms tests

8a29a89

- move some more general cases our of filter_test -> multi_test - add some filter default cases (around 'active' and 'strictinterval') - add a bunch of calcTransform test (numeric, categorical and dates are now covered!)

test: make sure all '#graph' nodes are purged after toimage tests

ad5089e

test: fixup plotschema case for new filter attributes

7a98f15

test: add case where axis type is set by user

c8a13b8

etpinard added status: reviewable and removed status: in progress labels Sep 29, 2016

etpinard commented Sep 29, 2016

View reviewed changes

etpinard added 2 commits September 30, 2016 15:05

transforms: replace 'active' attribute by 'enabled'

f4b9dcc

transforms: replace 'strictinterval' with MORE operation values

104123e

etpinard added 2 commits September 30, 2016 16:45

transforms: improve attribute descriptions

ba9b933

fix a few typos

7fc7dfa

monfera approved these changes Sep 30, 2016

View reviewed changes

alexcjohnson reviewed Sep 30, 2016

View reviewed changes

transforms: fix logic for outside intervals

7cfa4ff

- add a missing test cases for semi-open intervals - replace verb 'filter' by 'keep' in operation description

alexcjohnson reviewed Sep 30, 2016

View reviewed changes

etpinard added 5 commits October 3, 2016 11:07

api: relax transform module requirements

3e74c80

- transform module are now required to have a transfrom OR calcTransform method

api: improve register transform logging and tests

2a7738e

test: update transform case descriptions

0aa4958

transforms: allow 'filtersrc' to be an arbitrary attr string

ad77818

- so that filters can be applied to ALL possible data array e.g `filtersrc: 'marker.color'` - add tests for geo 'lon' / 'lat' and 'marker.color'

transforms: add support for date 3D z-axes

4cd9edf

- using axisIds.getFromTrace !

etpinard commented Oct 3, 2016

View reviewed changes

etpinard mentioned this pull request Oct 5, 2016

Custom scales or more advanced hoverinfo support #1008

Closed

doc: add comment about transform in main index file

f5c64d2

etpinard merged commit 987f1aa into master Oct 6, 2016

etpinard deleted the transforms-primetime branch October 6, 2016 01:06

Filter and groupby transforms in main bundle #978

Filter and groupby transforms in main bundle #978

Conversation

etpinard commented Sep 26, 2016

etpinard commented Sep 29, 2016

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

rreusser Sep 29, 2016 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

rreusser Sep 29, 2016 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

etpinard Sep 29, 2016 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

etpinard Sep 29, 2016 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

monfera Sep 30, 2016 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

monfera Sep 30, 2016 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

monfera Sep 30, 2016 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

alexcjohnson commented Sep 30, 2016 • edited by etpinard Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

monfera commented Sep 30, 2016 • edited Loading

Choose a reason for hiding this comment

etpinard commented Oct 3, 2016

alexcjohnson commented Oct 3, 2016

rreusser commented Oct 3, 2016 • edited Loading

etpinard commented Oct 5, 2016

alexcjohnson commented Oct 5, 2016

alexcjohnson commented Oct 5, 2016

rreusser Sep 29, 2016 •

edited

Loading

rreusser Sep 29, 2016 •

edited

Loading

etpinard Sep 29, 2016 •

edited

Loading

etpinard Sep 29, 2016 •

edited

Loading

monfera Sep 30, 2016 •

edited

Loading

monfera Sep 30, 2016 •

edited

Loading

monfera Sep 30, 2016 •

edited

Loading

alexcjohnson commented Sep 30, 2016 •

edited by etpinard

Loading

monfera commented Sep 30, 2016 •

edited

Loading

rreusser commented Oct 3, 2016 •

edited

Loading