-
-
Notifications
You must be signed in to change notification settings - Fork 1.9k
Filter and groupby transforms in main bundle #978
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
- instead of calling it for every transform trace - call findArrayAttributes in transform function - make findArrayAttributes 'aware' of array attribute in trace so that it lists only the array attributes present in given trace
- to allow calcTransform methods where fullTrace / or calcdata items can be modified during the calc step. - this allows writing transforms that are applied after much of the 'computed' fields are filled e.g. axis (auto) 'type' - calcTransforms can only manipulate trace object 1-to-1 e.g. grouping operation should be done at the defaults step as previously.
- rewrite transform as a calcTransform - add 'active' (like in groupby) - add 'strictinterval' (boolean) to determine e.g. `>` vs `>=` - makes filter works with numerical, categorical and dates!!!
- extend original traces with extendDeepNoArrays (no need to copy arrays - they are filled during the grouping operation) - swap .forEach for good old for loop
- move some more general cases our of filter_test -> multi_test - add some filter default cases (around 'active' and 'strictinterval') - add a bunch of calcTransform test (numeric, categorical and dates are now covered!)
This PR adds another layer of functionality to transform plugins ➡️ transforms can now be applied during the
Applying transforms at the calc has a few important advantages:
Important For implementation details, see commit 1238236 In turn the filter transform was mostly rewritten to handle numerical, categorical and dates. See 0e1e9a7 |
* @return {array} arrayAttributes | ||
* list of array attributes for the given trace | ||
*/ | ||
exports.findArrayAttributes = function(trace) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@monfera FYI I decided to make findArrayAttributes
a lib function instead meant to be called on-demand by the transform modules.
That way, this same function can now find the array attributes per trace module and filter them to match the array attribute in the user trace.
exports.attributes = { | ||
active: { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I added an active
attribute similar to the active
in the groupby transform.
BUT, if no one objects, I'll change active
-> visible
for consistency with our other components. Moreover, active
is already used in layout.updatemenus
where it an 'integer'
attribute, having both a boolean and an integer active
attribute in our schema would be pretty confusing.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sounds fine to me!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am a bit bothered by visible=true
exactly causing things to become _in_visible.
How about enabled
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
active
-> enabled
in f4b9dcc
'Has only an effect for `operation` *>*, *<*, *within* and *notwithin*', | ||
'When `operation` is set to *within* and *notwithin*,', | ||
'`strictinterval` is expected to be a 2-item array where the first (second)', | ||
'item determines strictness for the lower (second) bound.' |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not super happy about this attribute name.
Any suggestion would be appreciated.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I looked to wikipedia for guidance, but it suggested 'strict inequality', so that that's perhaps where you already looked. Otherwise I usually mentally call it an 'open' or 'closed' interval. Wouldn't terminology from mathematics be the most universally understood and transparent nomenclature? 😬
Srsly though, what about grouping this into the operator? Would <=
be more straightforward than setting <
and applying strictinterval: false
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
but it suggested 'strict inequality', so that that's perhaps where you already looked.
Nop, I just remembered that term from Math classes. 😄
Would <= be more straightforward than setting < and applying strictinterval: false
I thought about that. But >
+ >=
doesn't extend to within
and notwithin
.
To make it really mathy, maybe we could change operation: 'within'
for operation: '[]'
- where []
is a closed interval ()
would an open interval and similarly for [)
and (]
?
@alexcjohnson @cldougl want to chip in on this?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
and maybe operation: 'in'
should be operation: '{}'
as in a set.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I generally like fewer attributes (see orientation
+ openreverse
vs. just direction
discussion), but I have no objection to the status quo, even if it's not the most beautiful, satisfying, elegant API ever. As long as it's straightforward and usable and correct, that's probably sufficient.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm just muddying it further but for lots of charting, e.g. binning, time series etc. a frequent one is left closed, right open...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
👍 for collapsing these two into one attribute. Having >
sometimes mean >=
would be strange, just define them both to mean what they always mean.
I actually kinda like []
, ()
, [)
, and (]
! Would it be going too far to turn notwithin
into ][
, ](
, )[
, and )(
?
But if people think that would be too confusing, I'd also be happy to simplify it and say within
is always inclusive, and notwithin
is always exclusive (so it's the precise complement of within
) and if a user wants something else done with the boundaries they can manually tweak them, similar to how in auto mode we try to adjust histogram bins so there actually aren't any samples on the boundaries and this distinction is moot.
That said, I think in
and notin
should stay as they are.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah my main concern was if we use []
, ()
, [)
, and (]
instead of within
what would happen to the notwithin
option?
I was thinking a similar thing to what @alexcjohnson suggested - if we go with []
, ()
, [)
, and (]
for within
I think there should be a parallel syntax for notwithin
. I don't think ][
, ](
, )[
, and )(
is too confusing (as long as we write a thorough description).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done in 104123e
would love to get a review.
// as another round of supplyDefaults is done on the transformed traces | ||
newTrace = Lib.extendDeep(newTrace, style[groupName] || {}); | ||
newTrace = Lib.extendDeepNoArrays(newTrace, style[groupName] || {}); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@monfera no need to extend arrays here, right?
This could be a solid performance boost.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, I think it should be fine, haven't tested it myself.
xaxis: { type: 'category' } | ||
}); | ||
|
||
_assert(out, [-2, -3], [3, 1], [0.3, 0.4]); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@bpostlethwaite this is a very important test case.
Here, a user inputs numerical data on a category axis - meaning that the data items are cast to strings and drawn in the ordered that they are input. That is,
// x
[1, 2, 3, 0, -1, -2, -3]
// is plotted with x coordinates
[0, 1, 2, 3, 4, 5, 6]
The calcTransform
filter using fullLayout._xaxis.d2c
performs admirably. ⛵
}, | ||
filtersrc: { | ||
valType: 'enumerated', | ||
values: ['x', 'y', 'z', 'ids'], |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
are we keeping this explicit list of attributes, or is it obsoleted by crawling the attribute tree for filterable arrays?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
are we keeping this explicit list of attributes, or is it obsoleted by crawling the attribute tree for filterable arrays?
👍 would be awesome to filter on marker.size, or text, anything that has an array...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Based on request and code sample from @etpinard, the groupby
transform already uses the attribute crawler which had been extracted out of plot_schema.js
and now lives in Lib/coerce.js
. I was asked during the groupby
PR review to elevate the crawler call to transform
so I think there's plan to eventually use it in filter
, maybe as a subsequent PR.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done in ad77818
var cv = d2c(v); | ||
return cv <= coercedValue[0] || cv > coercedValue[1]; | ||
}; | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nice run of all combinations, and the notation looks good and math
Plotly.register([ | ||
require('@lib/pointcloud') | ||
]); | ||
|
||
// Test utilities |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nice catch!
case '][': | ||
return function(v) { | ||
var cv = d2c(v); | ||
return cv < coercedValue[0] || cv > coercedValue[1]; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this isn't what I thought ][
would mean... shouldn't it be cv <= coercedValue[0] || cv >= coercedValue[1]
? ie ]
always means <=
whether it's on the left side or the right side.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yeah, I guess that's easier to remember than ][
meaning not in the closed interval.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
out of context, the ][
notation would be ambiguous but one of its meanings is the complement interval, and current code is consistent with that meaning (since there's also the ()
symbols, it's actually disambiguated)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
fixed in 7cfa4ff
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
btw ][
means 'not in the open interval' and code is correctly implemented that way
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
"Some authors use ]a, b[ to denote the complement of the interval (a, b); namely, the set of all real numbers that are either less than or equal to a, or greater than or equal to b." - wiki
I think it makes intuitive sense also, in that the retained part, i.e. the part outside the interval, has closed bounds
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@etpinard has already changed it per my suggestion (thanks!) so not to beat a dead horse but I find it much more persuasive to think of these as a combination of two independent operators (]
always means <=
) than to try to relate them to the corresponding "within" forms. Likewise, think about histograms: if you have a bin with an interval like [)
, then to meet it exactly the next bin must start with [
, hence [
is the complement of )
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@alexcjohnson I'm not requesting anything one way or another, it just looked like @etpinard's original code was in line with the standard math notation which is why I mentioned this on my initial review. I may be mistaken or we may not want to follow it, I have less experience than you figuring out what's expected by users.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@monfera no problem - also somehow I didn't see your previous comment including "]a, b[" before writing mine - which I think ends up being exactly what we ended up with ⭐
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@alexcjohnson just to clarify myself, when I wrote 'complement' it's meant in the set operations sense, for example, the complement of [8, 12]
is the union of (-∞, 8)
and (12, ∞)
which, at least according to my comment you quote, from the wiki, can be denoted as )8, 12(
though again, whichever notation you guys settle will work out just fine.
'*)[* filters items outside `value[0]` to value[1]` and not equal to `value[1]`', | ||
|
||
'*{}* filters items present in a set of values', | ||
'*}{* filters items not present in a set of values' |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
the verb "filters" is a bit ambiguous - could be taken to mean "filters out" which is the opposite of what you mean. How about "keeps" or "shows"?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
good point - done in 7cfa4ff
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yes I also like "keeps" or "retains"
- add a missing test cases for semi-open intervals - replace verb 'filter' by 'keep' in operation description
From an off-line chat with @etpinard - To be concrete: lets imagine a transform
|
assertStyle([1], ['rgb(255, 0, 0)'], [1]); | ||
|
||
done(); | ||
it('+ *within*', function() { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
names could be updated
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done in 0aa4958
Similar to @alexcjohnson's above comment (excerpt from discussion on A possibility for now is to limit what the API caller can do to well-understood, test covered sequences e.g. |
- transform module are now required to have a transfrom OR calcTransform method
- so that filters can be applied to ALL possible data array e.g `filtersrc: 'marker.color'` - add tests for geo 'lon' / 'lat' and 'marker.color'
- using axisIds.getFromTrace !
@@ -120,6 +120,23 @@ describe('filter transforms calc:', function() { | |||
expect(out[0].y).toEqual(base.y); | |||
}); | |||
|
|||
it('filters should handle 3D *z* data', function() { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
fyi @bpostlethwaite this works 🎉 !!!
With commit 2a7738e registering a transform module with both a Maybe we could add a |
If someone goes to the trouble of making a single transform that has both components, they probably already know what this will do. The issue I wanted to highlight comes when people (who probably weren't the transform author) use multiple transforms on the same plot that are split between the two modalities, so that, it seems to me, should be mentioned in two places: 1) wherever users will first learn about transforms, and 2) a log message would be great, but the best place for it would be when transforms are found that may not do what you expect. Specifically, if the first transform that has a |
Regarding making things clear through documentation, just a side note on docs philosophy. It was expressed to me that the goal of the plotly docs is to communicate through examples rather than through explanation. I think this can make it difficult to just explain some things in a straightforward manner though, so I added a See the animation docs for an example. |
We don't have a place for that yet unfortunately. I don't think we advertise about writing custom transforms to users just yet - until transform stabilized a little more.
That sounds like a lot of logic just to log a message. Ok to differ until transforms stabilize? Ok to merge? |
@etpinard I see. Right, people don't have a lot to go on yet... but that may not stop people from trying! How about just a warning here https://github.com/plotly/plotly.js/pull/978/files#diff-6d186b954a58d5bb740f73d84fe39073R34 - where anyone who writes a transform will have to list it - that this feature is still alpha. |
Thx 💃 |
This PR continues on the work done #936 and attempts to address