[concept] Queue length limit #726

monfera · 2016-07-07T11:21:14Z

Client bug report on seemingly unjustified memory growth lead to, among other things, the retainment of data in the undo/redo queue. As I'm not too familiar with this feature, I can only suggest that there be a default (overridable) limit to the queue, or at least an optional configuration for the maximum queue length. So I'd be glad to pass the baton on this one, due to numerous WebGL improvement requirements I still need to do.

(cherry picked from commit e07f91d)

etpinard · 2016-07-07T13:30:49Z

@monfera you bring up a very interesting point here.

Having some sort of queue limit length makes sense to me.

But perhaps we could go even further by not storing data_array updates (which can be large) into the queue.

@chriddyp @bpostlethwaite does the new plotly workspace use Plotly.Queue to undo/redo operations? If so, to what extent? In the old workspace the undo / redo isn't exposed for data arrays.

@rreusser can you think of any side-effects for animations?

bpostlethwaite · 2016-07-07T13:38:36Z

We don't know what will be happening with undo/redo yet.

bpostlethwaite · 2016-07-07T13:47:34Z

That said I think data arrays are ok so long as there is a limit. Most folks have a couple gigabytes of memory at their disposal which means hundreds of millions to billions of array values can be stored.

We just want the heap to level out at some point. A big heap is not necessarily bad when the user has a lot of data.

rreusser · 2016-07-07T14:13:22Z

Re-reading that more carefully, regarding side effects on animations:

If you alter a frame, it's up to the user/implementer to pass a data_array with the same reference. If that's done correctly, then the queue shouldn't accumulate giant amounts of memory unless you're actually modifying the data_arrays.

monfera · 2016-07-07T14:15:38Z

In agreement with @rreusser : I believe a key benefit of WebGL (i.e. one main reason to use it) is its ability to handle large amounts of data, so it's especially important there. Luckily, WebGL requires a rather different implementation of animation, it's a big topic but in any case it's usually not done by re-exposing a gl plot to a new set of data, unless as a last resort (there are about 10 different ways and lots of constraints e.g. CPU->GPU bandwidth, unknowable GPU memory size, multiple geometries on the GPU, and the possibility to do some types of animations in the shaders, just changing uniforms).

monfera · 2016-07-07T14:30:31Z

@rreusser re using the same reference, maybe my comment is irrelevant or shows my ignorance of the bridge implementation but unless there's something expressly covering this case, R and probably Python and other bindings would send fresh arrays.

@bpostlethwaite looking at the implementation, a data set is often pasted into multiple places (sometimes the same object ID but haven't checked for all) and sometimes there's derived data, hover pick related data etc. that in effect acts as a small integer multiplier on the data array size. This doesn't alter your point but memory footprint can be more than a user's estimate. Also, except for the WebGL data transfers, plotly uses untyped arrays, where, especially if there are non-IEEE 754 values like null, the value representation may be boxed (larger size than pure numeric array). Also, folks who don't use high-end machines use Intel or AMD integrated GPUs by default (e.g. Macbook Retina 13 or lower end 15), so the GPU memory is taken off the main memory. Well it's a larger topic but the upshot is that with WebGL it'll be eventually useful to just push the data to the GPU and not retain any large arrays on the CPU side of things.

rreusser · 2016-07-07T14:41:57Z

Thoughts on assessing memory usage:

window.performance.memory: proprietary. chrome only. progressive enhancement?
object-size totally fails to account for duplicate data_array references
A WeakMap, is a good way to reference things for analysis without actually retaining them, but again, compatibility. (Edit: not so simple… not iterable… but interesting)

monfera · 2016-07-07T14:50:10Z

@rreusser wasn't exactly plain sailing (due to multiple issues entangled in like effect) but the Chrome dev tool led me to the places where memory was retained, e.g. the awesome heap snapshot comparison (and things like sorting by real memory footprint). It can show the objects and who has reference to them, so as long as we do the proper dereferencing on disuse, other reasonable browsers should mark the unreferenced objects for GC too.

monfera · 2016-07-08T10:35:55Z

Is someone interested in taking over this, due to possible effects on the plot.ly web builder and R, Python etc. interfaces? A quick solution would help, because if I can't configure zero-length in master, I need to keep merging my local dev branches into a tmp branch into which I also merge this branch, otherwise memory use is unmeasurable. If noone can take this task now, then I can just change the limit to Infinity and allow user (plotly.js API) configurability of this limit, in expectation of quickly merging an upwards compatible change (even this would take time from the ongoing WebGL specific improvements).

@bpostlethwaite @etpinard I guess the solution depends on customer use case, i.e. do they need an undo/redo queue with both large datasets and (unlimited, or long) undo archival of changes to those large datasets (if there's a need for both, then it naturally leads to growing memory footprint even if there's some room to more compactly store data).

bpostlethwaite · 2016-07-08T12:48:53Z

@monfera does the Plotlyjs repo record events and data in the queue currently by default? I didn't realize it was "on" by default. @etpinard do you know if the queue is used in Python and R? If it is just the old Plotly Workspace then it seems like setting it to 0 length by default and altering Workspace code makes a lot of sense. I'll check to see if we have any other projects using the queue.

monfera · 2016-07-08T13:05:54Z

@bpostlethwaite yes, it's "on" even in the plain vanilla plotly.js bundle which I was using. But I'm glad to double-check if I get the same behavior from a CDN uploaded bundle (on the off-chance there's some bundle control option that's set differently for CDN vs. the development bunde).

monfera · 2016-07-08T13:11:55Z

@bpostlethwaite the undo/redo queue is on for https://cdn.plot.ly/plotly-latest.js

etpinard · 2016-07-08T13:26:45Z

do you know if the queue is used in Python and R?

I'm pretty sure that no, they don't use it. But @yankev @cpsievert ⏫ can you confirm?

If it is just
the old Plotly Workspace then it seems like setting it to 0 length by
default and altering Workspace code makes a lot of sense.

👍

Is someone interested in taking over this,

Yep. I'll wrap this up before 1.15.0

monfera · 2016-07-08T13:30:13Z

@etpinard thank you, I'm continuing with the streaming load.

yankev · 2016-07-08T14:23:46Z

@etpinard I don't believe so either. The Python api never really comes into contact with plotly.js code aside from offline mode, which as far as I know only calls newPlot.

timelyportfolio · 2016-07-08T14:34:48Z

R never calls the queue in the JavaScript for its htmlwidget.

monfera · 2016-07-11T07:03:18Z

@etpinard some good news: with large datasets, there's much avoidable CPU use and allocation by undo/redo outside queue.js. This line represents most of the runtime in cases where there are tens of thousands (or more) points: https://github.com/plotly/plotly.js/blob/master/src/plot_api/plot_api.js#L2036

Within this line, the actual cost comes from Lib.extendDeep as redoit stores large arrays:

As the default queue length is slated to be 0, overrideable via some mechanism, a part of this line - and perhaps the above code that generates redoit - could be conditionalized.

I came across this when profiling the time consumption of the repeated addition of 1000 points (test bench), which showed a slower and slower execution of Plotly.restyle. I was happy to run into this low hanging fruit that happened to also relate to the queue.

Numbers: given 100k points, adding 1k new points takes:

around 340ms if the line is intact
around 90ms if the line is commented out

All this doesn't change the fact that adding points takes longer as the data size grows. Adding 1k points to 1M preeexisting points takes 820ms; adding 1k points to 1k preexisting points is around 15ms. But if this line were left unchanged, then the planned GL speedup would have a significant impact.

monfera · 2016-07-11T07:34:09Z

... I'm also wondering if the fact that the data is deep copied even matters for the Workspace - maybe the event emission doesn't lead down to a code path where the redoit array contents are destructively modified. So maybe the removal of the deep copy is sufficient:

gd.emit('plotly_restyle', [redoit, traces]);

monfera · 2016-07-11T10:47:01Z

It turns out that slightly modifying Lib.extendDeep to have a shortcut for flat arrays (which the large arrays are) achieves nearly the same speedup as not calling Lib.extendDeep with the redoit data: #732

etpinard · 2016-07-11T17:10:52Z

a few more things to consider from #732

More generally, I'm thinking that we may not need to extend the redoit objects and traces arrays before emitting plotly_restyle and plotly_relayout. Instead we should extend them inside Queue.add which, together with #726, will speed the (most common) case where the Queue is not activated.

@etpinard if you want to deeply extend [redoit, traces] - whether here or in queue - I'd recommend either of:

using @rreusser 's extendDeepNoArrays if you're okay with not deep-copying the arrays - in which case you'd need to actually make the call to extendDeepNoArrays in some loop, because traces itself is an array, and extendDeepNoArrays just takes all arrays as pointers if I understand it
or using this PR, if you want to stick to the current extendDeep semantics (which gives fresh arrays), or if you don't want to manually pry apart and loop over things a layer or two below [redoit, traces]

etpinard · 2016-07-12T13:44:15Z

closed in favour of #737

monfera added 2 commits July 7, 2016 13:15

Queue length limitation concept

9bc0f18

(cherry picked from commit e07f91d)

Lint away queue length logging

8309063

monfera mentioned this pull request Jul 7, 2016

Sisyphean memory management of Mesh3d on Chrome #718

Closed

etpinard added type: maintenance labels Jul 7, 2016

monfera mentioned this pull request Jul 7, 2016

memory leak after plotting? plotly/plotly.py#487

Closed

etpinard self-assigned this Jul 8, 2016

monfera mentioned this pull request Jul 11, 2016

Speed up Lib.extendDeep (_extend) for arrays that have no object, array elements #732

Merged

etpinard added the type: performance label Jul 11, 2016

etpinard mentioned this pull request Jul 11, 2016

Set undo/redo queue length via setPlotConfig #737

Merged

etpinard closed this Jul 11, 2016

etpinard mentioned this pull request Sep 21, 2016

Memory leak with surface plots #957

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[concept] Queue length limit #726

[concept] Queue length limit #726

monfera commented Jul 7, 2016

etpinard commented Jul 7, 2016

bpostlethwaite commented Jul 7, 2016

bpostlethwaite commented Jul 7, 2016 •

edited

Loading

rreusser commented Jul 7, 2016

monfera commented Jul 7, 2016

monfera commented Jul 7, 2016

rreusser commented Jul 7, 2016 •

edited

Loading

monfera commented Jul 7, 2016 •

edited

Loading

monfera commented Jul 8, 2016

bpostlethwaite commented Jul 8, 2016 via email

monfera commented Jul 8, 2016

monfera commented Jul 8, 2016

etpinard commented Jul 8, 2016

monfera commented Jul 8, 2016

yankev commented Jul 8, 2016

timelyportfolio commented Jul 8, 2016

monfera commented Jul 11, 2016 •

edited

Loading

monfera commented Jul 11, 2016 •

edited

Loading

monfera commented Jul 11, 2016

etpinard commented Jul 11, 2016

etpinard commented Jul 12, 2016

[concept] Queue length limit #726

[concept] Queue length limit #726

Conversation

monfera commented Jul 7, 2016

etpinard commented Jul 7, 2016

bpostlethwaite commented Jul 7, 2016

bpostlethwaite commented Jul 7, 2016 • edited Loading

rreusser commented Jul 7, 2016

monfera commented Jul 7, 2016

monfera commented Jul 7, 2016

rreusser commented Jul 7, 2016 • edited Loading

monfera commented Jul 7, 2016 • edited Loading

monfera commented Jul 8, 2016

bpostlethwaite commented Jul 8, 2016 via email

monfera commented Jul 8, 2016

monfera commented Jul 8, 2016

etpinard commented Jul 8, 2016

monfera commented Jul 8, 2016

yankev commented Jul 8, 2016

timelyportfolio commented Jul 8, 2016

monfera commented Jul 11, 2016 • edited Loading

monfera commented Jul 11, 2016 • edited Loading

monfera commented Jul 11, 2016

etpinard commented Jul 11, 2016

etpinard commented Jul 12, 2016

bpostlethwaite commented Jul 7, 2016 •

edited

Loading

rreusser commented Jul 7, 2016 •

edited

Loading

monfera commented Jul 7, 2016 •

edited

Loading

monfera commented Jul 11, 2016 •

edited

Loading

monfera commented Jul 11, 2016 •

edited

Loading