-
-
Notifications
You must be signed in to change notification settings - Fork 1.9k
[concept] Queue length limit #726
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
(cherry picked from commit e07f91d)
@monfera you bring up a very interesting point here. Having some sort of queue limit length makes sense to me. But perhaps we could go even further by not storing @chriddyp @bpostlethwaite does the new plotly workspace use @rreusser can you think of any side-effects for animations? |
We don't know what will be happening with undo/redo yet. |
That said I think data arrays are ok so long as there is a limit. Most folks have a couple gigabytes of memory at their disposal which means hundreds of millions to billions of array values can be stored. We just want the heap to level out at some point. A big heap is not necessarily bad when the user has a lot of data. |
Re-reading that more carefully, regarding side effects on animations: If you alter a frame, it's up to the user/implementer to pass a data_array with the same reference. If that's done correctly, then the queue shouldn't accumulate giant amounts of memory unless you're actually modifying the data_arrays. |
In agreement with @rreusser : I believe a key benefit of WebGL (i.e. one main reason to use it) is its ability to handle large amounts of data, so it's especially important there. Luckily, WebGL requires a rather different implementation of animation, it's a big topic but in any case it's usually not done by re-exposing a gl plot to a new set of data, unless as a last resort (there are about 10 different ways and lots of constraints e.g. CPU->GPU bandwidth, unknowable GPU memory size, multiple geometries on the GPU, and the possibility to do some types of animations in the shaders, just changing |
@rreusser re using the same reference, maybe my comment is irrelevant or shows my ignorance of the bridge implementation but unless there's something expressly covering this case, R and probably Python and other bindings would send fresh arrays. @bpostlethwaite looking at the implementation, a data set is often pasted into multiple places (sometimes the same object ID but haven't checked for all) and sometimes there's derived data, hover pick related data etc. that in effect acts as a small integer multiplier on the data array size. This doesn't alter your point but memory footprint can be more than a user's estimate. Also, except for the WebGL data transfers, plotly uses untyped arrays, where, especially if there are non-IEEE 754 values like |
Thoughts on assessing memory usage:
|
@rreusser wasn't exactly plain sailing (due to multiple issues entangled in like effect) but the Chrome dev tool led me to the places where memory was retained, e.g. the awesome heap snapshot comparison (and things like sorting by real memory footprint). It can show the objects and who has reference to them, so as long as we do the proper dereferencing on disuse, other reasonable browsers should mark the unreferenced objects for GC too. |
Is someone interested in taking over this, due to possible effects on the plot.ly web builder and R, Python etc. interfaces? A quick solution would help, because if I can't configure zero-length in @bpostlethwaite @etpinard I guess the solution depends on customer use case, i.e. do they need an undo/redo queue with both large datasets and (unlimited, or long) undo archival of changes to those large datasets (if there's a need for both, then it naturally leads to growing memory footprint even if there's some room to more compactly store data). |
@monfera does the Plotlyjs repo record events and data in the queue
currently by default? I didn't realize it was "on" by default.
@etpinard do you know if the queue is used in Python and R? If it is just
the old Plotly Workspace then it seems like setting it to 0 length by
default and altering Workspace code makes a lot of sense.
I'll check to see if we have any other projects using the queue.
|
@bpostlethwaite yes, it's "on" even in the plain vanilla plotly.js bundle which I was using. But I'm glad to double-check if I get the same behavior from a CDN uploaded bundle (on the off-chance there's some bundle control option that's set differently for CDN vs. the development bunde). |
@bpostlethwaite the undo/redo queue is on for https://cdn.plot.ly/plotly-latest.js |
I'm pretty sure that no, they don't use it. But @yankev @cpsievert ⏫ can you confirm?
👍
Yep. I'll wrap this up before |
@etpinard thank you, I'm continuing with the streaming load. |
@etpinard I don't believe so either. The Python api never really comes into contact with plotly.js code aside from offline mode, which as far as I know only calls |
R never calls the |
@etpinard some good news: with large datasets, there's much avoidable CPU use and allocation by undo/redo outside Within this line, the actual cost comes from As the default queue length is slated to be 0, overrideable via some mechanism, a part of this line - and perhaps the above code that generates I came across this when profiling the time consumption of the repeated addition of 1000 points (test bench), which showed a slower and slower execution of Numbers: given 100k points, adding 1k new points takes:
All this doesn't change the fact that adding points takes longer as the data size grows. Adding 1k points to 1M preeexisting points takes 820ms; adding 1k points to 1k preexisting points is around 15ms. But if this line were left unchanged, then the planned GL speedup would have a significant impact. |
... I'm also wondering if the fact that the data is deep copied even matters for the Workspace - maybe the event emission doesn't lead down to a code path where the
|
It turns out that slightly modifying |
a few more things to consider from #732
|
closed in favour of #737 |
Client bug report on seemingly unjustified memory growth lead to, among other things, the retainment of data in the undo/redo queue. As I'm not too familiar with this feature, I can only suggest that there be a default (overridable) limit to the queue, or at least an optional configuration for the maximum queue length. So I'd be glad to pass the baton on this one, due to numerous WebGL improvement requirements I still need to do.