Skip to content

Speed up Lib.extendDeep (_extend) for arrays that have no object, array elements #732

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 1 commit into from
Jul 11, 2016

Conversation

monfera
Copy link
Contributor

@monfera monfera commented Jul 11, 2016

This PR has the same effect on speed as the suggestion in #726 (comment)

The referred suggestion isn't obsoleted by this PR, because deepExtend, by its nature, is still expensive: with large datasets, Plotly.restyle will still allocate and populate very large arrays (though much faster if this PR is merged).

Similarly, this PR can be useful even outside the original speed problem, as it generally speeds up structure extension when large, simple arrays are present.

@monfera monfera changed the title Speed up Lib.extendDeep (_extend) for primitive arrays Speed up Lib.extendDeep (_extend) for arrays that have no object, array elements Jul 11, 2016
@rreusser
Copy link
Contributor

On a related note: based on animation needs, I made a function called deepExtendNoArrays that deep merges objects but simply transfers array references without copying at all.

@monfera
Copy link
Contributor Author

monfera commented Jul 11, 2016

@rreusser thanks for mentioning, it would lead to higher efficiency and DRYness (my change keeps the existing behavior). What I don't know is if it's OK to use it in the place where it causes the slowdown - emitting an event here: https://github.com/plotly/plotly.js/blob/master/src/plot_api/plot_api.js#L2036.

I suppose the extension is deep so that if a user modifies callback value contents, it has no effect. If it's not a concern, then it would be worth doing a PR with your 'deepExtendNoArrays' ahead of the other parts.

... on the other hand queue will be 0 length for most things anyway.

@rreusser
Copy link
Contributor

rreusser commented Jul 11, 2016

Hmm… yeah, wasn't sure exactly what your need is, but we were talking and came to the conclusion (with maybe only an exception or two) that effectively any arrays in plotly json (i.e. traces—except for the array of traces themselves, of course) are data that can be transferred without copying.

Perhaps a common function that copies traces/layouts without arrays and handles any edge cases would make sense at some point.

@rreusser
Copy link
Contributor

@monfera
Copy link
Contributor Author

monfera commented Jul 11, 2016

@rreusser awesome, good to know! In the current case, an array of arrays is passed for extension, so probably the
gd.emit('plotly_restyle', Lib.extendDeep([], [redoit, traces])) call would need to switch to {}s:
gd.emit('plotly_restyle', Lib.extendDeepNoArrays({}, {redo: redoit, traces: traces})).

... or actually, one level deeper as traces is an array, or just doing it manually here.

So it looks possible to directly use extendDeepNoArrays for this case. I don't know how structures are handled generally, since as you say, sometimes an encountered array is not a plain array but just an (ordered) container of other things (e.g. traces) - maybe the way you call it also has assurances that an array is always flat.

What's in this PR speeds up those cases where you can't make assumptions about which arrays need to be deep copied. This commit is compatible with extendDeep rather than extendDeepNoArrays. I'll leave it to consensus here if this has any utility. There are a few dozen calls to extendDeep, some are for top-level objects i.e. in cloneplot.js and plot_api.js. If of no use, I'll just close the PR.

@monfera
Copy link
Contributor Author

monfera commented Jul 11, 2016

Forgot to add one more thing that might help: it would be good in general to store large arrays as typed JS arrays. It's easy to test e.g. a instanceof Float32Array ro ArrayBuffer.isView(arg) and has other benefits. The cons side is you can't s(p)lice them or push etc. Also, they only admit IEEE numbers - which include NaN and infinities, but doesn't include null so it would need that the meaning of missing data points is switched from null to NaN so all in all it would be a big change. But would be easy to tell apart kinds of arrays.

@rreusser
Copy link
Contributor

@monfera Agreed on locking down any corner cases where this might not be true.

And hmm… agreed that typed arrays have benefits, but I might need extra convincing that the difference is worthwhile and actually the bottleneck before worrying that it might be premature optimization. 😄

speed

@rreusser
Copy link
Contributor

rreusser commented Jul 11, 2016

Until then at least, I'll write a wrapper function to extend layouts and traces while simply moving array references with the goal of explicitly handling any corner cases where this transfer is not valid. (which, unfortunately, I think will be confined to my PR at the moment)

@etpinard
Copy link
Contributor

@monfera Thanks for bringing this up!

Have you done any benchmarking comparing extendDeep before and after your patch? From what I understand your patch here is compatible with @rreusser extendDeepNoArrays (or any other future plotly-JSON-specific extend method).

More generally, I'm thinking that we may not need to extend the redoit objects and traces arrays before emitting plotly_restyle and plotly_relayout. Instead we should extend them inside Queue.add which, together with #726, will speed the (most common) case where the Queue is not activated.

@monfera
Copy link
Contributor Author

monfera commented Jul 11, 2016

@etpinard yes, in the context of loading additional 1k points when we already have a lot of points. It's quite significant:

Numbers: given 100k points, adding 1k new points takes:

  • around 340ms with the old extendDeep
  • around 90ms with the extendDeep speedup

The actual speedup is even more, because, clearly, this is not the only contribution to the overall runtime, so I think it's more like e.g. 40ms vs 300ms so around 10x or more speedup.

@etpinard
Copy link
Contributor

Numbers: given 100k points, adding 1k new points takes:

around 340ms with the old extendDeep
around 90ms with the extendDeep speedup

Amazing.

@monfera
Copy link
Contributor Author

monfera commented Jul 11, 2016

@etpinard if you want to deeply extend [redoit, traces] - whether here or in queue - I'd recommend either of:

  • using @rreusser 's extendDeepNoArrays if you're okay with not deep-copying the arrays - in which case you'd need to actually make the call to extendDeepNoArrays in some loop, because traces itself is an array, and extendDeepNoArrays just takes all arrays as pointers if I understand it
  • or using this PR, if you want to stick to the current extendDeep semantics (which gives fresh arrays), or if you don't want to manually pry apart and loop over things a layer or two below [redoit, traces]

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants