Skip to content

Missing data in scattergl plot when I go juste above 10000 points (regression) #2334

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
mlaily opened this issue Feb 5, 2018 · 24 comments · Fixed by gl-vis/regl-scatter2d#19, #2593 or #4323
Assignees
Labels
bug something broken regression this used to work

Comments

@mlaily
Copy link

mlaily commented Feb 5, 2018

Hi,

I just updated plotly.js from 1.30 to the latest 1.33, and my plot is broken.

I’m rendering about 75000 points with a scattergl plot, and after the update, it looks like most of the points are not rendered at all. What is odd though, is that when I hover over the plot, I can still see popups showing when my mouse is over places where points are supposed to be.

After trying different things, I noticed everything is rendering fine up to 10000 data points. If I go just above, say 10001, my plot is broken and most of the points are not rendered!

Everything worked fine with plotly.js 1.30.

I made two versions of my code, one with 10000 points, and the other with 10001. See for yourself: https://0.x2a.yt/other/private/plotly-test/10000.html
https://0.x2a.yt/other/private/plotly-test/10001.html

(This bug report is a copy/paste from the following forum topic: https://community.plot.ly/t/regression-missing-data-in-scattergl-plot-when-i-go-juste-above-10000-points/8152)

@etpinard
Copy link
Contributor

etpinard commented Feb 5, 2018

cc @dfcreative

@etpinard etpinard added bug something broken regression this used to work labels Feb 5, 2018
@dy
Copy link
Contributor

dy commented Feb 5, 2018

That is a very good edge case for snap-points-2d, thank you @mlaily. In 1.30.0 we use gl-vis fancy-scattergl, which has no points snapping enabled, therefore points are rendered to the plot directly, although slow, in terms of interactions.

Now at 1e5 points the quadtree algorithm for points clustering gets triggered, which is good for cases of randomish distributions, but not good for linearly aligned points.

  • We can consider increasing TOO_MANY_POINTS constant to defer optimized rendering mode, but essentially that won't fix the problem.

  • Alternately, that can be addressed in upcoming point-cluster component, although we would have to think how to detect proper clustering mode.

  • Alternately, we can provide snap flag or number for scattergl trace, disabling/thresholding point snapping.

  • Looking at script, we can disable snapping for datetime data by default, since it tends to be regular.

@mlaily
Copy link
Author

mlaily commented Feb 5, 2018

Haha, you are welcome! :D

If I understand correctly, fixing the general case seems hard, but I think I would be happy with an option to disable clustering altogether.

That said, maybe I'm not using the optimal representation for my data. Would you have any advice by any chance?
The data set is my last.fm listening history, and I'm trying to reveal interesting trends over time.
I first thought using a heat map would be the best representation, but I found it not detailed enough, or lacking the webgl performance if I use too many points...
Using a scatter plot also allows me to use colors to differentiate between different artists or albums.

I think some kind of timeline plot with a sliding window would be better, but if I recall correctly, I could not find how to do that with plotly with acceptable performance.

Do you think there might be a more appropriate plot type?

Would you have any advice to improve performance on the current scattergl plot? The full data set (75000 points) is getting a bit slow. The text properties (I truncated it in my examples) has a lot of repetitive data, but I don't know how to avoid it...

Sorry if this is out of place for this issue. Feel free to say so and I will try to find a better place.

@dy
Copy link
Contributor

dy commented Feb 5, 2018

@mlaily can you please show a codepen with the example where it gets slow?

Unfortunately I cannot give qualified advice on picking right plot type for your data, I'd recommend reading Edward Tufte books for that. Or just playing around with different plot types, that is win both for us and you :)

@mlaily
Copy link
Author

mlaily commented Feb 5, 2018

Here is a version with all the data, using v1.30.1 of plotly: https://0.x2a.yt/other/private/plotly-test/all.html
(I can't use the up to date version since it "cheats" with clustering)

The initial delay is quite long but I guess that's to be expected. After that, the performance is good with the latest firefox, but very bad with the latest chrome.

(my dataset is too large for a codepen)

EDIT: you know what? forget about it, I'm a dumbass. I disabled hardware acceleration a while ago in chrome and forgot to put it back on -_-.

@etpinard
Copy link
Contributor

etpinard commented Feb 6, 2018

Alternately, that can be addressed in upcoming point-cluster component, although we would have to think how to detect proper clustering mode.

This would be ideal.

@mlaily
Copy link
Author

mlaily commented Feb 24, 2018

I don't know when you will be able to work on this, but in the meantime, a workaround is to look for snap: 1e4 in plotly-gl2d.js or directly in plolty.js, and increase the value to something more appropriate.
(I'm not entirely sure what you meant with the TOO_MANY_POINTS constant. It does not seem related)

@etpinard
Copy link
Contributor

@dy does the new point-cluster version (merged in #2499) help with this issue in any way?

@dy
Copy link
Contributor

dy commented Apr 30, 2018

That seems to be fixed with point-cluster.
snapping enabled:
image

snapping disabled:
image

@slishak
Copy link

slishak commented Oct 30, 2018

This issue seems to still be occuring:

https://codepen.io/anon/pen/zmQzOO

This should plot a point every millisecond. If you zoom in there are a few gaps (although you can hover over the ghost points):

image

@alexcjohnson
Copy link
Collaborator

The new cutoff - for this shape data anyway, in Chrome on my mac - seems to be >=75564 we get some gaps, < there are no gaps. That's a number I haven't seen before 🤔 That cutoff holds for date or numeric data, and any size plot, but interestingly if I change y to bilevel the cutoff drops to 68379 https://codepen.io/alexcjohnson/pen/MPdLjV

So the issue isn't quite the same, but symptoms are similar enough that I'll reopen

@etpinard
Copy link
Contributor

Another example from the reports in #3413:

https://codepen.io/etpinard/pen/rPRwOy?editors=1010

@etpinard
Copy link
Contributor

The problem is most likely in https://github.com/dy/point-cluster

@dy
Copy link
Contributor

dy commented Feb 20, 2019

I'd suggest changing maxDepth to see if that affects the issue.

@etpinard
Copy link
Contributor

Thanks for the hint @dy !!

Using https://codepen.io/alexcjohnson/pen/MPdLjV from #2334 (comment):

with maxDepth as it is right now (=255) gives:

image

with maxDepth=10, we get:

image

as expected, but with probably far worse panning perf when getting closer to 1e6 pts

@dy
Copy link
Contributor

dy commented Feb 20, 2019

@etpinard not necessarily - maxDepth handles edge cases with multiple points at the same coordinate. Making that number ‘127’ should be sufficient too. In fact I rarely saw more than 20 levels for real data, for 1e6 random points we had around 13 levels.

With maxDepth=10 it is possible that the artifact is at the beginning of the dataset. Anyways that’s def a bug.

@etpinard
Copy link
Contributor

WIP branch with maxDepth: 15 (the larger number that makes https://codepen.io/alexcjohnson/pen/MPdLjV render ok):

https://github.com/plotly/plotly.js/compare/scattergl-lower-maxdepth

some image tests are failing:

image

more investigation will be needed.

@etpinard
Copy link
Contributor

PR #3578 (set to be released in 1.45.0) fixes the problems reported in:

That solution probably isn't the end of this story. I suspect some graphs with more than 1e5 pts may have "missing" pts due to incorrect clustering, so I'll leave this issue open.

@Donnyvdm
Copy link

Donnyvdm commented Mar 6, 2019

FWIW, I can confirm that this update has resolved the issues for me reported in #2334, using dash==0.39.0, which uses plotly 1.45.0

@etpinard etpinard reopened this Mar 6, 2019
@deto
Copy link

deto commented Mar 7, 2019

I can confirm that there are still issues around the 100k - point threshold.

See my issue #3405 for details.

If you create a trace with more than 100k points and then use Plotly.react to change it to have less than 100k points, many of the points that should be rendered in the second trace will not be rendered.

The threshold used to be 10k, but now appears to be 100k.

@archmoj
Copy link
Contributor

archmoj commented Oct 30, 2019

Minimal codepen to illustrate the bug.

@deto
Copy link

deto commented Oct 30, 2019

For what it's worth, here's the code I've been using as my mitigation:

// When rendering/updating
 if (this.plotlyBug(data)) {
        Plotly.newPlot(this.node, data, layout, options)
    } else {
        Plotly.react(this.node, data, layout, options)
 }
//Utility function to detect when the bug would occur
_plotlyBug = function(newData) {
    var oldData = this.node.data

    var oldSizes = _.map(oldData, trace => trace.x.length)
    var newSizes = _.map(newData, trace => trace.x.length)

    var plotBug = false
    for(var i = 0; i < oldSizes.length; i++) {
        if ((oldSizes[i] > 100000) && (newSizes[i] <= 100000)) {
            plotBug = true
        }
    }

    return plotBug
}

Basically detects the situation and then calls newPlot instead of react when appropriate. The issue disappears entirely if you always call newPlot but I wanted to take advantage of the extra performance of react for most cases.

@archmoj
Copy link
Contributor

archmoj commented Oct 30, 2019

This issue seems to still be occuring:

https://codepen.io/anon/pen/zmQzOO

This should plot a point every millisecond. If you zoom in there are a few gaps (although you can hover over the ghost points):

image

Possible fix illustrated in codepen

@archmoj
Copy link
Contributor

archmoj commented Oct 30, 2019

Minimal codepen to illustrate the bug.

Candidate fix demo

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug something broken regression this used to work
Projects
None yet
8 participants