Skip to content

provide benchmarks for plotly.js #1511

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
wants to merge 6 commits into from
Closed

Conversation

etpinard
Copy link
Contributor

@etpinard etpinard commented Mar 23, 2017

Introducing a bench/ directory for centralised benchmark suites.

image

Goals:

  • Help us 🔒 down performance improvements
  • Always up-to-date performance standards documentation 📚

Technology used:

How does this work?

  • Run the benchmarks using npm run bench
  • Add a benchmark suite in bench/suites/ and npm run bench will automatically bundle and run them (similar to our test/jasmine/ workflow)
  • Results are logged in a single JSON file in dist/benchmarks.json which I'm thinking we should check in to git

Remaining questions:

  • Should npm run bench be run on CI? Or maybe on an old laptop as @rreusser suggested?
  • Should we test performance i.e. compare new and old benchmarks results and 🛑 PRs that slow down plotly.js too much?
  • More 💄 e.g. turn the results JSON into plotly graphs?

@rreusser @alexcjohnson @monfera @n-riesco @bpostlethwaite

- use karma-benchmark framework which uses benchmark.js
  under the hood to run the benchmarks
- use karma-benchmark-json-reporter to output benchmark
  result to JSON file checked in to dist/ (good idea?)
- use karma-benchmarkjs-reporter to output log to stdout
  during runs
- add a few suites (more to come)
@etpinard etpinard changed the title Benchmarks 🚀 Benchmarks Mar 23, 2017
@rreusser
Copy link
Contributor

rreusser commented Mar 24, 2017

Thoughts on consistency:

CI will probably be exceptionally difficult to trust since you really have no idea what else is going on on the physical machine or if you're on the same architecture. A quick check shows a dedicated machine on AWS is $85/mo or so. In lieu of that, one approach is to normalize the benchmarks to find out if a particular test is slower than expected, but this would probably still lead to weird/spurious quirks that might throw the whole thing in to question.

One other approach is for each run to check out the previous commit, run the benchmarks, check out master, run the benchmarks again, and compare. Again, the only downside is a lack of coherent history to get the long running trends vs. short regressions.

I think the most reliable would be a VM on a physical machine in the office or something so that at least there aren't so many unknowns, both known and unknown. If the machine dies and you can't continue on the same machine, at least you can still check out old tags and rebuild the history.

@bpostlethwaite
Copy link
Member

Plotly will sponsor a cloud machine. Large cloud providers guarantee reliable tenancy performance. I don't think a dedicated machine is necessary.

We can ensure the machine has enough resources so that it remains underutilized even during the most grueling benchmark suites. Putting the machine in the cloud will allow a larger circle of developers to have access to the machine.

If you are interested let me know and I'll help set that up.

@etpinard
Copy link
Contributor Author

Ping 🚨

@dfcreative and I are currently doing a lot of performance-related work, so it would be nice to finally have some sort of centralised way of doing benchmarks.

While setting a cloud machine (like proposed in #1511 (comment)) would be ideal, I'm thinking this PR here might a nice first step that requires minimal effort.

I'm proposing the following workflow:

  • Before working a feature branch, devs run npm run bench on master to generate baseline benchmarks.
  • The output file will be git-ignored (as the numbers will vary from machine to machine),
  • Devs would then be responsible for running npm run bench on their feature branch and writing the performance improvements (or drawbacks 😏 ) relative to their baseline benchmarks.

cc @nicolaskruchten who probably has a few ideas about this ...

@alexcjohnson
Copy link
Collaborator

@etpinard this looks like a great framework, and with all the work you and @dy are doing it's definitely important to get something like this going.

My big question, from a practical standpoint, is how we can run matching benchmarks - including new benchmarks just added - against older versions of code, and quickly compare performance? Ideally I don't want to compare to an older run, even one that I did on my own computer - I want to be able to check out some random older version of the code, maybe even 4 or 5 different versions, run the bench suite against each of them in turn, and then see an output table showing the % change vs the first version (or at least the first version that worked, what happens in this system when there's an error?)

I guess we could do this by somehow copying the bench folder out of the repo (or into an ignored folder in the repo) so that it's not affected by checking out various versions of the code?

@etpinard
Copy link
Contributor Author

Thanks for 🖊️ your concerns @alexcjohnson. To make benchmark comparisons easier, I'm thinking of making npm run bench accept a commit hash or git tag as argument.

For example, npm run bench -- --base v1.36.0 would:

  1. git checkout v1.36.0
  2. npm i
  3. browserify src/index.js > build/plotly.js
  4. git checkout <initial branch>
  5. run the benchmarks for v1.36.0
  6. keep results in node process memory
  7. npm i && browserify src/index.js > build/plotly.js
  8. run the benchmarks off the initial branch
  9. output the results in percentage of v1.36.0 results

@alexcjohnson
Copy link
Collaborator

That all sounds great. One possible modification for flexibility would be:

  • npm run bench -- --base XXX just runs steps 1-5, then saves the results with a versioned filename
  • user could then run this against as many different versions as she wants, getting a new output file for each one
  • then a small script to combine them into one output, npm run bench-compare -- v1.31.0 v1.36.0 master my-feature that outputs:
v1.31.0 v1.36.0 master my-feature
cleanNumber 1.38e5/sec 1.41e5/sec (+2.2%) 1.37e5/sec (-0.7%) 1.40e5/sec (+1.4%)
scattergl 1e5 pts 2.10/sec 1.08/sec (-49%) 10.1/sec (+381%) 11.2/sec (+433%)
newer feature ERROR ERROR 1.80e3/sec 5.41e4/sec (+2900%)

Or I suppose this could all be done with a single command that takes a list of base tags/branches, though that feels dangerous, as it's a whole lot to run in sequence and expect all to go right... but also, with a separate bench-compare script you could run the benchmarks and leave the results there, possibly changing the filename, then you could get comparisons of the same code over time (computer-to-computer or browser version), or across different browsers, or just more quickly compare against your own branch from yesterday.

@antoinerg
Copy link
Contributor

antoinerg commented Jul 22, 2019

In order to ensure consistency across runs, disabling Turbo Boost, or equivalent automatic overclocking technology present in modern CPUs is really important. My laptop, for example, will continuously modify its operating frequency between 1.90 GHz and 4.20 GHz during a run which greatly affects execution times.

Some more tips for accurate benchmarking: https://easyperf.net/blog/2019/08/02/Perf-measurement-environment-on-Linux

@gvwilson
Copy link
Contributor

This pull request has been sitting for a while, so I would like to close it as part of our effort to tidy up our public repositories. I've assigned it to myself to keep track of it; I'll wait until 2024-06-17 for someone to say it's still relevant and they'll to take it on, and otherwise I will close it then. Thanks - @gvwilson

@gvwilson gvwilson self-assigned this Jun 10, 2024
@gvwilson gvwilson removed their assignment Aug 2, 2024
@gvwilson gvwilson added feature something new and removed type: maintenance labels Aug 8, 2024
@gvwilson gvwilson added P3 backlog infrastructure build process etc. labels Aug 8, 2024
@gvwilson gvwilson changed the title Benchmarks provide benchmarks for plotly.js Aug 13, 2024
@gvwilson
Copy link
Contributor

apologies for letting this fall behind without merging - we'll create a separate ticket for benchmarking the current version of plotly.js. thanks - @gvwilson

@gvwilson gvwilson closed this Aug 21, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
feature something new infrastructure build process etc. P3 backlog
Projects
None yet
Development

Successfully merging this pull request may close these issues.

7 participants