-
Notifications
You must be signed in to change notification settings - Fork 15
JVM crash with "There is insufficient memory for the Java Runtime Environment to continue" #181
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
We think this is due to an increase in the memory usage by the dotty build. @adriaanm planned to reduce the job limit on each Jenkins worker (the EC2 instances have 16GB ram). |
Partest currently uses |
https://scala-ci.typesafe.com/job/scala-2.12.x-integrate-bootstrap/ has failed several nights in a row now with this error. the failures started when we bumped STARR from M5 to RC1 in scala/scala@7507765 but that's probably a coincidence.
|
Could also be because we started building 2.12.0 and 2.12.x simultaneously?
|
I think that's likely. lately, we have not seen the memory crash causing spurious PR validation failures. but most scala-2.12.0-integrate-bootstrap and scala-2.12.x-integrate-bootstrap runs have been failing. the bootstrap jobs run on jenkins-worker-ubuntu-publish, not on the worker behemoths. does 2a83d37 only affect the behemoths? |
manual 2.12.0-integrate-bootstrap run, has jenkins-worker-ubuntu-publish all to itself: https://scala-ci.typesafe.com/job/scala-2.12.0-integrate-bootstrap/103/consoleFull |
it does: |
assuming that test run passes, I'll try reducing publisher nodes from 2 to 1 concurrent jobs. |
the 7.5 GiB RAM numbers listed at https://github.com/scala/scala-jenkins-infra/blob/master/doc/design.md seem low. I have 16 GB in my laptop |
I wonder if we could detect the error in our build scripts and grab some extra diagnostics (e.g. memory used by each processes: http://askubuntu.com/a/62351) |
oh good, my manual run passed, that gives me hope that we can put this to rest (for now) by reducing the parallelism on the publishers too |
in a desperate attempt to fix the "There is insufficient memory for the Java Runtime Environment to continue" crashes, see scala#181
in a desperate attempt to fix the "There is insufficient memory for the Java Runtime Environment to continue" crashes, see scala#181
sigh, #188 failed with some weird Chef error |
I set the executors count to 1 at https://scala-ci.typesafe.com/computer/jenkins-worker-ubuntu-publish/configure , maybe the manual setting will actually stick for a while if Chef is borked |
several nights of green runs so far. let's keep monitoring it |
I would say my experience in the last week or so has been that reducing the parallelism from 4 to 3 definitely helped, but didn't get everything running smoothly, either. bootstrap jobs randomly failing, community builds randomly failing, etc. though, it can be difficult to distinguish X failing randomly because X is flaky, and X failing randomly because our Jenkins config as a whole is flaky. but my intuition is that
@adriaanm and I chatted about it just now and he wants to try 2 soon |
Instead, I realized it would be easiest to change the EC2 instance type from c4.2xlarge to c4.4xlarge, which doubles the ram to 30 GB and cores to 8. Done for behemoth 1, still pending for number 2. |
https://scala-ci.typesafe.com/job/scala-2.12.0-validate-test/93/
😱 |
Some notes from scala/scala#5430:
|
have we seen this lately? |
I'll grep the logs tomorrow. |
things had been quiet in recent weeks (is my informal impression), but there was a spate of failures today, e.g. https://scala-ci.typesafe.com/job/scala-2.12.x-validate-test/3528/, reported by @som-snytt |
Intermittent? "You keep using that word. I do not think it means what you think it means." |
added swap to behemoths in 47a0d79 |
quiet on this front lately, especially on the new larger behemoths, and anyway most stuff is moving to Travis-CI+AppVeyor |
Uh oh!
There was an error while loading. Please reload this page.
Seen by @soc, @odersky and I intermittently in the past two days.
The text was updated successfully, but these errors were encountered: