Skip to content

JVM crash with "There is insufficient memory for the Java Runtime Environment to continue" #181

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
retronym opened this issue Jul 28, 2016 · 27 comments
Assignees

Comments

@retronym
Copy link
Member

retronym commented Jul 28, 2016

#
# There is insufficient memory for the Java Runtime Environment to continue.
# Native memory allocation (mmap) failed to map 123207680 bytes for committing reserved memory.
# Possible reasons:
#   The system is out of physical RAM or swap space
#   In 32 bit mode, the process size limit was hit
# Possible solutions:

Seen by @soc, @odersky and I intermittently in the past two days.

@retronym
Copy link
Member Author

@retronym
Copy link
Member Author

We think this is due to an increase in the memory usage by the dotty build. @adriaanm planned to reduce the job limit on each Jenkins worker (the EC2 instances have 16GB ram).

@retronym
Copy link
Member Author

Partest currently uses _refArrayOps. We should remove the _ prefixed versions and make the original names implicit again with the new return types, as suggested in the comment. Let's do that in a separate PR, though, so we can clearly see why we're doing a bootstrap.

@SethTisue
Copy link
Member

@adriaanm, at least, has still seen PR validation failure(s?) in the last few days, even after the number of executors per behemoth was reduced in 2a83d37 — including a strange EOF error, perhaps from a forked JVM that died and we don't have the right error reporting in place?

@SethTisue
Copy link
Member

https://scala-ci.typesafe.com/job/scala-2.12.x-integrate-bootstrap/ has failed several nights in a row now with this error. the failures started when we bumped STARR from M5 to RC1 in scala/scala@7507765 but that's probably a coincidence.

OpenJDK 64-Bit Server VM warning: INFO: os::commit_memory(0x00000000a0580000, 233832448, 0) failed; error='Cannot allocate memory' (errno=12)
#
# There is insufficient memory for the Java Runtime Environment to continue.
# Native memory allocation (mmap) failed to map 233832448 bytes for committing reserved memory.
# An error report file with more information is saved as:
# /home/jenkins/workspace/scala-2.12.x-integrate-bootstrap/hs_err_pid26131.log
Exception in thread "Thread-33" java.io.EOFException
    at java.io.ObjectInputStream$BlockDataInputStream.peekByte(ObjectInputStream.java:2626)
    at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1321)
    at java.io.ObjectInputStream.readObject(ObjectInputStream.java:373)
    at sbt.React.react(ForkTests.scala:114)
    at sbt.ForkTests$$anonfun$mainTestTask$1$Acceptor$2$.run(ForkTests.scala:74)
    at java.lang.Thread.run(Thread.java:745)

@adriaanm
Copy link
Contributor

Could also be because we started building 2.12.0 and 2.12.x simultaneously?
Didn't check parallelism setting tho
On Sun, Sep 11, 2016 at 21:39 Seth Tisue [email protected] wrote:

https://scala-ci.typesafe.com/job/scala-2.12.x-integrate-bootstrap/ has
failed several nights in a row now with this error. the failures started
when we bumped STARR from M5 to RC1 in scala/scala@7507765
scala/scala@7507765
but that's probably a coincidence.

OpenJDK 64-Bit Server VM warning: INFO: os::commit_memory(0x00000000a0580000, 233832448, 0) failed; error='Cannot allocate memory' (errno=12)

There is insufficient memory for the Java Runtime Environment to continue.

Native memory allocation (mmap) failed to map 233832448 bytes for committing reserved memory.

An error report file with more information is saved as:

/home/jenkins/workspace/scala-2.12.x-integrate-bootstrap/hs_err_pid26131.log

Exception in thread "Thread-33" java.io.EOFException
at java.io.ObjectInputStream$BlockDataInputStream.peekByte(ObjectInputStream.java:2626)
at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1321)
at java.io.ObjectInputStream.readObject(ObjectInputStream.java:373)
at sbt.React.react(ForkTests.scala:114)
at sbt.ForkTests$$anonfun$mainTestTask$1$Acceptor$2$.run(ForkTests.scala:74)
at java.lang.Thread.run(Thread.java:745)


You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
#181 (comment),
or mute the thread
https://github.com/notifications/unsubscribe-auth/AAFjy0RI0_qWK6Rqk5zGOrfnkRgZeNGLks5qpFj9gaJpZM4JXS3d
.

@SethTisue
Copy link
Member

SethTisue commented Sep 21, 2016

Could also be because we started building 2.12.0 and 2.12.x simultaneously

I think that's likely.

lately, we have not seen the memory crash causing spurious PR validation failures.

but most scala-2.12.0-integrate-bootstrap and scala-2.12.x-integrate-bootstrap runs have been failing. the bootstrap jobs run on jenkins-worker-ubuntu-publish, not on the worker behemoths.

does 2a83d37 only affect the behemoths?

@SethTisue
Copy link
Member

SethTisue commented Sep 21, 2016

manual 2.12.0-integrate-bootstrap run, has jenkins-worker-ubuntu-publish all to itself: https://scala-ci.typesafe.com/job/scala-2.12.0-integrate-bootstrap/103/consoleFull

@SethTisue
Copy link
Member

does 2a83d37 only affect the behemoths?

it does: lightWorker = publisher # TODO: better heuristic...

@SethTisue
Copy link
Member

assuming that test run passes, I'll try reducing publisher nodes from 2 to 1 concurrent jobs.

@SethTisue
Copy link
Member

the 7.5 GiB RAM numbers listed at https://github.com/scala/scala-jenkins-infra/blob/master/doc/design.md seem low. I have 16 GB in my laptop

@retronym
Copy link
Member Author

I wonder if we could detect the error in our build scripts and grab some extra diagnostics (e.g. memory used by each processes: http://askubuntu.com/a/62351)

@SethTisue
Copy link
Member

SethTisue commented Sep 22, 2016

oh good, my manual run passed, that gives me hope that we can put this to rest (for now) by reducing the parallelism on the publishers too

SethTisue added a commit to SethTisue/scala-jenkins-infra that referenced this issue Sep 22, 2016
in a desperate attempt to fix the "There is insufficient memory for
the Java Runtime Environment to continue" crashes, see
scala#181
SethTisue added a commit to SethTisue/scala-jenkins-infra that referenced this issue Sep 22, 2016
in a desperate attempt to fix the "There is insufficient memory for
the Java Runtime Environment to continue" crashes, see
scala#181
@SethTisue
Copy link
Member

sigh, #188 failed with some weird Chef error

@SethTisue
Copy link
Member

SethTisue commented Sep 22, 2016

I set the executors count to 1 at https://scala-ci.typesafe.com/computer/jenkins-worker-ubuntu-publish/configure , maybe the manual setting will actually stick for a while if Chef is borked

@SethTisue
Copy link
Member

several nights of green runs so far. let's keep monitoring it

@SethTisue
Copy link
Member

SethTisue commented Sep 28, 2016

I would say my experience in the last week or so has been that reducing the parallelism from 4 to 3 definitely helped, but didn't get everything running smoothly, either. bootstrap jobs randomly failing, community builds randomly failing, etc.

though, it can be difficult to distinguish X failing randomly because X is flaky, and X failing randomly because our Jenkins config as a whole is flaky. but my intuition is that

  1. at minimum it would be worth further reducing the parallel jobs to 3 to 2, wait a week, and see whether overall flakiness levels drop
  2. or perhaps we've put up with this long enough and it's time to either give the nodes more RAM or make more nodes

@adriaanm and I chatted about it just now and he wants to try 2 soon

@adriaanm
Copy link
Contributor

adriaanm commented Sep 28, 2016

Instead, I realized it would be easiest to change the EC2 instance type from c4.2xlarge to c4.4xlarge, which doubles the ram to 30 GB and cores to 8. Done for behemoth 1, still pending for number 2.

@SethTisue
Copy link
Member

https://scala-ci.typesafe.com/job/scala-2.12.0-validate-test/93/

# starting 63 tests in run
!!  1 - run/SI-4676.scala                         [compilation failed]
!!  2 - run/SI-4360.scala                         [compilation failed]
!!  3 - run/SI-4887.scala                         [compilation failed]
##### Log file '/home/jenkins/workspace/scala-2.12.0-validate-test/test/scaladoc/run/SI-4360-run.log' from failed test #####

error: Java heap space

##### Log file '/home/jenkins/workspace/scala-2.12.0-validate-test/test/scaladoc/run/SI-4887-run.log' from failed test #####

error: Java heap space

##### Log file '/home/jenkins/workspace/scala-2.12.0-validate-test/test/scaladoc/run/SI-4676-run.log' from failed test #####

error: Java heap space

😱

@retronym
Copy link
Member Author

retronym commented Sep 29, 2016

Some notes from scala/scala#5430:

  • Reduce the max heap of the forked partest tests, to, say, 256M. If some test cases need more, either rewrite them to be less memory hungry, or corral them in a new partest category that is run sequentially.

  • Review the initial/max heap of the forked partest control process. Seems to me like that needn't use -Xmx2G.

  • DRY up the config of heap size between javaOptions and testOption as much as makes sense, as per @adriaanm's suggested:

    testOptions in IntegrationTest += Tests.Argument(s"-Dpartest.java_opts=${(javaOptions in IntegrationTest).value.mkString(" ")}")

@SethTisue
Copy link
Member

have we seen this lately?

@adriaanm
Copy link
Contributor

adriaanm commented Nov 8, 2016

I'll grep the logs tomorrow.

@SethTisue
Copy link
Member

SethTisue commented Nov 17, 2016

things had been quiet in recent weeks (is my informal impression), but there was a spate of failures today, e.g. https://scala-ci.typesafe.com/job/scala-2.12.x-validate-test/3528/, reported by @som-snytt

@som-snytt
Copy link

Intermittent? "You keep using that word. I do not think it means what you think it means."

@adriaanm
Copy link
Contributor

adriaanm commented Nov 17, 2016

added swap to behemoths in 47a0d79

@SethTisue
Copy link
Member

quiet on this front lately, especially on the new larger behemoths, and anyway most stuff is moving to Travis-CI+AppVeyor

@sundharsk

This comment has been minimized.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

5 participants