Skip to content

Increase the maximum heap size on Jenkins #1030

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 1 commit into from
Jan 17, 2016

Conversation

smarter
Copy link
Member

@smarter smarter commented Jan 16, 2016

Review by @odersky or @DarkDimius

@smarter
Copy link
Member Author

smarter commented Jan 16, 2016

[info] OpenJDK 64-Bit Server VM warning: INFO: os::commit_memory(0x00000006ded80000, 2463629312, 0) failed; error='Cannot allocate memory' (errno=12)
[info] #
[info] # There is insufficient memory for the Java Runtime Environment to continue.
[info] # Native memory allocation (mmap) failed to map 2463629312 bytes for committing reserved memory.
[info] # An error report file with more information is saved as:
[info] # /home/jenkins/workspace/dotty-master-validate-partest/hs_err_pid24921.log

I think this means that we're trying to allocate more memory than available on the Jenkins VM.

@SethTisue, @adriaanm : How much memory do the VMs have? Is this something that could be increased?

In the meantime, I'll try reducing -Xmx a bit and see if I can find a middle ground.

@DarkDimius
Copy link
Contributor

@smarter

Quoting @adriaanm

behemoths are c4.2xlarge, the others c4.xlarge

c4.2xlarge = 15 Gb Ram,
c4.xlarge = 7.5 Gb ram

@smarter
Copy link
Member Author

smarter commented Jan 16, 2016

@DarkDimius : can we reduce the parallelism so that we use less memory?

@smarter smarter force-pushed the add/more-memory branch 3 times, most recently from 977681d to 29ea17c Compare January 16, 2016 18:54
We're getting a lot of OutOfMemoryException when the maximum size is 1
GB, but we cannot increase it too much without using up all the memory
available on the Jenkins instances, let's see if 1.1 GB is enough.

Also stop using a custom -Xss, the default of 1 MB should be good enough.
@smarter smarter force-pushed the add/more-memory branch 2 times, most recently from 1e0965f to 6e8fcde Compare January 16, 2016 20:12
@smarter smarter changed the title Use the same memory options everywhere Increase the maximum heap size on Jenkins Jan 16, 2016
@smarter
Copy link
Member Author

smarter commented Jan 16, 2016

OK, I've done four runs with the maximum heap size increased from 1 GB to 1.1 GB for Jenkins and they all succeeded, so I propose we merge this now and worry about a longer-term solution later. I didn't change the memory settings used when running the tests locally because decreasing the heap size there would probably slow down the tests.

@odersky
Copy link
Contributor

odersky commented Jan 16, 2016

Can we give it 1.5G? That would give us some margin for more complicated
tests.

  • Martin

On Sat, Jan 16, 2016 at 9:15 PM, Guillaume Martres <[email protected]

wrote:

OK, I've done four runs with the maximum heap size increased from 1 GB to
1.1 GB for Jenkins and they all succeeded, so I propose we merge this now
and worry about a longer-term solution later. I didn't change the memory
settings used when running the tests locally because decreasing the heap
size there would probably slow down the tests.


Reply to this email directly or view it on GitHub
#1030 (comment).

Martin Odersky
EPFL

@smarter
Copy link
Member Author

smarter commented Jan 16, 2016

Can we give it 1.5G? That would give us some margin for more complicated tests.

Unfortunately not, I tried that at first but got the failure described in #1030 (comment) because apparently we're using up all the RAM available on the VM instance, 1.1G is the max I could get away with, even 1.25G is too much. So this is really a short-term fix.

@smarter
Copy link
Member Author

smarter commented Jan 16, 2016

Note that if the VM instances had a swap partition then at least we wouldn't get a crash when using too much RAM, it would just be very slow, but an even better fix would be to figure out how we're using the 15GB of RAM of the VM instances and fix this (I have no idea how many JVMs we run in parallel and what controls this)

@DarkDimius
Copy link
Contributor

I have no idea how many JVMs we run in parallel and what controls this

We run 1 JVM per core.
Those Amazon EC2 VMs have 1.875 GB RAM per core.

@smarter
Copy link
Member Author

smarter commented Jan 16, 2016

We run 1 JVM per core.

You mean, partest starts 1 JVM per-core? But you also have to take into account the JVM used by sbt itself right?

@smarter
Copy link
Member Author

smarter commented Jan 16, 2016

If N=Number of cores, could we try running partest with N-1 JVMs instead of N JVMs?

@DarkDimius
Copy link
Contributor

c4.2xlarge:
8 Cores,
15 GB Ram.

Even if we run 9 JVMs, that's 1.66GB per VM.
End this is upper bound. I wouldn't expect SBT to actually allocate all allowed memory while compiling Dotty. Nor would I expect Dotty to use so much memory. It simply shouldn't. Even when compiling itself.
Unless we have a memory leak.

@smarter
Copy link
Member Author

smarter commented Jan 16, 2016

You also have to take into account that some ram is used by the OS itself, and the total memory usage of a JVM is bigger than its heap size (but I don't know how much), so we might be close to the limit.

@smarter
Copy link
Member Author

smarter commented Jan 16, 2016

(One thing that might be worth investigating: in the error I showed above the amount of memory that the JVM tried to "commit" was 2463629312 bytes = 2.35 GB, this is much bigger than the maximum heap size of 1.5G I specified, so something weird may be going on here, we might have more details once scala/scala-jenkins-infra#157 is merged and we can take a look at the log file).

@odersky odersky merged commit 6e8fcde into scala:master Jan 17, 2016
@odersky
Copy link
Contributor

odersky commented Jan 17, 2016

Wait: Partest runs N threads in one VM. Otherwise we would not have seen
data races. Each thread runs a full compiler, and a full compiler needs a
couple 100 mb to compile a significant chunk of sources. So it seems to me
that if max heap if fixed we could still reduce memory pressure by reducing
N. What is N at the moment?

On Sun, Jan 17, 2016 at 12:11 AM, Guillaume Martres <
[email protected]> wrote:

(One thing that might be worth investigating: in the error I showed above
the amount of memory that the JVM tried to "commit" was 2463629312 bytes =
2.35 GB, this is much bigger than the maximum heap size of 1.5G I
specified, so something weird may be going on here, we might have more
details once scala/scala-jenkins-infra#157
scala/scala-jenkins-infra#157 is merged and we
can take a look at the log file).


Reply to this email directly or view it on GitHub
#1030 (comment).

Martin Odersky
EPFL

@smarter
Copy link
Member Author

smarter commented Jan 17, 2016

What is N at the moment?

It's the number of cores on the machine, here's an attempt at reducing this: #1032

smarter added a commit to dotty-staging/dotty that referenced this pull request Jan 17, 2016
All of our recent memory-related tests failures since
scala#1030 was merged seem to be caused
by t7880.scala. It tries to intentionally trigger an OutOfMemoryError,
however since we don't pass -Xmx to our run tests it's possible that
this we fill up the memory of our host before we reach the maximum heap
size of the JVM.

Ideally, we would specify a -Xmx for run tests (scalac uses 1 GB),
unfortunately in the version of partest we use this is tricky because we
need to set the system property "partest.java_opts". If we upgrade our
partest to the latest release, we can instead specify it by setting the
argument `javaOpts` of the constructor of `SuiteRunner`, see
scala/scala-partest@7c4659e
smarter added a commit to dotty-staging/dotty that referenced this pull request Jan 17, 2016
All of our recent memory-related tests failures since
scala#1030 was merged seem to be caused
by t7880.scala. It tries to intentionally trigger an OutOfMemoryError,
however since we don't pass -Xmx to our run tests it's possible that
this we fill up the memory of our host before we reach the maximum heap
size of the JVM.

Ideally, we would specify a -Xmx for run tests (scalac uses 1 GB),
unfortunately in the version of partest we use this is tricky because we
need to set the system property "partest.java_opts". If we upgrade our
partest to the latest release, we can instead specify it by setting the
argument `javaOpts` of the constructor of `SuiteRunner`, see
scala/scala-partest@7c4659e
@SethTisue
Copy link
Member

@SethTisue, @adriaanm : How much memory do the VMs have? Is this something that could be increased?

Adriaan has some numbers at https://github.com/scala/scala-jenkins-infra/blob/master/doc/design.md, not entirely sure they're current. If you want to poke around yourself, we should probably get you access to the Jenkins nodes so you can ssh in and investigate on your own, try experiments from the command line, etc. see https://github.com/scala/scala-jenkins-infra/blob/master/doc/client-setup.md#hosts-and-ssh-config

@smarter
Copy link
Member Author

smarter commented Jan 18, 2016

@SethTisue : That could be useful in the future yes, I guess you'll need me to give you an ssh key, if so you can use: http://guillaume.martres.me/id_rsa.pub

@SethTisue
Copy link
Member

@adriaanm could you help Guillaume with this? I realized this is part of this process I don't know about, the docs just say "Send Adriaan your public key... He will use it to encrypt your credentials"

@adriaanm
Copy link
Contributor

Here's the section where you documented how to do this: https://github.com/scala/scala-jenkins-infra/blob/master/doc/client-setup.md#get-your-public-key-added :-)

@SethTisue
Copy link
Member

You can lead a horse to documentation, but you can't make him drink. Even if you are the horse.

@SethTisue
Copy link
Member

Hmm, I merged scala/scala-jenkins-infra#159, ran knife cookbook upload scala-jenkins-infra as usual, and ran chef-client on jenkins-master, but I didn't see anything about the change in the resulting output, and I don't see Guillaume's key in ~ec2-user/.ssh/authorized_keys. Is there something else we need to do to make the change kick in?

@adriaanm
Copy link
Contributor

default['authorized_keys']['jenkins'] is for the jenkins user

@SethTisue
Copy link
Member

hmm. do I need to run chef-client on the workers?

@adriaanm
Copy link
Contributor

yes, but they should run that on a schedule

@SethTisue
Copy link
Member

ok. @smarter, if this isn't already working for you, let us know.

@allanrenucci allanrenucci deleted the add/more-memory branch December 14, 2017 16:57
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants