Skip to content

docker-rootful: Increase inotify limits by default #1179

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 1 commit into
base: master
Choose a base branch
from

Conversation

carlosonunez-vmw
Copy link

This resolves #1178.

@carlosonunez-vmw carlosonunez-vmw changed the title Increase inotify limits by default docker-rootful: Increase inotify limits by default Nov 18, 2022
@AkihiroSuda
Copy link
Member

Thanks, but please sign the commit for DCO
https://github.com/apps/dco

(run git commit -a -s --amend, and make sure that the Signed-off-by: NAME <EMAIL> line with your real name is included in the commit message)

# from crash looping.
echo 'fs.inotify.max_user_watches = 524288' >> /etc/sysctl.conf
echo 'fs.inotify.max_user_instances = 512' >> /etc/sysctl.conf
sysctl --system
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we replicate this to docker.yaml, podman*.yaml, k8s.yaml, k3s.yaml too?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

good idea!

@carlosonunez-vmw
Copy link
Author

carlosonunez-vmw commented Nov 20, 2022 via email

@afbjorklund
Copy link
Member

afbjorklund commented Nov 20, 2022

Seems needlessly complicated for the k3s and k8s examples, since they would have VMs as nodes (not containers) ?

If I understand correctly, it is only for running containerd-in-docker or containerd-in-podman - as part of "kind"

@AkihiroSuda
Copy link
Member

This resolves lima-vm#1178 and allows users to create multiple local Kubernetes
clusters through Kind or the Cluster API Docker provider.

Signed-off-by: Carlos Nunez <[email protected]>
@carlosonunez-vmw
Copy link
Author

✅ Please sign off the commit for DCO: https://github.com/apps/dco
✅ Please squash commits
⚠️ Please consider doing the same for podman*.yaml

I'm not sure if Podman needs this treatment, as it uses crun instead of runc which handles nested cgroup mounting differently. This would require additional testing.

Can that be a separate pull request, given that this behavior is known for containerd-based engines?

script: |
#!/bin/bash
# Increase inotify limits to prevent nested Kubernetes control planes
# from crash looping.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this needed for k3s? If so, it should be needed for k8s.yaml too?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As far as I know, it is only needed for k3d and kind - not for k3s and k8s

Copy link

@BenTheElder BenTheElder Feb 28, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As far as I know, it is only needed for k3d and kind - not for k3s and k8s

Not necessarily, it's used anytime you're using a lot of inotify, which can happen with k3s as well, anything using configmaps will need one per configmap, user workloads of other kinds may also run into this.

Copy link

@BenTheElder BenTheElder Feb 28, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

kind usage is a common way to encounter it, because you often start multiple kubelets on the same kernel and some system workloads with configmaps, but that's only one way to run up usage. a single kubelet with many configmaps could hit the same limit.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

BTW on Ubuntu defaults kubernetes's e2e tests created enough pods to exceed it running a Kubernetes worker node on the host (not kind, and not a single node cluster), in particular max_user_instances default seems to be pretty low (128)

kubernetes/kubernetes#130990

I setup my fork the other day and have been meaning to work up a new PR, that didn't happen yet, but leaving this breadcrumb in the meantime. There's also some pointers in the linked issue with example tuning in other cluster tools in the project.

@chancez
Copy link
Contributor

chancez commented Dec 8, 2022

Ah this looks great, I've been doing something similar for ages.

@BenTheElder
Copy link

I'm not sure if Podman needs this treatment, as it uses crun instead of runc which handles nested cgroup mounting differently. This would require additional testing.

inotify isn't namespaced in the kernel, if you start another VM / kernel you'll have separate limits but otherwise this applies to anything using inotify (consider also things like the inotify command line tool, IDEs, etc)

crun/runc/... shouldn't change that. Increasing the inotify limits is probably a good idea on all the templates, at the cost of increasing the template complexity and some kmem

@carlosonunez
Copy link

carlosonunez commented Mar 1, 2025

Hello! I apologize for not signing the DCO with my vmware account. I've since moved on and no longer have access to this GitHub handle.

@AkihiroSuda , should I re-raise this PR with my personal GitHub account (i.e. this one) and sign-off with that? I still increase my inotify limits in my Docker template; it would be nice to save others the work.

(I have not tested whether podman + kind needs this treatment, though I likely will soon.)

@jandubois
Copy link
Member

should I re-raise this PR with my personal GitHub account (i.e. this one) and sign-off with that?

Yes, please! We are unable to merge any commits without a valid DCO; it is a CNCF requirement.

@jandubois
Copy link
Member

Increasing the inotify limits is probably a good idea on all the templates, at the cost of increasing the template complexity and some kmem

We could do this in the internal provisioning scripts, if we can agree that it should be done for all distros. That way it would not complicate the templates. We would also have to decide what the new limits are supposed to be, and if they can be the same for each distro.

Is there any downside to increasing the limits? If not, why are the default limits so low?

@AkihiroSuda any opinion on this?

An alternative in the future (when we have composable templates) would be to implement this in a mix-in template, and users could add it with base: template://options/inotify or whatever we want to call it. It could even be parameterized with param settings. But we are not there yet until later this year.

@BenTheElder
Copy link

Is there any downside to increasing the limits? If not, why are the default limits so low?

Setting low limits caps the memory used by the kernel for this purpose (tracking inotify watches)

https://man7.org/linux/man-pages/man7/inotify.7.html#:~:text=/proc%20interfaces%0A%20%20%20%20%20%20%20The%20following%20interfaces%20can%20be%20used%20to%20limit%20the%20amount%20of%20kernel%0A%20%20%20%20%20%20%20memory%20consumed%20by%20inotify%3A

So there's definitely a downside to high limits. Since it's kernel memory it can't be swapped.

I don't remember where 524288 but that's probably .... excessive

@BenTheElder
Copy link

AIUI that memory is only consumed if the user actually creates that many watches though.

The kernel default is pretty low

https://www.monodevelop.com/documentation/inotify-watches-limit/#:~:text=Managed%20file%20watching%20is%20less,The%20default%20is%208192.

https://fleet-support.jetbrains.com/hc/en-us/articles/8084899752722-Inotify-Watches-Limit-Linux

https://www.suse.com/support/kb/doc/?id=000020048

524288 max_user_watches seems pretty common, but I think a lower value would be fine.

There's some discussion in https://patchwork.kernel.org/project/linux-fsdevel/patch/[email protected]/#23713335

@carlosonunez
Copy link

carlosonunez commented Mar 1, 2025 via email

@BenTheElder
Copy link

A workload like a remote IDE or one cluster with as many lighter workloads as the multiple clusters together can also do it. We tend to notice it with multiple clusters because there are a minimum number of system workloads, but just adding workloads to one cluster can hit this instead, not to mention workloads that themselves use inotify.

Bumping inotify limits is common for other developer tools that use it, see the links above.

It also shouldn't cost anything if unused. It just caps the number of entries (~Inodes) that are permitted, which consume kmem.

one thing to note: these limits are per user, so they only sort of work as a defensive limit, if you run workloads as lots of users that will multiply the maximum anyhow ...

https://watchexec.github.io/docs/inotify-limits.html

@BenTheElder
Copy link

So actually users can increase this a bit by allocating more memory but you have to add a lot as the default is capped to 1% up to a maximum:

torvalds/linux@9289012

See also abiosoft/colima#319
It appears docker desktop defaults to that maximum based on the discussion there, but I don't have it locally since the license change and I'm not finding it documented anywhere.

@AkihiroSuda
Copy link
Member

Increasing the inotify limits is probably a good idea on all the templates, at the cost of increasing the template complexity and some kmem

Yes, this should be probably added to https://github.com/lima-vm/lima/tree/master/pkg/cidata/cidata.TEMPLATE.d/boot.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

"too many open files" error upon creating multiple Kind clusters on Lima VMs.
7 participants