-
Notifications
You must be signed in to change notification settings - Fork 648
docker-rootful
: Increase inotify limits by default
#1179
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
Conversation
docker-rootful
: Increase inotify limits by default
Thanks, but please sign the commit for DCO (run |
# from crash looping. | ||
echo 'fs.inotify.max_user_watches = 524288' >> /etc/sysctl.conf | ||
echo 'fs.inotify.max_user_instances = 512' >> /etc/sysctl.conf | ||
sysctl --system |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we replicate this to docker.yaml, podman*.yaml, k8s.yaml, k3s.yaml too?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
good idea!
sure; i'll try to make these changes some time between tomorrow and friday-- CarlosOn Nov 19, 2022, at 19:04, Akihiro Suda ***@***.***> wrote:
@AkihiroSuda commented on this pull request.
In examples/docker-rootful.yaml:
@@ -54,6 +54,14 @@ provision:
fi
export DEBIAN_FRONTEND=noninteractive
curl -fsSL https://get.docker.com | sh
+- mode: system
+ script: |
+ #!/bin/bash
+ # Increase inotify limits to prevent nested Kubernetes control planes
+ # from crash looping.
+ echo 'fs.inotify.max_user_watches = 524288' >> /etc/sysctl.conf
+ echo 'fs.inotify.max_user_instances = 512' >> /etc/sysctl.conf
+ sysctl --system
Can we replicate this to docker.yaml, podman*.yaml, k8s.yaml, k3s.yaml too?
—Reply to this email directly, view it on GitHub, or unsubscribe.You are receiving this because you authored the thread.Message ID: ***@***.***>
|
Seems needlessly complicated for the k3s and k8s examples, since they would have VMs as nodes (not containers) ? If I understand correctly, it is only for running containerd-in-docker or containerd-in-podman - as part of "kind" |
|
This resolves lima-vm#1178 and allows users to create multiple local Kubernetes clusters through Kind or the Cluster API Docker provider. Signed-off-by: Carlos Nunez <[email protected]>
0ff03e9
to
047e703
Compare
✅ Please sign off the commit for DCO: https://github.com/apps/dco I'm not sure if Podman needs this treatment, as it uses Can that be a separate pull request, given that this behavior is known for containerd-based engines? |
script: | | ||
#!/bin/bash | ||
# Increase inotify limits to prevent nested Kubernetes control planes | ||
# from crash looping. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this needed for k3s? If so, it should be needed for k8s.yaml
too?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As far as I know, it is only needed for k3d and kind - not for k3s and k8s
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As far as I know, it is only needed for k3d and kind - not for k3s and k8s
Not necessarily, it's used anytime you're using a lot of inotify, which can happen with k3s as well, anything using configmaps will need one per configmap, user workloads of other kinds may also run into this.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
kind usage is a common way to encounter it, because you often start multiple kubelets on the same kernel and some system workloads with configmaps, but that's only one way to run up usage. a single kubelet with many configmaps could hit the same limit.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
BTW on Ubuntu defaults kubernetes's e2e tests created enough pods to exceed it running a Kubernetes worker node on the host (not kind, and not a single node cluster), in particular max_user_instances default seems to be pretty low (128)
I setup my fork the other day and have been meaning to work up a new PR, that didn't happen yet, but leaving this breadcrumb in the meantime. There's also some pointers in the linked issue with example tuning in other cluster tools in the project.
Ah this looks great, I've been doing something similar for ages. |
inotify isn't namespaced in the kernel, if you start another VM / kernel you'll have separate limits but otherwise this applies to anything using inotify (consider also things like the inotify command line tool, IDEs, etc) crun/runc/... shouldn't change that. Increasing the inotify limits is probably a good idea on all the templates, at the cost of increasing the template complexity and some kmem |
Hello! I apologize for not signing the DCO with my @AkihiroSuda , should I re-raise this PR with my personal GitHub account (i.e. this one) and sign-off with that? I still increase my (I have not tested whether podman + kind needs this treatment, though I likely will soon.) |
Yes, please! We are unable to merge any commits without a valid DCO; it is a CNCF requirement. |
We could do this in the internal provisioning scripts, if we can agree that it should be done for all distros. That way it would not complicate the templates. We would also have to decide what the new limits are supposed to be, and if they can be the same for each distro. Is there any downside to increasing the limits? If not, why are the default limits so low? @AkihiroSuda any opinion on this? An alternative in the future (when we have composable templates) would be to implement this in a mix-in template, and users could add it with |
Setting low limits caps the memory used by the kernel for this purpose (tracking inotify watches) So there's definitely a downside to high limits. Since it's kernel memory it can't be swapped. I don't remember where 524288 but that's probably .... excessive |
AIUI that memory is only consumed if the user actually creates that many watches though. The kernel default is pretty low https://fleet-support.jetbrains.com/hc/en-us/articles/8084899752722-Inotify-Watches-Limit-Linux https://www.suse.com/support/kb/doc/?id=000020048 524288 max_user_watches seems pretty common, but I think a lower value would be fine. There's some discussion in https://patchwork.kernel.org/project/linux-fsdevel/patch/[email protected]/#23713335 |
i agree that upping inotify limits only makes sense when many containerized kubernetes clusters will run on the VM (IIRC I ran into this limit when i was doing cluster-api work using the capd provider which creates lots of clusters).given this, another thing I could do is add a bool field in the Config struct, like "enableHighFSWatcherCount", that adds the appropriate sysctl commands automatically to reduce complexity within template YAMLs.thoughts?
|
A workload like a remote IDE or one cluster with as many lighter workloads as the multiple clusters together can also do it. We tend to notice it with multiple clusters because there are a minimum number of system workloads, but just adding workloads to one cluster can hit this instead, not to mention workloads that themselves use inotify. Bumping inotify limits is common for other developer tools that use it, see the links above. It also shouldn't cost anything if unused. It just caps the number of entries (~Inodes) that are permitted, which consume kmem. one thing to note: these limits are per user, so they only sort of work as a defensive limit, if you run workloads as lots of users that will multiply the maximum anyhow ... |
So actually users can increase this a bit by allocating more memory but you have to add a lot as the default is capped to 1% up to a maximum: See also abiosoft/colima#319 |
Yes, this should be probably added to https://github.com/lima-vm/lima/tree/master/pkg/cidata/cidata.TEMPLATE.d/boot. |
This resolves #1178.