-
Notifications
You must be signed in to change notification settings - Fork 770
Huge number of open file descriptors after days of running #12
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
Thanks for the report, I'll look into it this weekend. Sounds like there's a query being improperly closed somewhere. |
Running the current release in a container as per the instructions I have in the README at the moment doesn't seem to cause any issues. My test setup:
Could you supply the output of
Another source of debugging would be to check the exporter's own introspection data - the prometheus metrics should provide
|
Thank you for your reply. I've left an exporter running for the last 2 days.
At exporter start, we had:
fd count kept growing:
then
Now I have:
over the container lifespan, I've queried exporter itself keeps few files:
Now on docker daemon:
lsof_p_docker_after_2_days.txt is attached. Thank you. |
So from the looks of that docker trace what's being held open are docker's json log files for some reason. That looks like it's a docker issue you've run across rather then an exporter issue as we don't write to those files directly ever. Try running the exporter with |
moby/moby#21231 looks like this might've been a docker problem up till quite recently... |
Indeed, it is odd that:
I've started a new exporter with the log driver option, and I'll leave it for the weekend. |
Okay, so after a few days, I get this:
A similar run on another server gives:
|
Okay, I think I found the issue: the I used the same binary with tini on two images based on:
And these are the results:
|
This still doesn't seem entirely right to me. Looking at the EDIT: Something else I'm not clear on, what command is producing the result values you've been printing? i.e. |
While checking tini source code, we can see that it deals not only with child process management but also with signals management. The lack of the latter is what I suspect is causing the issue. A process running with PID 1 is handled differently by the kernel, child and signal wise, and this has been documented and addressed in various places:
As for why this is happening, I think running a custom container that only reports signal events upon receiving would shed more light, but I can't personally do it presently. I have to say that postgres exporter is not the only image we're running that has no init, a few other ones are running. They're Ruby based and the only relevant feature they have is explicit Regarding the command I use to report open fds, it's a script that iterates over all existing containers and counts container specific fds by looking at I just ran the script now, and here's the report:
(intitiator and configurator are the Ruby based containers I'm running)
We're using Ubuntu server 14.04LTS. That said, the tini PR has a side benefit, I can finally run Thank you! |
So I've been trying to replicate what you're seeing, and I just can't:
It very much seems like there is something broken with docker's pipe handling on your specific setup somehow (the container not killing seems to point in that direction too). |
Yes, this definitely on our setup, but since we're running production DB servers as containers, we can't just restart or upgrade Docker. |
Closing this unless it hits someone else or I can figure out a replication. |
I am having a similar issue using AWS ECS Fargate. Eventually postgres_exporter stops collecting and spams logs with:
I have Prometheus in 10 different environments, this bug comes for all of them after a similar amount of uptime. |
got this as well on server (no docker) |
…-cluster-label-and-log Update mixin to acommodate selector replacement.
* Monitor pga-reddit-service-taxonomy and -15 * Add dbname
Hi Will,
We've been running pg exporters for the last 2 weeks, and we noticed that they leaked file descriptors.
Here's a list of some pg exporters running on our site, the 2nd column being the open fds on the docker daemon and names have been masked:
As a result, this exposes a docker bug, where even destroying those containers, leaves fds deleted but still open on the docker daemon.
Here's a list of open fd count on our DB servers:
The last one is at half a million open fds, and it runs our masters.
docker ps
just hangs indefinitely.To reproduce this bug, just run an exporter and wait, open fds on docker daemon will keep growing.
Thank you
The text was updated successfully, but these errors were encountered: