-
-
Notifications
You must be signed in to change notification settings - Fork 3.6k
Builds fail trying to create_container
and/or remove_container
#7583
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
Use |
But that's just a deprecation, not sure why it's failing on it. |
It seems it was a temporal error. It failed when trying to cleanup the docker container used to build the documentation. I just trigger a new build or Also, it could be that you are hitting the max time allowed to build (your build reports 1311 seconds). So, increasing this time may help here as well. Let's see. @minrk I think I know what happened. Where you using I triggered another build https://readthedocs.org/projects/jupyterlab/builds/12146492/ (this should success because it's running in a bigger server). Please, let me know if the following builds are building OK. |
Hrm... The new build on the bigger server didn't succeed either. I will need to take a deeper look at this because I'm not sure what's happening here and the logs does not tell me too much about this. |
OK. I triggered a new build for latest and jumped into the server that started the build. I think found something there. CPU consumption is very high while Besides, it seems that it's not one specific process being killed, but the whole VM instance. My SSH session was closed, I don't have access to the same VM anymore and it does not appears under Azure instances anymore. So, it seems it's being killed by Azure for some reason? 🤷♂️ |
I looked the VM instance to not be reachable by autoscale rules, just in case, and it wasn't killed by Azure. While doing the build, I checked the memory used by Then, logs show that the Docker container user for building can't be removed (which our app handles), then ~5 minutes of a
|
I was able to reproduce this issue in my local RTD development instance. I do see different calls to the API that fails for this project:
They retry over and over again and finally they failed with These URL are hit after the This issue is not present when building any other project locally. |
OK, thanks to @stsewd that realized this. I think we found the issue and we have a workaround for your case. Your git repository has ~15k tags and ~30 branches. Read the Docs keeps a sync of those tags/branches and create a I checked your documentation and you have Take into account that new tags won't appear in Read the Docs as versions now and you won't be able to build them for now (at least until we deploy the async PR). If you find yourselves needing these tags, please ping us back and we will try re-enabling the sync of tags with the async PR merged. All of this does not explain why sometimes it failed with OOM, though. |
It looks like the "sync of tags" workaround didn't fully solve the problem, we're still seeing failing builds that aren't due to OOM, e.g. https://readthedocs.org/projects/jupyterlab/builds/12211330/ |
Yeah... There is something weird happening with Docker, but I'm not sure what it is yet. Our code was already catching some exceptions for this case ( Let's see if that PR that I opened helps for now as a workaround, but the root cause of the problem is still unknown to me. I guess that after an OOM issue in the servers the Docker daemon gets dumb --maybe the OS is killing it? We will need to debug this more deeply, I'm sure. |
create_container
and/or remove_container
We just deployed #7618 and I triggered a build for |
I think we're good, thanks! The last four PR builds also passed. |
We haven't seen this in a long time now and we did a massive refactor of the build process in #8815. So, I'm closing it. We can come back to it if we find it again and becomes a problem. |
Details:
Expected Result
Build succeeds, docs are published
Actual Result
Build succeeds, docs are not published. Or an error somewhere is not reported. The build status is
but no errors are reported in the logs.
The build log ends with:
This started failing after an update to how the project builds docs:
The text was updated successfully, but these errors were encountered: