Skip to content

Builds not retrying after concurrency limit reached #9014

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
oTree-org opened this issue Mar 15, 2022 · 13 comments
Closed

Builds not retrying after concurrency limit reached #9014

oTree-org opened this issue Mar 15, 2022 · 13 comments
Assignees
Labels
Bug A bug Needed: replication Bug replication is required

Comments

@oTree-org
Copy link

oTree-org commented Mar 15, 2022

My project has translations to Chinese and Japanese. When I push a new commit, it fails with the message "Concurrency limit reached (2), retrying in 5 minutes." But 5 minutes later, nothing seems to happen.

In the past, I got the same concurrency message, but the build did retry 5 minutes later, so it was OK. That was as recently as a week ago, as you can see from the builds page: https://readthedocs.org/projects/otree/builds/

@humitos
Copy link
Member

humitos commented Mar 15, 2022

Yesterday, I opened a similar issue at #9011. However, I just checked and your builds only take less than 60s to build, so I don't really know why they are not being re-tried. I'll dig a little more on this and see what I find.

@humitos humitos added the Needed: replication Bug replication is required label Mar 15, 2022
@oTree-org
Copy link
Author

Thanks for looking into it! FYI this issue continues to occur...

@agjohnson
Copy link
Contributor

Also noticed the same issue on this project: https://readthedocs.org/projects/overte-docs/builds/16541488/

In fact this build doesn't even look like it should be concurrency limited:
https://readthedocs.org/projects/overte-docs/builds/16548600/

This build started a day after the last build, and was never retried. Same for the build before it. We might need to align execution times next, but this feels like a bug with our concurrency/retry logic.

@humitos
Copy link
Member

humitos commented Apr 7, 2022

I'm doing a quick test with test-builds project.

  1. set max_concurrent_builds = 1
  2. trigger timeout version at https://readthedocs.org/projects/test-builds/builds/16590955/
  3. trigger latest immediately after: https://readthedocs.org/projects/test-builds/builds/16590960/
  4. trigger build-jobs immediately after: https://readthedocs.org/projects/test-builds/builds/16590962/
  5. at this point latest and build-jobs are concurrency limited

If everything works properly, 3) and 4) should retry several times until the build from 2) (timeout) finishes. At this point, one of them should be picked and run. The other one, should continue limited and keep retrying. Finally, the limited one should be picked and run.

At this point, the three URLs linked in the steps should show all passed builds in green.

@ericholscher
Copy link
Member

Believe this has been fixed now with #9096. Please let us know if you continue to see any issues.

Repository owner moved this from Planned to Done in 📍Roadmap Apr 12, 2022
@stsewd
Copy link
Member

stsewd commented Apr 12, 2022

I have opened an issue on celery celery/celery#7455, in case it's a bug from their side

@oTree-org
Copy link
Author

I am still experiencing the same issue: https://readthedocs.org/projects/otree/builds/16634477/

@stsewd
Copy link
Member

stsewd commented Apr 12, 2022

That looks like our API thinking that project has active builds.

@stsewd
Copy link
Member

stsewd commented Apr 12, 2022

Just checked the API, and it's correct, I have also triggered some builds on your project, and this build was queued https://readthedocs.org/projects/otree/builds/16634866/ to retry in 5 min (if everything is okay it will appear as successful in some minutes).

@stsewd
Copy link
Member

stsewd commented Apr 14, 2022

We are having more reports about this problem, it seems to happen less frequently and (maybe?) randomly now :/

@stsewd stsewd reopened this Apr 14, 2022
Repository owner moved this from Done to In progress in 📍Roadmap Apr 14, 2022
@nijel
Copy link
Contributor

nijel commented Apr 14, 2022

It is happening at https://readthedocs.org/projects/weblate/builds/16650229/ for about a month. Maybe it's related to building documentation in several languages?

@agjohnson
Copy link
Contributor

We've definitely noticed this more with projects that have translations. There is a possibility this is related directly to translation usage, though more likely projects with translations might just be noticing the but more frequently as these projects are always triggering multiple builds per commit.

We weren't able to reproduce the build failures that otree + translations triggered immediately following deploy, but there does seem to be something wrong here still.

@agjohnson
Copy link
Contributor

Also, I raised this in another issue, but the builds that were concurrency limited are in a failed state, which seems unexpected. I noticed this happening very quickly on some projects, in 1-2 minutes after the build is set to retry. I think these builds are intended to stay as Build.state = "triggered", so the failed state seems like it could be a hint.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Bug A bug Needed: replication Bug replication is required
Projects
Archived in project
Development

No branches or pull requests

6 participants