Celery: use `on_retry` to handle `BuildMaxConcurrencyError` #8917

humitos · 2022-02-14T21:20:34Z

Instead of handling the retry of a task manually when the project hits max
concurrency error, we rely on Celey handlers to do it for us.

By raising the any of the exceptions defined in autoretry_for, Celery will
call on_retry automatically. There, inside on_retry we handle the particular
case for BuildMaxConcurrencyError by setting the error into the build object
and skipping the test to continue running.

See
docs.celeryproject.org/en/master/userguide/tasks.html#automatic-retry-for-known-exceptions
and docs.celeryproject.org/en/master/userguide/tasks.html#Task.autoretry_for

Continuation of #8905

Define Django settings for max concurrency retry and seconds delay to re-use them acrross the whole codebase.

Instead of handling the retry of a task manually when the project hits max concurrency error, we rely on Celey handlers to do it for us. By raising the any of the exceptions defined in `autoretry_for`, Celery will call `on_retry` automatically. There, inside `on_retry` we handle the particular case for `BuildMaxConcurrencyError` by setting the error into the build object and skipping the test to continue running. See https://docs.celeryproject.org/en/master/userguide/tasks.html#automatic-retry-for-known-exceptions and https://docs.celeryproject.org/en/master/userguide/tasks.html#Task.autoretry_for

ericholscher

This seems reasonable, but I don't really follow what is happening here as it's currently written. Definitely needs some comments or docstrings.

ericholscher · 2022-02-14T21:29:04Z

readthedocs/core/utils/__init__.py

@@ -186,8 +186,8 @@ def prepare_build(
                project_slug=project.slug,
                version_slug=version.slug,
            )
-            options['countdown'] = 5 * 60


How does this code actually cancel the build? it seems it's just setting options on the task, and an error, but still triggering the build?

Also seems like these aren't required, since it should take this from the task?

This PR does not change the flow of this chunk of code. I just moved these numbers into Django settings.

That said, this code is run in the webs when the build is triggered and it immediately sets a 5 minutes delay because we already know this project is concurrency limited at the moment. If we delete this code, builders will grab each of these triggered tasks and retry them immediately. So, we are just short-circuiting the process here.

Note that it does not cancel the build, it just adds a countdown to be delayed 5 minutes. The error message is to communicate this delay to the user.

Ah, interesting. That makes the comment above make a lot more sense. 👍

ericholscher · 2022-02-14T21:31:05Z

readthedocs/projects/tasks/builds.py

@@ -482,7 +471,16 @@ def on_success(self, retval, task_id, args, kwargs):
        self.data.build['success'] = True

    def on_retry(self, exc, task_id, args, kwargs, einfo):


This task could use a docstring. When in the build flow is this triggered?

I added a small docstring explaining when this celery handler is called mentioning both flows: raising a known exception to auto retry; and calling self.retry

agjohnson · 2022-02-14T22:23:44Z

readthedocs/projects/tasks/builds.py

-                exc=BuildMaxConcurrencyError,
-                # We want to retry this build more times
-                max_retries=25,
+                )


Ooooh, I see what you were talking about now with autoretry_for. I wasn't familiar with this attribute when we were last talking about this. I like this syntax, it makes more sense 👍

ericholscher · 2022-02-14T23:17:18Z

readthedocs/core/utils/__init__.py

@@ -186,8 +186,8 @@ def prepare_build(
                project_slug=project.slug,
                version_slug=version.slug,
            )
-            options['countdown'] = 5 * 60


Ah, interesting. That makes the comment above make a lot more sense. 👍

ericholscher · 2022-02-14T23:18:17Z

readthedocs/core/utils/__init__.py

@@ -186,8 +186,8 @@ def prepare_build(
                project_slug=project.slug,
                version_slug=version.slug,
            )
-            options['countdown'] = 5 * 60
-            options['max_retries'] = 25
+            options['countdown'] = settings.RTD_BUILDS_RETRY_DELAY


Suggested change

options['countdown'] = settings.RTD_BUILDS_RETRY_DELAY

# Delay the start of the build for the build retry delay.

# We're still triggering the task, but it won't run immediately,

# and the user will be alerted in the UI from the Error below.

options['countdown'] = settings.RTD_BUILDS_RETRY_DELAY

ericholscher · 2022-02-14T23:18:35Z

readthedocs/projects/tasks/builds.py

-            api_v2.build(self.data.build['id']).patch({
-                'error': BuildMaxConcurrencyError.message.format(
+            # By raising this exception and using ``autoretry_for``, Celery
+            # will handle this automatically calling ``on_retry``


This is a great comment!

ericholscher · 2022-02-14T23:19:14Z

readthedocs/settings/base.py

@@ -125,6 +125,8 @@ def SESSION_COOKIE_SAMESITE(self):
    RTD_STABLE_VERBOSE_NAME = 'stable'
    RTD_CLEAN_AFTER_BUILD = False
    RTD_MAX_CONCURRENT_BUILDS = 4
+    RTD_BUILDS_MAX_RETRIES = 25


Is this a reasonable default? This probably isn't the place to change it, but it feels like a lot.

I thought about this as well. I think you are right that 25 as default for all retry cases is not perfect and we should have a lower number for other cases. However, right now, the only situation where we perform a retry is the concurrency limit reached, as far as I can tell.

But we should expand this to be:

RTD_BUILDS_MAX_RETRIES=5

RTD_BUILDS_CONCURRENT_LIMIT_MAX_RETRIES=25

or similar in case we need the default value for other cases.

This comment was suggested in #8917 (comment) but the PR was already merged.

humitos added 2 commits February 14, 2022 17:48

Celery: max concurrency settings

46f4b41

Define Django settings for max concurrency retry and seconds delay to re-use them acrross the whole codebase.

humitos requested a review from a team as a code owner February 14, 2022 21:20

ericholscher reviewed Feb 14, 2022

View reviewed changes

Docstring for on_retry

1bf397e

agjohnson approved these changes Feb 14, 2022

View reviewed changes

ericholscher approved these changes Feb 14, 2022

View reviewed changes

ericholscher merged commit bfe9b05 into master Feb 14, 2022

ericholscher deleted the humitos/celery-max-concurrency branch February 14, 2022 23:44

humitos added a commit that referenced this pull request Feb 15, 2022

Comment: add comment from PR review

8acba21

This comment was suggested in #8917 (comment) but the PR was already merged.

humitos mentioned this pull request Feb 15, 2022

Comment: add comment from PR review #8921

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

Celery: use `on_retry` to handle `BuildMaxConcurrencyError` #8917

Celery: use `on_retry` to handle `BuildMaxConcurrencyError` #8917

Uh oh!

humitos commented Feb 14, 2022

Uh oh!

ericholscher left a comment

Uh oh!

ericholscher Feb 14, 2022

Uh oh!

humitos Feb 14, 2022 •

edited

Loading

Uh oh!

ericholscher Feb 14, 2022

Uh oh!

ericholscher Feb 14, 2022

Uh oh!

humitos Feb 14, 2022

Uh oh!

agjohnson Feb 14, 2022

Uh oh!

ericholscher Feb 14, 2022

Uh oh!

ericholscher Feb 14, 2022

Uh oh!

ericholscher Feb 14, 2022

Uh oh!

ericholscher Feb 14, 2022

Uh oh!

humitos Feb 15, 2022

Uh oh!

Uh oh!

		@@ -482,7 +471,16 @@ def on_success(self, retval, task_id, args, kwargs):
		self.data.build['success'] = True

		def on_retry(self, exc, task_id, args, kwargs, einfo):

-            options['countdown'] = settings.RTD_BUILDS_RETRY_DELAY
+            # Delay the start of the build for the build retry delay.
+            # We're still triggering the task, but it won't run immediately,
+            # and the user will be alerted in the UI from the Error below.
+            options['countdown'] = settings.RTD_BUILDS_RETRY_DELAY

Uh oh!

Celery: use on_retry to handle BuildMaxConcurrencyError #8917

Celery: use on_retry to handle BuildMaxConcurrencyError #8917

Uh oh!

Conversation

humitos commented Feb 14, 2022

Uh oh!

ericholscher left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

humitos Feb 14, 2022 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Celery: use `on_retry` to handle `BuildMaxConcurrencyError` #8917

Celery: use `on_retry` to handle `BuildMaxConcurrencyError` #8917

humitos Feb 14, 2022 •

edited

Loading