-
-
Notifications
You must be signed in to change notification settings - Fork 3.6k
Celery: use on_retry
to handle BuildMaxConcurrencyError
#8917
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Changes from 2 commits
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Original file line number | Diff line number | Diff line change | ||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|
|
@@ -186,8 +186,8 @@ def prepare_build( | |||||||||||
project_slug=project.slug, | ||||||||||||
version_slug=version.slug, | ||||||||||||
) | ||||||||||||
options['countdown'] = 5 * 60 | ||||||||||||
options['max_retries'] = 25 | ||||||||||||
options['countdown'] = settings.RTD_BUILDS_RETRY_DELAY | ||||||||||||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
Suggested change
|
||||||||||||
options['max_retries'] = settings.RTD_BUILDS_MAX_RETRIES | ||||||||||||
build.error = BuildMaxConcurrencyError.message.format( | ||||||||||||
limit=max_concurrent_builds, | ||||||||||||
) | ||||||||||||
|
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -31,6 +31,7 @@ | |
BUILD_STATE_CLONING, | ||
BUILD_STATE_FINISHED, | ||
BUILD_STATE_INSTALLING, | ||
BUILD_STATE_TRIGGERED, | ||
BUILD_STATE_UPLOADING, | ||
BUILD_STATUS_FAILURE, | ||
BUILD_STATUS_SUCCESS, | ||
|
@@ -217,8 +218,9 @@ class UpdateDocsTask(SyncRepositoryMixin, Task): | |
autoretry_for = ( | ||
BuildMaxConcurrencyError, | ||
) | ||
max_retries = 5 # 5 per normal builds, 25 per concurrency limited | ||
default_retry_delay = 7 * 60 | ||
max_retries = settings.RTD_BUILDS_MAX_RETRIES | ||
default_retry_delay = settings.RTD_BUILDS_RETRY_DELAY | ||
retry_backoff = False | ||
|
||
# Expected exceptions that will be logged as info only and not retried | ||
throws = ( | ||
|
@@ -263,25 +265,12 @@ def _check_concurrency_limit(self): | |
max_concurrent_builds = settings.RTD_MAX_CONCURRENT_BUILDS | ||
|
||
if concurrency_limit_reached: | ||
# TODO: this could be handled in `on_retry` probably | ||
log.warning( | ||
'Delaying tasks due to concurrency limit.', | ||
project_slug=self.data.project.slug, | ||
version_slug=self.data.version.slug, | ||
) | ||
|
||
# This is done automatically on the environment context, but | ||
# we are executing this code before creating one | ||
api_v2.build(self.data.build['id']).patch({ | ||
'error': BuildMaxConcurrencyError.message.format( | ||
# By raising this exception and using ``autoretry_for``, Celery | ||
# will handle this automatically calling ``on_retry`` | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. This is a great comment! |
||
raise BuildMaxConcurrencyError( | ||
BuildMaxConcurrencyError.message.format( | ||
limit=max_concurrent_builds, | ||
), | ||
'builder': socket.gethostname(), | ||
}) | ||
self.retry( | ||
exc=BuildMaxConcurrencyError, | ||
# We want to retry this build more times | ||
max_retries=25, | ||
) | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Ooooh, I see what you were talking about now with |
||
) | ||
|
||
def _check_duplicated_build(self): | ||
|
@@ -482,7 +471,16 @@ def on_success(self, retval, task_id, args, kwargs): | |
self.data.build['success'] = True | ||
|
||
def on_retry(self, exc, task_id, args, kwargs, einfo): | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. This task could use a docstring. When in the build flow is this triggered? There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I added a small docstring explaining when this celery handler is called mentioning both flows: raising a known exception to auto retry; and calling |
||
log.warning('Retrying this task.') | ||
log.info('Retrying this task.') | ||
|
||
if isinstance(exc, BuildMaxConcurrencyError): | ||
log.warning( | ||
'Delaying tasks due to concurrency limit.', | ||
project_slug=self.data.project.slug, | ||
version_slug=self.data.version.slug, | ||
) | ||
self.data.build['error'] = exc.message | ||
self.update_build(state=BUILD_STATE_TRIGGERED) | ||
|
||
def after_return(self, status, retval, task_id, args, kwargs, einfo): | ||
# Update build object | ||
|
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -125,6 +125,8 @@ def SESSION_COOKIE_SAMESITE(self): | |
RTD_STABLE_VERBOSE_NAME = 'stable' | ||
RTD_CLEAN_AFTER_BUILD = False | ||
RTD_MAX_CONCURRENT_BUILDS = 4 | ||
RTD_BUILDS_MAX_RETRIES = 25 | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Is this a reasonable default? This probably isn't the place to change it, but it feels like a lot. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I thought about this as well. I think you are right that 25 as default for all retry cases is not perfect and we should have a lower number for other cases. However, right now, the only situation where we perform a retry is the concurrency limit reached, as far as I can tell. But we should expand this to be:
or similar in case we need the default value for other cases. |
||
RTD_BUILDS_RETRY_DELAY = 5 * 60 # seconds | ||
RTD_BUILD_STATUS_API_NAME = 'docs/readthedocs' | ||
RTD_ANALYTICS_DEFAULT_RETENTION_DAYS = 30 * 3 | ||
RTD_AUDITLOGS_DEFAULT_RETENTION_DAYS = 30 * 3 | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How does this code actually cancel the build? it seems it's just setting options on the task, and an
error
, but still triggering the build?Also seems like these aren't required, since it should take this from the task?
Uh oh!
There was an error while loading. Please reload this page.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This PR does not change the flow of this chunk of code. I just moved these numbers into Django settings.
That said, this code is run in the webs when the build is triggered and it immediately sets a 5 minutes delay because we already know this project is concurrency limited at the moment. If we delete this code, builders will grab each of these triggered tasks and retry them immediately. So, we are just short-circuiting the process here.
Note that it does not cancel the build, it just adds a countdown to be delayed 5 minutes. The error message is to communicate this delay to the user.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah, interesting. That makes the comment above make a lot more sense. 👍