Skip to content

Build: handle Celery timeout events #9387

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
wants to merge 5 commits into from

Conversation

humitos
Copy link
Member

@humitos humitos commented Jun 29, 2022

Solve the problem reported by Sentry when the Celery task times out: https://sentry.io/organizations/read-the-docs/issues/3038754375/

Each commit has a detailed explanation of the changes

Note to myself: the latest commit tries to handle soft timeout cancelled properly. It's close but it needs a little more of QA because for some reason SIGUSR1 is triggered/catched as well.

humitos added 5 commits June 29, 2022 15:05
Move exception messages together with other definitions.
We were accessing to `self.task.data` which does not exist. `self.task` is not
the same instance as the one it's running, so we don't have `.data` on it.

However, we can access the arguments passed when called. So, we are logging that
information which is what we have.

This commit solves a problem reported on Sentry everytime this happens.
This was never called.

However, after fixing the `on_timeout` issue, we need the default behavior here
because SIGTERM is used on `TimeLimitExceeded` to _hard_ kill the task and we
need that.
Once Celery triggers `SoftTimeLimitExceeded` we immediately set the build error
to TIMEOUT because there is high probabilities this task gets killed. So, we
take advantage of this event to update the build status properly.
When `SoftTimeLimitExceeded` is reached, we revoke the task and mark it as
"Cancelled due timeout" to communicate this clearly to the user.
@humitos humitos added the PR: work in progress Pull request is not ready for full review label Jun 29, 2022
@humitos
Copy link
Member Author

humitos commented Jun 30, 2022

Note that we are not hitting this too frequently because we have an pretty high timeout due to this:

# TODO similar to the celery task time limit, we can't infer this from
# Docker settings anymore, because Docker settings are determined on the
# build servers dynamically.
# time_limit = int(DOCKER_LIMITS['time'] * 1.2)
# Set time as maximum celery task time limit + 5m
time_limit = 7200 + 300

@humitos
Copy link
Member Author

humitos commented Aug 9, 2023

I'm closing this PR for now. We can come back in the future if we consider. The idea is good, but I wasn't able to properly QA on production to be sure it works as I expect.

@humitos humitos closed this Aug 9, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
PR: work in progress Pull request is not ready for full review
Projects
Archived in project
Development

Successfully merging this pull request may close these issues.

1 participant