-
-
Notifications
You must be signed in to change notification settings - Fork 3.6k
Async github/bitbucket repository syncing. #1417
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
I think we need to go with #2. One of the main issues with today's site slowness was around running out of web worker processes. Doing things synchronously in the web process is definitely not a scalable way of handling this. I think it needs to feed into a celery task, and have it monitoring, as you said. |
+1 to option 2. I'd say that is the correct approach. |
It should be easy enough to get the celery job ID in the response when we trigger the celery job, and then do an AJAX polling (or websocket, or other fanciness) to check status at |
ea0d2d5
to
1ccab53
Compare
So the PR implements now a cool framework for monitoring celery tasks via the API. It integrates permission checks so that only authorized people can view the progress. We use that framework now for the GitHub/BitBucket async repo syncing. I had to add some messages for progress and error states in the UI. Maybe that needs some love, but I will let the reviewer decide :) In the long run we could use the newly created monitorable The PR is featurecomplete in my sense and is therefore ready for an architectural review. However it does not include any tests so far. I'll add these next week. |
Awesome, I'll take a quick first pass at this today. I'll have some feedback on the javascript, but should get to writing documentation on our front end toolchain first. |
} | ||
}); | ||
}, 2000); | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This should include a timeout to display a "can't sync" error. My install executes the task fine, but doesn't seem to update the client correctly, which keeps me waiting forever for a sync to finish
Some high level feedback:
I'll have some time to delve to a lower level tomorrow afternoon, including some UX bits. This is looking good so far though. |
No, I wasn't sure were to place it and without thinking much about it, the rtd
Awesome, I looked for existing JS on the import page and media/ was were I
Previously running the task failed since a connect to the celery task backend I also changed the behaviour that en error of the API won't keep trying to |
__all__ = ('user_id_matches',) | ||
|
||
|
||
def user_id_matches(request, state, context): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We have some logic around this kind of stuff in the privacy app as well -- so it might make sense to put it there.
Testing this out against my environment, it does sync, but after the task completes, and I'm notified that the task completed, the list of projects isn't automatically updated. On reload, the list is there of course. A good first pass at this would be to blow away the current list when you hit 'sync' -- or blow away when we have results -- and repopulate the list in it's entirety. |
Indeed, the updates were not working. Sorry for this, it was a little typo. Fixed that. It should work now as expected. If we want to indicate some syncing in the actual repository list then I would overlay the repo list with a spinner. Shall I work on this? Also I had problems with integrating the current master with the changes to the gulp buildsystem into this branch. I documented them in #1490 |
6c88a96
to
5790f5e
Compare
Whats the status here? Would love to see this get ready for merge. |
…o appropriate manager methods.
We are using a ajax request to the API to trigger the update and then monitor the progress. When the sync job has finished, we update the page. Also removing the synchronous sync code from the urls and view logic.
We will later add tests for the new celery based sync tasks.
5790f5e
to
0b9b520
Compare
I rebased the work on master so that it will apply cleanly. I consider the PR feature complete. @agjohnson proposed to add some loading animation to the repository list, but I wasn't sure if I shall add a spinner there. Clarifying that is the only blocking thing. |
We therefore document it in the docstring that the method must be implemented by subclasses. We do that because pylint complains that subclasses are having a differing argument list.
Will eventually fix #1370.
I have two strategies in mind doing this and would like to discuss them here.
Synchronous call via AJAX to
/import/github/sync/
Probably the simplest solution to avoid that users are seeing a timeout, is that we call the
/import/github/sync/
page with an AJAX request from/import/github/
. The syncing then still takes place in a long synchronous request. The long request runs in an AJAX call and is therefore hidden from the user. After the AJAX call finishes, we update the page with the new content.To make this work we would simply need to increase the max timeout for the sync page to a real high number (~10 Minutes?). But otherwise no big code changes are required in the backend. In the frontend we add the necessary JavaScript and remove the direct link to the sync page, so that users don't hit it manually.
Async repository syncing using Celery + monitoring
The more advanced solution is to have a celery task that is syncing the repositories. When the user hits
/import/github/
we fire the task to update the repos. The user sees the page immediatelly though since the syncing takes place asynchronously.We need a way to then monitor the progress of the celery task somehow in the frontend. We would therefore need some kind of API that we can poll using AJAX for state changes in the celery task. Once the task is completed, we update the page with the new content.
So I would like to discuss the two approaches and which one we should go here.
We should also look at these tickets while developing it: