Async github/bitbucket repository syncing. #1417

gregmuellegger · 2015-07-09T12:41:43Z

Will eventually fix #1370.

I have two strategies in mind doing this and would like to discuss them here.

Synchronous call via AJAX to `/import/github/sync/`

Probably the simplest solution to avoid that users are seeing a timeout, is that we call the /import/github/sync/ page with an AJAX request from /import/github/. The syncing then still takes place in a long synchronous request. The long request runs in an AJAX call and is therefore hidden from the user. After the AJAX call finishes, we update the page with the new content.

To make this work we would simply need to increase the max timeout for the sync page to a real high number (~10 Minutes?). But otherwise no big code changes are required in the backend. In the frontend we add the necessary JavaScript and remove the direct link to the sync page, so that users don't hit it manually.

Async repository syncing using Celery + monitoring

The more advanced solution is to have a celery task that is syncing the repositories. When the user hits /import/github/ we fire the task to update the repos. The user sees the page immediatelly though since the syncing takes place asynchronously.

We need a way to then monitor the progress of the celery task somehow in the frontend. We would therefore need some kind of API that we can poll using AJAX for state changes in the celery task. Once the task is completed, we update the page with the new content.

So I would like to discuss the two approaches and which one we should go here.

We should also look at these tickets while developing it:

github list is not up to date #1117

ericholscher · 2015-07-09T17:04:22Z

I think we need to go with #2. One of the main issues with today's site slowness was around running out of web worker processes. Doing things synchronously in the web process is definitely not a scalable way of handling this. I think it needs to feed into a celery task, and have it monitoring, as you said.

d0ugal · 2015-07-09T17:13:15Z

+1 to option 2. I'd say that is the correct approach.

ericholscher · 2015-07-09T17:15:50Z

It should be easy enough to get the celery job ID in the response when we trigger the celery job, and then do an AJAX polling (or websocket, or other fanciness) to check status at /api/v2/github/import/?job={id} or something.

gregmuellegger · 2015-07-17T12:41:08Z

So the PR implements now a cool framework for monitoring celery tasks via the API. It integrates permission checks so that only authorized people can view the progress.

We use that framework now for the GitHub/BitBucket async repo syncing. I had to add some messages for progress and error states in the UI. Maybe that needs some love, but I will let the reviewer decide :)

In the long run we could use the newly created monitorable PublicTask as well for making the doc build status available in the API.

The PR is featurecomplete in my sense and is therefore ready for an architectural review. However it does not include any tests so far. I'll add these next week.

agjohnson · 2015-07-17T18:00:40Z

Awesome, I'll take a quick first pass at this today. I'll have some feedback on the javascript, but should get to writing documentation on our front end toolchain first.

agjohnson · 2015-07-20T09:01:49Z

media/javascript/rtd-import.js

+            }
+          });
+        }, 2000);
+      }


This should include a timeout to display a "can't sync" error. My install executes the task fine, but doesn't seem to update the client correctly, which keeps me waiting forever for a sync to finish

agjohnson · 2015-07-20T09:32:24Z

Some high level feedback:

Is there any reason readthedocs/rtd/* shouldn't be in readthedocs/core? Historically, this was supposed to be the application path for shared-only resources, though it has also accumulated some misplaced resources.
I'd like to be moving new javascript off of the media/ path, and into per-application static files. I've outlined some of the process behind bundling up files in static-src, though until we settle on that, it would be fine to just move your libraries to static paths.
I like the public/private task api -- this should translate well to handling build status updates as well.
Does this need a degradation if I am running a development server with celery eager task resolution? The task doesn't seem to resolve in this case.

I'll have some time to delve to a lower level tomorrow afternoon, including some UX bits. This is looking good so far though.

gregmuellegger · 2015-07-20T14:35:19Z

Is there any reason readthedocs/rtd/* shouldn't be in readthedocs/core? Historically, this was supposed to be the application path for shared-only resources, though it has also accumulated some misplaced resources.

No, I wasn't sure were to place it and without thinking much about it, the rtd
directory was created. Moved that now into core.utils.tasks.

I'd like to be moving new javascript off of the media/ path, and into per-application static files. I've outlined some of the process behind bundling up files in static-src, though until we settle on that, it would be fine to just move your libraries to static paths.

Awesome, I looked for existing JS on the import page and media/ was were I
found it. I moved it into core/static-src/core/js/ were it makes more sense.
Thanks for clarifying.

I like the public/private task api -- this should translate well to handling build status updates as well.
Does this need a degradation if I am running a development server with celery eager task resolution? The task doesn't seem to resolve in this case.

Previously running the task failed since a connect to the celery task backend
(Redis) was made before actually starting the task. I changed that now that
when triggering the task no Redis call is made when
CELERY_ALWAYS_EAGER=True. However the task monitoring will still fail as
Redis is not reachable. I have no intend for fixing that as it's not good to
mock for local development IMO. If we really want to support local development
on these features without having celery properly with Redis running, then we
could use the DB broker for Celery, that could be preconfigured in the sqlite
settings.

I also changed the behaviour that en error of the API won't keep trying to
access the task status. It will show the error message instead. I thought it
might be a good practice to keep retrying in case the client has connectivity
problems, but as you said the user won't see any error message ever. So that's
propably worse then a to early error message.

ericholscher · 2015-07-20T20:59:27Z

readthedocs/core/utils/tasks/permission_checks.py

+__all__ = ('user_id_matches',)
+
+
+def user_id_matches(request, state, context):


We have some logic around this kind of stuff in the privacy app as well -- so it might make sense to put it there.

agjohnson · 2015-07-21T20:13:50Z

Testing this out against my environment, it does sync, but after the task completes, and I'm notified that the task completed, the list of projects isn't automatically updated. On reload, the list is there of course. A good first pass at this would be to blow away the current list when you hit 'sync' -- or blow away when we have results -- and repopulate the list in it's entirety.

gregmuellegger · 2015-07-27T15:43:14Z

Indeed, the updates were not working. Sorry for this, it was a little typo. Fixed that. It should work now as expected. If we want to indicate some syncing in the actual repository list then I would overlay the repo list with a spinner. Shall I work on this?

Also I had problems with integrating the current master with the changes to the gulp buildsystem into this branch. I documented them in #1490

ericholscher · 2015-08-05T00:16:07Z

Whats the status here? Would love to see this get ready for merge.

…o appropriate manager methods.

We are using a ajax request to the API to trigger the update and then monitor the progress. When the sync job has finished, we update the page. Also removing the synchronous sync code from the urls and view logic.

We will later add tests for the new celery based sync tasks.

gregmuellegger · 2015-08-06T11:46:07Z

I rebased the work on master so that it will apply cleanly. I consider the PR feature complete. @agjohnson proposed to add some loading animation to the repository list, but I wasn't sure if I shall add a spinner there.

Clarifying that is the only blocking thing.

We therefore document it in the docstring that the method must be implemented by subclasses. We do that because pylint complains that subclasses are having a differing argument list.

gregmuellegger added the PR: work in progress Pull request is not ready for full review label Jul 9, 2015

gregmuellegger force-pushed the async-github-repo-syncing branch from ea0d2d5 to 1ccab53 Compare July 17, 2015 12:35

agjohnson self-assigned this Jul 17, 2015

agjohnson reviewed Jul 20, 2015
View reviewed changes

ericholscher reviewed Jul 20, 2015
View reviewed changes

gregmuellegger mentioned this pull request Jul 27, 2015

Problems with new gulp build system #1490

Closed

gregmuellegger force-pushed the async-github-repo-syncing branch from 6c88a96 to 5790f5e Compare August 3, 2015 10:48

gregmuellegger added 13 commits August 6, 2015 13:38

Moving Github/Bitbucket object create logic from helper functions int…

389742d

…o appropriate manager methods.

Add PublicTask as base for monitorable celery tasks

2f3f1e1

Add decorator for assigning reusable permission checks on the task

abbcf34

Add helper for retrieving task data from celery

6eb47c8

Add interface to retrieve task data that the request is authorized for

1772864

Add reusable user_id_matches permission check for public tasks

83380e7

Add api v2 endpoint to query celery task status

938a3f4

Add api endpoints to trigger async github/bitbucket repo syncing

21a7d7b

Add django-csrf.js to support csrf protected POST requests.

37e2689

Make github/bitbucket repo syncing async in the frontend

8ebec2e

We are using a ajax request to the API to trigger the update and then monitor the progress. When the sync job has finished, we update the page. Also removing the synchronous sync code from the urls and view logic.

Remove tests for github/bitbucket sync views

b80c171

We will later add tests for the new celery based sync tasks.

Disable task status updates for PublicTask when CELERY_ALWAYS_EAGER

e0b2060

Show sync error when status update request fails on repo import page

bb04d9f

gregmuellegger added 9 commits August 6, 2015 13:43

Add api endpoints to trigger async github/bitbucket repo syncing

db667ab

Make core.utils a package

fc6a4ca

Move rtd.utils.tasks into core.utils.tasks

71f7e91

Move django-csrf.js and rtd-import.js into core's static-src directory

b41c9a6

Update assets build

85eb3fe

Move common JS imports for project import pages into base template

2487c39

Rebuilding JS files

9ed0fe7

Update projectimport.js to use the gulp buildsystem

1b3373e

Fix typo in data-target for repolist updates

0b9b520

gregmuellegger force-pushed the async-github-repo-syncing branch from 5790f5e to 0b9b520 Compare August 6, 2015 11:44

gregmuellegger and others added 5 commits August 6, 2015 15:01

Prepend imports with 'readthedocs.'

b3a867e

Remove not implemented run_public stub method from PublicTask

976d117

We therefore document it in the docstring that the method must be implemented by subclasses. We do that because pylint complains that subclasses are having a differing argument list.

Get repos for the user’s BB name, not their RTD name :)

215e4dd

Fix import paths on URLs

a6eb4bd

Show friendlier message when there are no repos.

417b1d5

ericholscher merged commit 417b1d5 into master Aug 6, 2015

gregmuellegger deleted the async-github-repo-syncing branch September 17, 2015 14:13

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Async github/bitbucket repository syncing. #1417

Async github/bitbucket repository syncing. #1417

gregmuellegger commented Jul 9, 2015

ericholscher commented Jul 9, 2015

d0ugal commented Jul 9, 2015

ericholscher commented Jul 9, 2015

gregmuellegger commented Jul 17, 2015

agjohnson commented Jul 17, 2015

agjohnson Jul 20, 2015

agjohnson commented Jul 20, 2015

gregmuellegger commented Jul 20, 2015

ericholscher Jul 20, 2015

agjohnson commented Jul 21, 2015

gregmuellegger commented Jul 27, 2015

ericholscher commented Aug 5, 2015

gregmuellegger commented Aug 6, 2015

		__all__ = ('user_id_matches',)


		def user_id_matches(request, state, context):

Async github/bitbucket repository syncing. #1417

Async github/bitbucket repository syncing. #1417

Conversation

gregmuellegger commented Jul 9, 2015

Synchronous call via AJAX to /import/github/sync/

Async repository syncing using Celery + monitoring

ericholscher commented Jul 9, 2015

d0ugal commented Jul 9, 2015

ericholscher commented Jul 9, 2015

gregmuellegger commented Jul 17, 2015

agjohnson commented Jul 17, 2015

agjohnson Jul 20, 2015

Choose a reason for hiding this comment

agjohnson commented Jul 20, 2015

gregmuellegger commented Jul 20, 2015

ericholscher Jul 20, 2015

Choose a reason for hiding this comment

agjohnson commented Jul 21, 2015

gregmuellegger commented Jul 27, 2015

ericholscher commented Aug 5, 2015

gregmuellegger commented Aug 6, 2015

Synchronous call via AJAX to `/import/github/sync/`