From 2dbf3c1dac153fb1c8675cec5495ddece8b04367 Mon Sep 17 00:00:00 2001 From: Manuel Kaufmann Date: Tue, 9 Jun 2020 16:03:33 +0200 Subject: [PATCH 1/5] Design document for RemoteRepository DB normalization Initial document to discuss the path forward to implement https://github.com/readthedocs/readthedocs.org/issues/2759 --- .../design/refactor-remote-repository.rst | 101 ++++++++++++++++++ 1 file changed, 101 insertions(+) create mode 100644 docs/development/design/refactor-remote-repository.rst diff --git a/docs/development/design/refactor-remote-repository.rst b/docs/development/design/refactor-remote-repository.rst new file mode 100644 index 00000000000..230390965cb --- /dev/null +++ b/docs/development/design/refactor-remote-repository.rst @@ -0,0 +1,101 @@ +====================================== + Refactor ``RemoteRepository`` object +====================================== + + +This document describes the current usage of ``RemoteRepository`` objects and proposes a new normalized modeling. + + +Goals +===== + +* De-duplicate data stored in our database. +* Save only one ``RemoteRepository`` per GitHub repository. +* Use an intermediate table between ``RemoteRepository`` and ``User`` to store associated remote data for the specific user. +* Make this model usable from our SSO implementation. +* Use Post ``JSONField`` to store associated ``json`` remote data. +* Do not disconnect ``Project`` and ``RemoteRepository`` when a user delete/disconnects their account. + + +Non-Goals +========= + +* Keep ``RemoteRepository`` in sync with GitHub repositories. +* Delete ``RemoteRepository`` objects deleted from GitHub. +* Listen to GitHub events to detect ``full_name`` changes and update our objects. + +.. note:: + + We may need/want some of these non-goals in the future. + They are just outside the scope of this document. + + +Current Implementation +====================== + +When a user connect their account to a social account, we create a + +* ``allauth.socialaccount.models.SocialAccount`` + * basic information (provider, last login, etc) + * provider's specific data saved in a JSON under ``extra_data`` +* ``allauthsocialaccount.models.SocialToken`` + * token to hit the API on behalf the user + + +We *don't create* any ``RemoteRepository`` at this point. +They are created when the user jumps into "Import Project" page and hit the circled arrows. +It triggers `sync_remote_repostories`_ task in background that updates or creates ``RemoteRepositories``, +but **it does not delete** them. +One ``RemoteRepository`` is created per repository the ``User`` has access to. + +.. _sync_remote_repositoies: https://github.com/readthedocs/readthedocs.org/blob/56253cb786945c9fe53a034a4433f10672ae8a4f/readthedocs/oauth/tasks.py#L25-L43 + + +Where ``RemoteRepository`` is used? +=================================== + +* List of available repositories to import under "Import Project" +* Show a "+", "External Arrow" or a "Lock" sign next to the element in the list + * +: it's available to be imported + * External Arrow: the repository is already imported (see `RemoteRepository.matches`_ method) + * Lock: user doesn't have (admin) permissions to import this repository + (uses ``RemoteRepository.private`` and ``RemoteRepository.admin``) +* Avatar URL in the list of project available to import +* `Update webhook`_ when user clicks "Resync webhook" from the :guilabel:`Admin` > :guilabel:`Integrations` tab +* `Send build status`_ when building Pull Requests + + +.. _RemoteRepository.matches: https://github.com/readthedocs/readthedocs.org/blob/56253cb786945c9fe53a034a4433f10672ae8a4f/readthedocs/oauth/models.py#L182-L204 +.. _Update webhook: https://github.com/readthedocs/readthedocs.org/blob/56253cb786945c9fe53a034a4433f10672ae8a4f/readthedocs/oauth/utils.py#L26-L62 +.. _Send build status: https://github.com/readthedocs/readthedocs.org/blob/56253cb786945c9fe53a034a4433f10672ae8a4f/readthedocs/projects/tasks.py#L1852-L1956 + + +New Normalized Implementation +============================= + +The ``ManyToMany`` relation ``RemoteRepository.users`` will be changed to be ``ManyToMany(through='RemoteRelation')`` +`to add extra fields in the relation`_ that are specific only for the User. +Allows us to have *only one* ``RemoteRepository`` per GitHub repository with multiple relationships to ``User``. + +.. _to add extra fields in the relation: https://docs.djangoproject.com/en/2.2/topics/db/models/#extra-fields-on-many-to-many-relationships + +With this modeling, we can avoid the disconnection ``Project`` and ``RemoteRepository`` only by removing the ``RemoteRelation``. + +.. note:: + + All the points mentioned in the previous section may need to be adapted to use the new normalized modeling. + However, it may be only field renaming or small query changes over new fields. + + +Use this modeling for SSO +------------------------- + +We can get the list of ``Project`` where a user as access: + +.. code-block:: python + + admin_remote_repositories = RemoteRepository.objects.filter( + users__contains=request.user, + users__remoterelation__json__admin=True, # False for read-only access + ) + Project.objects.filter(remote_repository__in=admin_remote_repositories) From 05f6357de65e23fff08f216a43c0e87866035249 Mon Sep 17 00:00:00 2001 From: Manuel Kaufmann Date: Thu, 30 Jul 2020 12:24:02 +0200 Subject: [PATCH 2/5] Update document with latest changes in the context --- .../design/refactor-remote-repository.rst | 15 ++++++++++++--- 1 file changed, 12 insertions(+), 3 deletions(-) diff --git a/docs/development/design/refactor-remote-repository.rst b/docs/development/design/refactor-remote-repository.rst index 230390965cb..4a911158fd4 100644 --- a/docs/development/design/refactor-remote-repository.rst +++ b/docs/development/design/refactor-remote-repository.rst @@ -12,8 +12,9 @@ Goals * De-duplicate data stored in our database. * Save only one ``RemoteRepository`` per GitHub repository. * Use an intermediate table between ``RemoteRepository`` and ``User`` to store associated remote data for the specific user. -* Make this model usable from our SSO implementation. +* Make this model usable from our SSO implementation (adding ``remote_id`` field in ``Remote`` objects). * Use Post ``JSONField`` to store associated ``json`` remote data. +* Make ``Project`` connect directly to ``RemoteRepository`` without being linked to a specific ``User``. * Do not disconnect ``Project`` and ``RemoteRepository`` when a user delete/disconnects their account. @@ -45,10 +46,18 @@ When a user connect their account to a social account, we create a We *don't create* any ``RemoteRepository`` at this point. They are created when the user jumps into "Import Project" page and hit the circled arrows. It triggers `sync_remote_repostories`_ task in background that updates or creates ``RemoteRepositories``, -but **it does not delete** them. +but **it does not delete** them (after `#7183`_ and `#7310`_ got merged, they will be deleted). One ``RemoteRepository`` is created per repository the ``User`` has access to. +.. note:: + + In corporate, we are automatically syncing ``RemoteRepository`` and ``RemoteOganization`` + at signup (foreground) and login (background) via a signal. We should eventually move these to community. + + .. _sync_remote_repositoies: https://github.com/readthedocs/readthedocs.org/blob/56253cb786945c9fe53a034a4433f10672ae8a4f/readthedocs/oauth/tasks.py#L25-L43 +.. _#7183: https://github.com/readthedocs/readthedocs.org/pull/7183 +.. _#7310: https://github.com/readthedocs/readthedocs.org/pull/7310 Where ``RemoteRepository`` is used? @@ -96,6 +105,6 @@ We can get the list of ``Project`` where a user as access: admin_remote_repositories = RemoteRepository.objects.filter( users__contains=request.user, - users__remoterelation__json__admin=True, # False for read-only access + users__remoterelation__admin=True, # False for read-only access ) Project.objects.filter(remote_repository__in=admin_remote_repositories) From b2b65dc702a529657a4a4528ed8e6ffb08c31625 Mon Sep 17 00:00:00 2001 From: Manuel Kaufmann Date: Mon, 14 Dec 2020 12:32:03 +0100 Subject: [PATCH 3/5] Add a summarized version of the rollout plan --- .../design/refactor-remote-repository.rst | 27 +++++++++++++++++++ 1 file changed, 27 insertions(+) diff --git a/docs/development/design/refactor-remote-repository.rst b/docs/development/design/refactor-remote-repository.rst index 4a911158fd4..22fc0684a27 100644 --- a/docs/development/design/refactor-remote-repository.rst +++ b/docs/development/design/refactor-remote-repository.rst @@ -108,3 +108,30 @@ We can get the list of ``Project`` where a user as access: users__remoterelation__admin=True, # False for read-only access ) Project.objects.filter(remote_repository__in=admin_remote_repositories) + + +Rollout Plan +============ + +Due the constraints we have in the ``RemoteRepository`` table and its size, +we can't just do the data migration at the same time of the deploy. +Because of this we need to be more creative here and find a way to re-sync the data from VCS providers, +while the site continue working. + +To achieve this, we thought on following this steps: + +1. modify all the Python code to use the new modeling in .org and .com + (will help us to find out bugs locally in an easier way) +1. QA this locally with test data +1. enable Django signal to re-sync RemoteRepository on login async + (we already have this in .com). New active users will have updated data immediately +1. spin up a new instance with the new refactored code +1. run migrations to create a new table for `RemoteRepository` +1. re-sync everything from VCS providers into the new table for 1-week or so +1. dump-n-load `Project - RemoteRepository` relations +1. create a migration to use the new table with synced data +1. deploy new code once the sync is finished + +See these issues for more context: +* https://github.com/readthedocs/readthedocs.org/pull/7536#issuecomment-724102640 +* https://github.com/readthedocs/readthedocs.org/pull/7675#issuecomment-732756118 From 8e545beab307a3d26b31996b0e439c3f29e28619 Mon Sep 17 00:00:00 2001 From: Manuel Kaufmann Date: Wed, 16 Dec 2020 18:52:47 +0100 Subject: [PATCH 4/5] Lint --- docs/development/design/refactor-remote-repository.rst | 3 +-- 1 file changed, 1 insertion(+), 2 deletions(-) diff --git a/docs/development/design/refactor-remote-repository.rst b/docs/development/design/refactor-remote-repository.rst index 22fc0684a27..62af86a36e9 100644 --- a/docs/development/design/refactor-remote-repository.rst +++ b/docs/development/design/refactor-remote-repository.rst @@ -67,8 +67,7 @@ Where ``RemoteRepository`` is used? * Show a "+", "External Arrow" or a "Lock" sign next to the element in the list * +: it's available to be imported * External Arrow: the repository is already imported (see `RemoteRepository.matches`_ method) - * Lock: user doesn't have (admin) permissions to import this repository - (uses ``RemoteRepository.private`` and ``RemoteRepository.admin``) + * Lock: user doesn't have (admin) permissions to import this repository (uses ``RemoteRepository.private`` and ``RemoteRepository.admin``) * Avatar URL in the list of project available to import * `Update webhook`_ when user clicks "Resync webhook" from the :guilabel:`Admin` > :guilabel:`Integrations` tab * `Send build status`_ when building Pull Requests From f6e3b561cb1c3341baed6f17e3bea9af41f35617 Mon Sep 17 00:00:00 2001 From: Manuel Kaufmann Date: Mon, 4 Jan 2021 13:17:16 +0100 Subject: [PATCH 5/5] rstcheck lint fixes --- docs/development/design/refactor-remote-repository.rst | 8 +++----- 1 file changed, 3 insertions(+), 5 deletions(-) diff --git a/docs/development/design/refactor-remote-repository.rst b/docs/development/design/refactor-remote-repository.rst index 62af86a36e9..8d9b37a198e 100644 --- a/docs/development/design/refactor-remote-repository.rst +++ b/docs/development/design/refactor-remote-repository.rst @@ -45,7 +45,7 @@ When a user connect their account to a social account, we create a We *don't create* any ``RemoteRepository`` at this point. They are created when the user jumps into "Import Project" page and hit the circled arrows. -It triggers `sync_remote_repostories`_ task in background that updates or creates ``RemoteRepositories``, +It triggers ``sync_remote_repostories`` task in background that updates or creates ``RemoteRepositories``, but **it does not delete** them (after `#7183`_ and `#7310`_ got merged, they will be deleted). One ``RemoteRepository`` is created per repository the ``User`` has access to. @@ -119,11 +119,9 @@ while the site continue working. To achieve this, we thought on following this steps: -1. modify all the Python code to use the new modeling in .org and .com - (will help us to find out bugs locally in an easier way) +1. modify all the Python code to use the new modeling in .org and .com (will help us to find out bugs locally in an easier way) 1. QA this locally with test data -1. enable Django signal to re-sync RemoteRepository on login async - (we already have this in .com). New active users will have updated data immediately +1. enable Django signal to re-sync RemoteRepository on login async (we already have this in .com). New active users will have updated data immediately 1. spin up a new instance with the new refactored code 1. run migrations to create a new table for `RemoteRepository` 1. re-sync everything from VCS providers into the new table for 1-week or so