Skip to content

Design doc: use GitHub Apps for more granular permissions #8445

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
wants to merge 3 commits into from
Closed
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
308 changes: 308 additions & 0 deletions docs/development/design/github-application.rst
Original file line number Diff line number Diff line change
@@ -0,0 +1,308 @@
==================
GitHub Application
==================
Comment on lines +1 to +3
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
==================
GitHub Application
==================
=======================
GitHub Apps integration
=======================


GitHub offers two different ways to access users' data from its own platform:
OAuth Application and GitHub Application. Depending on the applications
needings, GitHub recommends using one or the other.

Read the Docs currently uses the OAuth Application to support "Login with
GitHub" and "Import Project". They require different level of permissions, but
OAuth Application does not really allow us to manage them in a granular way.

Based on our needs and the GitHub recommendations in
https://docs.github.com/en/developers/apps/getting-started-with-apps/about-apps#determining-which-integration-to-build
we should use a GitHub Application for Read the Docs::

Only as me? -> NO
Act as an App -> NO
Access Everything? -> NO
-> GitHub Application

Or in case we need to act as an App, we end in the same route::

Only as me? -> NO
Act as an App -> YES
-> GitHub Application

This document explains some problems we have found in our current implementation
using OAuth Application and explore the recommended way using GitHub
Application. It goes through the work required and touches a little about its
implementation.


Problems with current implementation
------------------------------------

* We request permissions that are too permissive at Sign Up

Even if our application will never require some of these scopes (because it's
a read user --and will never import a project), we ask for too many permissions.

* OAuth Application does not provide granular permissions

We have to ask for ``read`` scope. It gives us read & write permissions to
*all the repositories of this user*. This makes some users/organizations to
don't want to user our platform.

Besides, ``admin:repo_hook`` scope is also required to be able to create a
webhook for pushes and trigger a new build.

* Repositories from an GitHub Organization does not appear in the "Import
Project" list

Organizations with OAuth App access restriction enabled, require users using
our OAuth Application to request permissions to the GitHub Organization's
owner to accept our OAuth App.


Goals
-----

* Reduce as much as possible granted permissions for our GitHub Application
* Do not require access to *all the users'/organizations' repositories*
* Allow Read the Docs GitHub Application only to specific repositories
* Support GitHub Application *and* OAuth Application at the same time during the
migration time
* Support all the GitHub features current using OAuth Application

* GitHub SSO
* Pull Request builder
* Trigger a build on push

* Don't require SSH key to clone private repositories


Non Goals
---------

* Escalate permissions as they are needed

* Only ask mostly for ``Read-only`` permissions
* ``Write`` permission is only required to send commit status on Pull Request
* Only ask for the smallest subset required for Read the Docs to work

* Migrate all current OAuth Application users to GitHub Application users at
once
* Implement something similar to GitHub Application for other VCS providers
(GitLab and Bitbucket)


Context related to GitHub Application
-------------------------------------

Considering there are too much confusion around GitHub Application and OAuth
Application, and copy&paste some context from its official documentation trying
to mitigate it and have a better understanding of this design doc.

GitHub Apps are first-class actors within GitHub. A GitHub App acts on its own
behalf, taking actions via the API directly using its own identity, which
means you don't need to maintain a bot or service account as a separate user.

GitHub Apps can be installed directly on organizations and user accounts and
granted access to specific repositories. They come with built-in webhooks and
narrow, specific permissions. When you set up your GitHub App, you can select
the repositories you want it to access

To install a GitHub App, you must be an organization owner or have admin
permissions in a repository.

Don't build an OAuth App if you want your application to act on a single
repository. With the repo OAuth scope, OAuth Apps can act on all of the
authenticated user's repositories.

(from https://docs.github.com/en/developers/apps/getting-started-with-apps/about-apps)

You can install GitHub Apps in your personal account or organizations you own.
If you have admin permissions in a repository, you can install GitHub Apps on
organization accounts. If a GitHub App is installed in a repository and requires
organization permissions, the organization owner must approve the application.

Account owners can use a GitHub App in one account without granting access to
another ... A GitHub App remains installed if the person who set it up leaves
the organization.

The installation token from a GitHub App loses access to resources if an admin
removes repositories from the installation.

Installation access tokens are limited to specified repositories with the
permissions chosen by the creator of the app.

GitHub Apps can request separate access to issues and pull requests without
accessing the actual contents of the repository.

A GitHub App receives a webhook event when an installation is changed or
removed. This tells the app creator when they've received more or less access to
an organization's resources.

A GitHub App can request an installation access token by using a private key
with a JSON web token format out-of-band.

An installation token identifies the app as the GitHub Apps bot, such as
@jenkins-bot.

GitHub Apps can authenticate on behalf of the user, which is called
user-to-server requests. The flow to authorize is the same as the OAuth App
authorization flow. User-to-server tokens can expire and be renewed with a
refresh token

(from https://docs.github.com/en/developers/apps/getting-started-with-apps/differences-between-github-apps-and-oauth-apps)


GitHub Application requirements
-------------------------------

:Repository permissions:
- Metadata: Read-only (mandatory)
- Contents: Read-only
- Commit statuses: Read & Write
- Pull requests: Read-only

:Organization permissions:
- Members: Read-only
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this for SSO, or something else? Might be useful to escalate in this case, if it lets us not request Org permissions up front.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This may be outdated and it's not required 😅 I already solved this problem in "Remote* models re-sync" without requiring this permission.

However, I had an idea in mind with this permission, but I didn't document it here. If we request Organization - Members: Read-only we can subscribe to related events (Member, Membership, Team and Team add) to immediately sync permissions when they are changed in GitHub.

Currently, we are re-syncing permissions at signup and on each login. So, this permission is not strictly required, but it could be better UX, tho.

We tried to implement the re-sync based on a webhook already with our current code at #7336


:User permissions:
- Email addresses: Read-only

:Events subscribed:
- Meta (When this App is deleted and the associated hook is removed)
- Create (Branch or tag created)
- Delete (Branch or tag deleted)
- Push (Git push to a repository)
- Pull request (Pull request opened, closed, reopened, ...)


Authentication as GitHub Application
------------------------------------

https://docs.github.com/en/developers/apps/building-github-apps/authenticating-with-github-apps

Generate the JWT on Python:

.. code:: python

import datetime
import jwt
GITHUB_APP_ID = 134302

# content of PEM file downloaded from GH Application settings
private_key = b'''-----BEGIN RSA PRIVATE KEY-----'''

jwt.encode({"iat": datetime.datetime.utcnow() - datetime.timedelta(seconds=60), "exp": datetime.datetime.utcnow() + datetime.timedelta(minutes=5), "iss": GITHUB_APP_ID}, private_key, algorithm="RS256")

Use the token generated in the previous step as authorization in the cURL command:

.. code:: bash

$ curl -H "Authorization: Bearer <JWT>" -H "Accept: application/vnd.github.v3+json" https://api.github.com/app


.. note::

For some reason this is not working and I'm getting:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It says the app needs to be installed on at least 1 organization. That could be it.


.. code:: text

'Issued at' claim ('iat') must be an Integer representing the time that the assertion was issued


SSH keys are not required to clone private repositories
-------------------------------------------------------

When asking for ``contents`` (repository permission) we are able to clone
private repositories by using GitHub Application installation access tokens:

.. code:: bash

git clone https://x-access-token:<token>@github.com/owner/repo.git

(from
https://docs.github.com/en/developers/apps/building-github-apps/authenticating-with-github-apps#http-based-git-access-by-an-installation)


Handling webhooks
-----------------

GitHub Application has *only one webhook* where it will receive all the events
for all the installations. The body contains all the information about the
particular installation that triggered the event. With this data, we will create
an access token and perform the query/actions we need.

There are some events that need to map to particular ``Project`` in our
database. For example, "a push to ``main`` branch in repository
``readthedocs/blog``" should "trigger a build for ``latest`` version on
``read-the-docs-blog`` project". For these cases we can use use
``repository.id`` field from the body to get the ``RemoteRepository.remote_id``
and from there get the ``Project``.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is just a Project mapping though, and means that if readthedocs/template gets a commit, then every project on RTD with that repo_url will be built? That takes us back to our original webhooks implementation, which is not great :/

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is just a Project mapping though, and means that if readthedocs/template gets a commit, then every project on RTD with that repo_url will be built?

No.

This will trigger a build only on the projects that were imported by users with permissions on readthedocs/template repository. That is, it appeared on the "Import Project" page and they clicked on "+" to import it. This is where the relation between RemoteRepository -> Project is created.

All the readthedocs/template forks and, all official readthedocs/template that were imported manually won't be associated to any RemoteRepository, so it won't trigger a build for those on commits.



Remote* models re-sync
----------------------

Currently, we are using 2 endpoints to sync all the ``Remote*`` models:

* ``https://api.github.com/user/repos`` (https://docs.github.com/en/rest/reference/repos#list-repositories-for-the-authenticated-user)
* ``https://api.github.com/org/{org}/repos`` (https://docs.github.com/en/rest/reference/repos#list-organization-repositories)

However, this endpoints won't return the same data when using GitHub Application
since we won't be authenticated as a user anymore and the ``permission.admin:
boolean`` field won't come in the response.

Instead, we will have to iterate over,

* all repositories accessible to the app installation

* ``https://api.github.com/installation/repositories`` (https://docs.github.com/en/rest/reference/apps#list-repositories-accessible-to-the-app-installation)
* iterate over each repository checking for user's permission

* ``https://api.github.com/repos/{owner}/{repo}/collaborators/{username}/permission`` (https://docs.github.com/en/rest/reference/repos#get-repository-permissions-for-a-user)

By doing this, we will keep our ``Remote*`` table very small because we will
only track repositories that users gave us permissions. Then, only these
repositories will be shown in the "Import Project" page.

Once they created a new repository under their organization, if they haven't given us
access to "All repositories",
they will need to go to the GitHub Application installation configuration and grant access to the
new repository. This will send us a webhook (``installation_repositories``,
https://docs.github.com/en/developers/webhooks-and-events/webhooks/webhook-events-and-payloads#installation_repositories)
and we can automatically do a re-sync of ``Remote*`` models.

Need more research
------------------

* Does django-allauth support GitHub Application for Login?

Using the Client ID for the GitHub Application (instead the OAuth
application), should make it to work:
https://docs.github.com/en/developers/apps/building-github-apps/identifying-and-authorizing-users-for-github-apps#web-application-flow

* How do we use Client ID for GitHub Application for new users and Client ID for
OAuth application for current/existing users?

* Should we keep using our OAuth Application for login and GitHub Application
for the rest? Is it possible?

Related: https://docs.github.com/en/developers/apps/getting-started-with-apps/migrating-oauth-apps-to-github-apps
Related: https://github.com/readthedocs/readthedocs-ops/issues/532

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These seem like really important questions. The interaction with allauth in particular worries me. I really don't want to have to hack a lot of stuff, so keeping auth & repo permissions separated might be the best solution. Could even make the GH App approach optional for more security-minded people?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

so keeping auth & repo permissions separated might be the best solution

I'm afraid that this is not possible. At the "Login" page, when the user clicks on "Login with GitHub" we need to redirect them to the URL to authorize the OAuth Application or the GitHub Application --so we have to make that decision before showing the button. I don't really know how to solve this problem yet. This will require a little more thinking and research.

Could even make the GH App approach optional for more security-minded people?

This question makes me think that we may need to show 2 different buttons: OAuth App and GH App. I know that would be terrible UX and will confuse a lot of users, tho.

I'd say that we should keep OAuth working as long as we have old users using this integration. However, new users should not have the option to use OAuth App and they must use GH App for all the good reasons we are migrating to it.

At some point, when no users using OAuth left, we should be able to delete it completely.

The most important thing to solve here is "How do we know that the person on the Login page is old or new to show them the appropriate button?"

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agreed having two buttons would be bad UX, we'd confuse users a lot here.

Is there any way we can try both, starting with one and if that attempt fails, try the other automatically? Or do either allow us to query to see if it is installed for the user?

To start, we could attempt authentication with the oauth app, and back down to the GH app -- swapping once we feel more users are using the GH app.

This could leave a single button in tact but for some users there would be an ugly wait to log in as we try to auth twice

new users should not have the option to use OAuth App and they must use GH App

👍

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there any way we can try both, starting with one and if that attempt fails, try the other automatically? Or do either allow us to query to see if it is installed for the user?

No, I don't think we can do this due to how GitHub login works. After clicking in "Sign in" we redirect the unknown user (yet) to a GitHub URL where we lose control. That URL has to be the OAuth App or GH App URL (same URL with different parameters).

Example (current OAuth app): https://github.com/login/oauth/authorize?client_id=0dca71bac3403aba87c1&redirect_uri=https%3A%2F%2Freadthedocs.com%2Faccounts%2Fgithub%2Flogin%2Fcallback%2F&scope=read%3Aorg+repo+user%3Aemail+admin%3Arepo_hook&response_type=code&state=111111111

There, the user will be presented the page to authorize our OAuth application --even if they have already installed our GH app. We cannot do any kind of check at this point because we don't have control and also because we don't know who is the user.

For GH app the same URL is used. However, the parameters (in particular, client_id) are differents. See https://docs.github.com/en/developers/apps/building-github-apps/identifying-and-authorizing-users-for-github-apps#web-application-flow

Note that we could check the Request user authorization (OAuth) during installation in our GH application and make this flow happen during GH app installation; however, we will get only an access token for the organization where the GH app is installed, instead of for each of its members. So, that doesn't solve our problem either 😞

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Honestly, I'm not finding a good way to solve this problem. I re-read the migration guide that GitHub provides and it's not super useful in this case.

I was thinking of a different approach:

  1. implement all the GH application code
  2. do not expose GH application to users yet, but make all the URLs to work
  3. QA internally and make sure that everything works as expected
  4. send emails + add a banner (with "call to action") telling organization owners to install our GH application (https://docs.github.com/en/developers/apps/getting-started-with-apps/migrating-oauth-apps-to-github-apps#direct-users-to-install-your-github-app-on-repositories)
  5. send another email + add a banner communicating the EOL of the OAuth and telling them that starting on X date they must have GH App installed to use Read the Docs

As always, dropping support for something at a specific date is not good; but I'm not clearly seeing what else we can do besides forcing users to migrate. This may be doable in commercial, but probably won't in community.

Anyways, I'm open to getting some feedback and ideas on this because I ran out of ideas 😞

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You might have the most knowledge and best opinions here. I'm not super familiar with the implementation at a technical level so can only take some wild shots in the dark.

The thing that I find awkward about that approach is that the user will get this notification to use the GHA method, allow the application on their user/organization, but then they need both the GHA and the oauth app installed until we do the cut over to the GHA.

It feels like we want users to have a different login entirely if they want to use the GHA implementation. Does it make sense to discuss using the beta instance for GHA in addition to the new templates?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does it make sense to discuss using the beta instance for GHA in addition to the new templates?

This was my suggestion that I was trying to bring up in our roadmap call today. Would this approach be worth exploring?

The unknown here that I haven't thought through is what would happen when we're ready to cut everyone over to the new template instances.

cc @humitos

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@agjohnson If I understand correctly, there won't be any difference in doing this for the beta instance than for the production one. At some point, all users will need to migrate to the new GH Application (installing it on their GH Organization / GH Project) to be able to use Read the Docs.

Another thing I just realize that may work:

  • force all users to re-signing/re-signup with our new GH application (only one "Sign up/Login with Github" button)
  • that would associate a new SocialAccount (with a custom provider='github-app') to the username
  • at this point, they don't have to install our GH app on the organization/repositories
  • if we don't delete the old GH SocialAccount (provider='github') for this user, RemoteRepositoryRelation(s) won't be deleted, so users should keep access via SSO
  • we should freeze any change on permissions or "Import Project" until the user Installs our new GH application on their Organization / Project. Otherwise, it will be super confusing.
  • once they install our GH application, we can delete the old SocialAccount with all old RemoteResitoryRelation(s) for this user and re-sync them with the new GH application
  • at this point, the user should be on the new GH app and giving us only the permissions strictly needed for RTD to work

This may need more thinking on the UX, but I think with this we could keep "old permissions and webhooks" working until they migrate to the new GH app. However, the migration should be all or nothing because a mixture of both for the same user would be hard to communicate at a UI/UX level.

I'm also thinking that QAing this will probably be a nightmare 😄

Note that for the case where the user don't have permission to install the GH app on the organization/repository, we should force the re-signing/re-signup when we detect the GH application was installed in the organization they belong to. Otherwise, we will have some users accessing to old data and other users from the same RTD organization accessing to new data.

Anyways, this is still super complex.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

At some point, all users will need to migrate to the new GH Application

Yeah, if we made the beta instance a path to using the new GH App, there are two problems:

  • We have to switch everyone over at some point. We also will switch everyone over to the new templates at some point, so perhaps the UX overall could be fairly good here. That is, users switching to the new templates would go to a different page to log in, and on first log in they'd be asked to authorize the GH App -- like it's a new application entirely. I don't hate this option.
  • However this brings up a new hiccup: the user can't easily switch back to using the current templates once they set up their account to use the new GitHub App instead.

That second point might make this not a great option.

Another thing I just realize that may work

We might need to diagram that out 😆 I think I understand most of that though, and it seems like maybe a workaround that would get us past the UX issues of multiple Github Allauth providers.

I'd need to think more on the UX around organization specially, this might be worth some discussion.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

One of the issues that I was left thinking through here, and I think @humitos echoed it, would be if we did upgrade the connection for one of the users in an organization, do we have to upgrade all of the connections for users in the same organization?


Questions
---------

* How are we going to handle other VCS providers (GitLab and Bitbucket)?

GitLab and Bitbucket does not offer another option than OAuth Application. We
need to maintain the implementation that we currently have for them.

* Does it worth the effort integrating GitHub Application without being able to
use the same for other services?

GitHub is the most VCS provider used in our platform. Because of this, if the
we can provide a better UX to most of our users I'd call it a win.

* Do we need to support both, GitHub Application and GitHub OAuth, at the same time?

Yes. Current users will keep using GitHub OAuth for a long period of time. We
could notify them encouraging them to migrate to our new GitHub Application.
However, this will take a non-trivial amount of time.