Skip to content

Add GitHub App service #12072

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 14 commits into from
Apr 15, 2025
41 changes: 41 additions & 0 deletions docs/dev/github-app.rst
Original file line number Diff line number Diff line change
@@ -0,0 +1,41 @@
GitHub App
==========

Our GitHub App integration consists of a GitHub App (one for each platform, readthedocs.org and readthedocs.com),
which can be installed on a user's account or organization.

After installing the GitHub App, users can grant acccess to all repositories or select specific repositories,
this allows Read the Docs to access the repositories and perform actions on them, such as reporting build statuses,
and subscribe to events like push and pull request events.

Unlike our other Git integrations, this doesn't rely on user's OAuth2 tokens to interact with the repositories, instead it makes use of the installation.
As long as the installation is active and has access to the repositories, Read the Docs can perform actions on them.

You can read more about GitHub Apps in `GitHub's documentation <https://docs.github.com/en/apps/overview>`__.

Modeling
--------

Since we no longer make use of user's OAuth2 tokens, we need to keep track of all the installations of our GitHub App in the database (``GitHubAppInstallation`` model).
Each remote repository has a foreign key to the installation that has access to it,
so we can use that installation to perform actions on the repository.

Keeping data in sync
--------------------

Once a GitHub app is installed, GitHub will send webhook events to our application (``readthedocs.oauth.views``) related to the installation
(even when the installation is uninstalled or revoked), repositories, and organizations.
This allows us to always have the ``RemoteRepository`` and ``RemoteOrganization`` models and its permissions in sync with GitHub.

The only use case for using the user's OAuth2 token is to get all the installations the user has access to,
this helps us to keep permissions in sync with GitHub.

Security
--------

- All webhooks are signed with a secret, which we verify before processing each event.
- Since we make use of the installation to perform actions on the repositories instead of the user's OAuth2 token,
we make sure that only users with admin permissions on the repository can link the repository to a Read the Docs project.
- Once we lose access to a repository (e.g. the installation is uninstalled or revoked, or the project was deselected from the installation),
we remove the remote repository from the database, as we don't want to keep the relation bettween the project and the repository.
This is to prevent connecting the repository to the project again without the user's consent if they grant access to the repository again.
1 change: 1 addition & 0 deletions docs/dev/index.rst
Original file line number Diff line number Diff line change
Expand Up @@ -29,5 +29,6 @@ or taking the open source Read the Docs codebase for your own custom installatio
server-side-search
search-integration
subscriptions
github-app
settings
tests
24 changes: 23 additions & 1 deletion docs/dev/install.rst
Original file line number Diff line number Diff line change
Expand Up @@ -262,10 +262,32 @@ For others, the webhook will simply fail to connect when there are new commits t

* Configure the applications on GitHub, Bitbucket, and GitLab.
For each of these, the callback URI is ``http://devthedocs.org/accounts/<provider>/login/callback/``
where ``<provider>`` is one of ``github``, ``gitlab``, or ``bitbucket_oauth2``.
where ``<provider>`` is one of ``github``, ``githubapp``, ``gitlab``, or ``bitbucket_oauth2``.
When setup, you will be given a "Client ID" (also called an "Application ID" or just "Key") and a "Secret".
* Take the "Client ID" and "Secret" for each service and set them as :ref:`environment variables <settings:Allauth secrets>`.

Configuring GitHub App
~~~~~~~~~~~~~~~~~~~~~~

- Create a new GitHub app from https://github.com/settings/apps/new.
- Callback URL should be ``http://devthedocs.org/accounts/githubapp/login/callback/``.
- Keep marked "Expire user authorization tokens"
- Activate the webhook, and set the URL to one provided by a service like `Webhook.site <https://docs.webhook.site/cli.html>`__ to forward all incoming webhooks to your local development instance.
You should forward all events to ``http://devthedocs.org/webhook/githubapp/``.
- In permissions, select the following:

- Repository permissions: Commit statuses (read and write, so we can create commit statuses),
Contents (read only, so we can clone repos with a token),
Metadata (read only, so we read the repo collaborators),
Pull requests (read and write, so we can post a comment on PRs in the future).
- Organization permissions: Members (read only so we can read the organization members).
- Account permissions: Email addresses (read only, so allauth can fetch all verified emails).
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It would be preferred if these permissions are on an opt-in basis. Many projects may want to start small, and i've seen plenty of them that are skeptical about every little permission. So instead it could be like

The minimum permissions necessary for this app are
- Repository permissions:
  Contents (read only, so we can clone the repo contents)

Additionally you may grand additional permissions depending on the features you wish to have
- Repository permissions:
  Commit statuses (read and write, so we can create commit statuses)

It does require the app to read the granted permissions.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi, don't think GH allows for this, the permissions are defined by the app, installations can't choose a subset of permissions. What can happen is that the app requires more permissions and the user can accept or reject those new permissions, but all new installations will always require granting all permissions designated by the app.

And even if GH allows for this, not sure that we will support that case, as it requires considering more cases to support, and also support time, as some users don't know why a feature isn't working.


- Subscribe to the following events: Installation target, Member, Organization, Membership, Pull request, Push, and Repository.
- Copy the "Client ID" and "Client Secret" and set them as :ref:`environment variables <settings:Allauth secrets>`.
- Generate a webhook secret and a private key from the GitHub App settings,
and set them as :ref:`environment variables <settings:GitHub App secrets>`.

Troubleshooting
---------------

Expand Down
10 changes: 10 additions & 0 deletions docs/dev/settings.rst
Original file line number Diff line number Diff line change
Expand Up @@ -167,6 +167,16 @@ providers using the following environment variables:
.. envvar:: RTD_SOCIALACCOUNT_PROVIDERS_GOOGLE_CLIENT_ID
.. envvar:: RTD_SOCIALACCOUNT_PROVIDERS_GOOGLE_SECRET

GitHub App
~~~~~~~~~~

You can use the following environment variables to set the settings used by the GitHub App:

.. envvar:: RTD_GITHUB_APP_ID
.. envvar:: RTD_GITHUB_APP_NAME
.. envvar:: RTD_GITHUB_APP_PRIVATE_KEY
.. envvar:: RTD_GITHUB_APP_WEBHOOK_SECRET

Stripe secrets
~~~~~~~~~~~~~~

Expand Down
15 changes: 15 additions & 0 deletions readthedocs/oauth/clients.py
Original file line number Diff line number Diff line change
@@ -1,7 +1,10 @@
from datetime import datetime

import structlog
from django.conf import settings
from django.utils import timezone
from github import Auth
from github import GithubIntegration
from requests_oauthlib import OAuth2Session


Expand Down Expand Up @@ -70,3 +73,15 @@ def get_oauth2_client(account):
token_updater=_get_token_updater(token),
)
return session


def get_gh_app_client() -> GithubIntegration:
"""Return a client authenticated as the GitHub App to interact with the API."""
app_auth = Auth.AppAuth(
app_id=settings.GITHUB_APP_CLIENT_ID,
private_key=settings.GITHUB_APP_PRIVATE_KEY,
# 10 minutes is the maximum allowed by GitHub.
# PyGithub will handle the token expiration and renew it automatically.
jwt_expiry=60 * 10,
)
return GithubIntegration(auth=app_auth)
77 changes: 77 additions & 0 deletions readthedocs/oauth/models.py
Original file line number Diff line number Diff line change
@@ -1,5 +1,7 @@
"""OAuth service models."""

from functools import cached_property

import structlog
from allauth.socialaccount.models import SocialAccount
from django.contrib.auth.models import User
Expand All @@ -12,6 +14,7 @@
from readthedocs.projects.constants import REPO_CHOICES
from readthedocs.projects.models import Project

from .constants import GITHUB_APP
from .constants import VCS_PROVIDER_CHOICES
from .querysets import RemoteOrganizationQuerySet
from .querysets import RemoteRepositoryQuerySet
Expand Down Expand Up @@ -84,6 +87,80 @@ class GitHubAppInstallation(TimeStampedModel):
class Meta(TimeStampedModel.Meta):
verbose_name = _("GitHub app installation")

@cached_property
def service(self):
"""Return the service for this installation."""
from readthedocs.oauth.services import GitHubAppService

return GitHubAppService(self)

def delete(self, *args, **kwargs):
"""Override delete method to remove orphaned organizations."""
self.delete_repositories()
return super().delete(*args, **kwargs)

def delete_repositories(self, repository_ids: list[int] | None = None):
"""
Delete repositories linked to this installation.
When an installation is deleted, we delete all its remote repositories
and relations, users will need to manually link the projects to each repository again.
We also remove organizations that don't have any repositories after removing the repositories.
:param repository_ids: List of repository ids (remote ID) to delete.
If None, all repositories will be considered for deletion.
"""
# repository_ids is optionaal (None, which means all repositories),
# but if it's an empty list, we don't want to delete anything.
if repository_ids is not None and not repository_ids:
log.info("No remote repositories to delete")
return

remote_organizations = RemoteOrganization.objects.filter(
repositories__github_app_installation=self,
vcs_provider=GITHUB_APP,
)
remote_repositories = self.repositories.filter(vcs_provider=GITHUB_APP)
if repository_ids:
remote_organizations = remote_organizations.filter(
repositories__remote_id__in=repository_ids
)
remote_repositories = remote_repositories.filter(remote_id__in=repository_ids)

# Fetch all IDs before deleting the repositories, so we can filter the organizations later.
remote_organizations_ids = list(remote_organizations.values_list("id", flat=True))

count, deleted = remote_repositories.delete()
log.info(
"Deleted remote repositories",
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
"Deleted remote repositories",
"Deleted orphaned remote repositories",

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The concept of orphaned projects doesn't exist in this case, only for organizations (like we still have access to the installation of an organization, but without access to nay of its projects)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wanted to put something not as scary as it is now. Why are we deleting these remote repositories? They are outdated/old/invalid/the user lost access to them/etc? I would mention that.

Example,

Suggested change
"Deleted remote repositories",
"Deleted outdated remote repositories",

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We delete them when our app no longer has access to them, is not that they are outdated.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍🏼 -- It's important to note that in the log.

Suggested change
"Deleted remote repositories",
"Deleted remote repositories that our app no longer has access to.",

count=count,
deleted=deleted,
installation_id=self.installation_id,
)

count, deleted = RemoteOrganization.objects.filter(
id__in=remote_organizations_ids,
repositories=None,
).delete()
log.info(
"Deleted orphaned remote organizations",
count=count,
deleted=deleted,
installation_id=self.installation_id,
)

def delete_organization(self, organization_id: int):
"""Delete an organization and all its repositories and relations from the database."""
count, deleted = RemoteOrganization.objects.filter(
remote_id=str(organization_id),
vcs_provider=GITHUB_APP,
).delete()
log.info(
"Deleted remote organization",
count=count,
deleted=deleted,
organization_id=organization_id,
installation_id=self.installation_id,
)


class RemoteOrganization(TimeStampedModel):
"""
Expand Down
13 changes: 10 additions & 3 deletions readthedocs/oauth/querysets.py
Original file line number Diff line number Diff line change
Expand Up @@ -3,6 +3,7 @@
from django.db import models

from readthedocs.core.querysets import NoReprQuerySet
from readthedocs.oauth.constants import GITHUB_APP


class RelatedUserQuerySet(NoReprQuerySet, models.QuerySet):
Expand All @@ -12,7 +13,10 @@ def api(self, user=None):
"""Return objects for user."""
if not user.is_authenticated:
return self.none()
return self.filter(users=user)
queryset = self.filter(users=user)
# Exclude repositories/organizations from the GitHub App for now to avoid confusions.
queryset = queryset.exclude(vcs_provider=GITHUB_APP)
return queryset

def api_v2(self, *args, **kwargs):
# API v2 is the same as API v3 for .org, but it's
Expand All @@ -28,10 +32,13 @@ def for_project_linking(self, user):
Repositories can be linked to a project only if the user has admin access
to the repository on the VCS service.
"""
return self.filter(
queryset = self.filter(
remote_repository_relations__user=user,
remote_repository_relations__admin=True,
).distinct()
)
# Exclude repositories from the GitHub App for now to avoid confusions.
queryset = queryset.exclude(vcs_provider=GITHUB_APP)
return queryset.distinct()


class RemoteOrganizationQuerySet(RelatedUserQuerySet):
Expand Down
12 changes: 11 additions & 1 deletion readthedocs/oauth/services/__init__.py
Original file line number Diff line number Diff line change
Expand Up @@ -4,6 +4,16 @@
from readthedocs.oauth.services import bitbucket
from readthedocs.oauth.services import github
from readthedocs.oauth.services import gitlab
from readthedocs.oauth.services.githubapp import GitHubAppService


__all__ = [
"GitHubService",
"BitbucketService",
"GitLabService",
"GitHubAppService",
"registry",
]


class GitHubService(SettingsOverrideObject):
Expand All @@ -21,4 +31,4 @@ class GitLabService(SettingsOverrideObject):
_override_setting = "OAUTH_GITLAB_SERVICE"


registry = [GitHubService, BitbucketService, GitLabService]
registry = [GitHubService, BitbucketService, GitLabService, GitHubAppService]
10 changes: 9 additions & 1 deletion readthedocs/oauth/services/base.py
Original file line number Diff line number Diff line change
Expand Up @@ -34,7 +34,7 @@ class Service:
vcs_provider_slug: str
allauth_provider = type[OAuth2Provider]

url_pattern: re.Pattern | None
url_pattern: re.Pattern | None = None
default_user_avatar_url = settings.OAUTH_AVATAR_USER_DEFAULT_URL
default_org_avatar_url = settings.OAUTH_AVATAR_ORG_DEFAULT_URL
supports_build_status = False
Expand Down Expand Up @@ -106,6 +106,10 @@ def send_build_status(self, *, build, commit, status):
"""
raise NotImplementedError

def get_clone_token(self, project):
"""Get a token used for cloning the repository."""
raise NotImplementedError

@classmethod
def is_project_service(cls, project):
"""
Expand Down Expand Up @@ -326,3 +330,7 @@ def sync_repositories(self):

def sync_organizations(self):
raise NotImplementedError

def get_clone_token(self, project):
"""User services make use of SSH keys only for cloning."""
return None
Loading