Skip to content

Build: use scoped credentials for interacting with S3 #12078

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 20 commits into
base: main
Choose a base branch
from
Open
Show file tree
Hide file tree
Changes from 10 commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
30 changes: 30 additions & 0 deletions docs/dev/aws-temporary-credentials.rst
Original file line number Diff line number Diff line change
@@ -0,0 +1,30 @@
AWS temporary credentials
=========================

Builders run arbitrary commands provided by the user, while we run the commands in a sandboxed environment (docker),
that shouln't be the only line of defense, as we still interact with the files generated by the user outside docker for some operations.

This is why instead of using credentials that have access to all the resources in AWS,
we are using credentials that are generated by the AWS STS service,
which are temporary and scoped to the resources that are needed for the build.

Local development
-----------------

In order to make use of STS, you need:

- Create a role in IAM with a trusted entity type set to the AWS account that is going to be used to generate the temporary credentials.
- Create an inline policy for the role, the policy should allow access to all S3 buckets and paths that are going to be used.

You can use :ref:`environment variables <settings:AWS configuration>` to set the credentials for AWS, make sure to set the value of ``RTD_S3_PROVIDER`` to ``AWS``.

.. note::

If you are part of the development team, you should be able to use the credentials from the ``storage-dev``` user,
which is already configured to make use of STS, and the ARN from the ``RTDSTSAssumeRoleDev`` role.

.. note::

You should use AWS only when you are testing the AWS integration,
use the default minio provider for local development.
Otherwise, files may be overridden if multiple developers are using the same credentials.
1 change: 1 addition & 0 deletions docs/dev/index.rst
Original file line number Diff line number Diff line change
Expand Up @@ -28,6 +28,7 @@ or taking the open source Read the Docs codebase for your own custom installatio
migrations
server-side-search
search-integration
aws-temporary-credentials
subscriptions
settings
tests
15 changes: 15 additions & 0 deletions docs/dev/settings.rst
Original file line number Diff line number Diff line change
Expand Up @@ -167,6 +167,21 @@ providers using the following environment variables:
.. envvar:: RTD_SOCIALACCOUNT_PROVIDERS_GOOGLE_CLIENT_ID
.. envvar:: RTD_SOCIALACCOUNT_PROVIDERS_GOOGLE_SECRET

AWS configuration
~~~~~~~~~~~~~~~~~

The following variables can be used to use AWS in your local environment.

.. envvar:: RTD_S3_PROVIDER
.. envvar:: RTD_AWS_ACCESS_KEY_ID
.. envvar:: RTD_AWS_SECRET_ACCESS_KEY
.. envvar:: RTD_AWS_STS_ASSUME_ROLE_ARN
Comment on lines +176 to +179
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This changes the current semantic of the RTD_ prefix.

We have been using RTD_ prefix for settings specific for Read the Docs. These settings are for third-party applications, so I'd use a different prefix here. I suggest DJANGO_ for all these settings.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

All environment variables that are passed to our application are prefixed with RTD, settings related to the platform are prefixed with RTD (settings.py).

.. envvar:: RTD_S3_MEDIA_STORAGE_BUCKET
.. envvar:: RTD_S3_BUILD_COMMANDS_STORAGE_BUCKET
.. envvar:: RTD_S3_BUILD_TOOLS_STORAGE_BUCKET
.. envvar:: RTD_S3_STATIC_STORAGE_BUCKET
.. envvar:: RTD_AWS_S3_REGION_NAME

Stripe secrets
~~~~~~~~~~~~~~

Expand Down
33 changes: 33 additions & 0 deletions readthedocs/api/v2/views/model_views.py
Original file line number Diff line number Diff line change
@@ -1,6 +1,7 @@
"""Endpoints for listing Projects, Versions, Builds, etc."""

import json
from dataclasses import asdict

import structlog
from allauth.socialaccount.models import SocialAccount
Expand Down Expand Up @@ -38,6 +39,8 @@
from readthedocs.projects.models import Domain
from readthedocs.projects.models import Project
from readthedocs.storage import build_commands_storage
from readthedocs.storage.security_token_service import AWSTemporaryCredentialsError
from readthedocs.storage.security_token_service import get_s3_scoped_credentials

from ..serializers import BuildAdminReadOnlySerializer
from ..serializers import BuildAdminSerializer
Expand Down Expand Up @@ -345,6 +348,36 @@ def reset(self, request, **kwargs):
def get_queryset_for_api_key(self, api_key):
return self.model.objects.filter(project=api_key.project)

@decorators.action(
detail=True,
permission_classes=[HasBuildAPIKey],
methods=["post"],
url_path="temporary-credentials",
)
def temporary_credentials(self, request, **kwargs):
"""
Generate temporary credentials for the build.

This can generate temporary credentials for interacting with S3 only for now.
"""
build = self.get_object()
project = build.project
version = build.version
try:
credentials = get_s3_scoped_credentials(
project=project,
version=version,
session_id=build.pk,
# 30 minutes should be enough to upload all build artifacts.
duration=30 * 60,
)
except AWSTemporaryCredentialsError:
return Response(
{"error": "Failed to generate temporary credentials"},
status=status.HTTP_500_INTERNAL_SERVER_ERROR,
)
return Response({"s3": asdict(credentials)})


class BuildCommandViewSet(DisableListEndpoint, CreateModelMixin, UserSelectViewSet):
parser_classes = [JSONParser, MultiPartParser]
Expand Down
5 changes: 5 additions & 0 deletions readthedocs/projects/models.py
Original file line number Diff line number Diff line change
Expand Up @@ -1932,6 +1932,7 @@ def add_features(sender, **kwargs):

# Build related features
SCALE_IN_PROTECTION = "scale_in_prtection"
USE_S3_SCOPED_CREDENTIALS_ON_BUILDERS = "use_s3_scoped_credentials_on_builders"

FEATURES = (
(
Expand Down Expand Up @@ -2010,6 +2011,10 @@ def add_features(sender, **kwargs):
SCALE_IN_PROTECTION,
_("Build: Set scale-in protection before/after building."),
),
(
USE_S3_SCOPED_CREDENTIALS_ON_BUILDERS,
_("Build: Use S3 scoped credentials for uploading build artifacts."),
),
)

FEATURES = sorted(FEATURES, key=lambda x: x[1])
Expand Down
59 changes: 58 additions & 1 deletion readthedocs/projects/tasks/builds.py
Original file line number Diff line number Diff line change
Expand Up @@ -18,6 +18,7 @@
from celery import Task
from django.conf import settings
from django.utils import timezone
from django.utils.module_loading import import_string
from slumber import API
from slumber.exceptions import HttpClientError

Expand Down Expand Up @@ -53,7 +54,6 @@
from readthedocs.doc_builder.exceptions import BuildUserError
from readthedocs.doc_builder.exceptions import MkDocsYAMLParseError
from readthedocs.projects.models import Feature
from readthedocs.storage import build_media_storage
from readthedocs.telemetry.collectors import BuildDataCollector
from readthedocs.telemetry.tasks import save_build_data
from readthedocs.worker import app
Expand Down Expand Up @@ -885,6 +885,61 @@ def set_valid_clone(self):
self.data.project.has_valid_clone = True
self.data.version.project.has_valid_clone = True

def _get_sync_media_storage(self):
"""
Get a storage class instance to use for syncing build artifacts.

.. note::

We no longer use readthedocs.storage.build_media_storage directly,
as we are now using per-build credentials for S3 storage,
so we need to dynamically create the storage class instance
"""
storage_class = self._get_sync_media_storage_class()
extra_kwargs = {}
if settings.USING_AWS:
extra_kwargs = self._get_s3_scoped_credentials()
return storage_class(**extra_kwargs)

def _get_sync_media_storage_class(self):
"""
Get a storage class to use for syncing build artifacts.

This is done in a separate method to make it easier to mock in tests.

.. note::

We no longer use readthedocs.storage.build_media_storage directly,
as we are now using per-build credentials for S3 storage,
so we need to dynamically create the storage class instance
"""
return import_string(settings.RTD_BUILD_MEDIA_STORAGE)

def _get_s3_scoped_credentials(self):
if not self.data.project.has_feature(Feature.USE_S3_SCOPED_CREDENTIALS_ON_BUILDERS):
# Use the default credentials defined in the settings.
return {}

build_id = self.data.build["id"]
try:
credentials = self.data.api_client.build(f"{build_id}/temporary-credentials").post()
except Exception:
log.exception(
"Error getting scoped credentials.",
build_id=build_id,
)
raise BuildAppError(
BuildAppError.GENERIC_WITH_BUILD_ID,
exception_message="Error getting scoped credentials.",
)

s3_credentials = credentials["s3"]
return {
"access_key": s3_credentials["access_key_id"],
"secret_key": s3_credentials["secret_access_key"],
"security_token": s3_credentials["session_token"],
}

def store_build_artifacts(self):
"""
Save build artifacts to "storage" using Django's storage API.
Expand All @@ -904,6 +959,8 @@ def store_build_artifacts(self):
types_to_copy = []
types_to_delete = []

build_media_storage = self._get_sync_media_storage()

for artifact_type in ARTIFACT_TYPES:
if artifact_type in valid_artifacts:
types_to_copy.append(artifact_type)
Expand Down
18 changes: 15 additions & 3 deletions readthedocs/projects/tests/mockers.py
Original file line number Diff line number Diff line change
Expand Up @@ -7,6 +7,7 @@
from readthedocs.api.v2.serializers import VersionAdminSerializer
from readthedocs.builds.constants import BUILD_STATE_TRIGGERED
from readthedocs.projects.constants import MKDOCS
from readthedocs.projects.tasks.builds import UpdateDocsTask


class BuildEnvironmentMocker:
Expand Down Expand Up @@ -159,9 +160,7 @@ def _mock_environment(self):
# )

def _mock_storage(self):
self.patches["build_media_storage"] = mock.patch(
"readthedocs.projects.tasks.builds.build_media_storage",
)
self.patches["get_sync_media_storage_class"] = mock.patch.object(UpdateDocsTask, "_get_sync_media_storage_class")

def _mock_api(self):
headers = {"Content-Type": "application/json"}
Expand Down Expand Up @@ -225,6 +224,19 @@ def _mock_api(self):
headers=headers,
)

self.requestsmock.post(
f"{settings.SLUMBER_API_HOST}/api/v2/build/{self.build.pk}/temporary-credentials/",
status_code=201,
json={
"s3": {
"access_key_id": "some-access-key",
"secret_access_key": "some-secret-key",
"session_token": "some-session-token",
}
},
headers=headers,
)

self.requestsmock.patch(
f"{settings.SLUMBER_API_HOST}/api/v2/project/{self.project.pk}/",
status_code=201,
Expand Down
Loading