Skip to content

Build: use scoped credentials for interacting with S3 #12078

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 20 commits into
base: main
Choose a base branch
from
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
32 changes: 32 additions & 0 deletions docs/dev/aws-temporary-credentials.rst
Original file line number Diff line number Diff line change
@@ -0,0 +1,32 @@
AWS temporary credentials
=========================

Builders run arbitrary commands provided by the user, while we run the commands in a sandboxed environment (docker),
that shouln't be the only line of defense, as we still interact with the files generated by the user outside docker for some operations.

This is why instead of using credentials that have access to all the resources in AWS,
we are using credentials that are generated by the `AWS STS service <https://docs.aws.amazon.com/STS/latest/APIReference/welcome.html>`__,
which are temporary and scoped to the resources that are needed for the build.

Local development
-----------------

In order to make use of STS, you need:

- Create a role in IAM with a trusted entity type set to the AWS account that is going to be used to generate the temporary credentials.
- Create an inline policy for the role, the policy should allow access to all S3 buckets and paths that are going to be used.
- Create an inline policy to the user that is going to be used to generate the temporary credentials,
the policy should allow the ``sts:AssumeRole`` action for the role created in the previous step.

You can use :ref:`environment variables <settings:AWS configuration>` to set the credentials for AWS, make sure to set the value of ``RTD_S3_PROVIDER`` to ``AWS``.

.. note::

If you are part of the development team, you should be able to use the credentials from the ``storage-dev``` user,
which is already configured to make use of STS, and the ARN from the ``RTDSTSAssumeRoleDev`` role.

.. note::

You should use AWS only when you are testing the AWS integration,
use the default minio provider for local development.
Otherwise, files may be overridden if multiple developers are using the same credentials.
1 change: 1 addition & 0 deletions docs/dev/index.rst
Original file line number Diff line number Diff line change
Expand Up @@ -28,6 +28,7 @@ or taking the open source Read the Docs codebase for your own custom installatio
migrations
server-side-search
search-integration
aws-temporary-credentials
subscriptions
github-app
settings
Expand Down
16 changes: 16 additions & 0 deletions docs/dev/settings.rst
Original file line number Diff line number Diff line change
Expand Up @@ -167,6 +167,22 @@ providers using the following environment variables:
.. envvar:: RTD_SOCIALACCOUNT_PROVIDERS_GOOGLE_CLIENT_ID
.. envvar:: RTD_SOCIALACCOUNT_PROVIDERS_GOOGLE_SECRET

AWS configuration
~~~~~~~~~~~~~~~~~

The following variables can be used to use AWS in your local environment.
Useful for testing :doc:`temporary credentials </aws-temporary-credentials>`.

.. envvar:: RTD_S3_PROVIDER
.. envvar:: RTD_AWS_ACCESS_KEY_ID
.. envvar:: RTD_AWS_SECRET_ACCESS_KEY
.. envvar:: RTD_AWS_STS_ASSUME_ROLE_ARN
Comment on lines +176 to +179
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This changes the current semantic of the RTD_ prefix.

We have been using RTD_ prefix for settings specific for Read the Docs. These settings are for third-party applications, so I'd use a different prefix here. I suggest DJANGO_ for all these settings.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

All environment variables that are passed to our application are prefixed with RTD, settings related to the platform are prefixed with RTD (settings.py).

.. envvar:: RTD_S3_MEDIA_STORAGE_BUCKET
.. envvar:: RTD_S3_BUILD_COMMANDS_STORAGE_BUCKET
.. envvar:: RTD_S3_BUILD_TOOLS_STORAGE_BUCKET
.. envvar:: RTD_S3_STATIC_STORAGE_BUCKET
.. envvar:: RTD_AWS_S3_REGION_NAME

GitHub App
~~~~~~~~~~

Expand Down
42 changes: 42 additions & 0 deletions readthedocs/api/v2/views/model_views.py
Original file line number Diff line number Diff line change
@@ -1,6 +1,7 @@
"""Endpoints for listing Projects, Versions, Builds, etc."""

import json
from dataclasses import asdict

import structlog
from allauth.socialaccount.models import SocialAccount
Expand All @@ -27,6 +28,9 @@
from readthedocs.api.v2.permissions import IsOwner
from readthedocs.api.v2.permissions import ReadOnlyPermission
from readthedocs.api.v2.utils import normalize_build_command
from readthedocs.aws.security_token_service import AWSTemporaryCredentialsError
from readthedocs.aws.security_token_service import get_s3_build_media_scoped_credentials
from readthedocs.aws.security_token_service import get_s3_build_tools_scoped_credentials
from readthedocs.builds.constants import INTERNAL
from readthedocs.builds.models import Build
from readthedocs.builds.models import BuildCommandResult
Expand Down Expand Up @@ -345,6 +349,44 @@ def reset(self, request, **kwargs):
def get_queryset_for_api_key(self, api_key):
return self.model.objects.filter(project=api_key.project)

@decorators.action(
detail=True,
permission_classes=[HasBuildAPIKey],
methods=["post"],
url_path="credentials/storage",
)
def credentials_for_storage(self, request, **kwargs):
"""
Generate temporary credentials for interacting with storage.

This can generate temporary credentials for interacting with S3 only for now.
"""
build = self.get_object()
credentials_type = request.data.get("type")

if credentials_type == "build_media":
method = get_s3_build_media_scoped_credentials
# 30 minutes should be enough for uploading build artifacts.
duration = 30 * 60
elif credentials_type == "build_tools":
method = get_s3_build_tools_scoped_credentials
# 30 minutes should be enough for downloading build tools.
duration = 30 * 60
else:
return Response(
{"error": "Invalid storage type"},
status=status.HTTP_400_BAD_REQUEST,
)

try:
credentials = method(build=build, duration=duration)
except AWSTemporaryCredentialsError:
return Response(
{"error": "Failed to generate temporary credentials"},
status=status.HTTP_500_INTERNAL_SERVER_ERROR,
)
return Response({"s3": asdict(credentials)})


class BuildCommandViewSet(DisableListEndpoint, CreateModelMixin, UserSelectViewSet):
parser_classes = [JSONParser, MultiPartParser]
Expand Down
Empty file added readthedocs/aws/__init__.py
Empty file.
239 changes: 239 additions & 0 deletions readthedocs/aws/security_token_service.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,239 @@
"""
Module to interact with AWS STS (Security Token Service) to assume a role and get temporary scoped credentials.

This is mainly used to generate temporary credentials to interact with S3 buckets from the builders.

In order to make use of STS, we need:

- Create a role in IAM with a trusted entity type set to the AWS account that is going to be used to generate the temporary credentials.
- Create an inline policy for the role, the policy should allow access to all S3 buckets and paths that are going to be used.
- Create an inline policy to the user that is going to be used to generate the temporary credentials,
the policy should allow the ``sts:AssumeRole`` action for the role created in the previous step.

The permissions of the temporary credentials are the result of the intersection of the role policy and the inline policy that is passed to the AssumeRole API.
This means that the inline policy can be used to limit the permissions of the temporary credentials, but not to expand them.

See:

- https://docs.aws.amazon.com/STS/latest/APIReference/API_AssumeRole.html
- https://docs.aws.amazon.com/IAM/latest/UserGuide/id_credentials_sts-comparison.html
- https://docs.aws.amazon.com/IAM/latest/UserGuide/id_credentials_temp_control-access_assumerole.html
- https://docs.readthedocs.com/dev/latest/aws-temporary-credentials.html
"""

import json
from dataclasses import dataclass

import boto3
import structlog
from django.conf import settings


log = structlog.get_logger(__name__)


class AWSTemporaryCredentialsError(Exception):
"""Exception raised when there is an error getting AWS S3 credentials."""


@dataclass
class AWSTemporaryCredentials:
"""Dataclass to hold AWS temporary credentials."""

access_key_id: str
secret_access_key: str
session_token: str | None


@dataclass
class AWSS3TemporaryCredentials(AWSTemporaryCredentials):
"""Subclass of AWSTemporaryCredentials to include S3 specific fields."""

bucket_name: str
region_name: str


def get_sts_client():
return boto3.client(
"sts",
aws_access_key_id=settings.AWS_ACCESS_KEY_ID,
aws_secret_access_key=settings.AWS_SECRET_ACCESS_KEY,
region_name=settings.AWS_S3_REGION_NAME,
)


def _get_scoped_credentials(*, session_name, policy, duration) -> AWSTemporaryCredentials:
"""
:param session_name: An identifier to attach to the generated credentials, useful to identify who requested them.
AWS limits the session name to 64 characters, so if the session_name is too long, it will be truncated.
:param duration: The duration of the credentials in seconds. Default is 15 minutes.
Note that the minimum duration time is 15 minutes and the maximum is given by the role (defaults to 1 hour).
:param policy: The inline policy to attach to the generated credentials.

.. note::

If USING_AWS is set to False, this function will return
the values of the AWS_ACCESS_KEY_ID and AWS_SECRET_ACCESS_KEY settings.
Useful for local development where we don't have a service like AWS STS.
"""
if not settings.USING_AWS:
if not settings.DEBUG:
raise ValueError(
"Not returning global credentials, AWS STS should always be used in production."
)
return AWSTemporaryCredentials(
access_key_id=settings.AWS_ACCESS_KEY_ID,
secret_access_key=settings.AWS_SECRET_ACCESS_KEY,
# A session token is not needed for the default credentials.
session_token=None,
)

# Limit to 64 characters, as per AWS limitations.
session_name = session_name[:64]
try:
sts_client = get_sts_client()
response = sts_client.assume_role(
RoleArn=settings.AWS_STS_ASSUME_ROLE_ARN,
RoleSessionName=session_name,
Policy=json.dumps(policy),
DurationSeconds=duration,
)
except Exception:
log.exception(
"Error while assuming role to generate temporary credentials",
session_name=session_name,
policy=policy,
duration=duration,
)
raise AWSTemporaryCredentialsError

credentials = response["Credentials"]
return AWSTemporaryCredentials(
access_key_id=credentials["AccessKeyId"],
secret_access_key=credentials["SecretAccessKey"],
session_token=credentials["SessionToken"],
)


def get_s3_build_media_scoped_credentials(
*,
build,
duration=60 * 15,
) -> AWSS3TemporaryCredentials:
"""
Get temporary credentials with read/write access to the build media bucket.

The credentials are scoped to the paths that the build needs to access.

:duration: The duration of the credentials in seconds. Default is 15 minutes.
Note that the minimum duration time is 15 minutes and the maximum is given by the role (defaults to 1 hour).
"""
project = build.project
version = build.version
bucket_arn = f"arn:aws:s3:::{settings.S3_MEDIA_STORAGE_BUCKET}"
storage_paths = version.get_storage_paths()
# Generate the list of allowed prefix resources
# The resulting prefix looks like:
# - html/project/latest/*
# - pdf/project/latest/*
allowed_prefixes = [f"{storage_path}/*" for storage_path in storage_paths]

# Generate the list of allowed object resources in ARN format.
# The resulting ARN looks like:
# arn:aws:s3:::readthedocs-media/html/project/latest/*
# arn:aws:s3:::readthedocs-media/pdf/project/latest/*
allowed_objects_arn = [f"{bucket_arn}/{prefix}" for prefix in allowed_prefixes]

# Inline policy document to limit the permissions of the temporary credentials.
policy = {
"Version": "2012-10-17",
"Statement": [
{
"Effect": "Allow",
"Action": [
"s3:GetObject",
"s3:PutObject",
"s3:DeleteObject",
],
"Resource": allowed_objects_arn,
},
# In order to list the objects in a path, we need to allow the ListBucket action.
# But since that action is not scoped to a path, we need to limit it using a condition.
{
"Effect": "Allow",
"Action": ["s3:ListBucket"],
"Resource": [
bucket_arn,
],
"Condition": {
"StringLike": {
"s3:prefix": allowed_prefixes,
}
},
},
],
}

session_name = f"rtd-{build.id}-{project.slug}-{version.slug}"
credentials = _get_scoped_credentials(
session_name=session_name,
policy=policy,
duration=duration,
)
return AWSS3TemporaryCredentials(
access_key_id=credentials.access_key_id,
secret_access_key=credentials.secret_access_key,
session_token=credentials.session_token,
region_name=settings.AWS_S3_REGION_NAME,
bucket_name=settings.S3_MEDIA_STORAGE_BUCKET,
)


def get_s3_build_tools_scoped_credentials(
*,
build,
duration=60 * 15,
) -> AWSS3TemporaryCredentials:
"""
Get temporary credentials with read-only access to the build-tools bucket.

:param build: The build to get the credentials for.
:param duration: The duration of the credentials in seconds. Default is 15 minutes.
Note that the minimum duration time is 15 minutes and the maximum is given by the role (defaults to 1 hour).
"""
project = build.project
version = build.version
bucket = settings.S3_BUILD_TOOLS_STORAGE_BUCKET
bucket_arn = f"arn:aws:s3:::{bucket}"

# Inline policy to limit the permissions of the temporary credentials.
# The build-tools bucket is publicly readable, so we don't need to limit the permissions to a specific path.
policy = {
"Version": "2012-10-17",
"Statement": [
{
"Effect": "Allow",
"Action": [
"s3:GetObject",
"s3:ListBucket",
],
"Resource": [
bucket_arn,
f"{bucket_arn}/*",
],
},
],
}
session_name = f"rtd-{build.id}-{project.slug}-{version.slug}"
credentials = _get_scoped_credentials(
session_name=session_name,
policy=policy,
duration=duration,
)
return AWSS3TemporaryCredentials(
access_key_id=credentials.access_key_id,
secret_access_key=credentials.secret_access_key,
session_token=credentials.session_token,
region_name=settings.AWS_S3_REGION_NAME,
bucket_name=bucket,
)
Empty file.
Loading