Skip to content

s3:ListAllMyBuckets or s3:CreateBucket permission required for Session with default bucket #2910

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
janbe-ts opened this issue Feb 8, 2022 · 5 comments

Comments

@janbe-ts
Copy link

janbe-ts commented Feb 8, 2022

Describe the bug
When I create a Session like this: Session(default_bucket=my_bucket), the library conveniently creates the bucket if it doesn't already exists. To check whether the bucket exists is based on getting the creation date of the bucket which is an operation that requires the IAM permission s3:ListAllMyBuckets. This permission shouldn't be necessary and is often not allowed in environments following the least privileges principle.

To reproduce
Create a Session with a default_bucket that already exists. Creating any Sagemaker processing job will fail if the used IAM role does not have s3:ListAllMyBuckets and s3:CreateBucket permission.

Expected behavior
I should be able to create a Sagemaker processing job without having to assign s3:ListAllMyBuckets or s3:CreateBucket to my IAM role.

System information
A description of your system. Please provide:

  • SageMaker Python SDK version: 2.75.1
  • Framework name (eg. PyTorch) or algorithm (eg. KMeans): PySparkProcessor
  • Framework version: 2.4
  • Python version: 3.9.4
  • CPU or GPU: CPU
  • Custom Docker image (Y/N): Y

Additional context
This code in the function _create_s3_bucket_if_it_does_not_exist in session.py creates the bucket if it doesn't already exists:

if bucket.creation_date is None:
  try:
      if region == "us-east-1":
          # 'us-east-1' cannot be specified because it is the default region:
          # https://github.com/boto/boto3/issues/125
          s3.create_bucket(Bucket=bucket_name)
      else:
          s3.create_bucket(
              Bucket=bucket_name, CreateBucketConfiguration={"LocationConstraint": region}
          )
  
      LOGGER.info("Created S3 bucket: %s", bucket_name)
  except ClientError as e:
      error_code = e.response["Error"]["Code"]
      message = e.response["Error"]["Message"]
  
      if error_code == "BucketAlreadyOwnedByYou":
          pass
      elif (
          error_code == "OperationAborted"
          and "conflicting conditional operation" in message
      ):
          # If this bucket is already being concurrently created, we don't need to create
          # it again.
          pass
      else:
          raise

bucket.creation_date (surprisingly) uses an API call like aws s3 list-buckets to get the creation date of the default bucket. The API call requires s3:ListAllMyBuckets permission. Weirdly, if the caller doesn't have this permission, the API doesn't fail but just returns None. In this case, we enter the if clause although the bucket may already exist.

s3.create_bucket obviously requires the s3:CreateBucket permission. If the caller has this permission and the bucket already exists, the API raises a ClientError with error_code BucketAlreadyOwnedByYou, which will allow this code to pass.
However, if the s3:CreateBucket permission isn't granted, this code fails with a permission denied ClientError.

Therefore, the caller needs either s3:ListAllMyBuckets or s3:CreateBucket in order for this code to work. Both shouldn't be needed for this operation.

The head_bucket operation would be more suitable to be used instead of bucket.creation_date here because it's both faster and doesn't need the mentioned permissions. This would also allow the if clause if error_code == "BucketAlreadyOwnedByYou": to be removed.

If you agree with me, I can help implementing a change. Please let me know! :)
Cheers

@janbe-ts
Copy link
Author

janbe-ts commented Feb 8, 2022

For anyone running into the same issue, a simple but hacky workaround is to override the sessions _default_bucket attribute instead of using the constructor to specify the default_bucket:

    sagemaker_session = Session()
    sagemaker_session._default_bucket = 'my_bucket'

@janbe-ts janbe-ts changed the title s3:ListAllMyBuckets permission required for Session with default bucket s3:ListAllMyBuckets permission required for Session with default bucket Feb 8, 2022
@janbe-ts janbe-ts changed the title s3:ListAllMyBuckets permission required for Session with default bucket s3:ListAllMyBuckets or s3:CreateBucket permission required for Session with default bucket Feb 8, 2022
@iPhra
Copy link

iPhra commented Mar 14, 2022

Facing the exact same issue when overriding the default_bucket on a Session object. This is completely counterintuitive and against least-privilege principles.

@sperka
Copy link

sperka commented May 29, 2022

this issue is extremely underrated. shows how many follow least-privilege principles vs using AmazonS3FullAccess

@mufaddal-rohawala
Copy link
Member

Thanks for all the info, Sagemaker team is looking into this issue and we would try to resolve this issue soon.

@mufaddal-rohawala
Copy link
Member

The issue should be fixed now. Please use recent sagemaker-python-sdk >= v2.96.0

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

4 participants