Ability to specify default S3 Bucket #1258

jerrygb · 2020-01-23T04:14:44Z

Describe the bug
The sagemaker session allows us to setup a default bucket, which is great. This is part of the PR #1168, but it was removed in #1175 and later added back in #1176.

The part we are missing is, where we could override the bucket with something existing. @laurenyu eluded to this point in her review comments.

A question/suggestion, would it be best for self.default_bucket() to check whether the bucket exists or we have appropriate permissions before we try create a new bucket in the interest of restrictive AWS environments that explicitly prevent bucket creation.

Please let me know if my understanding is incorrect or there is better way to go about this.

I would have expected the function to use s3.meta.client.head_bucket to check if the bucket exists before it creates it. This is important, since functions like self.upload_data() just called self.default_bucket() to enforce a declarative state (create bucket if not there) and then return the bucket name.

To reproduce

sagemaker_session = sagemaker.session.Session(default_bucket='my-bucket')
sagemaker_session.default_bucket()

Expected behavior
I would have expected the above command to check for an existing bucket and then create it if it cannot find the bucket. Inadvertently, because of the nature of the head_bucket(), it would also try to create the bucket if the context role does not have necessary permissions. But I think that is still better than calling the create_bucket() API call each time in the interest of the IAM role that the data scientists may have.

Screenshots or logs
NA

System information
A description of your system. Please provide:

SageMaker Python SDK version: master or 1.50.7
Framework name (eg. PyTorch) or algorithm (eg. KMeans): NA
Framework version: NA
Python version: NA
CPU or GPU: NA
Custom Docker image (Y/N): NA

Additional context
Add any other context about the problem here.

The text was updated successfully, but these errors were encountered:

knakad · 2020-01-29T22:37:07Z

Thanks for reaching out!

I had noticed this a couple days before you opened the issue, and was waiting for bandwidth to resolve it.
This'll be fixed in the next release, scheduled to go out next week (Tue/Wed).

Please reach out if you have any other issues =)

knakad · 2020-01-30T01:34:32Z

I released the changes early: https://pypi.org/project/sagemaker/ =)
https://pypi.org/project/sagemaker/1.50.8/

jerrygb · 2020-02-13T02:22:38Z

Thank you @knakad

jerrygb added the type: bug label Jan 23, 2020

knakad mentioned this issue Jan 29, 2020

fix: modify session and kms_utils to check for S3 bucket before creation #1271

Merged

7 tasks

knakad added status: pending release The fix have been merged but not yet released to PyPI type: feature request and removed type: bug labels Jan 29, 2020

knakad closed this as completed Jan 30, 2020

knakad mentioned this issue Apr 10, 2020

Specify location for repacked model data #1402

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Ability to specify default S3 Bucket #1258

Ability to specify default S3 Bucket #1258

jerrygb commented Jan 23, 2020 •

edited

Loading

knakad commented Jan 29, 2020

knakad commented Jan 30, 2020 •

edited

Loading

jerrygb commented Feb 13, 2020

Ability to specify default S3 Bucket #1258

Ability to specify default S3 Bucket #1258

Comments

jerrygb commented Jan 23, 2020 • edited Loading

knakad commented Jan 29, 2020

knakad commented Jan 30, 2020 • edited Loading

jerrygb commented Feb 13, 2020

jerrygb commented Jan 23, 2020 •

edited

Loading

knakad commented Jan 30, 2020 •

edited

Loading