Skip to content

Ability to specify default S3 Bucket #1258

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
jerrygb opened this issue Jan 23, 2020 · 3 comments
Closed

Ability to specify default S3 Bucket #1258

jerrygb opened this issue Jan 23, 2020 · 3 comments
Labels
status: pending release The fix have been merged but not yet released to PyPI type: feature request

Comments

@jerrygb
Copy link

jerrygb commented Jan 23, 2020

Describe the bug
The sagemaker session allows us to setup a default bucket, which is great. This is part of the PR #1168, but it was removed in #1175 and later added back in #1176.

The part we are missing is, where we could override the bucket with something existing. @laurenyu eluded to this point in her review comments.

A question/suggestion, would it be best for self.default_bucket() to check whether the bucket exists or we have appropriate permissions before we try create a new bucket in the interest of restrictive AWS environments that explicitly prevent bucket creation.

Please let me know if my understanding is incorrect or there is better way to go about this.

I would have expected the function to use s3.meta.client.head_bucket to check if the bucket exists before it creates it. This is important, since functions like self.upload_data() just called self.default_bucket() to enforce a declarative state (create bucket if not there) and then return the bucket name.

To reproduce

sagemaker_session = sagemaker.session.Session(default_bucket='my-bucket')
sagemaker_session.default_bucket() 

Expected behavior
I would have expected the above command to check for an existing bucket and then create it if it cannot find the bucket. Inadvertently, because of the nature of the head_bucket(), it would also try to create the bucket if the context role does not have necessary permissions. But I think that is still better than calling the create_bucket() API call each time in the interest of the IAM role that the data scientists may have.

Screenshots or logs
NA

System information
A description of your system. Please provide:

  • SageMaker Python SDK version: master or 1.50.7
  • Framework name (eg. PyTorch) or algorithm (eg. KMeans): NA
  • Framework version: NA
  • Python version: NA
  • CPU or GPU: NA
  • Custom Docker image (Y/N): NA

Additional context
Add any other context about the problem here.

@knakad
Copy link
Contributor

knakad commented Jan 29, 2020

Thanks for reaching out!

I had noticed this a couple days before you opened the issue, and was waiting for bandwidth to resolve it.
This'll be fixed in the next release, scheduled to go out next week (Tue/Wed).

Please reach out if you have any other issues =)

@knakad knakad added status: pending release The fix have been merged but not yet released to PyPI type: feature request and removed type: bug labels Jan 29, 2020
@knakad
Copy link
Contributor

knakad commented Jan 30, 2020

I released the changes early: https://pypi.org/project/sagemaker/ =)
https://pypi.org/project/sagemaker/1.50.8/

@knakad knakad closed this as completed Jan 30, 2020
@jerrygb
Copy link
Author

jerrygb commented Feb 13, 2020

Thank you @knakad

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
status: pending release The fix have been merged but not yet released to PyPI type: feature request
Projects
None yet
Development

No branches or pull requests

2 participants