Skip to content

upload_data() does not create bucket when bucket is specified #371

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
harusametime opened this issue Aug 30, 2018 · 3 comments
Closed

upload_data() does not create bucket when bucket is specified #371

harusametime opened this issue Aug 30, 2018 · 3 comments

Comments

@harusametime
Copy link
Contributor

harusametime commented Aug 30, 2018

System Information

  • Framework (e.g. TensorFlow) / Algorithm (e.g. KMeans): Any
  • Framework Version: N/A
  • Python Version: Any
  • CPU or GPU: Any
  • Python SDK Version: 1.9.2
  • Are you using a custom image: No

Describe the problem

According to readthedocs, upload_data() creates the bucket if the bucket does not exist. However, the bucket is not created when the bucket is specified like upload_data(path, bucket=not_exist_bucket, key_prefix='data').

As implemented in session.py,

  • default_bucket() creates the bucket and is called in upload_data()
  • upload_data() does not create the bucket when the bucket is specified

These are not consistent with the document and seem confusing. Could you consider to fix the code or document?

Minimal repro / logs

bucket = sagemaker_session.default_bucket()+'-01'
prefix = 'sagemaker/DEMO-pytorch-mnist-00'

print('Bucket: {}'.format(bucket))

inputs = sagemaker_session.upload_data(path='data', bucket=bucket, key_prefix=prefix)
print('input spec (in this case, just an S3 path): {}'.format(inputs))

output:

Bucket: sagemaker-us-east-1-(account_ID)-01

error:

S3UploadFailedErrorTraceback (most recent call last)
<ipython-input-45-be77d611c2a1> in <module>()
      4 print('Bucket: {}'.format(bucket))
      5 
----> 6 inputs = sagemaker_session.upload_data(path='data', bucket=bucket, key_prefix=prefix)
      7 print('input spec (in this case, just an S3 path): {}'.format(inputs))

/home/ec2-user/anaconda3/envs/pytorch_p27/lib/python2.7/site-packages/sagemaker/session.pyc in upload_data(self, path, bucket, key_prefix)
    150 
    151         for local_path, s3_key in files:
--> 152             s3.Object(bucket, s3_key).upload_file(local_path)
    153 
    154         s3_uri = 's3://{}/{}'.format(bucket, key_prefix)

/home/ec2-user/anaconda3/envs/pytorch_p27/lib/python2.7/site-packages/boto3/s3/inject.pyc in object_upload_file(self, Filename, ExtraArgs, Callback, Config)
    278     return self.meta.client.upload_file(
    279         Filename=Filename, Bucket=self.bucket_name, Key=self.key,
--> 280         ExtraArgs=ExtraArgs, Callback=Callback, Config=Config)
    281 
    282 

/home/ec2-user/anaconda3/envs/pytorch_p27/lib/python2.7/site-packages/boto3/s3/inject.pyc in upload_file(self, Filename, Bucket, Key, ExtraArgs, Callback, Config)
    129         return transfer.upload_file(
    130             filename=Filename, bucket=Bucket, key=Key,
--> 131             extra_args=ExtraArgs, callback=Callback)
    132 
    133 

/home/ec2-user/anaconda3/envs/pytorch_p27/lib/python2.7/site-packages/boto3/s3/transfer.pyc in upload_file(self, filename, bucket, key, callback, extra_args)
    285             raise S3UploadFailedError(
    286                 "Failed to upload %s to %s: %s" % (
--> 287                     filename, '/'.join([bucket, key]), e))
    288 
    289     def download_file(self, bucket, key, filename, extra_args=None,

S3UploadFailedError: Failed to upload data/raw/t10k-labels-idx1-ubyte to sagemaker-us-east-1-
(account_ID)-01/sagemaker/DEMO-pytorch-mnist-00/raw/t10k-labels-idx1-ubyte: An error occurred (NoSuchBucket) when calling the PutObject operation: The specified bucket does not exist
@harusametime harusametime changed the title upload_data() does not create bucket with given bucket upload_data() does not create bucket when bucket is specified Aug 30, 2018
@nadiaya
Copy link
Contributor

nadiaya commented Aug 30, 2018

The reasoning behind not creating user specified bucket in case if it doesn't exist was to prevent creating a wrong bucket in case user made a typo in the bucket name and actually wanted to use a different existing one.
Not sure if it's strong enough reasoning anymore though and maybe we should change the behavior.

Good point that readthedocs should reflect the behavior regardless.

@nadiaya
Copy link
Contributor

nadiaya commented Aug 31, 2018

Documentation update was merged in.

@nadiaya nadiaya closed this as completed Aug 31, 2018
apacker pushed a commit to apacker/sagemaker-python-sdk that referenced this issue Nov 15, 2018
Added: MXNet Gluon CIFAR-10 Automatic Model Tuning vs random search
@kmair
Copy link

kmair commented Aug 1, 2020

Thanks @nadiaya. In general, using the default bucket name works.

bucket_name = sess.default_bucket()

Above change to the bucket name can solve this

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants