Skip to content

error using s3 location for TensorFlow entry_point #471

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
wqp89324 opened this issue Nov 12, 2018 · 4 comments
Closed

error using s3 location for TensorFlow entry_point #471

wqp89324 opened this issue Nov 12, 2018 · 4 comments

Comments

@wqp89324
Copy link

System Information

  • Framework (e.g. TensorFlow) / Algorithm (e.g. KMeans): TensorFlow
  • Framework Version: 1.11.0
  • Python Version: 3.6.5
  • CPU or GPU: CPU
  • Python SDK Version:
  • Are you using a custom image: SageMaker tensorflow_p36

Describe the problem

I'm working with the tensorflow_abalone_age_predictor_using_layers example, but when I specify TensorFlow model entry_point as a s3 location, I get the error below.
Here is link to the s3 script: https://s3.us-east-2.amazonaws.com/sagemaker-us-east-2-XXX/tensorflow_abalone_age_predictor_using_layers/abalone.py

Minimal repro / logs

FileNotFoundError Traceback (most recent call last)
in ()
10 train_instance_type='ml.c4.xlarge')
11
---> 12 abalone_estimator.fit(inputs)

~/anaconda3/envs/tensorflow_p36/lib/python3.6/site-packages/sagemaker/tensorflow/estimator.py in fit(self, inputs, wait, logs, job_name, run_tensorboard_locally)
259 tensorboard.join()
260 else:
--> 261 fit_super()
262
263 @classmethod

~/anaconda3/envs/tensorflow_p36/lib/python3.6/site-packages/sagemaker/tensorflow/estimator.py in fit_super()
241
242 def fit_super():
--> 243 super(TensorFlow, self).fit(inputs, wait, logs, job_name)
244
245 if run_tensorboard_locally and wait is False:

~/anaconda3/envs/tensorflow_p36/lib/python3.6/site-packages/sagemaker/estimator.py in fit(self, inputs, wait, logs, job_name)
205 based on the training image name and current timestamp.
206 """
--> 207 self._prepare_for_training(job_name=job_name)
208
209 self.latest_training_job = _TrainingJob.start_new(self, inputs)

~/anaconda3/envs/tensorflow_p36/lib/python3.6/site-packages/sagemaker/estimator.py in _prepare_for_training(self, job_name)
714 script = self.entry_point
715 else:
--> 716 self.uploaded_code = self._stage_user_code_in_s3()
717 code_dir = self.uploaded_code.s3_prefix
718 script = self.uploaded_code.script_name

~/anaconda3/envs/tensorflow_p36/lib/python3.6/site-packages/sagemaker/estimator.py in _stage_user_code_in_s3(self)
743 s3_key_prefix=code_s3_prefix,
744 script=self.entry_point,
--> 745 directory=self.source_dir)
746
747 def _model_source_dir(self):

~/anaconda3/envs/tensorflow_p36/lib/python3.6/site-packages/sagemaker/fw_utils.py in tar_and_upload_dir(session, bucket, s3_key_prefix, script, directory)
136 key = '{}/{}'.format(s3_key_prefix, 'sourcedir.tar.gz')
137
--> 138 tar_file = sagemaker.utils.create_tar_file(source_files)
139 s3.Object(bucket, key).upload_file(tar_file)
140 os.remove(tar_file)

~/anaconda3/envs/tensorflow_p36/lib/python3.6/site-packages/sagemaker/utils.py in create_tar_file(source_files, target)
263 for sf in source_files:
264 # Add all files from the directory into the root of the directory structure of the tar
--> 265 t.add(sf, arcname=os.path.basename(sf))
266 return filename
267

~/anaconda3/envs/tensorflow_p36/lib/python3.6/tarfile.py in add(self, name, arcname, recursive, exclude, filter)
1932
1933 # Create a TarInfo object from the file.
-> 1934 tarinfo = self.gettarinfo(name, arcname)
1935
1936 if tarinfo is None:

~/anaconda3/envs/tensorflow_p36/lib/python3.6/tarfile.py in gettarinfo(self, name, arcname, fileobj)
1801 if fileobj is None:
1802 if hasattr(os, "lstat") and not self.dereference:
-> 1803 statres = os.lstat(name)
1804 else:
1805 statres = os.stat(name)

FileNotFoundError: [Errno 2] No such file or directory: 's3://sagemaker-us-east-2-XXX/tensorflow_abalone_age_predictor_using_layers/abalone.py'

  • Exact command to reproduce:
script_loc = 's3://sagemaker-us-east-2-XXX/tensorflow_abalone_age_predictor_using_layers/abalone.py'
abalone_estimator = TensorFlow(entry_point=script_loc,
                               role=role,
                               framework_version='1.11.0',
                               training_steps= 100,                                  
                               evaluation_steps= 100,
                               hyperparameters={'learning_rate': 0.001},
                               train_instance_count=1,
                               train_instance_type='ml.c4.xlarge')
abalone_estimator.fit(inputs)
@iquintero
Copy link
Contributor

Hi @wqp89324

script_location needs to be a local file. This is why you are getting the FileNotFoundError.

script_loc = '/path/to/abalone.py' should work.

Feel free to re-open if you have any further questions.

@BigLep
Copy link

BigLep commented Nov 17, 2018

@iquintero : can we please give a self-documenting error message so customers aren't left wondering why we failed (e.g., "entry_point" needs to be a local file)?

@laurenyu
Copy link
Contributor

@BigLep I've submitted #500 to provide a better error message in this case

@jesterhazy
Copy link
Contributor

#500 has been merged and will be in our next patch release.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

5 participants