Skip to content

Local deployment is not working on Windows 10 #1297

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
ViktorStepanukCN opened this issue Feb 19, 2020 · 5 comments
Open

Local deployment is not working on Windows 10 #1297

ViktorStepanukCN opened this issue Feb 19, 2020 · 5 comments

Comments

@ViktorStepanukCN
Copy link

ViktorStepanukCN commented Feb 19, 2020

Describe the bug
Trained model artifacts are not downloaded from S3 during deploy on Windows 10.

To reproduce
Demonstrated on example from
https://github.com/awslabs/amazon-sagemaker-examples/blob/master/sagemaker-python-sdk/tensorflow_script_mode_training_and_serving/tensorflow_script_mode_training_and_serving.ipynb

import sagemaker
from sagemaker import get_execution_role
from sagemaker.tensorflow import TensorFlow

sagemaker_session = sagemaker.Session()
role = get_execution_role()
region = sagemaker_session.boto_session.region_name

training_data_uri = 's3://sagemaker-sample-data-{}/tensorflow/mnist'.format(region)

mnist_estimator2 = TensorFlow(entry_point='mnist2.py',
                             role=role,
                             train_instance_count=1,
                             train_instance_type='local',
                             framework_version='2.0.0',
                             py_version='py3')

mnist_estimator2.fit(training_data_uri)

predictor2 = mnist_estimator2.deploy(initial_instance_count=1, instance_type='local')

Expected behavior
Model artifacts should be downloaded from S3 and accessible to serving container.

Screenshots or logs
image

System information
A description of your system. Please provide:

  • SageMaker Python SDK version:sagemaker==1.50.10.post0
  • Framework name (eg. PyTorch) or algorithm (eg. KMeans):TensorFlow
  • Framework version:2.0
  • Python version:3.7.6
  • CPU or GPU:CPU
  • Custom Docker image (Y/N):N

Additional context
Problems comes from obtaining S3ModelArtifacts path in
\sagemaker\local\image.py
in method
def retrieve_artifacts(self, compose_data, output_data_config, job_name)
is artifact path returned using simple
return os.path.join(output_data, "model.tar.gz")
if this is called on Windows it produces something like:
../tensorflow-training-2020-02-19-15-57-14-207\model.tar.gz
when Sagemaker tries to download artifacts from S3 afterwards in
\sagemaker\utils.py
using method
def download_folder(bucket_name, prefix, target, sagemaker_session):
it fails to retrieve files calling
bucket.objects.filter(Prefix=prefix)
because of the \ in front of model.tar.gz

@ViktorStepanukCN ViktorStepanukCN changed the title **Describe the bug** **Local deployment is now working on Windows 10** Feb 19, 2020
@ViktorStepanukCN ViktorStepanukCN changed the title **Local deployment is now working on Windows 10** Local deployment is now working on Windows 10 Feb 19, 2020
@ViktorStepanukCN ViktorStepanukCN changed the title Local deployment is now working on Windows 10 Local deployment is not working on Windows 10 Feb 24, 2020
@ajaykarpur
Copy link
Contributor

ajaykarpur commented Feb 25, 2020

Thank you for submitting a detailed bug report. It appears this issue was fixed in #1302, which was released in v1.50.14.

Please try updating your version of the SageMaker Python SDK.

@ViktorStepanukCN
Copy link
Author

Hi, thank you for reaction. I tried version 1.50.16.dev0 and the problem still remains. It looks like that metioned fix was for the similiar problem but in different place.

Code of the method for retrieving model artifacts ( retrieve_artifacts in sagemaker-python-sdk-master/src/sagemaker/local/image.py ) is still using return os.path.join(output_data, "model.tar.gz")

@nadiaya
Copy link
Contributor

nadiaya commented Feb 26, 2020

Windows Support for Local Mode has been Experimental and unfortunately has never been fully supported or tested.

Marking with a feature request label.

@nadiaya
Copy link
Contributor

nadiaya commented Feb 26, 2020

Could you please provide more details about your use case for using local mode?
That would help us a lot when prioritizing the roadmap.

Thank you!

@LukeHankey
Copy link

Could you please provide more details about your use case for using local mode? That would help us a lot when prioritizing the roadmap.

Thank you!

Hi, I thought I'd bring some more notice to this.

The use case for me is for local testing. Currently, the only way to test is by using instances, storage, etc. It's noted in the blog here which are also perfectly valid use cases for windows machines. It can take some time just to debug as well as an extra expense.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

6 participants