Skip to content

Default model_dir uses backslashes on Windows #1759

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
RobinKa opened this issue Jul 28, 2020 · 2 comments
Closed

Default model_dir uses backslashes on Windows #1759

RobinKa opened this issue Jul 28, 2020 · 2 comments

Comments

@RobinKa
Copy link

RobinKa commented Jul 28, 2020

Describe the bug
The default model_dir will contain backslashes when running the following code to start tensorflow training on Windows.

from sagemaker.tensorflow import TensorFlow

role = "<role>"

tf_estimator = TensorFlow(
    source_dir="src", script_mode=True,
    framework_version="1.15.2", py_version="py37",
    entry_point="main.py", role=role,
    train_instance_count=1, train_instance_type="ml.g4dn.xlarge",
    train_volume_size=50,
    output_path="<bucketpath>/exported_model"
)

tf_estimator.fit({
    "train": "<bucketpath>/<train dir>",
    "test": "<bucketpath>/<test dir>",
})

Trying to then upload something to the model_dir path (eg. when tensorflow tries to save checkpoints) will not work as S3 requires forward slashes. Example args that are passed when executing the above code

Args: Namespace(alpha=0.25, augment=True, batch_size=3, classes=2, crop_size=512, current_host='algo-1', epochs=3, fl_weight=0.1, gamma=2.0, hosts=['algo-1'], init_lr=0.004, layer_depth=21, mode='train', model_dir='<bucketpath>/exported_model\\tensorflow-training-2020
-07-28-14-00-00-884\\model', momentum=0.9, num_gpus=1, output_dir='output', power=0.9, regularization_scale=0.0001, sm_model_dir='/opt/ml/model', stddev=0.02, test_dir='/opt/ml/input/data/test', train_dir='/opt/ml/input/data/train')

As a workaround I just did args.model_dir = args.model_dir.replace("\\", "/") in my training code.

Expected behavior
The model_dir path should use forward slashes no matter which OS the code to initiate training is run from for S3 paths.

System information
A description of your system. Please provide:

  • SageMaker Python SDK version: 1.71.1
  • Framework name (eg. PyTorch) or algorithm (eg. KMeans): TensorFlow
  • Framework version: 1.15.2
  • Python version: 3.7.? on server, 3.7.3 locally
  • CPU or GPU: GPU
  • Custom Docker image (Y/N): N
@laurenyu
Copy link
Contributor

Thanks for the bug report! I've opened a PR to address this: #1763

@laurenyu
Copy link
Contributor

laurenyu commented Aug 5, 2020

I have merged #1763 - sorry for the delay!

@laurenyu laurenyu closed this as completed Aug 5, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants