Skip to content

Fixing container path for source files when using local_session #499

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 4 commits into from
Dec 14, 2018

Conversation

bearpelican
Copy link
Contributor

@bearpelican bearpelican commented Nov 19, 2018

Issue #, if available:
Fixes: No such file or directory error when running estimator fit/predict with a local sagemaker session and a local script.

When running sagemaker in local mode, sagemaker_submit_directory is set to the the local environment path (Ex: file:///home/ec2-user/mnist/src'), when it should be set to the container path - '/opt/ml/code'. The container path is mounted in src/local/image.py.
The container does not have access to the local environment path, which is why a No such file or directory error happens.

Minimal example that breaks:
https://gist.github.com/bearpelican/440d10f28003c3653c7866c129418ad7

This was taken directly from the sagemaker example notebook:
https://github.com/awslabs/amazon-sagemaker-examples/tree/master/sagemaker-python-sdk/pytorch_mnist

Description of changes:

Merge Checklist

Put an x in the boxes that apply. You can also fill these out after creating the PR. If you're unsure about any of them, don't hesitate to ask. We're here to help! This is simply a reminder of what we are going to look for before merging your pull request.

  • I have read the CONTRIBUTING doc
  • I have added tests that prove my fix is effective or that my feature works (if appropriate)
  • I have updated the changelog with a description of my changes (if appropriate)
  • I have updated any necessary documentation (if appropriate)

By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.

@codecov-io
Copy link

codecov-io commented Nov 19, 2018

Codecov Report

Merging #499 into master will increase coverage by 0.01%.
The diff coverage is 100%.

Impacted file tree graph

@@            Coverage Diff             @@
##           master     #499      +/-   ##
==========================================
+ Coverage   92.77%   92.79%   +0.01%     
==========================================
  Files          71       71              
  Lines        5359     5371      +12     
==========================================
+ Hits         4972     4984      +12     
  Misses        387      387
Impacted Files Coverage Δ
src/sagemaker/local/image.py 88.53% <100%> (+0.4%) ⬆️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 4bac185...81cdfd4. Read the comment docs.

@bearpelican
Copy link
Contributor Author

bearpelican commented Nov 19, 2018

Not sure if this is the best way to fix the bug, but I wanted to put this here as an issue and proposed fix. If anyone has any other ideas, please feel free to submit a new PR and close this one!

Copy link
Contributor

@laurenyu laurenyu left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the contribution! Would you mind adding some tests?

@bearpelican
Copy link
Contributor Author

Yup will do

@bearpelican bearpelican force-pushed the local_code branch 2 times, most recently from 4118b9d to 5f54934 Compare December 11, 2018 00:18
@bearpelican
Copy link
Contributor Author

bearpelican commented Dec 11, 2018

Updated commit - 5f54934 with extended unit tests

laurenyu
laurenyu previously approved these changes Dec 11, 2018
@laurenyu
Copy link
Contributor

Thanks for updating! There are a couple flake8 errors:

flake8 runtests: commands[0] | flake8
./src/sagemaker/local/image.py:111:1: W293 blank line contains whitespace
./src/sagemaker/local/image.py:338:1: W293 blank line contains whitespace

@bearpelican
Copy link
Contributor Author

Cool. Should be fixed now!

laurenyu
laurenyu previously approved these changes Dec 11, 2018
@@ -106,6 +106,8 @@ def train(self, input_data_config, output_data_config, hyperparameters, job_name
data_dir = self._create_tmp_folder()
volumes = self._prepare_training_volumes(data_dir, input_data_config, output_data_config,
hyperparameters)
# If local, source directory needs to be updated to mounted /opt/ml/code path
hyperparameters = self._update_local_src_path(hyperparameters, key=sagemaker.estimator.DIR_PARAM_NAME)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

an integ test is failing because it expects this hyperparameter to be JSON (at least for the SageMaker deep learning framework images)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah good catch! Updated hyperparameters with json conversion

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Unable to run integration tests on my machine, so please let me know if there are any more errors on those tests

Copying env variable instead

Adding tests forl local code
@laurenyu
Copy link
Contributor

@bearpelican the integ tests look fine now, but could you add the changelog entry to a new 1.16.3 section? (1.16.2 got released yesterday)

@jesterhazy jesterhazy merged commit 9174d5b into aws:master Dec 14, 2018
metrizable pushed a commit to metrizable/sagemaker-python-sdk that referenced this pull request Dec 1, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants