Skip to content

Add support for additional libraries in the Estimator #498

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 7 commits into from
Nov 19, 2018

Conversation

mvsusp
Copy link
Contributor

@mvsusp mvsusp commented Nov 18, 2018

Description of changes:
Introduces the following argument in the Estimator and Model classes.

  • **dependencies** (list[str]) A list of paths to directories (absolute or relative) with
    any additional libraries that will be exported to the container (default: []).
    The library folders will be copied to SageMaker in the same folder where the entrypoint is copied.
    If the source_dir points to S3, code will be uploaded and the S3 location will be used
    instead. Example:

          The following call
          >>> Estimator(entry_point='train.py', lib_dirs=['my/libs/common', 'virtual-env'])
          results in the following inside the container:
    
          >>> $ ls
    
          >>> opt/ml/code
          >>>     ├── train.py
          >>>     ├── common
          >>>     └── virtual-env
    

Merge Checklist

Put an x in the boxes that apply. You can also fill these out after creating the PR. If you're unsure about any of them, don't hesitate to ask. We're here to help! This is simply a reminder of what we are going to look for before merging your pull request.

  • I have read the CONTRIBUTING doc
  • I have added tests that prove my fix is effective or that my feature works (if appropriate)
  • [ X I have updated the changelog with a description of my changes (if appropriate)
  • I have updated any necessary documentation (if appropriate)

By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.

@mvsusp mvsusp requested review from nadiaya, owen-t and leopd November 18, 2018 23:31

def test_source_dirs(sagemaker_session, tmpdir):
source_dir = os.path.join(DATA_DIR, 'pytorch_source_dirs')
lib = os.path.join(str(tmpdir), 'alexa.py')
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

any reason not to put it in a separate directory?
i think it would be closer to a real life use case.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

source dir is under test/data and alexa.py is under /tmp. We about thought about the same use case =)


predict_response = predictor.predict([24])

assert predict_response == [24]
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why not make to return 42?

owen-t
owen-t previously approved these changes Nov 19, 2018
Copy link
Contributor

@owen-t owen-t left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great.

@@ -149,6 +149,23 @@ The following are optional arguments. When you create a ``Chainer`` object, you
other training source code dependencies including the entry point
file. Structure within this directory will be preserved when training
on SageMaker.
- ``lib_dirs (list[str])`` A list of paths to directories (absolute or relative) with
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We may want to call this libs as the entries do not need to be directories.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

sure

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Renaming it to dependencies.

nadiaya
nadiaya previously approved these changes Nov 19, 2018
instead. Example:

The following call
>>> Estimator(entry_point='train.py', lib_dirs=['my/libs/common', 'virtual-env'])
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This should be Chainer() not Estimator() Applies to the other ones as well.

@mvsusp mvsusp dismissed stale reviews from nadiaya and owen-t via 4e02b12 November 19, 2018 01:50
nadiaya
nadiaya previously approved these changes Nov 19, 2018
owen-t
owen-t previously approved these changes Nov 19, 2018
@mvsusp mvsusp dismissed stale reviews from owen-t and nadiaya via bbd7984 November 19, 2018 04:22
nadiaya
nadiaya previously approved these changes Nov 19, 2018
from tests.integ import DATA_DIR, PYTHON_VERSION


def test_source_dirs(sagemaker_session, tmpdir, sagemaker_local_session):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you remove sagemaker_session?

nadiaya
nadiaya previously approved these changes Nov 19, 2018
@mvsusp mvsusp merged commit b096cd1 into aws:master Nov 19, 2018
@mvsusp mvsusp deleted the mvs-lib-dir branch November 19, 2018 18:32
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants