Skip to content

Add support for additional libraries in the Estimator #498

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 7 commits into from
Nov 19, 2018
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
5 changes: 3 additions & 2 deletions CHANGELOG.rst
Original file line number Diff line number Diff line change
Expand Up @@ -2,9 +2,10 @@
CHANGELOG
=========

1.15.1.dev
==========
1.15.1dev
=========

* feature: Estimators: dependencies attribute allows export of additional libraries into the container
* feature: Add APIs to export Airflow transform and deploy config

1.15.0
Expand Down
17 changes: 17 additions & 0 deletions src/sagemaker/chainer/README.rst
Original file line number Diff line number Diff line change
Expand Up @@ -149,6 +149,23 @@ The following are optional arguments. When you create a ``Chainer`` object, you
other training source code dependencies including the entry point
file. Structure within this directory will be preserved when training
on SageMaker.
- ``dependencies (list[str])`` A list of paths to directories (absolute or relative) with
any additional libraries that will be exported to the container (default: []).
The library folders will be copied to SageMaker in the same folder where the entrypoint is copied.
If the ```source_dir``` points to S3, code will be uploaded and the S3 location will be used
instead. Example:

The following call
>>> Chainer(entry_point='train.py', dependencies=['my/libs/common', 'virtual-env'])
results in the following inside the container:

>>> $ ls

>>> opt/ml/code
>>> ├── train.py
>>> ├── common
>>> └── virtual-env

- ``hyperparameters`` Hyperparameters that will be used for training.
Will be made accessible as a dict[str, str] to the training code on
SageMaker. For convenience, accepts other types besides str, but
Expand Down
2 changes: 1 addition & 1 deletion src/sagemaker/chainer/estimator.py
Original file line number Diff line number Diff line change
Expand Up @@ -133,7 +133,7 @@ def create_model(self, model_server_workers=None, role=None, vpc_config_override
py_version=self.py_version, framework_version=self.framework_version,
model_server_workers=model_server_workers, image=self.image_name,
sagemaker_session=self.sagemaker_session,
vpc_config=self.get_vpc_config(vpc_config_override))
vpc_config=self.get_vpc_config(vpc_config_override), dependencies=self.dependencies)

@classmethod
def _prepare_init_params_from_job_description(cls, job_details, model_channel_name=None):
Expand Down
22 changes: 20 additions & 2 deletions src/sagemaker/estimator.py
Original file line number Diff line number Diff line change
Expand Up @@ -637,7 +637,7 @@ class Framework(EstimatorBase):
LAUNCH_PS_ENV_NAME = 'sagemaker_parameter_server_enabled'

def __init__(self, entry_point, source_dir=None, hyperparameters=None, enable_cloudwatch_metrics=False,
container_log_level=logging.INFO, code_location=None, image_name=None, **kwargs):
container_log_level=logging.INFO, code_location=None, image_name=None, dependencies=None, **kwargs):
"""Base class initializer. Subclasses which override ``__init__`` should invoke ``super()``

Args:
Expand All @@ -646,6 +646,22 @@ def __init__(self, entry_point, source_dir=None, hyperparameters=None, enable_cl
source_dir (str): Path (absolute or relative) to a directory with any other training
source code dependencies aside from tne entry point file (default: None). Structure within this
directory are preserved when training on Amazon SageMaker.
dependencies (list[str]): A list of paths to directories (absolute or relative) with
any additional libraries that will be exported to the container (default: []).
The library folders will be copied to SageMaker in the same folder where the entrypoint is copied.
Example:

The following call
>>> Estimator(entry_point='train.py', dependencies=['my/libs/common', 'virtual-env'])
results in the following inside the container:

>>> $ ls

>>> opt/ml/code
>>> ├── train.py
>>> ├── common
>>> └── virtual-env

hyperparameters (dict): Hyperparameters that will be used for training (default: None).
The hyperparameters are made accessible as a dict[str, str] to the training code on SageMaker.
For convenience, this accepts other types for keys and values, but ``str()`` will be called
Expand All @@ -663,6 +679,7 @@ def __init__(self, entry_point, source_dir=None, hyperparameters=None, enable_cl
"""
super(Framework, self).__init__(**kwargs)
self.source_dir = source_dir
self.dependencies = dependencies or []
self.entry_point = entry_point
if enable_cloudwatch_metrics:
warnings.warn('enable_cloudwatch_metrics is now deprecated and will be removed in the future.',
Expand Down Expand Up @@ -729,7 +746,8 @@ def _stage_user_code_in_s3(self):
bucket=code_bucket,
s3_key_prefix=code_s3_prefix,
script=self.entry_point,
directory=self.source_dir)
directory=self.source_dir,
dependencies=self.dependencies)

def _model_source_dir(self):
"""Get the appropriate value to pass as source_dir to model constructor on deploying
Expand Down
47 changes: 29 additions & 18 deletions src/sagemaker/fw_utils.py
Original file line number Diff line number Diff line change
Expand Up @@ -14,11 +14,15 @@

import os
import re
import shutil
import tempfile
from collections import namedtuple
from six.moves.urllib.parse import urlparse

import sagemaker.utils

_TAR_SOURCE_FILENAME = 'source.tar.gz'

UploadedCode = namedtuple('UserCode', ['s3_prefix', 'script_name'])
"""sagemaker.fw_utils.UserCode: An object containing the S3 prefix and script name.

Expand Down Expand Up @@ -107,7 +111,7 @@ def validate_source_dir(script, directory):
return True


def tar_and_upload_dir(session, bucket, s3_key_prefix, script, directory):
def tar_and_upload_dir(session, bucket, s3_key_prefix, script, directory, dependencies=None):
"""Pack and upload source files to S3 only if directory is empty or local.

Note:
Expand All @@ -118,31 +122,38 @@ def tar_and_upload_dir(session, bucket, s3_key_prefix, script, directory):
bucket (str): S3 bucket to which the compressed file is uploaded.
s3_key_prefix (str): Prefix for the S3 key.
script (str): Script filename.
directory (str): Directory containing the source file. If it starts with "s3://", no action is taken.
directory (str or None): Directory containing the source file. If it starts with
"s3://", no action is taken.
dependencies (List[str]): A list of paths to directories (absolute or relative)
containing additional libraries that will be copied into
/opt/ml/lib

Returns:
sagemaker.fw_utils.UserCode: An object with the S3 bucket and key (S3 prefix) and script name.
"""
if directory:
if directory.lower().startswith("s3://"):
return UploadedCode(s3_prefix=directory, script_name=os.path.basename(script))
else:
script_name = script
source_files = [os.path.join(directory, name) for name in os.listdir(directory)]
dependencies = dependencies or []
key = '%s/sourcedir.tar.gz' % s3_key_prefix

if directory and directory.lower().startswith('s3://'):
return UploadedCode(s3_prefix=directory, script_name=os.path.basename(script))
else:
# If no directory is specified, the script parameter needs to be a valid relative path.
os.path.exists(script)
script_name = os.path.basename(script)
source_files = [script]
tmp = tempfile.mkdtemp()

try:
source_files = _list_files_to_compress(script, directory) + dependencies
tar_file = sagemaker.utils.create_tar_file(source_files, os.path.join(tmp, _TAR_SOURCE_FILENAME))

session.resource('s3').Object(bucket, key).upload_file(tar_file)
finally:
shutil.rmtree(tmp)

s3 = session.resource('s3')
key = '{}/{}'.format(s3_key_prefix, 'sourcedir.tar.gz')
script_name = script if directory else os.path.basename(script)
return UploadedCode(s3_prefix='s3://%s/%s' % (bucket, key), script_name=script_name)

tar_file = sagemaker.utils.create_tar_file(source_files)
s3.Object(bucket, key).upload_file(tar_file)
os.remove(tar_file)

return UploadedCode(s3_prefix='s3://{}/{}'.format(bucket, key), script_name=script_name)
def _list_files_to_compress(script, directory):
basedir = directory if directory else os.path.dirname(script)
return [os.path.join(basedir, name) for name in os.listdir(basedir)]


def framework_name_from_image(image_name):
Expand Down
52 changes: 36 additions & 16 deletions src/sagemaker/model.py
Original file line number Diff line number Diff line change
Expand Up @@ -16,10 +16,10 @@

import sagemaker

from sagemaker.local import LocalSession
from sagemaker.fw_utils import tar_and_upload_dir, parse_s3_url, model_code_key_prefix
from sagemaker.session import Session
from sagemaker.utils import name_from_image, get_config_value
from sagemaker import local
from sagemaker import fw_utils
from sagemaker import session
from sagemaker import utils


class Model(object):
Expand Down Expand Up @@ -96,12 +96,12 @@ def deploy(self, initial_instance_count, instance_type, endpoint_name=None, tags
"""
if not self.sagemaker_session:
if instance_type in ('local', 'local_gpu'):
self.sagemaker_session = LocalSession()
self.sagemaker_session = local.LocalSession()
else:
self.sagemaker_session = Session()
self.sagemaker_session = session.Session()

container_def = self.prepare_container_def(instance_type)
self.name = self.name or name_from_image(container_def['Image'])
self.name = self.name or utils.name_from_image(container_def['Image'])
self.sagemaker_session.create_model(self.name, self.role, container_def, vpc_config=self.vpc_config)
production_variant = sagemaker.production_variant(self.name, instance_type, initial_instance_count)
self.endpoint_name = endpoint_name or self.name
Expand All @@ -127,7 +127,7 @@ class FrameworkModel(Model):

def __init__(self, model_data, image, role, entry_point, source_dir=None, predictor_cls=None, env=None, name=None,
enable_cloudwatch_metrics=False, container_log_level=logging.INFO, code_location=None,
sagemaker_session=None, **kwargs):
sagemaker_session=None, dependencies=None, **kwargs):
"""Initialize a ``FrameworkModel``.

Args:
Expand All @@ -140,6 +140,23 @@ def __init__(self, model_data, image, role, entry_point, source_dir=None, predic
source code dependencies aside from tne entry point file (default: None). Structure within this
directory will be preserved when training on SageMaker.
If the directory points to S3, no code will be uploaded and the S3 location will be used instead.
dependencies (list[str]): A list of paths to directories (absolute or relative) with
any additional libraries that will be exported to the container (default: []).
The library folders will be copied to SageMaker in the same folder where the entrypoint is copied.
If the ```source_dir``` points to S3, code will be uploaded and the S3 location will be used
instead. Example:

The following call
>>> Estimator(entry_point='train.py', dependencies=['my/libs/common', 'virtual-env'])
results in the following inside the container:

>>> $ ls

>>> opt/ml/code
>>> ├── train.py
>>> ├── common
>>> └── virtual-env

predictor_cls (callable[string, sagemaker.session.Session]): A function to call to create
a predictor (default: None). If not None, ``deploy`` will return the result of invoking
this function on the created endpoint name.
Expand All @@ -160,10 +177,11 @@ def __init__(self, model_data, image, role, entry_point, source_dir=None, predic
sagemaker_session=sagemaker_session, **kwargs)
self.entry_point = entry_point
self.source_dir = source_dir
self.dependencies = dependencies or []
self.enable_cloudwatch_metrics = enable_cloudwatch_metrics
self.container_log_level = container_log_level
if code_location:
self.bucket, self.key_prefix = parse_s3_url(code_location)
self.bucket, self.key_prefix = fw_utils.parse_s3_url(code_location)
else:
self.bucket, self.key_prefix = None, None
self.uploaded_code = None
Expand All @@ -179,22 +197,24 @@ def prepare_container_def(self, instance_type): # pylint disable=unused-argumen
Returns:
dict[str, str]: A container definition object usable with the CreateModel API.
"""
deploy_key_prefix = model_code_key_prefix(self.key_prefix, self.name, self.image)
deploy_key_prefix = fw_utils.model_code_key_prefix(self.key_prefix, self.name, self.image)
self._upload_code(deploy_key_prefix)
deploy_env = dict(self.env)
deploy_env.update(self._framework_env_vars())
return sagemaker.container_def(self.image, self.model_data, deploy_env)

def _upload_code(self, key_prefix):
local_code = get_config_value('local.local_code', self.sagemaker_session.config)
local_code = utils.get_config_value('local.local_code', self.sagemaker_session.config)
if self.sagemaker_session.local_mode and local_code:
self.uploaded_code = None
else:
self.uploaded_code = tar_and_upload_dir(session=self.sagemaker_session.boto_session,
bucket=self.bucket or self.sagemaker_session.default_bucket(),
s3_key_prefix=key_prefix,
script=self.entry_point,
directory=self.source_dir)
bucket = self.bucket or self.sagemaker_session.default_bucket()
self.uploaded_code = fw_utils.tar_and_upload_dir(session=self.sagemaker_session.boto_session,
bucket=bucket,
s3_key_prefix=key_prefix,
script=self.entry_point,
directory=self.source_dir,
dependencies=self.dependencies)

def _framework_env_vars(self):
if self.uploaded_code:
Expand Down
17 changes: 17 additions & 0 deletions src/sagemaker/mxnet/README.rst
Original file line number Diff line number Diff line change
Expand Up @@ -271,6 +271,23 @@ The following are optional arguments. When you create an ``MXNet`` object, you c
other training source code dependencies including the entry point
file. Structure within this directory will be preserved when training
on SageMaker.
- ``dependencies (list[str])`` A list of paths to directories (absolute or relative) with
any additional libraries that will be exported to the container (default: []).
The library folders will be copied to SageMaker in the same folder where the entrypoint is copied.
If the ```source_dir``` points to S3, code will be uploaded and the S3 location will be used
instead. Example:

The following call
>>> MXNet(entry_point='train.py', dependencies=['my/libs/common', 'virtual-env'])
results in the following inside the container:

>>> $ ls

>>> opt/ml/code
>>> ├── train.py
>>> ├── common
>>> └── virtual-env

- ``hyperparameters`` Hyperparameters that will be used for training.
Will be made accessible as a dict[str, str] to the training code on
SageMaker. For convenience, accepts other types besides str, but
Expand Down
2 changes: 1 addition & 1 deletion src/sagemaker/mxnet/estimator.py
Original file line number Diff line number Diff line change
Expand Up @@ -115,7 +115,7 @@ def create_model(self, model_server_workers=None, role=None, vpc_config_override
container_log_level=self.container_log_level, code_location=self.code_location,
py_version=self.py_version, framework_version=self.framework_version, image=self.image_name,
model_server_workers=model_server_workers, sagemaker_session=self.sagemaker_session,
vpc_config=self.get_vpc_config(vpc_config_override))
vpc_config=self.get_vpc_config(vpc_config_override), dependencies=self.dependencies)

@classmethod
def _prepare_init_params_from_job_description(cls, job_details, model_channel_name=None):
Expand Down
17 changes: 17 additions & 0 deletions src/sagemaker/pytorch/README.rst
Original file line number Diff line number Diff line change
Expand Up @@ -175,6 +175,23 @@ The following are optional arguments. When you create a ``PyTorch`` object, you
other training source code dependencies including the entry point
file. Structure within this directory will be preserved when training
on SageMaker.
- ``dependencies (list[str])`` A list of paths to directories (absolute or relative) with
any additional libraries that will be exported to the container (default: []).
The library folders will be copied to SageMaker in the same folder where the entrypoint is copied.
If the ```source_dir``` points to S3, code will be uploaded and the S3 location will be used
instead. Example:

The following call
>>> PyTorch(entry_point='train.py', dependencies=['my/libs/common', 'virtual-env'])
results in the following inside the container:

>>> $ ls

>>> opt/ml/code
>>> ├── train.py
>>> ├── common
>>> └── virtual-env

- ``hyperparameters`` Hyperparameters that will be used for training.
Will be made accessible as a dict[str, str] to the training code on
SageMaker. For convenience, accepts other types besides strings, but
Expand Down
2 changes: 1 addition & 1 deletion src/sagemaker/pytorch/estimator.py
Original file line number Diff line number Diff line change
Expand Up @@ -96,7 +96,7 @@ def create_model(self, model_server_workers=None, role=None, vpc_config_override
container_log_level=self.container_log_level, code_location=self.code_location,
py_version=self.py_version, framework_version=self.framework_version, image=self.image_name,
model_server_workers=model_server_workers, sagemaker_session=self.sagemaker_session,
vpc_config=self.get_vpc_config(vpc_config_override))
vpc_config=self.get_vpc_config(vpc_config_override), dependencies=self.dependencies)

@classmethod
def _prepare_init_params_from_job_description(cls, job_details, model_channel_name=None):
Expand Down
17 changes: 17 additions & 0 deletions src/sagemaker/tensorflow/README.rst
Original file line number Diff line number Diff line change
Expand Up @@ -409,6 +409,23 @@ you can specify these as keyword arguments.
other training source code dependencies including the entry point
file. Structure within this directory will be preserved when training
on SageMaker.
- ``dependencies (list[str])`` A list of paths to directories (absolute or relative) with
any additional libraries that will be exported to the container (default: []).
The library folders will be copied to SageMaker in the same folder where the entrypoint is copied.
If the ```source_dir``` points to S3, code will be uploaded and the S3 location will be used
instead. Example:

The following call
>>> TensorFlow(entry_point='train.py', dependencies=['my/libs/common', 'virtual-env'])
results in the following inside the container:

>>> $ ls

>>> opt/ml/code
>>> ├── train.py
>>> ├── common
>>> └── virtual-env

- ``requirements_file (str)`` Path to a ``requirements.txt`` file. The path should
be within and relative to ``source_dir``. This is a file containing a list of items to be
installed using pip install. Details on the format can be found in the
Expand Down
Loading