Skip to content

Commit f2fd196

Browse files
committed
Merge branch 'master' into script-path
2 parents 4d949e1 + ab52225 commit f2fd196

19 files changed

+201
-120
lines changed

CHANGELOG.md

Lines changed: 25 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -1,5 +1,30 @@
11
# Changelog
22

3+
## v1.50.8 (2020-01-30)
4+
5+
### Bug Fixes and Other Changes
6+
7+
* disable Debugger defaults in unsupported regions
8+
* modify session and kms_utils to check for S3 bucket before creation
9+
* update docker-compose and PyYAML dependencies
10+
* enable smdebug for Horovod (MPI) training setup
11+
* create lib dir for dependencies safely (only if it doesn't exist yet).
12+
* create the correct session for MultiDataModel
13+
14+
### Documentation Changes
15+
16+
* update links to the local mode notebooks examples.
17+
* Remove outdated badges from README
18+
* update links to TF notebook examples to link to script mode examples.
19+
* clean up headings, verb tenses, names, etc. in MXNet overview
20+
* Update SageMaker operator Helm chart installation guide
21+
22+
### Testing and Release Infrastructure
23+
24+
* choose faster notebook for notebook PR build
25+
* properly fail PR build if has-matching-changes fails
26+
* properly fail PR build if has-matching-changes fails
27+
328
## v1.50.7 (2020-01-20)
429

530
### Bug fixes and other changes

README.rst

Lines changed: 3 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -171,7 +171,7 @@ MXNet SageMaker Estimators
171171

172172
By using MXNet SageMaker Estimators, you can train and host MXNet models on Amazon SageMaker.
173173

174-
Supported versions of MXNet: ``0.12.1``, ``1.0.0``, ``1.1.0``, ``1.2.1``, ``1.3.0``, ``1.4.0``, ``1.4.1``.
174+
Supported versions of MXNet: ``0.12.1``, ``1.0.0``, ``1.1.0``, ``1.2.1``, ``1.3.0``, ``1.4.0``, ``1.4.1``, ``1.6.0``.
175175

176176
Supported versions of MXNet for Elastic Inference: ``1.3.0``, ``1.4.0``, ``1.4.1``.
177177

@@ -187,9 +187,9 @@ TensorFlow SageMaker Estimators
187187

188188
By using TensorFlow SageMaker Estimators, you can train and host TensorFlow models on Amazon SageMaker.
189189

190-
Supported versions of TensorFlow: ``1.4.1``, ``1.5.0``, ``1.6.0``, ``1.7.0``, ``1.8.0``, ``1.9.0``, ``1.10.0``, ``1.11.0``, ``1.12.0``, ``1.13.1``, ``1.14.``, ``1.15.0``, ``2.0.0``.
190+
Supported versions of TensorFlow: ``1.4.1``, ``1.5.0``, ``1.6.0``, ``1.7.0``, ``1.8.0``, ``1.9.0``, ``1.10.0``, ``1.11.0``, ``1.12.0``, ``1.13.1``, ``1.14.0``, ``1.15.0``, ``2.0.0``.
191191

192-
Supported versions of TensorFlow for Elastic Inference: ``1.11.0``, ``1.12.0``, ``1.13.1``, ``1.14``.
192+
Supported versions of TensorFlow for Elastic Inference: ``1.11.0``, ``1.12.0``, ``1.13.1``, ``1.14.0``.
193193

194194
We recommend that you use the latest supported version, because that's where we focus most of our development efforts.
195195

VERSION

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1 +1 @@
1-
1.50.8.dev0
1+
1.50.9.dev0

doc/overview.rst

Lines changed: 4 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -818,8 +818,10 @@ Here is an end-to-end example:
818818
819819
For detailed examples of running Docker in local mode, see:
820820
821-
- `TensorFlow local mode example notebook <https://github.com/awslabs/amazon-sagemaker-examples/blob/master/sagemaker-python-sdk/tensorflow_distributed_mnist/tensorflow_local_mode_mnist.ipynb>`__.
822-
- `MXNet local mode example notebook <https://github.com/awslabs/amazon-sagemaker-examples/blob/master/sagemaker-python-sdk/mxnet_gluon_mnist/mnist_with_gluon_local_mode.ipynb>`__.
821+
- `TensorFlow local mode example notebook <https://github.com/awslabs/amazon-sagemaker-examples/blob/master/sagemaker-python-sdk/tensorflow_script_mode_using_shell_commands/tensorflow_script_mode_using_shell_commands.ipynb>`__.
822+
- `MXNet local mode CPU example notebook <https://github.com/awslabs/amazon-sagemaker-examples/blob/master/sagemaker-python-sdk/mxnet_gluon_mnist/mxnet_mnist_with_gluon_local_mode.ipynb>`__.
823+
- `MXNet local mode GPU example notebook <https://github.com/awslabs/amazon-sagemaker-examples/blob/master/sagemaker-python-sdk/mxnet_gluon_cifar10/mxnet_cifar10_local_mode.ipynb>`__.
824+
- `PyTorch local mode example notebook <https://github.com/awslabs/amazon-sagemaker-examples/blob/master/sagemaker-python-sdk/pytorch_cnn_cifar10/pytorch_local_mode_cifar10.ipynb>`__.
823825
824826
You can also find these notebooks in the **SageMaker Python SDK** section of the **SageMaker Examples** section in a notebook instance.
825827
For information about using sample notebooks in a SageMaker notebook instance, see `Use Example Notebooks <https://docs.aws.amazon.com/sagemaker/latest/dg/howitworks-nbexamples.html>`__

setup.py

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -50,8 +50,8 @@ def read_version():
5050
"analytics": ["pandas"],
5151
"local": [
5252
"urllib3>=1.21.1,<1.26,!=1.25.0,!=1.25.1",
53-
"docker-compose>=1.23.0",
54-
"PyYAML>=3.10, <5", # PyYAML version has to match docker-compose requirements
53+
"docker-compose>=1.25.2",
54+
"PyYAML>=5.3, <6", # PyYAML version has to match docker-compose requirements
5555
],
5656
"tensorflow": ["tensorflow>=1.3.0"],
5757
}

src/sagemaker/estimator.py

Lines changed: 4 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -38,6 +38,7 @@
3838
parse_s3_url,
3939
UploadedCode,
4040
validate_source_dir,
41+
_region_supports_debugger,
4142
)
4243
from sagemaker.job import _Job
4344
from sagemaker.local import LocalSession
@@ -1674,7 +1675,9 @@ def _validate_and_set_debugger_configs(self):
16741675
"""
16751676
Set defaults for debugging
16761677
"""
1677-
if self.debugger_hook_config is None:
1678+
if self.debugger_hook_config is None and _region_supports_debugger(
1679+
self.sagemaker_session.boto_region_name
1680+
):
16781681
self.debugger_hook_config = DebuggerHookConfig(s3_output_path=self.output_path)
16791682
elif not self.debugger_hook_config:
16801683
self.debugger_hook_config = None

src/sagemaker/fw_utils.py

Lines changed: 15 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -84,6 +84,8 @@
8484
"pytorch-serving": [1, 2, 0],
8585
}
8686

87+
DEBUGGER_UNSUPPORTED_REGIONS = ["us-gov-west-1", "us-iso-east-1"]
88+
8789

8890
def is_version_equal_or_higher(lowest_version, framework_version):
8991
"""Determine whether the ``framework_version`` is equal to or higher than
@@ -504,3 +506,16 @@ def python_deprecation_warning(framework, latest_supported_version):
504506
return PYTHON_2_DEPRECATION_WARNING.format(
505507
framework=framework, latest_supported_version=latest_supported_version
506508
)
509+
510+
511+
def _region_supports_debugger(region_name):
512+
"""Returns boolean indicating whether the region supports Amazon SageMaker Debugger.
513+
514+
Args:
515+
region_name (str): Name of the region to check against.
516+
517+
Returns:
518+
bool: Whether or not the region supports Amazon SageMaker Debugger.
519+
520+
"""
521+
return region_name.lower() not in DEBUGGER_UNSUPPORTED_REGIONS

src/sagemaker/mxnet/README.rst

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -6,7 +6,7 @@ With the SageMaker Python SDK, you can train and host MXNet models on Amazon Sag
66

77
Supported versions of MXNet: ``0.12.1``, ``1.0.0``, ``1.1.0``, ``1.2.1``, ``1.3.0``, ``1.4.0``, ``1.4.1``, ``1.6.0``.
88

9-
Supported versions of MXNet for Elastic Inference: ``1.3.0``, ``1.4.0``, ``1.4.1``, ``1.6.0``.
9+
Supported versions of MXNet for Elastic Inference: ``1.3.0``, ``1.4.0``, ``1.4.1``.
1010

1111
For information about using MXNet with the SageMaker Python SDK, see https://sagemaker.readthedocs.io/en/stable/using_mxnet.html.
1212

src/sagemaker/mxnet/model.py

Lines changed: 17 additions & 7 deletions
Original file line numberDiff line numberDiff line change
@@ -144,7 +144,9 @@ def prepare_container_def(self, instance_type, accelerator_type=None):
144144
deploy_image = self.image
145145
if not deploy_image:
146146
region_name = self.sagemaker_session.boto_session.region_name
147-
deploy_image = self.serving_image_uri(region_name, instance_type)
147+
deploy_image = self.serving_image_uri(
148+
region_name, instance_type, accelerator_type=accelerator_type
149+
)
148150

149151
deploy_key_prefix = model_code_key_prefix(self.key_prefix, self.name, deploy_image)
150152
self._upload_code(deploy_key_prefix, self._is_mms_version())
@@ -157,24 +159,32 @@ def prepare_container_def(self, instance_type, accelerator_type=None):
157159
deploy_image, self.repacked_model_data or self.model_data, deploy_env
158160
)
159161

160-
def serving_image_uri(self, region_name, instance_type):
162+
def serving_image_uri(self, region_name, instance_type, accelerator_type=None):
161163
"""Create a URI for the serving image.
162164
163165
Args:
164166
region_name (str): AWS region where the image is uploaded.
165167
instance_type (str): SageMaker instance type. Used to determine device type
166168
(cpu/gpu/family-specific optimized).
169+
accelerator_type (str): The Elastic Inference accelerator type to
170+
deploy to the instance for loading and making inferences to the
171+
model (default: None). For example, 'ml.eia1.medium'.
167172
168173
Returns:
169174
str: The appropriate image URI based on the given parameters.
170175
171176
"""
172177
framework_name = self.__framework_name__
173178
if self._is_mms_version():
174-
framework_name += "-serving"
179+
framework_name = "{}-serving".format(framework_name)
175180

176181
return create_image_uri(
177-
region_name, framework_name, instance_type, self.framework_version, self.py_version
182+
region_name,
183+
framework_name,
184+
instance_type,
185+
self.framework_version,
186+
self.py_version,
187+
accelerator_type=accelerator_type,
178188
)
179189

180190
def _is_mms_version(self):
@@ -184,6 +194,6 @@ def _is_mms_version(self):
184194
Returns:
185195
bool: If the framework version corresponds to an image using MMS.
186196
"""
187-
return packaging.version.Version(self.framework_version) >= packaging.version.Version(
188-
self._LOWEST_MMS_VERSION
189-
)
197+
lowest_mms_version = packaging.version.Version(self._LOWEST_MMS_VERSION)
198+
framework_version = packaging.version.Version(self.framework_version)
199+
return framework_version >= lowest_mms_version

src/sagemaker/pytorch/model.py

Lines changed: 27 additions & 22 deletions
Original file line numberDiff line numberDiff line change
@@ -15,7 +15,6 @@
1515

1616
import logging
1717
import packaging.version
18-
from sagemaker import fw_utils
1918

2019
import sagemaker
2120
from sagemaker.fw_utils import (
@@ -137,34 +136,21 @@ def prepare_container_def(self, instance_type, accelerator_type=None):
137136
For example, 'ml.p2.xlarge'.
138137
accelerator_type (str): The Elastic Inference accelerator type to
139138
deploy to the instance for loading and making inferences to the
140-
model. For example, 'ml.eia1.medium'.
139+
model. Currently unsupported with PyTorch.
141140
142141
Returns:
143142
dict[str, str]: A container definition object usable with the
144143
CreateModel API.
145144
"""
146-
lowest_mms_version = packaging.version.Version(self._LOWEST_MMS_VERSION)
147-
framework_version = packaging.version.Version(self.framework_version)
148-
is_mms_version = framework_version >= lowest_mms_version
149-
150145
deploy_image = self.image
151146
if not deploy_image:
152147
region_name = self.sagemaker_session.boto_session.region_name
153-
154-
framework_name = self.__framework_name__
155-
if is_mms_version:
156-
framework_name += "-serving"
157-
158-
deploy_image = create_image_uri(
159-
region_name,
160-
framework_name,
161-
instance_type,
162-
self.framework_version,
163-
self.py_version,
164-
accelerator_type=accelerator_type,
148+
deploy_image = self.serving_image_uri(
149+
region_name, instance_type, accelerator_type=accelerator_type
165150
)
151+
166152
deploy_key_prefix = model_code_key_prefix(self.key_prefix, self.name, deploy_image)
167-
self._upload_code(deploy_key_prefix, repack=is_mms_version)
153+
self._upload_code(deploy_key_prefix, repack=self._is_mms_version())
168154
deploy_env = dict(self.env)
169155
deploy_env.update(self._framework_env_vars())
170156

@@ -174,22 +160,41 @@ def prepare_container_def(self, instance_type, accelerator_type=None):
174160
deploy_image, self.repacked_model_data or self.model_data, deploy_env
175161
)
176162

177-
def serving_image_uri(self, region_name, instance_type):
163+
def serving_image_uri(self, region_name, instance_type, accelerator_type=None):
178164
"""Create a URI for the serving image.
179165
180166
Args:
181167
region_name (str): AWS region where the image is uploaded.
182168
instance_type (str): SageMaker instance type. Used to determine device type
183169
(cpu/gpu/family-specific optimized).
170+
accelerator_type (str): The Elastic Inference accelerator type to
171+
deploy to the instance for loading and making inferences to the
172+
model. Currently unsupported with PyTorch.
184173
185174
Returns:
186175
str: The appropriate image URI based on the given parameters.
187176
188177
"""
189-
return fw_utils.create_image_uri(
178+
framework_name = self.__framework_name__
179+
if self._is_mms_version():
180+
framework_name = "{}-serving".format(framework_name)
181+
182+
return create_image_uri(
190183
region_name,
191-
"-".join([self.__framework_name__, "serving"]),
184+
framework_name,
192185
instance_type,
193186
self.framework_version,
194187
self.py_version,
188+
accelerator_type=accelerator_type,
195189
)
190+
191+
def _is_mms_version(self):
192+
"""Whether the framework version corresponds to an inference image using
193+
the Multi-Model Server (https://github.com/awslabs/multi-model-server).
194+
195+
Returns:
196+
bool: If the framework version corresponds to an image using MMS.
197+
"""
198+
lowest_mms_version = packaging.version.Version(self._LOWEST_MMS_VERSION)
199+
framework_version = packaging.version.Version(self.framework_version)
200+
return framework_version >= lowest_mms_version

src/sagemaker/session.py

Lines changed: 47 additions & 29 deletions
Original file line numberDiff line numberDiff line change
@@ -342,40 +342,58 @@ def default_bucket(self):
342342
).get_caller_identity()["Account"]
343343
default_bucket = "sagemaker-{}-{}".format(region, account)
344344

345-
s3 = self.boto_session.resource("s3")
346-
try:
347-
# 'us-east-1' cannot be specified because it is the default region:
348-
# https://github.com/boto/boto3/issues/125
349-
if region == "us-east-1":
350-
s3.create_bucket(Bucket=default_bucket)
351-
else:
352-
s3.create_bucket(
353-
Bucket=default_bucket, CreateBucketConfiguration={"LocationConstraint": region}
354-
)
355-
356-
LOGGER.info("Created S3 bucket: %s", default_bucket)
357-
except ClientError as e:
358-
error_code = e.response["Error"]["Code"]
359-
message = e.response["Error"]["Message"]
360-
361-
if error_code == "BucketAlreadyOwnedByYou":
362-
pass
363-
elif (
364-
error_code == "OperationAborted" and "conflicting conditional operation" in message
365-
):
366-
# If this bucket is already being concurrently created, we don't need to create it
367-
# again.
368-
pass
369-
elif error_code == "TooManyBuckets":
370-
# Succeed if the default bucket exists
371-
s3.meta.client.head_bucket(Bucket=default_bucket)
372-
else:
373-
raise
345+
self._create_s3_bucket_if_it_does_not_exist(bucket_name=default_bucket, region=region)
374346

375347
self._default_bucket = default_bucket
376348

377349
return self._default_bucket
378350

351+
def _create_s3_bucket_if_it_does_not_exist(self, bucket_name, region):
352+
"""Creates an S3 Bucket if it does not exist.
353+
Also swallows a few common exceptions that indicate that the bucket already exists or
354+
that it is being created.
355+
356+
Args:
357+
bucket_name (str): Name of the S3 bucket to be created.
358+
region (str): The region in which to create the bucket.
359+
360+
Raises:
361+
botocore.exceptions.ClientError: If S3 throws an unexpected exception during bucket
362+
creation.
363+
If the exception is due to the bucket already existing or
364+
already being created, no exception is raised.
365+
366+
"""
367+
bucket = self.boto_session.resource("s3", region_name=region).Bucket(name=bucket_name)
368+
if bucket.creation_date is None:
369+
try:
370+
s3 = self.boto_session.resource("s3", region_name=region)
371+
if region == "us-east-1":
372+
# 'us-east-1' cannot be specified because it is the default region:
373+
# https://github.com/boto/boto3/issues/125
374+
s3.create_bucket(Bucket=bucket_name)
375+
else:
376+
s3.create_bucket(
377+
Bucket=bucket_name, CreateBucketConfiguration={"LocationConstraint": region}
378+
)
379+
380+
LOGGER.info("Created S3 bucket: %s", bucket_name)
381+
except ClientError as e:
382+
error_code = e.response["Error"]["Code"]
383+
message = e.response["Error"]["Message"]
384+
385+
if error_code == "BucketAlreadyOwnedByYou":
386+
pass
387+
elif (
388+
error_code == "OperationAborted"
389+
and "conflicting conditional operation" in message
390+
):
391+
# If this bucket is already being concurrently created, we don't need to create
392+
# it again.
393+
pass
394+
else:
395+
raise
396+
379397
def train( # noqa: C901
380398
self,
381399
input_mode,

src/sagemaker/tensorflow/estimator.py

Lines changed: 3 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -723,7 +723,9 @@ def _validate_and_set_debugger_configs(self):
723723
)
724724
self.debugger_hook_config = None
725725
self.debugger_rule_configs = None
726-
elif self.debugger_hook_config is None:
726+
elif self.debugger_hook_config is None and fw._region_supports_debugger(
727+
self.sagemaker_session.boto_session.region_name
728+
):
727729
# Set defaults for debugging.
728730
self.debugger_hook_config = DebuggerHookConfig(s3_output_path=self.output_path)
729731

0 commit comments

Comments
 (0)