Skip to content

Commit 1e29442

Browse files
author
Ivy Bazan
committed
fixed merge conflict
2 parents d34d431 + c5789ea commit 1e29442

Some content is hidden

Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.

61 files changed

+962
-162
lines changed

.github/PULL_REQUEST_TEMPLATE.md

+1-1
Original file line numberDiff line numberDiff line change
@@ -12,7 +12,7 @@ _Put an `x` in the boxes that apply. You can also fill these out after creating
1212

1313
- [ ] I have read the [CONTRIBUTING](https://github.com/aws/sagemaker-python-sdk/blob/master/CONTRIBUTING.md) doc
1414
- [ ] I used the commit message format described in [CONTRIBUTING](https://github.com/aws/sagemaker-python-sdk/blob/master/CONTRIBUTING.md#committing-your-change)
15-
- [ ] I have used the regional endpoint when creating S3 and/or STS clients (if appropriate)
15+
- [ ] I have passed the region in to any/all clients that I've initialized as part of this change.
1616
- [ ] I have updated any necessary documentation, including [READMEs](https://github.com/aws/sagemaker-python-sdk/blob/master/README.rst) and [API docs](https://github.com/aws/sagemaker-python-sdk/tree/master/doc) (if appropriate)
1717

1818
#### Tests

CHANGELOG.md

+61
Original file line numberDiff line numberDiff line change
@@ -1,5 +1,66 @@
11
# Changelog
22

3+
## v1.55.0 (2020-03-31)
4+
5+
### Features
6+
7+
* support cn-north-1 and cn-northwest-1
8+
9+
## v1.54.0 (2020-03-31)
10+
11+
### Features
12+
13+
* inferentia support
14+
15+
## v1.53.0 (2020-03-30)
16+
17+
### Features
18+
19+
* Allow setting S3 endpoint URL for Local Session
20+
21+
### Bug Fixes and Other Changes
22+
23+
* Pass kwargs from create_model to Model constructors
24+
* Warn if parameter server is used with multi-GPU instance
25+
26+
## v1.52.1 (2020-03-26)
27+
28+
### Bug Fixes and Other Changes
29+
30+
* Fix local _SageMakerContainer detached mode (aws#1374)
31+
32+
## v1.52.0.post0 (2020-03-25)
33+
34+
### Documentation Changes
35+
36+
* Add docs for debugger job support in operator
37+
38+
## v1.52.0 (2020-03-24)
39+
40+
### Features
41+
42+
* add us-gov-west-1 to neo supported regions
43+
44+
## v1.51.4 (2020-03-23)
45+
46+
### Bug Fixes and Other Changes
47+
48+
* Check that session is a LocalSession when using local mode
49+
* add tflite to Neo-supported frameworks
50+
* ignore tags with 'aws:' prefix when creating an EndpointConfig based on an existing one
51+
* allow custom image when calling deploy or create_model with various frameworks
52+
53+
### Documentation Changes
54+
55+
* fix description of default model_dir for TF
56+
* add more details about PyTorch eia
57+
58+
## v1.51.3 (2020-03-12)
59+
60+
### Bug Fixes and Other Changes
61+
62+
* make repack_model only removes py file when new entry_point provided
63+
364
## v1.51.2 (2020-03-11)
465

566
### Bug Fixes and Other Changes

VERSION

+1-1
Original file line numberDiff line numberDiff line change
@@ -1 +1 @@
1-
1.51.3.dev0
1+
1.55.1.dev0

doc/using_pytorch.rst

+33-3
Original file line numberDiff line numberDiff line change
@@ -118,10 +118,22 @@ to a certain filesystem path called ``model_dir``. This value is accessible thro
118118
After your training job is complete, SageMaker will compress and upload the serialized model to S3, and your model data
119119
will be available in the S3 ``output_path`` you specified when you created the PyTorch Estimator.
120120

121+
If you are using Elastic Inference, you must convert your models to the TorchScript format and use ``torch.jit.save`` to save the model.
122+
For example:
123+
124+
.. code:: python
125+
126+
import os
127+
import torch
128+
129+
# ... train `model`, then save it to `model_dir`
130+
model_dir = os.path.join(model_dir, "model.pt")
131+
torch.jit.save(model, model_dir)
132+
121133
Using third-party libraries
122134
---------------------------
123135

124-
When running your training script on SageMaker, it will have access to some pre-installed third-party libraries including ``torch``, ``torchvisopm``, and ``numpy``.
136+
When running your training script on SageMaker, it will have access to some pre-installed third-party libraries including ``torch``, ``torchvision``, and ``numpy``.
125137
For more information on the runtime environment, including specific package versions, see `SageMaker PyTorch Docker containers <#id4>`__.
126138

127139
If there are other packages you want to use with your script, you can include a ``requirements.txt`` file in the same directory as your training script to install other dependencies at runtime. Both ``requirements.txt`` and your training script should be put in the same folder. You must specify this folder in ``source_dir`` argument when creating PyTorch estimator.
@@ -303,7 +315,8 @@ It loads the model parameters from a ``model.pth`` file in the SageMaker model d
303315
304316
However, if you are using PyTorch Elastic Inference, you do not have to provide a ``model_fn`` since the PyTorch serving
305317
container has a default one for you. But please note that if you are utilizing the default ``model_fn``, please save
306-
yor parameter file as ``model.pt`` instead of ``model.pth``. For more information on inference script, please refer to:
318+
your ScriptModule as ``model.pt``. If you are implementing your own ``model_fn``, please use TorchScript and ``torch.jit.save``
319+
to save your ScriptModule, then load it in your ``model_fn`` with ``torch.jit.load``. For more information on inference script, please refer to:
307320
`SageMaker PyTorch Default Inference Handler <https://github.com/aws/sagemaker-pytorch-serving-container/blob/master/src/sagemaker_pytorch_serving_container/default_inference_handler.py>`_.
308321

309322
Serve a PyTorch Model
@@ -461,6 +474,23 @@ If you implement your own prediction function, you should take care to ensure th
461474
first argument to ``output_fn``. If you use the default
462475
``output_fn``, this should be a torch.Tensor.
463476

477+
The default Elastic Inference ``predict_fn`` is similar but runs the TorchScript model using ``torch.jit.optimized_execution``.
478+
If you are implementing your own ``predict_fn``, please also use the ``torch.jit.optimized_execution``
479+
block, for example:
480+
481+
.. code:: python
482+
483+
import torch
484+
import numpy as np
485+
486+
def predict_fn(input_data, model):
487+
device = torch.device("cpu")
488+
model = model.to(device)
489+
input_data = data.to(device)
490+
model.eval()
491+
with torch.jit.optimized_execution(True, {"target_device": "eia:0"}):
492+
output = model(input_data)
493+
464494
Process Model Output
465495
^^^^^^^^^^^^^^^^^^^^
466496

@@ -671,6 +701,6 @@ The following are optional arguments. When you create a ``PyTorch`` object, you
671701
SageMaker PyTorch Docker Containers
672702
***********************************
673703

674-
For information about SageMaker PyTorch containers, see `the SageMaker PyTorch containers repository <https://github.com/aws/sagemaker-pytorch-container>`_.
704+
For information about SageMaker PyTorch containers, see `the SageMaker PyTorch container repository <https://github.com/aws/sagemaker-pytorch-container>`_ and `SageMaker PyTorch Serving container repository <https://github.com/aws/sagemaker-pytorch-serving-container>`__.
675705

676706
For information about SageMaker PyTorch container dependencies, see `SageMaker PyTorch Containers <https://github.com/aws/sagemaker-python-sdk/tree/master/src/sagemaker/pytorch#sagemaker-pytorch-docker-containers>`_.

doc/using_tf.rst

+6-72
Original file line numberDiff line numberDiff line change
@@ -48,7 +48,7 @@ The training script is very similar to a training script you might run outside o
4848

4949
* ``SM_MODEL_DIR``: A string that represents the local path where the training job writes the model artifacts to.
5050
After training, artifacts in this directory are uploaded to S3 for model hosting. This is different than the ``model_dir``
51-
argument passed in your training script, which is an S3 location. ``SM_MODEL_DIR`` is always set to ``/opt/ml/model``.
51+
argument passed in your training script, which can be an S3 location. ``SM_MODEL_DIR`` is always set to ``/opt/ml/model``.
5252
* ``SM_NUM_GPUS``: An integer representing the number of GPUs available to the host.
5353
* ``SM_OUTPUT_DATA_DIR``: A string that represents the path to the directory to write output artifacts to.
5454
Output artifacts might include checkpoints, graphs, and other files to save, but do not include model artifacts.
@@ -166,7 +166,7 @@ The following args are not permitted when using Script Mode:
166166
Where the S3 url is a path to your training data within Amazon S3.
167167
The constructor keyword arguments define how SageMaker runs your training script.
168168

169-
For more information about the sagemaker.tensorflow.TensorFlow estimator, see `sagemaker.tensorflow.TensorFlow Class`_.
169+
For more information about the sagemaker.tensorflow.TensorFlow estimator, see `SageMaker TensorFlow Classes`_.
170170

171171
Call the fit Method
172172
===================
@@ -909,77 +909,11 @@ processing. There are 2 ways to do this:
909909
910910
For more information, see: https://github.com/aws/sagemaker-tensorflow-serving-container#prepost-processing
911911

912-
*************************************
913-
sagemaker.tensorflow.TensorFlow Class
914-
*************************************
912+
****************************
913+
SageMaker TensorFlow Classes
914+
****************************
915915

916-
The following are the most commonly used ``TensorFlow`` constructor arguments.
917-
918-
Required:
919-
920-
- ``entry_point (str)`` Path (absolute or relative) to the Python file which
921-
should be executed as the entry point to training.
922-
- ``role (str)`` An AWS IAM role (either name or full ARN). The Amazon
923-
SageMaker training jobs and APIs that create Amazon SageMaker
924-
endpoints use this role to access training data and model artifacts.
925-
After the endpoint is created, the inference code might use the IAM
926-
role, if accessing AWS resource.
927-
- ``train_instance_count (int)`` Number of Amazon EC2 instances to use for
928-
training.
929-
- ``train_instance_type (str)`` Type of EC2 instance to use for training, for
930-
example, 'ml.c4.xlarge'.
931-
932-
Optional:
933-
934-
- ``source_dir (str)`` Path (absolute or relative) to a directory with any
935-
other training source code dependencies including the entry point
936-
file. Structure within this directory will be preserved when training
937-
on SageMaker.
938-
- ``dependencies (list[str])`` A list of paths to directories (absolute or relative) with
939-
any additional libraries that will be exported to the container (default: ``[]``).
940-
The library folders will be copied to SageMaker in the same folder where the entrypoint is copied.
941-
If the ``source_dir`` points to S3, code will be uploaded and the S3 location will be used
942-
instead. Example:
943-
944-
The following call
945-
946-
>>> TensorFlow(entry_point='train.py', dependencies=['my/libs/common', 'virtual-env'])
947-
948-
results in the following inside the container:
949-
950-
>>> opt/ml/code
951-
>>> ├── train.py
952-
>>> ├── common
953-
>>> └── virtual-env
954-
955-
- ``hyperparameters (dict[str, ANY])`` Hyperparameters that will be used for training.
956-
Will be made accessible as command line arguments.
957-
- ``train_volume_size (int)`` Size in GB of the EBS volume to use for storing
958-
input data during training. Must be large enough to the store training
959-
data.
960-
- ``train_max_run (int)`` Timeout in seconds for training, after which Amazon
961-
SageMaker terminates the job regardless of its current status.
962-
- ``output_path (str)`` S3 location where you want the training result (model
963-
artifacts and optional output files) saved. If not specified, results
964-
are stored to a default bucket. If the bucket with the specific name
965-
does not exist, the estimator creates the bucket during the ``fit``
966-
method execution.
967-
- ``output_kms_key`` Optional KMS key ID to optionally encrypt training
968-
output with.
969-
- ``base_job_name`` Name to assign for the training job that the ``fit``
970-
method launches. If not specified, the estimator generates a default
971-
job name, based on the training image name and current timestamp.
972-
- ``image_name`` An alternative docker image to use for training and
973-
serving. If specified, the estimator will use this image for training and
974-
hosting, instead of selecting the appropriate SageMaker official image based on
975-
``framework_version`` and ``py_version``. Refer to: `SageMaker TensorFlow Docker containers <https://github.com/aws/sagemaker-python-sdk/tree/master/src/sagemaker/tensorflow#sagemaker-tensorflow-docker-containers>`_ for details on what the official images support
976-
and where to find the source code to build your custom image.
977-
- ``script_mode (bool)`` Whether to use Script Mode or not. Script mode is the only available training mode in Python 3,
978-
setting ``py_version`` to ``py3`` automatically sets ``script_mode`` to True.
979-
- ``model_dir (str)`` Location where model data, checkpoint data, and TensorBoard checkpoints should be saved during training.
980-
If not specified a S3 location will be generated under the training job's default bucket. And ``model_dir`` will be
981-
passed in your training script as one of the command line arguments.
982-
- ``distributions (dict)`` Configure your distribution strategy with this argument.
916+
For information about the different TensorFlow-related classes in the SageMaker Python SDK, see https://sagemaker.readthedocs.io/en/stable/sagemaker.tensorflow.html.
983917

984918
**************************************
985919
SageMaker TensorFlow Docker containers

src/sagemaker/amazon/amazon_estimator.py

+34-13
Original file line numberDiff line numberDiff line change
@@ -281,7 +281,9 @@ def record_set(self, train, labels=None, channel="train", encrypt=False):
281281
RecordSet: A RecordSet referencing the encoded, uploading training
282282
and label data.
283283
"""
284-
s3 = self.sagemaker_session.boto_session.resource("s3")
284+
s3 = self.sagemaker_session.boto_session.resource(
285+
"s3", region_name=self.sagemaker_session.boto_region_name
286+
)
285287
parsed_s3_url = urlparse(self.data_location)
286288
bucket, key_prefix = parsed_s3_url.netloc, parsed_s3_url.path
287289
key_prefix = key_prefix + "{}-{}/".format(type(self).__name__, sagemaker_timestamp())
@@ -467,9 +469,14 @@ def registry(region_name, algorithm=None):
467469
https://github.com/aws/sagemaker-python-sdk/tree/master/src/sagemaker/amazon
468470
469471
Args:
470-
region_name:
471-
algorithm:
472+
region_name (str): The region name for the account.
473+
algorithm (str): The algorithm for the account.
474+
475+
Raises:
476+
ValueError: If invalid algorithm passed in or if mapping does not exist for given algorithm
477+
and region.
472478
"""
479+
region_to_accounts = {}
473480
if algorithm in [
474481
None,
475482
"pca",
@@ -482,7 +489,7 @@ def registry(region_name, algorithm=None):
482489
"object2vec",
483490
"ipinsights",
484491
]:
485-
account_id = {
492+
region_to_accounts = {
486493
"us-east-1": "382416733822",
487494
"us-east-2": "404615174143",
488495
"us-west-2": "174872318107",
@@ -503,9 +510,11 @@ def registry(region_name, algorithm=None):
503510
"eu-west-3": "749696950732",
504511
"sa-east-1": "855470959533",
505512
"me-south-1": "249704162688",
506-
}[region_name]
513+
"cn-north-1": "390948362332",
514+
"cn-northwest-1": "387376663083",
515+
}
507516
elif algorithm in ["lda"]:
508-
account_id = {
517+
region_to_accounts = {
509518
"us-east-1": "766337827248",
510519
"us-east-2": "999911452149",
511520
"us-west-2": "266724342769",
@@ -521,9 +530,9 @@ def registry(region_name, algorithm=None):
521530
"eu-west-2": "644912444149",
522531
"us-west-1": "632365934929",
523532
"us-iso-east-1": "490574956308",
524-
}[region_name]
533+
}
525534
elif algorithm in ["forecasting-deepar"]:
526-
account_id = {
535+
region_to_accounts = {
527536
"us-east-1": "522234722520",
528537
"us-east-2": "566113047672",
529538
"us-west-2": "156387875391",
@@ -544,7 +553,9 @@ def registry(region_name, algorithm=None):
544553
"eu-west-3": "749696950732",
545554
"sa-east-1": "855470959533",
546555
"me-south-1": "249704162688",
547-
}[region_name]
556+
"cn-north-1": "390948362332",
557+
"cn-northwest-1": "387376663083",
558+
}
548559
elif algorithm in [
549560
"xgboost",
550561
"seq2seq",
@@ -553,7 +564,7 @@ def registry(region_name, algorithm=None):
553564
"object-detection",
554565
"semantic-segmentation",
555566
]:
556-
account_id = {
567+
region_to_accounts = {
557568
"us-east-1": "811284229777",
558569
"us-east-2": "825641698319",
559570
"us-west-2": "433757028032",
@@ -574,15 +585,25 @@ def registry(region_name, algorithm=None):
574585
"eu-west-3": "749696950732",
575586
"sa-east-1": "855470959533",
576587
"me-south-1": "249704162688",
577-
}[region_name]
588+
"cn-north-1": "390948362332",
589+
"cn-northwest-1": "387376663083",
590+
}
578591
elif algorithm in ["image-classification-neo", "xgboost-neo"]:
579-
account_id = NEO_IMAGE_ACCOUNT[region_name]
592+
region_to_accounts = NEO_IMAGE_ACCOUNT
580593
else:
581594
raise ValueError(
582595
"Algorithm class:{} does not have mapping to account_id with images".format(algorithm)
583596
)
584597

585-
return get_ecr_image_uri_prefix(account_id, region_name)
598+
if region_name in region_to_accounts:
599+
account_id = region_to_accounts[region_name]
600+
return get_ecr_image_uri_prefix(account_id, region_name)
601+
602+
raise ValueError(
603+
"Algorithm ({algorithm}) is unsupported for region ({region_name}).".format(
604+
algorithm=algorithm, region_name=region_name
605+
)
606+
)
586607

587608

588609
def get_image_uri(region_name, repo_name, repo_version=1):

src/sagemaker/chainer/estimator.py

+4-1
Original file line numberDiff line numberDiff line change
@@ -203,6 +203,9 @@ def create_model(
203203
sagemaker.chainer.model.ChainerModel: A SageMaker ``ChainerModel``
204204
object. See :func:`~sagemaker.chainer.model.ChainerModel` for full details.
205205
"""
206+
if "image" not in kwargs:
207+
kwargs["image"] = self.image_name
208+
206209
return ChainerModel(
207210
self.model_data,
208211
role or self.role,
@@ -215,10 +218,10 @@ def create_model(
215218
py_version=self.py_version,
216219
framework_version=self.framework_version,
217220
model_server_workers=model_server_workers,
218-
image=self.image_name,
219221
sagemaker_session=self.sagemaker_session,
220222
vpc_config=self.get_vpc_config(vpc_config_override),
221223
dependencies=(dependencies or self.dependencies),
224+
**kwargs
222225
)
223226

224227
@classmethod

0 commit comments

Comments
 (0)