Skip to content

Commit 3ee4b44

Browse files
authored
Merge branch 'master' into sklearn_update_1.0-1
2 parents 5722c45 + 255a339 commit 3ee4b44

33 files changed

+641
-260
lines changed

CHANGELOG.md

+32
Original file line numberDiff line numberDiff line change
@@ -1,5 +1,37 @@
11
# Changelog
22

3+
## v2.91.1 (2022-05-19)
4+
5+
### Bug Fixes and Other Changes
6+
7+
* Revert Prevent passing PipelineVariable object into image_uris.retrieve
8+
9+
## v2.91.0 (2022-05-19)
10+
11+
### Features
12+
13+
* Support Properties for StepCollection
14+
15+
### Bug Fixes and Other Changes
16+
17+
* Prevent passing PipelineVariable object into image_uris.retrieve
18+
* support image_uri being property ref for model
19+
* ResourceConflictException from AWS Lambda on pipeline upsert
20+
21+
### Documentation Changes
22+
23+
* release notes for SMDDP 1.4.1 and SMDMP 1.9.0
24+
25+
## v2.90.0 (2022-05-16)
26+
27+
### Features
28+
29+
* Add ModelStep for SageMaker Model Building Pipeline
30+
31+
### Bug Fixes and Other Changes
32+
33+
* update setup.py to add minimum python requirement of 3.6
34+
335
## v2.89.0 (2022-05-11)
436

537
### Features

VERSION

+1-1
Original file line numberDiff line numberDiff line change
@@ -1 +1 @@
1-
2.89.1.dev0
1+
2.91.2.dev0

doc/api/training/sdp_versions/latest.rst

+2-2
Original file line numberDiff line numberDiff line change
@@ -26,8 +26,8 @@ depending on the version of the library you use.
2626
<https://docs.aws.amazon.com/sagemaker/latest/dg/data-parallel-use-api.html#data-parallel-use-python-skd-api>`_
2727
for more information.
2828

29-
Version 1.4.0 (Latest)
30-
======================
29+
Version 1.4.0, 1.4.1 (Latest)
30+
=============================
3131

3232
.. toctree::
3333
:maxdepth: 1

doc/api/training/sdp_versions/v1.2.x/smd_data_parallel_pytorch.rst

+1-1
Original file line numberDiff line numberDiff line change
@@ -266,7 +266,7 @@ PyTorch API
266266
.. note::
267267

268268
The ``no_sync()`` context manager is available from smdistributed-dataparallel v1.2.2.
269-
To find the release note, see :ref:`sdp_1.2.2_release_note`.
269+
To find the release note, see :ref:`sdp_release_note`.
270270

271271
**Example:**
272272

doc/api/training/smd_data_parallel_release_notes/smd_data_parallel_change_log.rst

+38-7
Original file line numberDiff line numberDiff line change
@@ -1,4 +1,4 @@
1-
.. _sdp_1.2.2_release_note:
1+
.. _sdp_release_note:
22

33
#############
44
Release Notes
@@ -7,9 +7,45 @@ Release Notes
77
New features, bug fixes, and improvements are regularly made to the SageMaker
88
distributed data parallel library.
99

10-
SageMaker Distributed Data Parallel 1.4.0 Release Notes
10+
SageMaker Distributed Data Parallel 1.4.1 Release Notes
1111
=======================================================
1212

13+
*Date: May. 3. 2022*
14+
15+
**Currency Updates**
16+
17+
* Added support for PyTorch 1.11.0
18+
19+
**Known Issues**
20+
21+
* The library currently does not support the PyTorch sub-process groups API (torch.distributed.new_group (https://pytorch.org/docs/stable/distributed.html#torch.distributed.new_group)).
22+
23+
24+
**Migration to AWS Deep Learning Containers**
25+
26+
This version passed benchmark testing and is migrated to the following AWS Deep Learning Containers (DLC):
27+
28+
- PyTorch 1.11.0 DLC
29+
30+
.. code::
31+
32+
763104351884.dkr.ecr.<region>.amazonaws.com/pytorch-training:1.11.0-gpu-py38-cu113-ubuntu20.04-sagemaker
33+
34+
Binary file of this version of the library for custom container users:
35+
36+
.. code::
37+
38+
https://smdataparallel.s3.amazonaws.com/binary/pytorch/1.11.0/cu113/2022-04-14/smdistributed_dataparallel-1.4.1-cp38-cp38-linux_x86_64.whl
39+
40+
41+
----
42+
43+
Release History
44+
===============
45+
46+
SageMaker Distributed Data Parallel 1.4.0 Release Notes
47+
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
48+
1349
*Date: Feb. 24. 2022*
1450

1551
**New Features**
@@ -72,11 +108,6 @@ This version passed benchmark testing and is migrated to the following AWS Deep
72108
763104351884.dkr.ecr.<region>.amazonaws.com/pytorch-training:1.10.2-gpu-py38-cu113-ubuntu20.04-sagemaker
73109
74110
75-
----
76-
77-
Release History
78-
===============
79-
80111
SageMaker Distributed Data Parallel 1.2.2 Release Notes
81112
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
82113

doc/api/training/smd_model_parallel_release_notes/smd_model_parallel_change_log.rst

+34-7
Original file line numberDiff line numberDiff line change
@@ -5,9 +5,41 @@ Release Notes
55
New features, bug fixes, and improvements are regularly made to the SageMaker
66
distributed model parallel library.
77

8-
SageMaker Distributed Model Parallel 1.8.1 Release Notes
8+
SageMaker Distributed Model Parallel 1.9.0 Release Notes
99
========================================================
1010

11+
*Date: May. 3. 2022*
12+
13+
**Currency Updates**
14+
15+
* Added support for PyTorch 1.11.0
16+
17+
**Migration to AWS Deep Learning Containers**
18+
19+
This version passed benchmark testing and is migrated to the following AWS Deep Learning Containers (DLC):
20+
21+
- PyTorch 1.11.0 DLC
22+
23+
.. code::
24+
25+
763104351884.dkr.ecr.<region>.amazonaws.com/pytorch-training:1.11.0-gpu-py38-cu113-ubuntu20.04-sagemaker
26+
27+
Binary file of this version of the library for custom container users:
28+
29+
.. code::
30+
31+
https://sagemaker-distributed-model-parallel.s3.us-west-2.amazonaws.com/pytorch-1.11.0/build-artifacts/2022-04-20-17-05/smdistributed_modelparallel-1.9.0-cp38-cp38-linux_x86_64.whl
32+
33+
34+
35+
----
36+
37+
Release History
38+
===============
39+
40+
SageMaker Distributed Model Parallel 1.8.1 Release Notes
41+
--------------------------------------------------------
42+
1143
*Date: April. 23. 2022*
1244

1345
**New Features**
@@ -59,11 +91,6 @@ This version passed benchmark testing and is migrated to the following AWS Deep
5991
https://sagemaker-distributed-model-parallel.s3.us-west-2.amazonaws.com/pytorch-1.10.0/build-artifacts/2022-04-14-03-58/smdistributed_modelparallel-1.8.1-cp38-cp38-linux_x86_64.whl
6092
6193
62-
----
63-
64-
Release History
65-
===============
66-
6794
SageMaker Distributed Model Parallel 1.8.0 Release Notes
6895
--------------------------------------------------------
6996

@@ -91,7 +118,7 @@ This version passed benchmark testing and is migrated to the following AWS Deep
91118
763104351884.dkr.ecr.us-west-2.amazonaws.com/huggingface-pytorch-training:1.10.2-transformers4.17.0-gpu-py38-cu113-ubuntu20.04
92119
93120
94-
* The binary file of this version of the library for custom container users
121+
The binary file of this version of the library for custom container users:
95122

96123
.. code::
97124

doc/api/training/smp_versions/latest.rst

+2-2
Original file line numberDiff line numberDiff line change
@@ -10,8 +10,8 @@ depending on which version of the library you need to use.
1010
To use the library, reference the
1111
**Common API** documentation alongside the framework specific API documentation.
1212

13-
Version 1.7.0, 1.8.0, 1.8.1 (Latest)
14-
====================================
13+
Version 1.7.0, 1.8.0, 1.8.1, 1.9.0 (Latest)
14+
===========================================
1515

1616
To use the library, reference the Common API documentation alongside the framework specific API documentation.
1717

src/sagemaker/fw_utils.py

+6-2
Original file line numberDiff line numberDiff line change
@@ -16,6 +16,7 @@
1616
import logging
1717
import os
1818
import re
19+
import time
1920
import shutil
2021
import tempfile
2122
from collections import namedtuple
@@ -24,6 +25,7 @@
2425
import sagemaker.image_uris
2526
from sagemaker.session_settings import SessionSettings
2627
import sagemaker.utils
28+
from sagemaker.workflow import is_pipeline_variable
2729

2830
from sagemaker.deprecations import renamed_warning
2931

@@ -395,8 +397,10 @@ def model_code_key_prefix(code_location_key_prefix, model_name, image):
395397
Returns:
396398
str: the key prefix to be used in uploading code
397399
"""
398-
training_job_name = sagemaker.utils.name_from_image(image)
399-
return "/".join(filter(None, [code_location_key_prefix, model_name or training_job_name]))
400+
name_from_image = f"/model_code/{int(time.time())}"
401+
if not is_pipeline_variable(image):
402+
name_from_image = sagemaker.utils.name_from_image(image)
403+
return "/".join(filter(None, [code_location_key_prefix, model_name or name_from_image]))
400404

401405

402406
def warn_if_parameter_server_with_multi_gpu(training_instance_type, distribution):

src/sagemaker/lambda_helper.py

+26-22
Original file line numberDiff line numberDiff line change
@@ -15,6 +15,7 @@
1515

1616
from io import BytesIO
1717
import zipfile
18+
import time
1819
from botocore.exceptions import ClientError
1920
from sagemaker.session import Session
2021

@@ -134,32 +135,35 @@ def update(self):
134135
Returns: boto3 response from Lambda's update_function method.
135136
"""
136137
lambda_client = _get_lambda_client(self.session)
137-
138-
if self.script is not None:
139-
try:
140-
response = lambda_client.update_function_code(
141-
FunctionName=self.function_name, ZipFile=_zip_lambda_code(self.script)
142-
)
143-
return response
144-
except ClientError as e:
145-
error = e.response["Error"]
146-
raise ValueError(error)
147-
else:
138+
retry_attempts = 7
139+
for i in range(retry_attempts):
148140
try:
149-
response = lambda_client.update_function_code(
150-
FunctionName=(self.function_name or self.function_arn),
151-
S3Bucket=self.s3_bucket,
152-
S3Key=_upload_to_s3(
153-
s3_client=_get_s3_client(self.session),
154-
function_name=self.function_name,
155-
zipped_code_dir=self.zipped_code_dir,
156-
s3_bucket=self.s3_bucket,
157-
),
158-
)
141+
if self.script is not None:
142+
response = lambda_client.update_function_code(
143+
FunctionName=self.function_name, ZipFile=_zip_lambda_code(self.script)
144+
)
145+
else:
146+
response = lambda_client.update_function_code(
147+
FunctionName=(self.function_name or self.function_arn),
148+
S3Bucket=self.s3_bucket,
149+
S3Key=_upload_to_s3(
150+
s3_client=_get_s3_client(self.session),
151+
function_name=self.function_name,
152+
zipped_code_dir=self.zipped_code_dir,
153+
s3_bucket=self.s3_bucket,
154+
),
155+
)
159156
return response
160157
except ClientError as e:
161158
error = e.response["Error"]
162-
raise ValueError(error)
159+
code = error["Code"]
160+
if code == "ResourceConflictException":
161+
if i == retry_attempts - 1:
162+
raise ValueError(error)
163+
# max wait time = 2**0 + 2**1 + .. + 2**6 = 127 seconds
164+
time.sleep(2**i)
165+
else:
166+
raise ValueError(error)
163167

164168
def upsert(self):
165169
"""Method to create a lambda function or update it if it already exists

src/sagemaker/workflow/_utils.py

+12-7
Original file line numberDiff line numberDiff line change
@@ -17,7 +17,7 @@
1717
import shutil
1818
import tarfile
1919
import tempfile
20-
from typing import List, Union, Optional
20+
from typing import List, Union, Optional, TYPE_CHECKING
2121
from sagemaker import image_uris
2222
from sagemaker.inputs import TrainingInput
2323
from sagemaker.estimator import EstimatorBase
@@ -34,6 +34,9 @@
3434
from sagemaker.utils import _save_model, download_file_from_url
3535
from sagemaker.workflow.retry import RetryPolicy
3636

37+
if TYPE_CHECKING:
38+
from sagemaker.workflow.step_collections import StepCollection
39+
3740
FRAMEWORK_VERSION = "0.23-1"
3841
INSTANCE_TYPE = "ml.m5.large"
3942
REPACK_SCRIPT = "_repack_model.py"
@@ -57,7 +60,7 @@ def __init__(
5760
description: str = None,
5861
source_dir: str = None,
5962
dependencies: List = None,
60-
depends_on: Union[List[str], List[Step]] = None,
63+
depends_on: Optional[List[Union[str, Step, "StepCollection"]]] = None,
6164
retry_policies: List[RetryPolicy] = None,
6265
subnets=None,
6366
security_group_ids=None,
@@ -124,8 +127,9 @@ def __init__(
124127
>>> |------ virtual-env
125128
126129
This is not supported with "local code" in Local Mode.
127-
depends_on (List[str] or List[Step]): A list of step names or instances
128-
this step depends on (default: None).
130+
depends_on (List[Union[str, Step, StepCollection]]): The list of `Step`/`StepCollection`
131+
names or `Step` instances or `StepCollection` instances that the current `Step`
132+
depends on (default: None).
129133
retry_policies (List[RetryPolicy]): The list of retry policies for the current step
130134
(default: None).
131135
subnets (list[str]): List of subnet ids. If not specified, the re-packing
@@ -274,7 +278,7 @@ def __init__(
274278
compile_model_family=None,
275279
display_name: str = None,
276280
description=None,
277-
depends_on: Optional[Union[List[str], List[Step]]] = None,
281+
depends_on: Optional[List[Union[str, Step, "StepCollection"]]] = None,
278282
retry_policies: Optional[List[RetryPolicy]] = None,
279283
tags=None,
280284
container_def_list=None,
@@ -311,8 +315,9 @@ def __init__(
311315
if specified, a compiled model will be used (default: None).
312316
display_name (str): The display name of this `_RegisterModelStep` step (default: None).
313317
description (str): Model Package description (default: None).
314-
depends_on (List[str] or List[Step]): A list of step names or instances
315-
this step depends on (default: None).
318+
depends_on (List[Union[str, Step, StepCollection]]): The list of `Step`/`StepCollection`
319+
names or `Step` instances or `StepCollection` instances that the current `Step`
320+
depends on (default: None).
316321
retry_policies (List[RetryPolicy]): The list of retry policies for the current step
317322
(default: None).
318323
tags (List[dict[str, str]]): A list of dictionaries containing key-value pairs used to

src/sagemaker/workflow/callback_step.py

+6-4
Original file line numberDiff line numberDiff line change
@@ -13,7 +13,7 @@
1313
"""The step definitions for workflow."""
1414
from __future__ import absolute_import
1515

16-
from typing import List, Dict, Union
16+
from typing import List, Dict, Union, Optional
1717
from enum import Enum
1818

1919
import attr
@@ -27,6 +27,7 @@
2727
from sagemaker.workflow.entities import (
2828
DefaultEnumMeta,
2929
)
30+
from sagemaker.workflow.step_collections import StepCollection
3031
from sagemaker.workflow.steps import Step, StepTypeEnum, CacheConfig
3132

3233

@@ -86,7 +87,7 @@ def __init__(
8687
display_name: str = None,
8788
description: str = None,
8889
cache_config: CacheConfig = None,
89-
depends_on: Union[List[str], List[Step]] = None,
90+
depends_on: Optional[List[Union[str, Step, StepCollection]]] = None,
9091
):
9192
"""Constructs a CallbackStep.
9293
@@ -99,8 +100,9 @@ def __init__(
99100
display_name (str): The display name of the callback step.
100101
description (str): The description of the callback step.
101102
cache_config (CacheConfig): A `sagemaker.workflow.steps.CacheConfig` instance.
102-
depends_on (List[str] or List[Step]): A list of step names or step instances
103-
this `sagemaker.workflow.steps.CallbackStep` depends on
103+
depends_on (List[Union[str, Step, StepCollection]]): A list of `Step`/`StepCollection`
104+
names or `Step` instances or `StepCollection` instances that this `CallbackStep`
105+
depends on.
104106
"""
105107
super(CallbackStep, self).__init__(
106108
name, display_name, description, StepTypeEnum.CALLBACK, depends_on

0 commit comments

Comments
 (0)