Skip to content

Commit b91ac25

Browse files
authored
Merge branch 'master' into ptddp-launcher
2 parents c63391c + 95bbe7a commit b91ac25

File tree

114 files changed

+3117
-334
lines changed

Some content is hidden

Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.

114 files changed

+3117
-334
lines changed

CHANGELOG.md

+20
Original file line numberDiff line numberDiff line change
@@ -1,5 +1,25 @@
11
# Changelog
22

3+
## v2.101.0 (2022-07-27)
4+
5+
### Features
6+
7+
* Algorithms region launch on CGK
8+
* enhance-bucket-override-support
9+
* infer framework and version
10+
* support clarify bias detection when facets not included
11+
* Add CGK region to frameworks by DLC
12+
13+
### Bug Fixes and Other Changes
14+
15+
* Make repack step output path align with model repack path
16+
* Support parameterized source code input for TrainingStep
17+
18+
### Documentation Changes
19+
20+
* heterogeneous cluster api doc fix
21+
* smdmp v1.10 release note
22+
323
## v2.100.0 (2022-07-18)
424

525
### Features

VERSION

+1-1
Original file line numberDiff line numberDiff line change
@@ -1 +1 @@
1-
2.100.1.dev0
1+
2.101.1.dev0

doc/api/training/smd_model_parallel_release_notes/smd_model_parallel_change_log.rst

+50-6
Original file line numberDiff line numberDiff line change
@@ -5,14 +5,31 @@ Release Notes
55
New features, bug fixes, and improvements are regularly made to the SageMaker
66
distributed model parallel library.
77

8-
SageMaker Distributed Model Parallel 1.9.0 Release Notes
9-
========================================================
8+
SageMaker Distributed Model Parallel 1.10.0 Release Notes
9+
=========================================================
1010

11-
*Date: May. 3. 2022*
11+
*Date: July. 19. 2022*
1212

13-
**Currency Updates**
13+
**New Features**
1414

15-
* Added support for PyTorch 1.11.0
15+
The following new features are added for PyTorch.
16+
17+
* Added support for FP16 training by implementing smdistributed.modelparallel
18+
modification of Apex FP16_Module and FP16_Optimizer. To learn more, see
19+
`FP16 Training with Model Parallelism
20+
<https://docs.aws.amazon.com/sagemaker/latest/dg/model-parallel-extended-features-pytorch-fp16.html>`_.
21+
* New checkpoint APIs for CPU memory usage optimization. To learn more, see
22+
`Checkpointing Distributed Models and Optimizer States
23+
<https://docs.aws.amazon.com/sagemaker/latest/dg/model-parallel-extended-features-pytorch-checkpoint.html>`_.
24+
25+
**Improvements**
26+
27+
* The SageMaker distributed model parallel library manages and optimizes CPU
28+
memory by garbage-collecting non-local parameters in general and during checkpointing.
29+
* Changes in the `GPT-2 translate functions
30+
<https://docs.aws.amazon.com/sagemaker/latest/dg/model-parallel-extended-features-pytorch-hugging-face.html>`_
31+
(``smdistributed.modelparallel.torch.nn.huggingface.gpt2``)
32+
to save memory by not maintaining two copies of weights at the same time.
1633

1734
**Migration to AWS Deep Learning Containers**
1835

@@ -28,7 +45,7 @@ Binary file of this version of the library for custom container users:
2845

2946
.. code::
3047
31-
https://sagemaker-distributed-model-parallel.s3.us-west-2.amazonaws.com/pytorch-1.11.0/build-artifacts/2022-04-20-17-05/smdistributed_modelparallel-1.9.0-cp38-cp38-linux_x86_64.whl
48+
https://sagemaker-distributed-model-parallel.s3.us-west-2.amazonaws.com/pytorch-1.11.0/build-artifacts/2022-07-11-19-23/smdistributed_modelparallel-1.10.0-cp38-cp38-linux_x86_64.whl
3249
3350
3451
@@ -37,6 +54,33 @@ Binary file of this version of the library for custom container users:
3754
Release History
3855
===============
3956

57+
SageMaker Distributed Model Parallel 1.9.0 Release Notes
58+
--------------------------------------------------------
59+
60+
*Date: May. 3. 2022*
61+
62+
**Currency Updates**
63+
64+
* Added support for PyTorch 1.11.0
65+
66+
**Migration to AWS Deep Learning Containers**
67+
68+
This version passed benchmark testing and is migrated to the following AWS Deep Learning Containers (DLC):
69+
70+
- PyTorch 1.11.0 DLC
71+
72+
.. code::
73+
74+
763104351884.dkr.ecr.<region>.amazonaws.com/pytorch-training:1.11.0-gpu-py38-cu113-ubuntu20.04-sagemaker
75+
76+
Binary file of this version of the library for custom container users:
77+
78+
.. code::
79+
80+
https://sagemaker-distributed-model-parallel.s3.us-west-2.amazonaws.com/pytorch-1.11.0/build-artifacts/2022-04-20-17-05/smdistributed_modelparallel-1.9.0-cp38-cp38-linux_x86_64.whl
81+
82+
83+
4084
SageMaker Distributed Model Parallel 1.8.1 Release Notes
4185
--------------------------------------------------------
4286

doc/overview.rst

+7-3
Original file line numberDiff line numberDiff line change
@@ -1713,11 +1713,15 @@ in the AWS documentation.
17131713
SageMaker Workflow
17141714
******************
17151715
1716-
You can use Apache Airflow to author, schedule and monitor SageMaker workflow.
1716+
You can use the following machine learning frameworks to author, schedule and monitor SageMaker workflow.
17171717
1718-
For more information, see `SageMaker Workflow in Apache Airflow`_.
1718+
.. toctree::
1719+
:maxdepth: 2
17191720
1720-
.. _SageMaker Workflow in Apache Airflow: https://github.com/aws/sagemaker-python-sdk/blob/master/src/sagemaker/workflow/README.rst
1721+
workflows/airflow/index
1722+
workflows/step_functions/index
1723+
workflows/pipelines/index
1724+
workflows/lineage/index
17211725
17221726
************************************
17231727
SageMaker Model Building Pipeline

src/sagemaker/amazon/factorization_machines.py

+13-1
Original file line numberDiff line numberDiff line change
@@ -13,6 +13,8 @@
1313
"""Placeholder docstring"""
1414
from __future__ import absolute_import
1515

16+
from typing import Union, Optional
17+
1618
from sagemaker import image_uris
1719
from sagemaker.amazon.amazon_estimator import AmazonAlgorithmEstimatorBase
1820
from sagemaker.amazon.common import RecordSerializer, RecordDeserializer
@@ -21,7 +23,9 @@
2123
from sagemaker.predictor import Predictor
2224
from sagemaker.model import Model
2325
from sagemaker.session import Session
26+
from sagemaker.utils import pop_out_unused_kwarg
2427
from sagemaker.vpc_utils import VPC_CONFIG_DEFAULT
28+
from sagemaker.workflow.entities import PipelineVariable
2529

2630

2731
class FactorizationMachines(AmazonAlgorithmEstimatorBase):
@@ -319,7 +323,13 @@ class FactorizationMachinesModel(Model):
319323
returns :class:`FactorizationMachinesPredictor`.
320324
"""
321325

322-
def __init__(self, model_data, role, sagemaker_session=None, **kwargs):
326+
def __init__(
327+
self,
328+
model_data: Union[str, PipelineVariable],
329+
role: str,
330+
sagemaker_session: Optional[Session] = None,
331+
**kwargs
332+
):
323333
"""Initialization for FactorizationMachinesModel class.
324334
325335
Args:
@@ -343,6 +353,8 @@ def __init__(self, model_data, role, sagemaker_session=None, **kwargs):
343353
sagemaker_session.boto_region_name,
344354
version=FactorizationMachines.repo_version,
345355
)
356+
pop_out_unused_kwarg("predictor_cls", kwargs, FactorizationMachinesPredictor.__name__)
357+
pop_out_unused_kwarg("image_uri", kwargs, image_uri)
346358
super(FactorizationMachinesModel, self).__init__(
347359
image_uri,
348360
model_data,

src/sagemaker/amazon/hyperparameter.py

+13-5
Original file line numberDiff line numberDiff line change
@@ -15,6 +15,8 @@
1515

1616
import json
1717

18+
from sagemaker.workflow import is_pipeline_variable
19+
1820

1921
class Hyperparameter(object):
2022
"""An algorithm hyperparameter with optional validation.
@@ -98,8 +100,14 @@ def serialize_all(obj):
98100
"""
99101
if "_hyperparameters" not in dir(obj):
100102
return {}
101-
return {
102-
k: json.dumps(v) if isinstance(v, list) else str(v)
103-
for k, v in obj._hyperparameters.items()
104-
if v is not None
105-
}
103+
hps = {}
104+
for k, v in obj._hyperparameters.items():
105+
if v is not None:
106+
if isinstance(v, list):
107+
v = json.dumps(v)
108+
elif is_pipeline_variable(v):
109+
v = v.to_string()
110+
else:
111+
v = str(v)
112+
hps[k] = v
113+
return hps

src/sagemaker/amazon/ipinsights.py

+13-1
Original file line numberDiff line numberDiff line change
@@ -13,6 +13,8 @@
1313
"""Placeholder docstring"""
1414
from __future__ import absolute_import
1515

16+
from typing import Union, Optional
17+
1618
from sagemaker import image_uris
1719
from sagemaker.amazon.amazon_estimator import AmazonAlgorithmEstimatorBase
1820
from sagemaker.amazon.hyperparameter import Hyperparameter as hp # noqa
@@ -22,7 +24,9 @@
2224
from sagemaker.model import Model
2325
from sagemaker.serializers import CSVSerializer
2426
from sagemaker.session import Session
27+
from sagemaker.utils import pop_out_unused_kwarg
2528
from sagemaker.vpc_utils import VPC_CONFIG_DEFAULT
29+
from sagemaker.workflow.entities import PipelineVariable
2630

2731

2832
class IPInsights(AmazonAlgorithmEstimatorBase):
@@ -222,7 +226,13 @@ class IPInsightsModel(Model):
222226
Predictor that calculates anomaly scores for data points.
223227
"""
224228

225-
def __init__(self, model_data, role, sagemaker_session=None, **kwargs):
229+
def __init__(
230+
self,
231+
model_data: Union[str, PipelineVariable],
232+
role: str,
233+
sagemaker_session: Optional[Session] = None,
234+
**kwargs
235+
):
226236
"""Creates object to get insights on S3 model data.
227237
228238
Args:
@@ -246,6 +256,8 @@ def __init__(self, model_data, role, sagemaker_session=None, **kwargs):
246256
sagemaker_session.boto_region_name,
247257
version=IPInsights.repo_version,
248258
)
259+
pop_out_unused_kwarg("predictor_cls", kwargs, IPInsightsPredictor.__name__)
260+
pop_out_unused_kwarg("image_uri", kwargs, image_uri)
249261
super(IPInsightsModel, self).__init__(
250262
image_uri,
251263
model_data,

src/sagemaker/amazon/kmeans.py

+13-1
Original file line numberDiff line numberDiff line change
@@ -13,6 +13,8 @@
1313
"""Placeholder docstring"""
1414
from __future__ import absolute_import
1515

16+
from typing import Union, Optional
17+
1618
from sagemaker import image_uris
1719
from sagemaker.amazon.amazon_estimator import AmazonAlgorithmEstimatorBase
1820
from sagemaker.amazon.common import RecordSerializer, RecordDeserializer
@@ -21,7 +23,9 @@
2123
from sagemaker.predictor import Predictor
2224
from sagemaker.model import Model
2325
from sagemaker.session import Session
26+
from sagemaker.utils import pop_out_unused_kwarg
2427
from sagemaker.vpc_utils import VPC_CONFIG_DEFAULT
28+
from sagemaker.workflow.entities import PipelineVariable
2529

2630

2731
class KMeans(AmazonAlgorithmEstimatorBase):
@@ -246,7 +250,13 @@ class KMeansModel(Model):
246250
Predictor to performs k-means cluster assignment.
247251
"""
248252

249-
def __init__(self, model_data, role, sagemaker_session=None, **kwargs):
253+
def __init__(
254+
self,
255+
model_data: Union[str, PipelineVariable],
256+
role: str,
257+
sagemaker_session: Optional[Session] = None,
258+
**kwargs
259+
):
250260
"""Initialization for KMeansModel class.
251261
252262
Args:
@@ -270,6 +280,8 @@ def __init__(self, model_data, role, sagemaker_session=None, **kwargs):
270280
sagemaker_session.boto_region_name,
271281
version=KMeans.repo_version,
272282
)
283+
pop_out_unused_kwarg("predictor_cls", kwargs, KMeansPredictor.__name__)
284+
pop_out_unused_kwarg("image_uri", kwargs, image_uri)
273285
super(KMeansModel, self).__init__(
274286
image_uri,
275287
model_data,

src/sagemaker/amazon/knn.py

+13-1
Original file line numberDiff line numberDiff line change
@@ -13,6 +13,8 @@
1313
"""Placeholder docstring"""
1414
from __future__ import absolute_import
1515

16+
from typing import Union, Optional
17+
1618
from sagemaker import image_uris
1719
from sagemaker.amazon.amazon_estimator import AmazonAlgorithmEstimatorBase
1820
from sagemaker.amazon.common import RecordSerializer, RecordDeserializer
@@ -21,7 +23,9 @@
2123
from sagemaker.predictor import Predictor
2224
from sagemaker.model import Model
2325
from sagemaker.session import Session
26+
from sagemaker.utils import pop_out_unused_kwarg
2427
from sagemaker.vpc_utils import VPC_CONFIG_DEFAULT
28+
from sagemaker.workflow.entities import PipelineVariable
2529

2630

2731
class KNN(AmazonAlgorithmEstimatorBase):
@@ -238,7 +242,13 @@ class KNNModel(Model):
238242
and returns :class:`KNNPredictor`.
239243
"""
240244

241-
def __init__(self, model_data, role, sagemaker_session=None, **kwargs):
245+
def __init__(
246+
self,
247+
model_data: Union[str, PipelineVariable],
248+
role: str,
249+
sagemaker_session: Optional[Session] = None,
250+
**kwargs
251+
):
242252
"""Function to initialize KNNModel.
243253
244254
Args:
@@ -262,6 +272,8 @@ def __init__(self, model_data, role, sagemaker_session=None, **kwargs):
262272
sagemaker_session.boto_region_name,
263273
version=KNN.repo_version,
264274
)
275+
pop_out_unused_kwarg("predictor_cls", kwargs, KNNPredictor.__name__)
276+
pop_out_unused_kwarg("image_uri", kwargs, image_uri)
265277
super(KNNModel, self).__init__(
266278
image_uri,
267279
model_data,

src/sagemaker/amazon/lda.py

+13-1
Original file line numberDiff line numberDiff line change
@@ -13,6 +13,8 @@
1313
"""Placeholder docstring"""
1414
from __future__ import absolute_import
1515

16+
from typing import Union, Optional
17+
1618
from sagemaker import image_uris
1719
from sagemaker.amazon.amazon_estimator import AmazonAlgorithmEstimatorBase
1820
from sagemaker.amazon.common import RecordSerializer, RecordDeserializer
@@ -21,7 +23,9 @@
2123
from sagemaker.predictor import Predictor
2224
from sagemaker.model import Model
2325
from sagemaker.session import Session
26+
from sagemaker.utils import pop_out_unused_kwarg
2427
from sagemaker.vpc_utils import VPC_CONFIG_DEFAULT
28+
from sagemaker.workflow.entities import PipelineVariable
2529

2630

2731
class LDA(AmazonAlgorithmEstimatorBase):
@@ -220,7 +224,13 @@ class LDAModel(Model):
220224
Predictor that transforms vectors to a lower-dimensional representation.
221225
"""
222226

223-
def __init__(self, model_data, role, sagemaker_session=None, **kwargs):
227+
def __init__(
228+
self,
229+
model_data: Union[str, PipelineVariable],
230+
role: str,
231+
sagemaker_session: Optional[Session] = None,
232+
**kwargs
233+
):
224234
"""Initialization for LDAModel class.
225235
226236
Args:
@@ -244,6 +254,8 @@ def __init__(self, model_data, role, sagemaker_session=None, **kwargs):
244254
sagemaker_session.boto_region_name,
245255
version=LDA.repo_version,
246256
)
257+
pop_out_unused_kwarg("predictor_cls", kwargs, LDAPredictor.__name__)
258+
pop_out_unused_kwarg("image_uri", kwargs, image_uri)
247259
super(LDAModel, self).__init__(
248260
image_uri,
249261
model_data,

0 commit comments

Comments
 (0)