Skip to content

Commit b472b3f

Browse files
authored
Merge branch 'aws:master' into model_config
2 parents 39b2e65 + 8d2c16b commit b472b3f

Some content is hidden

Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.

46 files changed

+2227
-423
lines changed

CHANGELOG.md

Lines changed: 50 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -1,5 +1,55 @@
11
# Changelog
22

3+
## v2.131.1 (2023-02-03)
4+
5+
### Bug Fixes and Other Changes
6+
7+
* test dub gpu integs with p3
8+
* fix(experiments/run.py): Stop duplication of RUN_TC_TAG on Consecutive Experiment Runs
9+
* Enable load_run without name args in Transform env
10+
* Remove confusing log line emitted during feature group ingestion
11+
* Enable Experiment integ test on beta clients
12+
* Make test_processor_with_role_as_pipeline_parameter more concrete
13+
14+
### Documentation Changes
15+
16+
* add security note for the estimator hyperparameter arg
17+
* SageMaker distributed - model parallism library release note
18+
* Add a deprecation note for DetailedProfileConfig
19+
20+
## v2.131.0 (2023-01-31)
21+
22+
### Features
23+
24+
* Display file diff on black-check
25+
* Support for environment variables in the HPO
26+
* Support role as PipelineParameter in Processor class
27+
* Add TrainingImageConfig support for SageMaker training jobs
28+
29+
### Bug Fixes and Other Changes
30+
31+
* use FeatureGroup's Session in nonconcurrency ingestion
32+
* Update feature_group.py ingest() description
33+
* Do not use print function. User logger instead
34+
* Add batch_get_record and search API for FeatureStore
35+
* hashing problem for framework processors with identical source dirs
36+
37+
## v2.130.0 (2023-01-26)
38+
39+
### Features
40+
41+
* Add PyTorch 1.13.1 to SDK
42+
* Adding image_uri config for DJL containers
43+
* Support specifying env-vars when creating model from model package
44+
* local download dir for Model and Estimator classes
45+
46+
### Bug Fixes and Other Changes
47+
48+
* increase creation time slack minutes
49+
* Enable load_run auto pass in experiment config
50+
* Add us-isob-east-1 accounts and configs
51+
* Clean up Pipeline unit tests
52+
353
## v2.129.0 (2023-01-19)
454

555
### Features

VERSION

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1 +1 @@
1-
2.129.1.dev0
1+
2.131.2.dev0

doc/api/training/smd_model_parallel_release_notes/smd_model_parallel_change_log.rst

Lines changed: 41 additions & 7 deletions
Original file line numberDiff line numberDiff line change
@@ -6,9 +6,47 @@ New features, bug fixes, and improvements are regularly made to the SageMaker
66
distributed model parallel library.
77

88

9-
SageMaker Distributed Model Parallel 1.13.0 Release Notes
9+
SageMaker Distributed Model Parallel 1.14.0 Release Notes
1010
=========================================================
1111

12+
*Date: Jan. 30. 2023*
13+
14+
**Currency Updates**
15+
16+
* Added support for PyTorch v1.13.1
17+
18+
**Improvements**
19+
20+
* Upgraded the flash-attention (https://github.com/HazyResearch/flash-attention) library to v0.2.6.post1
21+
22+
**Migration to AWS Deep Learning Containers**
23+
24+
This version passed benchmark testing and is migrated to the following AWS Deep Learning Containers (DLC):
25+
26+
- SageMaker training container for PyTorch v1.13.1
27+
28+
.. code::
29+
30+
763104351884.dkr.ecr.<region>.amazonaws.com/pytorch-training:1.13.1-gpu-py39-cu117-ubuntu20.04-sagemaker
31+
32+
33+
Binary file of this version of the library for `custom container
34+
<https://docs.aws.amazon.com/sagemaker/latest/dg/model-parallel-sm-sdk.html#model-parallel-bring-your-own-container>`_ users:
35+
36+
- For PyTorch 1.13.1
37+
38+
.. code::
39+
40+
https://sagemaker-distributed-model-parallel.s3.us-west-2.amazonaws.com/pytorch-1.13.1/build-artifacts/2023-01-19-18-35/smdistributed_modelparallel-1.14.0-cp39-cp39-linux_x86_64.whl
41+
42+
----
43+
44+
Release History
45+
===============
46+
47+
SageMaker Distributed Model Parallel 1.13.0 Release Notes
48+
---------------------------------------------------------
49+
1250
*Date: Dec. 15. 2022*
1351

1452
**New Features**
@@ -46,16 +84,12 @@ This version passed benchmark testing and is migrated to the following AWS Deep
4684
Binary file of this version of the library for `custom container
4785
<https://docs.aws.amazon.com/sagemaker/latest/dg/model-parallel-sm-sdk.html#model-parallel-bring-your-own-container>`_ users:
4886

49-
- For PyTorch 1.12.0
87+
- For PyTorch 1.12.1
5088

5189
.. code::
5290
5391
https://sagemaker-distributed-model-parallel.s3.us-west-2.amazonaws.com/pytorch-1.12.1/build-artifacts/2022-12-08-21-34/smdistributed_modelparallel-1.13.0-cp38-cp38-linux_x86_64.whl
5492
55-
----
56-
57-
Release History
58-
===============
5993
6094
SageMaker Distributed Model Parallel 1.11.0 Release Notes
6195
---------------------------------------------------------
@@ -92,7 +126,7 @@ Binary file of this version of the library for `custom container
92126

93127
.. code::
94128
95-
https://sagemaker-distribu
129+
https://sagemaker-distributed-model-parallel.s3.us-west-2.amazonaws.com/pytorch-1.12.0/build-artifacts/2022-08-12-16-58/smdistributed_modelparallel-1.11.0-cp38-cp38-linux_x86_64.whl
96130
97131
SageMaker Distributed Model Parallel 1.10.1 Release Notes
98132
---------------------------------------------------------

doc/api/training/smp_versions/latest.rst

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -10,8 +10,8 @@ depending on which version of the library you need to use.
1010
To use the library, reference the
1111
**Common API** documentation alongside the framework specific API documentation.
1212

13-
Version 1.11.0, 1.13.0 (Latest)
14-
===============================
13+
Version 1.11.0, 1.13.0, 1.14.0 (Latest)
14+
=======================================
1515

1616
To use the library, reference the Common API documentation alongside the framework specific API documentation.
1717

src/sagemaker/debugger/framework_profile.py

Lines changed: 6 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -143,6 +143,12 @@ def __init__(
143143
profiling. Configure it using the
144144
:class:`~sagemaker.debugger.metrics_config.DetailedProfilingConfig` class.
145145
Pass ``DetailedProfilingConfig()`` to use the default configuration.
146+
147+
.. warning::
148+
This detailed framework profiling feature discontinues support for TensorFlow v2.11
149+
and later. To use the detailed profiling feature, use previous versions of
150+
TensorFlow between v2.3.1 and v2.10.0.
151+
146152
dataloader_profiling_config (DataloaderProfilingConfig): The configuration for
147153
dataloader metrics profiling. Configure it using the
148154
:class:`~sagemaker.debugger.metrics_config.DataloaderProfilingConfig` class.

src/sagemaker/debugger/metrics_config.py

Lines changed: 6 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -203,8 +203,7 @@ def __init__(
203203
):
204204
"""Specify target steps or a target duration to profile.
205205
206-
By default, it profiles step 5
207-
of training.
206+
By default, it profiles step 5 of the training job.
208207
209208
If **profile_default_steps** is set to `True` and none of the other
210209
range parameters is specified,
@@ -224,6 +223,11 @@ def __init__(
224223
if one of the two pairs is used. If both pairs are specified, a
225224
conflict error occurs.
226225
226+
.. warning::
227+
This detailed framework profiling feature discontinues support for TensorFlow v2.11
228+
and later. To use the detailed profiling feature, use previous versions of
229+
TensorFlow between v2.3.1 and v2.10.0.
230+
227231
"""
228232
assert isinstance(
229233
profile_default_steps, bool

0 commit comments

Comments
 (0)