Skip to content

Commit 06d1ea0

Browse files
author
Talia Chopra
committed
Merge remote-tracking branch 'upstream/master'
2 parents 82cc58e + 4e7a957 commit 06d1ea0

File tree

14 files changed

+192
-562
lines changed

14 files changed

+192
-562
lines changed

doc/api/training/sdp_versions/latest/smd_data_parallel_pytorch.rst

+2-2
Original file line numberDiff line numberDiff line change
@@ -153,9 +153,9 @@ you will have for distributed training with the distributed data parallel librar
153153
PyTorch API
154154
===========
155155

156-
**Supported versions:**
156+
.. rubric:: Supported versions
157157

158-
- PyTorch 1.6.0, 1.8.0
158+
**PyTorch 1.7.1, 1.8.0**
159159

160160

161161
.. function:: smdistributed.dataparallel.torch.distributed.is_available()

doc/api/training/sdp_versions/latest/smd_data_parallel_tensorflow.rst

+7-4
Original file line numberDiff line numberDiff line change
@@ -16,8 +16,9 @@ The following steps show you how to convert a TensorFlow 2.x training
1616
script to utilize the distributed data parallel library.
1717

1818
The distributed data parallel library APIs are designed to be close to Horovod APIs.
19-
See `SageMaker distributed data parallel TensorFlow examples <https://sagemaker-examples.readthedocs.io/en/latest/training/distributed_training/index.html#tensorflow-distributed>`__ for additional details on how to implement the data parallel library
20-
API offered for TensorFlow.
19+
See `SageMaker distributed data parallel TensorFlow examples
20+
<https://sagemaker-examples.readthedocs.io/en/latest/training/distributed_training/index.html#tensorflow-distributed>`__
21+
for additional details on how to implement the data parallel library.
2122

2223
- First import the distributed data parallel library’s TensorFlow client and initialize it:
2324

@@ -156,8 +157,10 @@ TensorFlow API
156157

157158
.. rubric:: Supported versions
158159

159-
- TensorFlow 2.x - 2.3.1
160-
160+
TensorFlow is supported in version 1.0.0 of ``sagemakerdistributed.dataparallel``.
161+
Reference version 1.0.0 `TensorFlow API documentation
162+
<https://sagemaker.readthedocs.io/en/stable/api/training/sdp_versions/latest/smd_data_parallel_tensorflow.html#tensorflow-sdp-api>`_
163+
for supported TensorFlow versions.
161164

162165
.. function:: smdistributed.dataparallel.tensorflow.init()
163166

doc/api/training/sdp_versions/v1.0.0/smd_data_parallel_pytorch.rst

+6-8
Original file line numberDiff line numberDiff line change
@@ -4,11 +4,10 @@ PyTorch Guide to SageMaker's distributed data parallel library
44

55
.. admonition:: Contents
66

7-
- :ref:`pytorch-sdp-modify`
8-
- :ref:`pytorch-sdp-api`
7+
- :ref:`pytorch-sdp-modify-1.0.0`
8+
- :ref:`pytorch-sdp-api-1.0.0`
99

10-
.. _pytorch-sdp-modify:
11-
:noindex:
10+
.. _pytorch-sdp-modify-1.0.0:
1211

1312
Modify a PyTorch training script to use SageMaker data parallel
1413
======================================================================
@@ -149,15 +148,14 @@ you will have for distributed training with the distributed data parallel librar
149148
    main()
150149
151150
152-
.. _pytorch-sdp-api:
153-
:noindex:
151+
.. _pytorch-sdp-api-1.0.0:
154152

155153
PyTorch API
156154
===========
157155

158-
**Supported versions:**
156+
.. rubric:: Supported versions
159157

160-
- PyTorch 1.6.0
158+
**PyTorch 1.6.0, 1.7.1**
161159

162160

163161
.. function:: smdistributed.dataparallel.torch.distributed.is_available()

doc/api/training/sdp_versions/v1.0.0/smd_data_parallel_tensorflow.rst

+5-7
Original file line numberDiff line numberDiff line change
@@ -4,11 +4,10 @@ TensorFlow Guide to SageMaker's distributed data parallel library
44

55
.. admonition:: Contents
66

7-
- :ref:`tensorflow-sdp-modify`
8-
- :ref:`tensorflow-sdp-api`
7+
- :ref:`tensorflow-sdp-modify-1.0.0`
8+
- :ref:`tensorflow-sdp-api-1.0.0`
99

10-
.. _tensorflow-sdp-modify:
11-
:noindex:
10+
.. _tensorflow-sdp-modify-1.0.0:
1211

1312
Modify a TensorFlow 2.x training script to use SageMaker data parallel
1413
======================================================================
@@ -150,15 +149,14 @@ script you will have for distributed training with the library.
150149
    checkpoint.save(checkpoint_dir)
151150
152151
153-
.. _tensorflow-sdp-api:
154-
:noindex:
152+
.. _tensorflow-sdp-api-1.0.0:
155153

156154
TensorFlow API
157155
==============
158156

159157
.. rubric:: Supported versions
160158

161-
- TensorFlow 2.x - 2.3.1
159+
**TensorFlow 2.3.x - 2.4.1**
162160

163161

164162
.. function:: smdistributed.dataparallel.tensorflow.init()

doc/api/training/smp_versions/latest/smd_model_parallel_pytorch.rst

+1-1
Original file line numberDiff line numberDiff line change
@@ -6,7 +6,7 @@
66
PyTorch API
77
===========
88

9-
**Supported versions: 1.7.1, 1.8.0**
9+
**Supported versions: 1.6.0, 1.7.1, 1.8.0**
1010

1111
This API document assumes you use the following import statements in your training scripts.
1212

doc/frameworks/huggingface/index.rst

+1
Original file line numberDiff line numberDiff line change
@@ -9,3 +9,4 @@ For general information about using the SageMaker Python SDK, see :ref:`overview
99
:maxdepth: 2
1010

1111
sagemaker.huggingface
12+
Use Hugging Face with the SageMaker Python SDK <https://huggingface.co/transformers/sagemaker.html>

doc/requirements.txt

+1
Original file line numberDiff line numberDiff line change
@@ -1,2 +1,3 @@
11
sphinx==3.1.1
22
sphinx-rtd-theme==0.5.0
3+
docutils==0.15.2

src/sagemaker/clarify.py

+6-3
Original file line numberDiff line numberDiff line change
@@ -123,6 +123,7 @@ def __init__(
123123
content_type=None,
124124
content_template=None,
125125
custom_attributes=None,
126+
accelerator_type=None,
126127
):
127128
"""Initializes a configuration of a model and the endpoint to be created for it.
128129
@@ -151,6 +152,9 @@ def __init__(
151152
Section 3.3.6. Field Value Components (
152153
https://tools.ietf.org/html/rfc7230#section-3.2.6) of the Hypertext Transfer
153154
Protocol (HTTP/1.1).
155+
accelerator_type (str): The Elastic Inference accelerator type to deploy to the model
156+
endpoint instance for making inferences to the model, see
157+
https://docs.aws.amazon.com/sagemaker/latest/dg/ei.html.
154158
"""
155159
self.predictor_config = {
156160
"model_name": model_name,
@@ -178,9 +182,8 @@ def __init__(
178182
f" Please include a placeholder $features."
179183
)
180184
self.predictor_config["content_template"] = content_template
181-
182-
if custom_attributes is not None:
183-
self.predictor_config["custom_attributes"] = custom_attributes
185+
_set(custom_attributes, "custom_attributes", self.predictor_config)
186+
_set(accelerator_type, "accelerator_type", self.predictor_config)
184187

185188
def get_predictor_config(self):
186189
"""Returns part of the predictor dictionary of the analysis config."""

tests/conftest.py

+12-7
Original file line numberDiff line numberDiff line change
@@ -190,7 +190,7 @@ def pytorch_inference_py_version(pytorch_inference_version, request):
190190
return "py3"
191191

192192

193-
def _huggingface_pytorch_version(huggingface_vesion):
193+
def _huggingface_base_fm_version(huggingface_vesion, base_fw):
194194
config = image_uris.config_for_framework("huggingface")
195195
training_config = config.get("training")
196196
original_version = huggingface_vesion
@@ -200,21 +200,26 @@ def _huggingface_pytorch_version(huggingface_vesion):
200200
)
201201
version_config = training_config.get("versions").get(huggingface_vesion)
202202
for key in list(version_config.keys()):
203-
if key.startswith("pytorch"):
204-
pt_version = key[7:]
203+
if key.startswith(base_fw):
204+
base_fw_version = key[len(base_fw) :]
205205
if len(original_version.split(".")) == 2:
206-
pt_version = ".".join(pt_version.split(".")[:-1])
207-
return pt_version
206+
base_fw_version = ".".join(base_fw_version.split(".")[:-1])
207+
return base_fw_version
208208

209209

210210
@pytest.fixture(scope="module")
211211
def huggingface_pytorch_version(huggingface_training_version):
212-
return _huggingface_pytorch_version(huggingface_training_version)
212+
return _huggingface_base_fm_version(huggingface_training_version, "pytorch")
213213

214214

215215
@pytest.fixture(scope="module")
216216
def huggingface_pytorch_latest_version(huggingface_training_latest_version):
217-
return _huggingface_pytorch_version(huggingface_training_latest_version)
217+
return _huggingface_base_fm_version(huggingface_training_latest_version, "pytorch")
218+
219+
220+
@pytest.fixture(scope="module")
221+
def huggingface_tensorflow_latest_version(huggingface_training_latest_version):
222+
return _huggingface_base_fm_version(huggingface_training_latest_version, "tensorflow")
218223

219224

220225
@pytest.fixture(scope="module")

0 commit comments

Comments
 (0)