Skip to content

Commit 0fc458e

Browse files
mchoi8739jerrypeng7773
authored andcommitted
documentation: documentation for heterogeneous cluster
1 parent 031db8f commit 0fc458e

File tree

4 files changed

+112
-57
lines changed

4 files changed

+112
-57
lines changed

doc/api/utility/instance_group.rst

Lines changed: 8 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,8 @@
1+
Instance Group
2+
--------------
3+
4+
.. automodule:: sagemaker.instance_group
5+
:members:
6+
:undoc-members:
7+
:show-inheritance:
8+
:private-members:

src/sagemaker/estimator.py

Lines changed: 58 additions & 31 deletions
Original file line numberDiff line numberDiff line change
@@ -160,7 +160,7 @@ def __init__(
160160
instance_count (int): Number of Amazon EC2 instances to use
161161
for training. Required if instance_groups is not set.
162162
instance_type (str): Type of EC2 instance to use for training,
163-
for example, 'ml.c4.xlarge'. Required if instance_groups is
163+
for example, ``'ml.c4.xlarge'``. Required if instance_groups is
164164
not set.
165165
volume_size (int): Size in GB of the EBS volume to use for
166166
storing input data during training (default: 30). Must be large
@@ -235,7 +235,6 @@ def __init__(
235235
use_spot_instances (bool): Specifies whether to use SageMaker
236236
Managed Spot instances for training. If enabled then the
237237
``max_wait`` arg should also be set.
238-
239238
More information:
240239
https://docs.aws.amazon.com/sagemaker/latest/dg/model-managed-spot-training.html
241240
(default: ``False``).
@@ -313,40 +312,38 @@ def __init__(
313312
when training on Amazon SageMaker. If 'git_config' is provided,
314313
'source_dir' should be a relative location to a directory in the Git
315314
repo.
315+
With the following GitHub repo directory structure:
316316
317-
.. admonition:: Example
318-
319-
With the following GitHub repo directory structure:
317+
.. code::
320318
321-
>>> |----- README.md
322-
>>> |----- src
323-
>>> |----- train.py
324-
>>> |----- test.py
319+
|----- README.md
320+
|----- src
321+
|----- train.py
322+
|----- test.py
325323
326-
if you need 'train.py' as the entry point and 'test.py' as
327-
the training source code, you can assign
328-
entry_point='train.py' and source_dir='src'.
324+
if you need 'train.py' as the entry point and 'test.py' as
325+
the training source code, you can assign
326+
entry_point='train.py' and source_dir='src'.
329327
git_config (dict[str, str]): Git configurations used for cloning
330328
files, including ``repo``, ``branch``, ``commit``,
331329
``2FA_enabled``, ``username``, ``password``, and ``token``. The
332330
``repo`` field is required. All other fields are optional.
333331
``repo`` specifies the Git repository where your training script
334332
is stored. If you don't provide ``branch``, the default value
335333
'master' is used. If you don't provide ``commit``, the latest
336-
commit in the specified branch is used.
334+
commit in the specified branch is used. For example, the following config:
337335
338-
.. admonition:: Example
339-
340-
The following config:
341-
342-
>>> git_config = {'repo': 'https://github.com/aws/sagemaker-python-sdk.git',
343-
>>> 'branch': 'test-branch-git-config',
344-
>>> 'commit': '329bfcf884482002c05ff7f44f62599ebc9f445a'}
336+
.. code:: python
345337
346-
results in cloning the repo specified in 'repo', then
347-
checking out the 'master' branch, and checking out the specified
348-
commit.
338+
git_config = {
339+
'repo': 'https://github.com/aws/sagemaker-python-sdk.git',
340+
'branch': 'test-branch-git-config',
341+
'commit': '329bfcf884482002c05ff7f44f62599ebc9f445a'
342+
}
349343
344+
results in cloning the repo specified in 'repo', then
345+
checking out the 'master' branch, and checking out the specified
346+
commit.
350347
``2FA_enabled``, ``username``, ``password``, and ``token`` are
351348
used for authentication. For GitHub (or other Git) accounts, set
352349
``2FA_enabled`` to 'True' if two-factor authentication is
@@ -427,10 +424,25 @@ def __init__(
427424
>>> |------ virtual-env
428425
429426
This is not supported with "local code" in Local Mode.
430-
instance_groups (list[InstanceGroup]): Optional. List of InstanceGroup
431-
for specifying different instance groups for heterogeneous cluster.
432-
For example: [sagemaker.InstanceGroup('worker','ml.p3dn.24xlarge',64),
433-
sagemaker.InstanceGroup('server','ml.c5n.18xlarge',64)]
427+
instance_groups (list[:class:`sagemaker.instance_group.InstanceGroup`]):
428+
Optional. A list of ``InstanceGroup`` objects
429+
for launching a training job with a heterogeneous cluster.
430+
For example:
431+
432+
.. code:: python
433+
434+
instance_groups=[
435+
sagemaker.InstanceGroup(
436+
'instance_group_name_1', 'ml.p3dn.24xlarge', 64),
437+
sagemaker.InstanceGroup(
438+
'instance_group_name_2', 'ml.c5n.18xlarge', 64)]
439+
440+
For instructions on how to use ``InstanceGroup`` objects
441+
to configure a heterogeneous cluster
442+
through the SageMaker generic and framework estimator classes, see
443+
`Train Using a Heterogeneous Cluster
444+
<https://docs.aws.amazon.com/sagemaker/latest/dg/train-heterogeneous-cluster.html>`_
445+
in the *Amazon SageMaker developer guide*.
434446
"""
435447
instance_count = renamed_kwargs(
436448
"train_instance_count", "instance_count", instance_count, kwargs
@@ -2418,10 +2430,25 @@ def __init__(
24182430
>>> |------ virtual-env
24192431
24202432
This is not supported with "local code" in Local Mode.
2421-
instance_groups (list[InstanceGroup]): Optional. List of InstanceGroup
2422-
for specifying different instance groups for heterogeneous cluster.
2423-
For example: [sagemaker.InstanceGroup('worker','ml.p3dn.24xlarge',64),
2424-
sagemaker.InstanceGroup('server','ml.c5n.18xlarge',64)]
2433+
instance_groups (list[:class:`sagemaker.instance_group.InstanceGroup`]):
2434+
Optional. A list of ``InstanceGroup`` objects
2435+
for launching a training job with a heterogeneous cluster.
2436+
For example:
2437+
2438+
.. code:: python
2439+
2440+
instance_groups=[
2441+
sagemaker.InstanceGroup(
2442+
'instance_group_name_1', 'ml.p3dn.24xlarge', 64),
2443+
sagemaker.InstanceGroup(
2444+
'instance_group_name_2', 'ml.c5n.18xlarge', 64)]
2445+
2446+
For instructions on how to use ``InstanceGroup`` objects
2447+
to configure a heterogeneous cluster
2448+
through the SageMaker generic and framework estimator classes, see
2449+
`Train Using a Heterogeneous Cluster
2450+
<https://docs.aws.amazon.com/sagemaker/latest/dg/train-heterogeneous-cluster.html>`_
2451+
in the *Amazon SageMaker developer guide*.
24252452
"""
24262453
self.image_uri = image_uri
24272454
self._hyperparameters = hyperparameters.copy() if hyperparameters else {}

src/sagemaker/inputs.py

Lines changed: 23 additions & 14 deletions
Original file line numberDiff line numberDiff line change
@@ -41,28 +41,37 @@ def __init__(
4141
target_attribute_name=None,
4242
shuffle_config=None,
4343
):
44-
"""Create a definition for input data used by an SageMaker training job.
44+
r"""Create a definition for input data used by an SageMaker training job.
4545
46-
See AWS documentation on the ``CreateTrainingJob`` API for more details on the parameters.
46+
See AWS documentation on the ``CreateTrainingJob`` API for more details
47+
on the parameters.
4748
4849
Args:
49-
s3_data (str): Defines the location of s3 data to train on.
50-
distribution (str): Valid values: 'FullyReplicated', 'ShardedByS3Key'
51-
(default: 'FullyReplicated').
52-
compression (str): Valid values: 'Gzip', None (default: None). This is used only in
50+
s3_data (str): Defines the location of S3 data to train on.
51+
distribution (str): Valid values: ``'FullyReplicated'``,
52+
``'ShardedByS3Key'``
53+
(default: ``'FullyReplicated'``).
54+
compression (str): Valid values: ``'Gzip'``, ``None`` (default: None).
55+
This is used only in
5356
Pipe input mode.
5457
content_type (str): MIME type of the input data (default: None).
5558
record_wrapping (str): Valid values: 'RecordIO' (default: None).
56-
s3_data_type (str): Valid values: 'S3Prefix', 'ManifestFile', 'AugmentedManifestFile'.
57-
If 'S3Prefix', ``s3_data`` defines a prefix of s3 objects to train on.
59+
s3_data_type (str): Valid values: ``'S3Prefix'``, ``'ManifestFile'``,
60+
``'AugmentedManifestFile'``.
61+
If ``'S3Prefix'``, ``s3_data`` defines a prefix of s3 objects to train on.
5862
All objects with s3 keys beginning with ``s3_data`` will be used to train.
59-
If 'ManifestFile' or 'AugmentedManifestFile', then ``s3_data`` defines a
60-
single S3 manifest file or augmented manifest file (respectively),
63+
If ``'ManifestFile'`` or ``'AugmentedManifestFile'``,
64+
then ``s3_data`` defines a
65+
single S3 manifest file or augmented manifest file respectively,
6166
listing the S3 data to train on. Both the ManifestFile and
62-
AugmentedManifestFile formats are described in the SageMaker API documentation:
63-
https://docs.aws.amazon.com/sagemaker/latest/dg/API_S3DataSource.html
64-
instance_groups (list[str]): Optional. List of InstanceGroupNames to send data to
65-
(default: None). By default, data will be sent to all groups.
67+
AugmentedManifestFile formats are described at `S3DataSource
68+
<https://docs.aws.amazon.com/sagemaker/latest/dg/API_S3DataSource.html>`_
69+
in the `Amazon SageMaker API reference`.
70+
instance_groups (list[str]): Optional. A list of ``instance_group_name``\ s
71+
of a heterogeneous cluster that's configured using the
72+
:class:`sagemaker.instance_group.InstanceGroup`.
73+
S3 data will be sent to all instance groups in the specified list.
74+
(default: None)
6675
input_mode (str): Optional override for this channel's input mode (default: None).
6776
By default, channels will use the input mode defined on
6877
``sagemaker.estimator.EstimatorBase.input_mode``, but they will ignore

src/sagemaker/instance_group.py

Lines changed: 23 additions & 12 deletions
Original file line numberDiff line numberDiff line change
@@ -10,32 +10,43 @@
1010
# distributed on an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF
1111
# ANY KIND, either express or implied. See the License for the specific
1212
# language governing permissions and limitations under the License.
13-
"""This file defines instance group for heterogeneous cluster."""
13+
"""Defines the InstanceGroup class that configures a heterogeneous cluster."""
1414
from __future__ import absolute_import
1515

1616

1717
class InstanceGroup(object):
18-
"""Accepts instance group parameters for conversion to request dict.
19-
20-
The `_to_request_dict` provides a method to turn the parameters into a dict.
21-
"""
18+
"""The class to create instance groups for a heterogeneous cluster."""
2219

2320
def __init__(
2421
self,
2522
instance_group_name=None,
2623
instance_type=None,
2724
instance_count=None,
2825
):
29-
"""Initialize a ``InstanceGroup`` instance.
26+
"""It initializes an ``InstanceGroup`` instance.
27+
28+
You can create instance group object of the ``InstanceGroup`` class
29+
by specifying the instance group configuration arguments.
3030
31-
InstanceGroup accepts instance group parameters and provides a method to turn
32-
these parameters into a dictionary.
31+
For instructions on how to use InstanceGroup objects
32+
to configure a heterogeneous cluster
33+
through the SageMaker generic and framework estimator classes, see
34+
`Train Using a Heterogeneous Cluster
35+
<https://docs.aws.amazon.com/sagemaker/latest/dg/train-heterogeneous-cluster.html>`_
36+
in the *Amazon SageMaker developer guide*.
3337
3438
Args:
35-
instance_group_name (str): Name of the instance group.
36-
instance_type (str): Type of EC2 instance to use in the instance group,
37-
for example, 'ml.c4.xlarge'.
38-
instance_count (int): Number of EC2 instances to use in the instance group.
39+
instance_group_name (str): The name of the instance group.
40+
instance_type (str): The instance type to use in the instance group.
41+
instance_count (int): The number of instances to use in the instance group.
42+
43+
.. tip::
44+
45+
For more information about available values for the arguments,
46+
see `InstanceGroup
47+
<https://docs.aws.amazon.com/sagemaker/latest/APIReference/API_InstanceGroup.html>`_
48+
API in the `Amazon SageMaker API reference`.
49+
3950
"""
4051
self.instance_group_name = instance_group_name
4152
self.instance_type = instance_type

0 commit comments

Comments
 (0)