Skip to content
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.

Commit 0b2a16c

Browse files
committedAug 30, 2022
Merge remote-tracking branch 'aws/master' into trcomp-hf-pt-111
2 parents 8a7827d + 29fc70e commit 0b2a16c

File tree

9 files changed

+123
-29
lines changed

9 files changed

+123
-29
lines changed
 

‎CHANGELOG.md

Lines changed: 14 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -1,5 +1,19 @@
11
# Changelog
22

3+
## v2.107.0 (2022-08-29)
4+
5+
### Features
6+
7+
* support python 3.10, update airflow dependency
8+
9+
### Bug Fixes and Other Changes
10+
11+
* Add retry in session.py to check if training is finished
12+
13+
### Documentation Changes
14+
15+
* remove Other tab in Built-in algorithms section and mi…
16+
317
## v2.106.0 (2022-08-24)
418

519
### Features

‎CONTRIBUTING.md

Lines changed: 4 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -78,9 +78,9 @@ Before sending us a pull request, please ensure that:
7878
1. cd into the sagemaker-python-sdk folder: `cd sagemaker-python-sdk` or `cd /environment/sagemaker-python-sdk`
7979
1. Run the following tox command and verify that all code checks and unit tests pass: `tox tests/unit`
8080

81-
You can also run a single test with the following command: `tox -e py36 -- -s -vv <path_to_file><file_name>::<test_function_name>`
81+
You can also run a single test with the following command: `tox -e py310 -- -s -vv <path_to_file><file_name>::<test_function_name>`
8282
* Note that the coverage test will fail if you only run a single test, so make sure to surround the command with `export IGNORE_COVERAGE=-` and `unset IGNORE_COVERAGE`
83-
* Example: `export IGNORE_COVERAGE=- ; tox -e py36 -- -s -vv tests/unit/test_estimator.py::test_sagemaker_model_s3_uri_invalid ; unset IGNORE_COVERAGE`
83+
* Example: `export IGNORE_COVERAGE=- ; tox -e py310 -- -s -vv tests/unit/test_estimator.py::test_sagemaker_model_s3_uri_invalid ; unset IGNORE_COVERAGE`
8484

8585

8686
### Run the Integration Tests
@@ -89,9 +89,9 @@ Our CI system runs integration tests (the ones in the `tests/integ` directory),
8989
You should only worry about manually running any new integration tests that you write, or integration tests that test an area of code that you've modified.
9090

9191
1. Follow the instructions at [Set Up the AWS Command Line Interface (AWS CLI)](https://docs.aws.amazon.com/polly/latest/dg/setup-aws-cli.html).
92-
1. To run a test, specify the test file and method you want to run per the following command: `tox -e py36 -- -s -vv <path_to_file><file_name>::<test_function_name>`
92+
1. To run a test, specify the test file and method you want to run per the following command: `tox -e py310 -- -s -vv <path_to_file><file_name>::<test_function_name>`
9393
* Note that the coverage test will fail if you only run a single test, so make sure to surround the command with `export IGNORE_COVERAGE=-` and `unset IGNORE_COVERAGE`
94-
* Example: `export IGNORE_COVERAGE=- ; tox -e py36 -- -s -vv tests/integ/test_tf_script_mode.py::test_mnist ; unset IGNORE_COVERAGE`
94+
* Example: `export IGNORE_COVERAGE=- ; tox -e py310 -- -s -vv tests/integ/test_tf_script_mode.py::test_mnist ; unset IGNORE_COVERAGE`
9595

9696
If you are writing or modifying a test that creates a SageMaker job (training, tuner, or transform) or endpoint, it's important to assign a concurrency-friendly `job_name` (or `endpoint_name`), or your tests may fail randomly due to name collisions. We have a helper method `sagemaker.utils.unique_name_from_base(base, max_length)` that makes test-friendly names. You can find examples of how to use it [here](https://github.com/aws/sagemaker-python-sdk/blob/3816a5658d3737c9767e01bc8d37fc3ed5551593/tests/integ/test_tfs.py#L37) and
9797
[here](https://github.com/aws/sagemaker-python-sdk/blob/3816a5658d3737c9767e01bc8d37fc3ed5551593/tests/integ/test_tuner.py#L616), or by searching for "unique\_name\_from\_base" in our test code.

‎README.rst

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -90,6 +90,7 @@ SageMaker Python SDK is tested on:
9090
- Python 3.7
9191
- Python 3.8
9292
- Python 3.9
93+
- Python 3.10
9394

9495
AWS Permissions
9596
~~~~~~~~~~~~~~~

‎VERSION

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1 +1 @@
1-
2.106.1.dev0
1+
2.107.1.dev0

‎requirements/extras/test_requirements.txt

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -11,7 +11,7 @@ contextlib2==21.6.0
1111
awslogs==0.14.0
1212
black==22.3.0
1313
stopit==1.1.2
14-
apache-airflow==2.2.4
14+
apache-airflow==2.3.4
1515
apache-airflow-providers-amazon==4.0.0
1616
attrs==20.3.0
1717
fabric==2.6.0

‎setup.py

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -94,6 +94,7 @@ def read_requirements(filename):
9494
"Programming Language :: Python :: 3.7",
9595
"Programming Language :: Python :: 3.8",
9696
"Programming Language :: Python :: 3.9",
97+
"Programming Language :: Python :: 3.10",
9798
],
9899
install_requires=required_packages,
99100
extras_require=extras,

‎src/sagemaker/estimator.py

Lines changed: 77 additions & 9 deletions
Original file line numberDiff line numberDiff line change
@@ -167,10 +167,44 @@ def __init__(
167167
instance_type (str): Type of EC2 instance to use for training,
168168
for example, ``'ml.c4.xlarge'``. Required if instance_groups is
169169
not set.
170-
volume_size (int): Size in GB of the EBS volume to use for
171-
storing input data during training (default: 30). Must be large
172-
enough to store training data if File Mode is used (which is the
173-
default).
170+
volume_size (int): Size in GB of the storage volume to use for
171+
storing input and output data during training (default: 30).
172+
173+
Must be large enough to store training data if File mode is
174+
used, which is the default mode.
175+
176+
When you use an ML instance with the EBS-only storage option
177+
such as ``ml.c5`` and ``ml.p2``,
178+
you must define the size of the EBS
179+
volume through the ``volume_size`` parameter in the estimator class.
180+
181+
.. note::
182+
183+
When you use an ML instance with `NVMe SSD volumes
184+
<https://docs.aws.amazon.com/AWSEC2/latest/UserGuide/ssd-instance-store.html#nvme-ssd-volumes>`_
185+
such as ``ml.p4d``, ``ml.g4dn``, and ``ml.g5``,
186+
do not include this parameter in the estimator configuration.
187+
If you use one of those ML instance types,
188+
SageMaker doesn't provision Amazon EBS General Purpose SSD
189+
(gp2) storage nor take this parameter to adjust the NVMe instance storage.
190+
Available storage is fixed to the NVMe instance storage
191+
capacity. SageMaker configures storage paths for training
192+
datasets, checkpoints, model artifacts, and outputs to use the
193+
entire capacity of the instance storage.
194+
195+
Note that if you include this parameter and specify a number that
196+
exceeds the size of the NVMe volume attached to the instance type,
197+
SageMaker returns an ``Invalid VolumeSizeInGB`` error.
198+
199+
To look up instance types and their instance storage types
200+
and volumes, see `Amazon EC2 Instance Types
201+
<http://aws.amazon.com/ec2/instance-types/>`_.
202+
203+
To find the default local paths defined by the SageMaker
204+
training platform, see `Amazon SageMaker Training Storage
205+
Folders for Training Datasets, Checkpoints, Model Artifacts,
206+
and Outputs
207+
<https://docs.aws.amazon.com/sagemaker/latest/dg/model-train-storage.html>`_.
174208
volume_kms_key (str): Optional. KMS key ID for encrypting EBS
175209
volume attached to the training instance (default: None).
176210
max_run (int): Timeout in seconds for training (default: 24 *
@@ -2233,12 +2267,46 @@ def __init__(
22332267
instance_count (int): Number of Amazon EC2 instances to use
22342268
for training. Required if instance_groups is not set.
22352269
instance_type (str): Type of EC2 instance to use for training,
2236-
for example, 'ml.c4.xlarge'. Required if instance_groups is
2270+
for example, ``'ml.c4.xlarge'``. Required if instance_groups is
22372271
not set.
2238-
volume_size (int): Size in GB of the EBS volume to use for
2239-
storing input data during training (default: 30). Must be large
2240-
enough to store training data if File Mode is used (which is the
2241-
default).
2272+
volume_size (int): Size in GB of the storage volume to use for
2273+
storing input and output data during training (default: 30).
2274+
2275+
Must be large enough to store training data if File mode is
2276+
used, which is the default mode.
2277+
2278+
When you use an ML instance with the EBS-only storage option
2279+
such as ``ml.c5`` and ``ml.p2``,
2280+
you must define the size of the EBS
2281+
volume through the ``volume_size`` parameter in the estimator class.
2282+
2283+
.. note::
2284+
2285+
When you use an ML instance with `NVMe SSD volumes
2286+
<https://docs.aws.amazon.com/AWSEC2/latest/UserGuide/ssd-instance-store.html#nvme-ssd-volumes>`_
2287+
such as ``ml.p4d``, ``ml.g4dn``, and ``ml.g5``,
2288+
do not include this parameter in the estimator configuration.
2289+
If you use one of those ML instance types,
2290+
SageMaker doesn't provision Amazon EBS General Purpose SSD
2291+
(gp2) storage nor take this parameter to adjust the NVMe instance storage.
2292+
Available storage is fixed to the NVMe instance storage
2293+
capacity. SageMaker configures storage paths for training
2294+
datasets, checkpoints, model artifacts, and outputs to use the
2295+
entire capacity of the instance storage.
2296+
2297+
Note that if you include this parameter and specify a number that
2298+
exceeds the size of the NVMe volume attached to the instance type,
2299+
SageMaker returns an ``Invalid VolumeSizeInGB`` error.
2300+
2301+
To look up instance types and their instance storage types
2302+
and volumes, see `Amazon EC2 Instance Types
2303+
<http://aws.amazon.com/ec2/instance-types/>`_.
2304+
2305+
To find the default local paths defined by the SageMaker
2306+
training platform, see `Amazon SageMaker Training Storage
2307+
Folders for Training Datasets, Checkpoints, Model Artifacts,
2308+
and Outputs
2309+
<https://docs.aws.amazon.com/sagemaker/latest/dg/model-train-storage.html>`_.
22422310
volume_kms_key (str): Optional. KMS key ID for encrypting EBS
22432311
volume attached to the training instance (default: None).
22442312
max_run (int): Timeout in seconds for training (default: 24 *

‎src/sagemaker/session.py

Lines changed: 22 additions & 12 deletions
Original file line numberDiff line numberDiff line change
@@ -41,6 +41,7 @@
4141
secondary_training_status_changed,
4242
secondary_training_status_message,
4343
sts_regional_endpoint,
44+
retries,
4445
)
4546
from sagemaker import exceptions
4647
from sagemaker.session_settings import SessionSettings
@@ -4699,21 +4700,30 @@ def _train_done(sagemaker_client, job_name, last_desc):
46994700
"""Placeholder docstring"""
47004701
in_progress_statuses = ["InProgress", "Created"]
47014702

4702-
desc = sagemaker_client.describe_training_job(TrainingJobName=job_name)
4703-
status = desc["TrainingJobStatus"]
4703+
for _ in retries(
4704+
max_retry_count=10, # 10*30 = 5min
4705+
exception_message_prefix="Waiting for schedule to leave 'Pending' status",
4706+
seconds_to_sleep=30,
4707+
):
4708+
try:
4709+
desc = sagemaker_client.describe_training_job(TrainingJobName=job_name)
4710+
status = desc["TrainingJobStatus"]
47044711

4705-
if secondary_training_status_changed(desc, last_desc):
4706-
print()
4707-
print(secondary_training_status_message(desc, last_desc), end="")
4708-
else:
4709-
print(".", end="")
4710-
sys.stdout.flush()
4712+
if secondary_training_status_changed(desc, last_desc):
4713+
print()
4714+
print(secondary_training_status_message(desc, last_desc), end="")
4715+
else:
4716+
print(".", end="")
4717+
sys.stdout.flush()
47114718

4712-
if status in in_progress_statuses:
4713-
return desc, False
4719+
if status in in_progress_statuses:
4720+
return desc, False
47144721

4715-
print()
4716-
return desc, True
4722+
print()
4723+
return desc, True
4724+
except botocore.exceptions.ClientError as err:
4725+
if err.response["Error"]["Code"] == "AccessDeniedException":
4726+
pass
47174727

47184728

47194729
def _processing_job_status(sagemaker_client, job_name):

‎tests/integ/test_airflow_config.py

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -14,9 +14,9 @@
1414

1515
import os
1616

17-
import airflow
1817
import pytest
1918
import numpy as np
19+
from airflow import utils
2020
from airflow import DAG
2121
from airflow.contrib.operators.sagemaker_training_operator import SageMakerTrainingOperator
2222
from airflow.contrib.operators.sagemaker_transform_operator import SageMakerTransformOperator
@@ -624,7 +624,7 @@ def _build_airflow_workflow(estimator, instance_type, inputs=None, mini_batch_si
624624

625625
default_args = {
626626
"owner": "airflow",
627-
"start_date": airflow.utils.dates.days_ago(2),
627+
"start_date": utils.dates.days_ago(2),
628628
"provide_context": True,
629629
}
630630

0 commit comments

Comments
 (0)
Please sign in to comment.