Skip to content

Commit 346e6b9

Browse files
authored
Merge branch 'master' into master
2 parents 4132632 + a35a093 commit 346e6b9

File tree

79 files changed

+5632
-655
lines changed

Some content is hidden

Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.

79 files changed

+5632
-655
lines changed

.gitignore

Lines changed: 3 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -29,4 +29,6 @@ venv/
2929
env/
3030
.vscode/
3131
**/tmp
32-
.python-version
32+
.python-version
33+
**/_repack_model.py
34+
**/_repack_script_launcher.sh

CHANGELOG.md

Lines changed: 64 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -1,5 +1,69 @@
11
# Changelog
22

3+
## v2.117.0 (2022-11-15)
4+
5+
### Features
6+
7+
* add support for PT1.12.1
8+
9+
## v2.116.0 (2022-10-28)
10+
11+
### Features
12+
13+
* support customized timeout for model data download and inference container startup health check for Hosting Endpoints
14+
* Trainium Neuron support for PyTorch
15+
* Pipelines cache keys update
16+
* Caching Improvements for SM Pipeline Workflows
17+
18+
## v2.115.0 (2022-10-27)
19+
20+
### Features
21+
22+
* Add support for TF 2.10 training
23+
* Disable profiler for Trainium instance type
24+
* support the Hyperband strategy with the StrategyConfig
25+
* support the GridSearch strategy for hyperparameter optimization
26+
27+
### Bug Fixes and Other Changes
28+
29+
* Update Graviton supported instance families
30+
31+
## v2.114.0 (2022-10-26)
32+
33+
### Features
34+
35+
* Graviton support for XGB and SKLearn frameworks
36+
* Graviton support for PyTorch and Tensorflow frameworks
37+
* do not expand estimator role when it is pipeline parameter
38+
* added support for batch transform with model monitoring
39+
40+
### Bug Fixes and Other Changes
41+
42+
* regex in tuning integs
43+
* remove debugger environment var set up
44+
* adjacent slash in s3 key
45+
* Fix Repack step auto install behavior
46+
* Add retry for airflow ParsingError
47+
48+
### Documentation Changes
49+
50+
* doc fix
51+
52+
## v2.113.0 (2022-10-21)
53+
54+
### Features
55+
56+
* support torch_distributed distribution for Trainium instances
57+
58+
### Bug Fixes and Other Changes
59+
60+
* bump apache-airflow from 2.4.0 to 2.4.1 in /requirements/extras
61+
62+
### Documentation Changes
63+
64+
* fix kwargs and descriptions of the smdmp checkpoint function
65+
* add the doc for the MonitorBatchTransformStep
66+
367
## v2.112.2 (2022-10-11)
468

569
### Bug Fixes and Other Changes

VERSION

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1 +1 @@
1-
2.112.3.dev0
1+
2.117.1.dev0

doc/amazon_sagemaker_model_building_pipeline.rst

Lines changed: 163 additions & 7 deletions
Original file line numberDiff line numberDiff line change
@@ -322,29 +322,34 @@ Example:
322322

323323
.. code-block:: python
324324
325+
bucket = "my-bucket"
326+
model_prefix = "my-model"
327+
325328
step_tune = TuningStep(...)
326329
# tuning step can launch multiple training jobs, thus producing multiple model artifacts
327330
# we can create a model with the best performance
328331
best_model = Model(
329332
model_data=Join(
330333
on="/",
331334
values=[
332-
"s3://my-bucket",
335+
f"s3://{bucket}/{model_prefix}",
333336
# from DescribeHyperParameterTuningJob
334337
step_tune.properties.BestTrainingJob.TrainingJobName,
335338
"output/model.tar.gz",
336339
],
340+
)
337341
)
338342
# we can also access any top-k best as we wish
339343
second_best_model = Model(
340344
model_data=Join(
341345
on="/",
342346
values=[
343-
"s3://my-bucket",
347+
f"s3://{bucket}/{model_prefix}",
344348
# from ListTrainingJobsForHyperParameterTuningJob
345349
step_tune.properties.TrainingJobSummaries[1].TrainingJobName,
346350
"output/model.tar.gz",
347351
],
352+
)
348353
)
349354
350355
:class:`sagemaker.workflow.steps.TuningStep` also has a helper function to generate any :code:`top-k` model data URI easily:
@@ -353,7 +358,8 @@ Example:
353358
354359
model_data = step_tune.get_top_model_s3_uri(
355360
top_k=0, # best model
356-
s3_bucket="s3://my-bucekt",
361+
s3_bucket=bucket,
362+
prefix=model_prefix
357363
)
358364
359365
CreateModelStep
@@ -833,9 +839,9 @@ The following example uses :class:`sagemaker.workflow.parallelism_config.Paralle
833839
834840
Caching Configuration
835841
==============================
836-
Executing the step without changing its configurations, inputs, or outputs can be a waste. Thus, we can enable caching for pipeline steps. When caching is enabled, an expiration time (in `ISO8601 duration string format`_) needs to be supplied. The expiration time indicates how old a previous execution can be to be considered for reuse.
842+
Executing the step without changing its configurations, inputs, or outputs can be a waste. Thus, we can enable caching for pipeline steps. When you use step signature caching, SageMaker Pipelines tries to use a previous run of your current pipeline step instead of running the step again. When previous runs are considered for reuse, certain arguments from the step are evaluated to see if any have changed. If any of these arguments have been updated, the step will execute again with the new configuration.
837843
838-
.. _ISO8601 duration string format: https://en.wikipedia.org/wiki/ISO_8601#Durations
844+
When you turn on caching, you supply an expiration time (in `ISO8601 duration string format <https://en.wikipedia.org/wiki/ISO_8601#Durations>`__). The expiration time indicates how old a previous execution can be to be considered for reuse.
839845
840846
.. code-block:: python
841847
@@ -844,13 +850,13 @@ Executing the step without changing its configurations, inputs, or outputs can b
844850
expire_after="P30d" # 30-day
845851
)
846852
847-
Here are few sample ISO8601 duration strings:
853+
You can format your ISO8601 duration strings like the following examples:
848854
849855
- :code:`p30d`: 30 days
850856
- :code:`P4DT12H`: 4 days and 12 hours
851857
- :code:`T12H`: 12 hours
852858
853-
Caching is supported for the following step type:
859+
Caching is supported for the following step types:
854860
855861
- :class:`sagemaker.workflow.steps.TrainingStep`
856862
- :class:`sagemaker.workflow.steps.ProcessingStep`
@@ -860,6 +866,156 @@ Caching is supported for the following step type:
860866
- :class:`sagemaker.workflow.clarify_check_step.ClarifyCheckStep`
861867
- :class:`sagemaker.workflow.emr_step.EMRStep`
862868
869+
In order to create pipeline steps and eventually construct a SageMaker pipeline, you provide parameters within a Python script or notebook. The SageMaker Python SDK creates a pipeline definition by translating these parameters into SageMaker job attributes. Some of these attributes, when changed, cause the step to re-run (See `Caching Pipeline Steps <https://docs.aws.amazon.com/sagemaker/latest/dg/pipelines-caching.html>`__ for a detailed list). Therefore, if you update a SDK parameter that is used to create such an attribute, the step will rerun. See the following discussion for examples of this in processing and training steps, which are commonly used steps in Pipelines.
870+
871+
The following example creates a processing step:
872+
873+
.. code-block:: python
874+
875+
from sagemaker.workflow.pipeline_context import PipelineSession
876+
from sagemaker.sklearn.processing import SKLearnProcessor
877+
from sagemaker.workflow.steps import ProcessingStep
878+
from sagemaker.dataset_definition.inputs import S3Input
879+
from sagemaker.processing import ProcessingInput, ProcessingOutput
880+
881+
pipeline_session = PipelineSession()
882+
883+
framework_version = "0.23-1"
884+
885+
sklearn_processor = SKLearnProcessor(
886+
framework_version=framework_version,
887+
instance_type="ml.m5.xlarge",
888+
instance_count=processing_instance_count,
889+
role=role,
890+
sagemaker_session=pipeline_session
891+
)
892+
893+
processor_args = sklearn_processor.run(
894+
inputs=[
895+
ProcessingInput(
896+
source="artifacts/data/abalone-dataset.csv",
897+
input_name="abalone-dataset",
898+
s3_input=S3Input(
899+
local_path="/opt/ml/processing/input",
900+
s3_uri="artifacts/data/abalone-dataset.csv",
901+
s3_data_type="S3Prefix",
902+
s3_input_mode="File",
903+
s3_data_distribution_type="FullyReplicated",
904+
s3_compression_type="None",
905+
)
906+
)
907+
],
908+
outputs=[
909+
ProcessingOutput(output_name="train", source="/opt/ml/processing/train"),
910+
ProcessingOutput(output_name="validation", source="/opt/ml/processing/validation"),
911+
ProcessingOutput(output_name="test", source="/opt/ml/processing/test"),
912+
],
913+
code="artifacts/code/process/preprocessing.py",
914+
)
915+
916+
processing_step = ProcessingStep(
917+
name="Process",
918+
step_args=processor_args,
919+
cache_config=cache_config
920+
)
921+
922+
The following parameters from the example cause additional processing step iterations when you change them:
923+
924+
- :code:`framework_version`: This parameter is used to construct the :code:`image_uri` for the `AppSpecification <https://docs.aws.amazon.com/sagemaker/latest/APIReference/API_AppSpecification.html>`__ attribute of the processing job.
925+
- :code:`inputs`: Any :class:`ProcessingInputs` are passed through directly as job `ProcessingInputs <https://docs.aws.amazon.com/sagemaker/latest/APIReference/API_ProcessingInput.html>`__. Input :code:`source` files that exist in the container’s local file system are uploaded to S3 and given a new :code:`S3_Uri`. If the S3 path changes, a new processing job is initiated. For examples of S3 paths, see the **S3 Artifact Folder Structure** section.
926+
- :code:`code`: The code parameter is also packaged as a `ProcessingInput <https://docs.aws.amazon.com/sagemaker/latest/APIReference/API_ProcessingInput.html>`__ job. For local files, a unique hash is created from the file. The file is then uploaded to S3 with the hash included in the path. When a different local file is used, a new hash is created and the S3 path for that `ProcessingInput <https://docs.aws.amazon.com/sagemaker/latest/APIReference/API_ProcessingInput.html>`__ changes, initiating a new step run. For examples S3 paths, see the **S3 Artifact Folder Structure** section.
927+
928+
The following example creates a training step:
929+
930+
.. code-block:: python
931+
932+
from sagemaker.sklearn.estimator import SKLearn
933+
from sagemaker.workflow.steps import TrainingStep
934+
935+
pipeline_session = PipelineSession()
936+
937+
image_uri = sagemaker.image_uris.retrieve(
938+
framework="xgboost",
939+
region=region,
940+
version="1.0-1",
941+
py_version="py3",
942+
instance_type="ml.m5.xlarge",
943+
)
944+
945+
hyperparameters = {
946+
"dataset_frequency": "H",
947+
"timestamp_format": "yyyy-MM-dd hh:mm:ss",
948+
"number_of_backtest_windows": "1",
949+
"role_arn": role_arn,
950+
"region": region,
951+
}
952+
953+
sklearn_estimator = SKLearn(
954+
entry_point="train.py",
955+
role=role_arn,
956+
image_uri=container_image_uri,
957+
instance_type=training_instance_type,
958+
sagemaker_session=pipeline_session,
959+
base_job_name="training_job",
960+
hyperparameters=hyperparameters,
961+
enable_sagemaker_metrics=True,
962+
)
963+
964+
train_args = xgb_train.fit(
965+
inputs={
966+
"train": TrainingInput(
967+
s3_data=step_process.properties.ProcessingOutputConfig.Outputs[
968+
"train"
969+
].S3Output.S3Uri,
970+
content_type="text/csv",
971+
),
972+
"validation": TrainingInput(
973+
s3_data=step_process.properties.ProcessingOutputConfig.Outputs[
974+
"validation"
975+
].S3Output.S3Uri,
976+
content_type="text/csv",
977+
),
978+
}
979+
)
980+
981+
training_step = TrainingStep(
982+
name="Train",
983+
estimator=sklearn_estimator,
984+
cache_config=cache_config
985+
)
986+
987+
The following parameters from the example cause additional training step iterations when you change them:
988+
989+
- :code:`image_uri`: The :code:`image_uri` parameter defines the image used for training, and is used directly in the `AlgorithmSpecification <https://docs.aws.amazon.com/sagemaker/latest/APIReference/API_AlgorithmSpecification.html>`__ attribute of the training job.
990+
- :code:`hyperparameters`: All of the hyperparameters are used directly in the `HyperParameters <https://docs.aws.amazon.com/sagemaker/latest/APIReference/API_DescribeTrainingJob.html#API_DescribeTrainingJob_ResponseSyntax>`__ attribute for the training job.
991+
- :code:`entry_point`: The entry point file is included in the training job’s `InputDataConfig Channel <https://docs.aws.amazon.com/sagemaker/latest/APIReference/API_Channel.html>`__ array. A unique hash is created from the file (and any other dependencies), and then the file is uploaded to S3 with the hash included in the path. When a different entry point file is used, a new hash is created and the S3 path for that `InputDataConfig Channel <https://docs.aws.amazon.com/sagemaker/latest/APIReference/API_Channel.html>`__ object changes, initiating a new step run. For examples of what the S3 paths look like, see the **S3 Artifact Folder Structure** section.
992+
- :code:`inputs`: The inputs are also included in the training job’s `InputDataConfig <https://docs.aws.amazon.com/sagemaker/latest/APIReference/API_Channel.html>`__. Local inputs are uploaded to S3. If the S3 path changes, a new training job is initiated. For examples of S3 paths, see the **S3 Artifact Folder Structure** section.
993+
994+
S3 Artifact Folder Structure
995+
----------------------------
996+
997+
You use the following S3 paths when uploading local input and code artifacts, and when saving output artifacts.
998+
999+
*Processing*
1000+
1001+
- Code: :code:`s3://bucket_name/pipeline_name/code/<code_hash>/file.py`. The file could also be a tar.gz of source_dir and dependencies.
1002+
- Input Data: :code:`s3://bucket_name/pipeline_name/step_name/input/input_name/file.csv`
1003+
- Configuration: :code:`s3://bucket_name/pipeline_name/step_name/input/conf/<configuration_hash>/configuration.json`
1004+
- Output: :code:`s3://bucket_name/pipeline_name/<execution_id>/step_name/output/output_name`
1005+
1006+
*Training*
1007+
1008+
- Code: :code:`s3://bucket_name/code_location/pipeline_name/code/<code_hash>/code.tar.gz`
1009+
- Output: The output paths for Training jobs can vary - the default output path is the root of the s3 bucket: :code:`s3://bucket_name`. For Training jobs created from a Tuning job, the default path includes the Training job name created by the Training platform, formatted as :code:`s3://bucket_name/<training_job_name>/output/model.tar.gz`.
1010+
1011+
*Transform*
1012+
1013+
- Output: :code:`s3://bucket_name/pipeline_name/<execution_id>/step_name`
1014+
1015+
.. warning::
1016+
For input artifacts such as data or code files, the actual content of the artifacts is not tracked, only the S3 path. This means that if a file in S3 is updated and re-uploaded directly with an identical name and path, then the step does NOT run again.
1017+
1018+
8631019
Retry Policy
8641020
===============
8651021

0 commit comments

Comments
 (0)