You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
:class:`sagemaker.workflow.steps.TuningStep` also has a helper function to generate any :code:`top-k` model data URI easily:
@@ -353,7 +358,8 @@ Example:
353
358
354
359
model_data = step_tune.get_top_model_s3_uri(
355
360
top_k=0, # best model
356
-
s3_bucket="s3://my-bucekt",
361
+
s3_bucket=bucket,
362
+
prefix=model_prefix
357
363
)
358
364
359
365
CreateModelStep
@@ -833,9 +839,9 @@ The following example uses :class:`sagemaker.workflow.parallelism_config.Paralle
833
839
834
840
Caching Configuration
835
841
==============================
836
-
Executing the step without changing its configurations, inputs, or outputs can be a waste. Thus, we can enable caching for pipeline steps. When caching is enabled, an expiration time (in`ISO8601 duration string format`_) needs to be supplied. The expiration time indicates how old a previous execution can be to be considered for reuse.
842
+
Executing the step without changing its configurations, inputs, or outputs can be a waste. Thus, we can enable caching for pipeline steps. When you use step signature caching, SageMaker Pipelines tries to use a previous run of your current pipeline step instead of running the step again. When previous runs are considered for reuse, certain arguments from the step are evaluated to see ifany have changed. If any of these arguments have been updated, the step will execute again with the new configuration.
When you turn on caching, you supply an expiration time (in`ISO8601duration string format<https://en.wikipedia.org/wiki/ISO_8601#Durations>`__). The expiration time indicates how old a previous execution can be to be considered for reuse.
839
845
840
846
.. code-block:: python
841
847
@@ -844,13 +850,13 @@ Executing the step without changing its configurations, inputs, or outputs can b
844
850
expire_after="P30d"# 30-day
845
851
)
846
852
847
-
Here are few sampleISO8601 duration strings:
853
+
You can format yourISO8601 duration strings like the following examples:
848
854
849
855
- :code:`p30d`: 30 days
850
856
- :code:`P4DT12H`: 4 days and12 hours
851
857
- :code:`T12H`: 12 hours
852
858
853
-
Caching is supported for the following step type:
859
+
Caching is supported for the following step types:
In order to create pipeline steps and eventually construct a SageMaker pipeline, you provide parameters within a Python script or notebook. The SageMaker Python SDK creates a pipeline definition by translating these parameters into SageMaker job attributes. Some of these attributes, when changed, cause the step to re-run (See `Caching Pipeline Steps <https://docs.aws.amazon.com/sagemaker/latest/dg/pipelines-caching.html>`__ for a detailed list). Therefore, if you update a SDK parameter that is used to create such an attribute, the step will rerun. See the following discussion for examples of this in processing and training steps, which are commonly used steps in Pipelines.
870
+
871
+
The following example creates a processing step:
872
+
873
+
.. code-block:: python
874
+
875
+
from sagemaker.workflow.pipeline_context import PipelineSession
876
+
from sagemaker.sklearn.processing import SKLearnProcessor
877
+
from sagemaker.workflow.steps import ProcessingStep
878
+
from sagemaker.dataset_definition.inputs import S3Input
879
+
from sagemaker.processing import ProcessingInput, ProcessingOutput
The following parameters from the example cause additional processing step iterations when you change them:
923
+
924
+
- :code:`framework_version`: This parameter is used to construct the :code:`image_uri`for the `AppSpecification <https://docs.aws.amazon.com/sagemaker/latest/APIReference/API_AppSpecification.html>`__ attribute of the processing job.
925
+
- :code:`inputs`: Any :class:`ProcessingInputs` are passed through directly as job `ProcessingInputs <https://docs.aws.amazon.com/sagemaker/latest/APIReference/API_ProcessingInput.html>`__. Input :code:`source` files that exist in the container’s local file system are uploaded to S3 and given a new :code:`S3_Uri`. If the S3 path changes, a new processing job is initiated. For examples of S3 paths, see the **S3 Artifact Folder Structure** section.
926
+
- :code:`code`: The code parameter is also packaged as a `ProcessingInput <https://docs.aws.amazon.com/sagemaker/latest/APIReference/API_ProcessingInput.html>`__ job. For local files, a unique hashis created from the file. The fileis then uploaded to S3 with the hash included in the path. When a different local fileis used, a new hashis created and the S3 path for that `ProcessingInput <https://docs.aws.amazon.com/sagemaker/latest/APIReference/API_ProcessingInput.html>`__ changes, initiating a new step run. For examples S3 paths, see the **S3 Artifact Folder Structure** section.
The following parameters from the example cause additional training step iterations when you change them:
988
+
989
+
- :code:`image_uri`: The :code:`image_uri` parameter defines the image used for training, andis used directly in the `AlgorithmSpecification <https://docs.aws.amazon.com/sagemaker/latest/APIReference/API_AlgorithmSpecification.html>`__ attribute of the training job.
990
+
- :code:`hyperparameters`: All of the hyperparameters are used directly in the `HyperParameters <https://docs.aws.amazon.com/sagemaker/latest/APIReference/API_DescribeTrainingJob.html#API_DescribeTrainingJob_ResponseSyntax>`__ attribute for the training job.
991
+
- :code:`entry_point`: The entry point fileis included in the training job’s `InputDataConfig Channel <https://docs.aws.amazon.com/sagemaker/latest/APIReference/API_Channel.html>`__ array. A unique hashis created from the file (andany other dependencies), and then the fileis uploaded to S3 with the hash included in the path. When a different entry point fileis used, a new hashis created and the S3 path for that `InputDataConfig Channel <https://docs.aws.amazon.com/sagemaker/latest/APIReference/API_Channel.html>`__ object changes, initiating a new step run. For examples of what the S3 paths look like, see the **S3 Artifact Folder Structure** section.
992
+
- :code:`inputs`: The inputs are also included in the training job’s `InputDataConfig <https://docs.aws.amazon.com/sagemaker/latest/APIReference/API_Channel.html>`__. Local inputs are uploaded to S3. If the S3 path changes, a new training job is initiated. For examples of S3 paths, see the **S3 Artifact Folder Structure** section.
993
+
994
+
S3 Artifact Folder Structure
995
+
----------------------------
996
+
997
+
You use the following S3 paths when uploading local inputand code artifacts, and when saving output artifacts.
998
+
999
+
*Processing*
1000
+
1001
+
- Code: :code:`s3://bucket_name/pipeline_name/code/<code_hash>/file.py`. The file could also be a tar.gz of source_dir and dependencies.
- Output: The output paths for Training jobs can vary - the default output path is the root of the s3 bucket: :code:`s3://bucket_name`. For Training jobs created from a Tuning job, the default path includes the Training job name created by the Training platform, formatted as :code:`s3://bucket_name/<training_job_name>/output/model.tar.gz`.
For input artifacts such as data or code files, the actual content of the artifacts isnot tracked, only the S3 path. This means that if a filein S3 is updated and re-uploaded directly with an identical name and path, then the step does NOT run again.
0 commit comments