Skip to content

Commit 300cd17

Browse files
authored
feat: Add detail profiler V2 options and tests (#4078)
1 parent 697c465 commit 300cd17

16 files changed

+686
-84
lines changed

doc/api/training/debugger.rst

+1-45
Original file line numberDiff line numberDiff line change
@@ -20,7 +20,7 @@ Debugger Rule APIs
2020
.. autoclass:: get_rule_container_image_uri
2121
:show-inheritance:
2222

23-
.. autoclass:: get_default_profiler_rule
23+
.. autoclass:: get_default_profiler_processing_job
2424
:show-inheritance:
2525

2626
.. class:: sagemaker.debugger.rule_configs
@@ -45,10 +45,6 @@ Debugger Rule APIs
4545
:show-inheritance:
4646
:inherited-members:
4747

48-
.. autoclass:: ProfilerRule
49-
:show-inheritance:
50-
:inherited-members:
51-
5248
Debugger Configuration APIs
5349
~~~~~~~~~~~~~~~~~~~~~~~~~~~
5450

@@ -60,43 +56,3 @@ Debugger Configuration APIs
6056

6157
.. autoclass:: TensorBoardOutputConfig
6258
:show-inheritance:
63-
64-
.. autoclass:: ProfilerConfig
65-
:show-inheritance:
66-
67-
Debugger Configuration APIs for Framework Profiling (Deprecated)
68-
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
69-
70-
.. warning::
71-
72-
SageMaker Debugger deprecates the framework profiling feature starting from TensorFlow 2.11 and PyTorch 2.0. You can still use the feature in the previous versions of the frameworks and SDKs as follows.
73-
74-
* SageMaker Python SDK <= v2.130.0
75-
* PyTorch >= v1.6.0, < v2.0
76-
* TensorFlow >= v2.3.1, < v2.11
77-
78-
With the deprecation, SageMaker Debugger discontinues support for the APIs below this note.
79-
80-
See also `Amazon SageMaker Debugger Release Notes: March 16, 2023 <https://docs.aws.amazon.com/sagemaker/latest/dg/debugger-release-notes.html#debugger-release-notes-20230315>`_.
81-
82-
.. autoclass:: FrameworkProfile
83-
:show-inheritance:
84-
85-
.. autoclass:: DetailedProfilingConfig
86-
:show-inheritance:
87-
88-
.. autoclass:: DataloaderProfilingConfig
89-
:show-inheritance:
90-
91-
.. autoclass:: PythonProfilingConfig
92-
:show-inheritance:
93-
94-
.. autoclass:: PythonProfiler
95-
:show-inheritance:
96-
97-
.. autoclass:: cProfileTimer
98-
:show-inheritance:
99-
100-
.. automodule:: sagemaker.debugger.metrics_config
101-
:members: StepRange, TimeRange
102-
:undoc-members:

doc/api/training/index.rst

+2-1
Original file line numberDiff line numberDiff line change
@@ -5,11 +5,12 @@ Training APIs
55
.. toctree::
66
:maxdepth: 4
77

8+
algorithm
89
analytics
910
automl
1011
debugger
1112
estimators
12-
algorithm
1313
tuner
1414
parameter
1515
processing
16+
profiler

doc/api/training/profiler.rst

+102
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,102 @@
1+
Profiler
2+
--------
3+
4+
Amazon SageMaker Profiler provides full visibility
5+
into provisioned compute resources for training
6+
state-of-the-art deep learning models.
7+
The following SageMaker Profiler classes are
8+
for activating SageMaker Profiler while creating
9+
an estimator object of `:class:sagemaker.pytorch.estimator.PyTorch`
10+
or `:class:sagemaker.tensorflow.estimator.TensorFlow`.
11+
12+
.. contents::
13+
14+
.. currentmodule:: sagemaker.debugger
15+
16+
Profiler configuration modules
17+
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
18+
19+
.. class:: sagemaker.Profiler(cpu_profiling_duration=3600)
20+
21+
A configuration class to activate
22+
`Amazon SageMaker Profiler <https://docs.aws.amazon.com/sagemaker/latest/dg/train-profile-computational-performance.html>`_.
23+
24+
To adjust the Profiler configuration instead of using the default configuration, use the following parameters.
25+
26+
**Parameters:**
27+
28+
- **cpu_profiling_duration** (*str*): Specify the time duration in seconds for
29+
profiling CPU activities. The default value is 3600 seconds.
30+
31+
**Example usage:**
32+
33+
.. code:: python
34+
35+
import sagemaker
36+
from sagemaker.pytorch import PyTorch
37+
from sagemaker import ProfilerConfig, Profiler
38+
39+
profiler_config = ProfilerConfig(
40+
profiler_params = Profiler(cpu_profiling_duration=3600)
41+
)
42+
43+
estimator = PyTorch(
44+
framework_version="2.0.0",
45+
... # Set up other essential parameters for the estimator class
46+
profiler_config=profiler_config
47+
)
48+
49+
For a complete instruction on activating and using SageMaker Profiler, see
50+
`Use Amazon SageMaker Profiler to profile activities on AWS compute resources
51+
<https://docs.aws.amazon.com/sagemaker/latest/dg/train-profile-computational-performance.html>`_.
52+
53+
.. autoclass:: sagemaker.ProfilerConfig
54+
55+
56+
Profiler Rule APIs
57+
~~~~~~~~~~~~~~~~~~
58+
59+
The following API is for setting up SageMaker Debugger's profiler rules
60+
to detect computational performance issues from training jobs.
61+
62+
.. autoclass:: ProfilerRule
63+
:inherited-members:
64+
65+
66+
Debugger Configuration APIs for Framework Profiling (Deprecated)
67+
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
68+
69+
.. warning::
70+
71+
In favor of `Amazon SageMaker Profiler <https://docs.aws.amazon.com/sagemaker/latest/dg/train-profile-computational-performance.html>`_,
72+
SageMaker Debugger deprecates the framework profiling feature starting from TensorFlow 2.11 and PyTorch 2.0. You can still use the feature in the previous versions of the frameworks and SDKs as follows.
73+
74+
* SageMaker Python SDK <= v2.130.0
75+
* PyTorch >= v1.6.0, < v2.0
76+
* TensorFlow >= v2.3.1, < v2.11
77+
78+
With the deprecation, SageMaker Debugger discontinues support for the APIs below this note.
79+
80+
See also `Amazon SageMaker Debugger Release Notes: March 16, 2023 <https://docs.aws.amazon.com/sagemaker/latest/dg/debugger-release-notes.html#debugger-release-notes-20230315>`_.
81+
82+
.. autoclass:: FrameworkProfile
83+
:show-inheritance:
84+
85+
.. autoclass:: DetailedProfilingConfig
86+
:show-inheritance:
87+
88+
.. autoclass:: DataloaderProfilingConfig
89+
:show-inheritance:
90+
91+
.. autoclass:: PythonProfilingConfig
92+
:show-inheritance:
93+
94+
.. autoclass:: PythonProfiler
95+
:show-inheritance:
96+
97+
.. autoclass:: cProfileTimer
98+
:show-inheritance:
99+
100+
.. automodule:: sagemaker.debugger.metrics_config
101+
:members: StepRange, TimeRange
102+
:undoc-members:

src/sagemaker/__init__.py

+2
Original file line numberDiff line numberDiff line change
@@ -62,4 +62,6 @@
6262
from sagemaker.automl.automl import AutoML, AutoMLJob, AutoMLInput # noqa: F401
6363
from sagemaker.automl.candidate_estimator import CandidateEstimator, CandidateStep # noqa: F401
6464

65+
from sagemaker.debugger import ProfilerConfig, Profiler # noqa: F401
66+
6567
__version__ = importlib_metadata.version("sagemaker")

src/sagemaker/debugger/__init__.py

+2-1
Original file line numberDiff line numberDiff line change
@@ -18,7 +18,7 @@
1818
DEBUGGER_FLAG,
1919
DebuggerHookConfig,
2020
framework_name,
21-
get_default_profiler_rule,
21+
get_default_profiler_processing_job,
2222
get_rule_container_image_uri,
2323
ProfilerRule,
2424
Rule,
@@ -27,6 +27,7 @@
2727
TensorBoardOutputConfig,
2828
)
2929
from sagemaker.debugger.framework_profile import FrameworkProfile # noqa: F401
30+
from sagemaker.debugger.profiler import Profiler # noqa: F401
3031
from sagemaker.debugger.metrics_config import ( # noqa: F401
3132
DataloaderProfilingConfig,
3233
DetailedProfilingConfig,

src/sagemaker/debugger/debugger.py

+44-12
Original file line numberDiff line numberDiff line change
@@ -20,8 +20,6 @@
2020
"""
2121
from __future__ import absolute_import
2222

23-
import time
24-
2523
from abc import ABC
2624

2725
from typing import Union, Optional, List, Dict
@@ -31,14 +29,31 @@
3129
import smdebug_rulesconfig as rule_configs
3230

3331
from sagemaker import image_uris
34-
from sagemaker.utils import build_dict
32+
from sagemaker.utils import build_dict, name_from_base
3533
from sagemaker.workflow.entities import PipelineVariable
34+
from sagemaker.debugger.profiler_constants import (
35+
DETAIL_PROF_PROCESSING_DEFAULT_INSTANCE_TYPE,
36+
DETAIL_PROF_PROCESSING_DEFAULT_VOLUME_SIZE,
37+
)
3638

3739
framework_name = "debugger"
40+
detailed_framework_name = "detailed-profiler"
3841
DEBUGGER_FLAG = "USE_SMDEBUG"
3942

4043

41-
def get_rule_container_image_uri(region):
44+
class DetailedProfilerProcessingJobConfig:
45+
"""ProfilerRule like class.
46+
47+
Serves as a vehicle to pass info through to the processing instance.
48+
49+
"""
50+
51+
def __init__(self):
52+
self.rule_name = self.__class__.__name__
53+
self.rule_parameters = {"rule_to_invoke": "DetailedProfilerProcessing"}
54+
55+
56+
def get_rule_container_image_uri(name, region):
4257
"""Return the Debugger rule image URI for the given AWS Region.
4358
4459
For a full list of rule image URIs,
@@ -52,19 +67,28 @@ def get_rule_container_image_uri(region):
5267
str: Formatted image URI for the given AWS Region and the rule container type.
5368
5469
"""
70+
if name is not None and name.startswith("DetailedProfilerProcessingJobConfig"):
71+
# should have the format like "123456789012.dkr.ecr.us-west-2.amazonaws.com/detailed-profiler-processing:latest"
72+
return image_uris.retrieve(detailed_framework_name, region)
73+
5574
return image_uris.retrieve(framework_name, region)
5675

5776

58-
def get_default_profiler_rule():
59-
"""Return the default built-in profiler rule with a unique name.
77+
def get_default_profiler_processing_job(instance_type=None, volume_size_in_gb=None):
78+
"""Return the default profiler processing job (a rule) with a unique name.
6079
6180
Returns:
6281
sagemaker.debugger.ProfilerRule: The instance of the built-in ProfilerRule.
6382
6483
"""
65-
default_rule = rule_configs.ProfilerReport()
66-
custom_name = f"{default_rule.rule_name}-{int(time.time())}"
67-
return ProfilerRule.sagemaker(default_rule, name=custom_name)
84+
default_rule = DetailedProfilerProcessingJobConfig()
85+
custom_name = name_from_base(default_rule.rule_name)
86+
return ProfilerRule.sagemaker(
87+
default_rule,
88+
name=custom_name,
89+
instance_type=instance_type,
90+
volume_size_in_gb=volume_size_in_gb,
91+
)
6892

6993

7094
@attr.s
@@ -482,6 +506,8 @@ def sagemaker(
482506
name=None,
483507
container_local_output_path=None,
484508
s3_output_path=None,
509+
instance_type=None,
510+
volume_size_in_gb=None,
485511
):
486512
"""Initialize a ``ProfilerRule`` object for a *built-in* profiling rule.
487513
@@ -510,13 +536,19 @@ def sagemaker(
510536
The instance of the built-in ProfilerRule.
511537
512538
"""
539+
used_name = name or base_config.rule_name
540+
if used_name.startswith("DetailedProfilerProcessingJobConfig"):
541+
if volume_size_in_gb is None:
542+
volume_size_in_gb = DETAIL_PROF_PROCESSING_DEFAULT_VOLUME_SIZE
543+
if instance_type is None:
544+
instance_type = DETAIL_PROF_PROCESSING_DEFAULT_INSTANCE_TYPE
513545
return cls(
514-
name=name or base_config.rule_name,
546+
name=used_name,
515547
image_uri="DEFAULT_RULE_EVALUATOR_IMAGE",
516-
instance_type=None,
548+
instance_type=instance_type,
517549
container_local_output_path=container_local_output_path,
518550
s3_output_path=s3_output_path,
519-
volume_size_in_gb=None,
551+
volume_size_in_gb=volume_size_in_gb,
520552
rule_parameters=base_config.rule_parameters,
521553
)
522554

src/sagemaker/debugger/profiler.py

+42
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,42 @@
1+
# Copyright Amazon.com, Inc. or its affiliates. All Rights Reserved.
2+
#
3+
# Licensed under the Apache License, Version 2.0 (the "License"). You
4+
# may not use this file except in compliance with the License. A copy of
5+
# the License is located at
6+
#
7+
# http://aws.amazon.com/apache2.0/
8+
#
9+
# or in the "license" file accompanying this file. This file is
10+
# distributed on an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF
11+
# ANY KIND, either express or implied. See the License for the specific
12+
# language governing permissions and limitations under the License.
13+
14+
"""Configuration for collecting profiler v2 metrics in SageMaker training jobs."""
15+
from __future__ import absolute_import
16+
17+
from sagemaker.debugger.profiler_constants import (
18+
FILE_ROTATION_INTERVAL_DEFAULT,
19+
CPU_PROFILING_DURATION,
20+
DETAIL_PROF_PROCESSING_DEFAULT_INSTANCE_TYPE,
21+
DETAIL_PROF_PROCESSING_DEFAULT_VOLUME_SIZE,
22+
)
23+
24+
25+
class Profiler:
26+
"""A configuration class to activate SageMaker Profiler."""
27+
28+
def __init__(
29+
self,
30+
cpu_profiling_duration: str = str(CPU_PROFILING_DURATION),
31+
file_rotation_interval: str = str(FILE_ROTATION_INTERVAL_DEFAULT),
32+
):
33+
"""To specify values to adjust the Profiler configuration, use the following parameters.
34+
35+
:param cpu_profiling_duration: Specify the time duration in seconds for
36+
profiling CPU activities. The default value is 3600 seconds.
37+
"""
38+
self.profiling_parameters = {}
39+
self.profiling_parameters["CPUProfilingDuration"] = str(cpu_profiling_duration)
40+
self.profiling_parameters["SMPFileRotationSecs"] = str(file_rotation_interval)
41+
self.instanceType = DETAIL_PROF_PROCESSING_DEFAULT_INSTANCE_TYPE
42+
self.volumeSizeInGB = DETAIL_PROF_PROCESSING_DEFAULT_VOLUME_SIZE

0 commit comments

Comments
 (0)