You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: src/patterns/gen-ai/aws-model-deployment-sagemaker/README_custom_sagemaker_endpoint.md
+9-1
Original file line number
Diff line number
Diff line change
@@ -62,7 +62,9 @@ new CustomSageMakerEndpoint(this, 'customModel', {
62
62
modelDataUrl: 's3://{Bucket}/{Key}/model.tar.gz',
63
63
endpointName: 'testbgebase',
64
64
instanceCount: 1,
65
-
volumeSizeInGb: 100
65
+
volumeSizeInGb: 100,
66
+
minCapacity: 1,
67
+
maxCapacity: 2,
66
68
});
67
69
```
68
70
@@ -92,6 +94,8 @@ CustomSageMakerEndpoint(
92
94
endpoint_name='testbgebase',
93
95
instance_count=1,
94
96
volume_size_in_gb=100,
97
+
min_capacity=1,
98
+
max_capacity=2,
95
99
)
96
100
```
97
101
@@ -132,6 +136,8 @@ Parameters
132
136
| modelDataDownloadTimeoutInSeconds | Integer || The timeout value, in seconds, to download and extract the model that you want to host from Amazon S3 to the individual inference instance associated with this production variant. |
133
137
| volumeSizeInGb | Integer || The size, in GB, of the ML storage volume attached to individual inference instance associated with the production variant. Currently only Amazon EBS gp2 storage volumes are supported. |
134
138
| asyncInference | AsyncInferenceConfig || Specifies configuration for how an endpoint performs asynchronous inference. Refer to [AsyncInferenceConfig](#asyncinferenceconfig) for details. If not defined, the endpoint will be configured as real-time.|
139
+
| minCapacity | Integer || Specifies the minimum value that Application Auto Scaling can use to scale a target during a scaling activity. |
140
+
| maxCapacity | Integer || Specifies the maximum value that Application Auto Scaling can use to scale a target during a scaling activity. |
135
141
136
142
### AsyncInferenceConfig
137
143
@@ -167,6 +173,8 @@ If defined, the SageMaker endpoint will perform asynchronous inference.
167
173
- startupHealthCheckTimeoutInSeconds: 600 if not provided
168
174
- modelDataDownloadTimeoutInSeconds: 600 if not provided
169
175
- instanceCount: 1 if not provided
176
+
- minCapacity: 1 if not provided
177
+
- maxCapacity: 2 if not provided
170
178
171
179
If async configuration is enabled:
172
180
- Enable server-side encryption for SNS Topics using AWS managed KMS Key
0 commit comments