You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
This usually is not a problem in normal cases , but with Internet disabled Notebooks using VPC endpoint this causes a issue , as the global endpoint resolves to a Public IP and hence the above two methods would hang and timeout
Due to implementation on boto3 side , even if you pass a region to the boto client creation for sts it would still use the global endpoint . The only workaround is to pass a endpoint_url to the client passing the regional endpoint
The Boto3 issue that addresses this problem is listed : boto/boto3#1859
###################################
Currently I am working around the get_execution_role() by overriding the session object with below chunk of code , obviously this can also be done by just passing the role arn as .fit seems to work regardless :
import re
from sagemaker.session import Session
from sagemaker import get_execution_role
region = Session().boto_region_name
endpoint_url = "https://sts.{}.amazonaws.com".format(region)
def get_execution_role_override(sagemaker_session=None):
if not sagemaker_session:
sagemaker_session = Session()
arn = sagemaker_session.get_caller_identity_arn()
if ":role/" in arn:
return arn
def get_caller_identity_arn_override(self):
assumed_role = self.boto_session.client("sts",region_name=region,endpoint_url=endpoint_url).get_caller_identity()["Arn"]
if "AmazonSageMaker-ExecutionRole" in assumed_role:
role = re.sub(
r"^(.+)sts::(\d+):assumed-role/(.+?)/.*$",
r"\1iam::\2:role/service-role/\3",
assumed_role,
)
return role
role = re.sub(r"^(.+)sts::(\d+):assumed-role/(.+?)/.*$", r"\1iam::\2:role/\3", assumed_role)
# Call IAM to get the role's path
role_name = role[role.rfind("/") + 1 :]
try:
role = self.boto_session.client("iam").get_role(RoleName=role_name)["Role"]["Arn"]
except ClientError:
LOGGER.warning(
"Couldn't call 'get_role' to get Role ARN from role name {} to get Role path.".format(
role_name
)
)
return role
Session.get_caller_identity_arn = get_caller_identity_arn_override
role = get_execution_role_override()
bucket = "YOUR_BUCKET_NAME"
Minimal repro / logs
a) Create a Internet Disabled SageMaker Notebook
b) Add sts VPC endpoint and also sagemaker.api VPC endpoint (also others if required like s3 and cloudwatch)
c) Run any notebook that calls any of the above function and it would hang
d) From notebook terminal if you nslookup https://sts.amazonaws.com you would get a public IP and not a private IP as required by sts VPC endpoint . But nslookup on https://sts.us-west-2.amazonaws.com would give you a private IP that would go through the STS endpoint
Can you see if this is something that needs to be fixed on the SageMaker SDK or followed up on boto3
The text was updated successfully, but these errors were encountered:
Please fill out the form below.
System Information
Describe the problem
methods on Session.py (Session.py)
get_execution_role()
sagemaker-python-sdk/src/sagemaker/session.py
Line 194 in 6aa409b
OR
default_bucket()
sagemaker-python-sdk/src/sagemaker/session.py
Line 1243 in 6aa409b
use a STS client which does not use regional endpoints which means except for newer regions all region , this calls https://sts.amazonaws.com instead of say https://sts.us-west-2.amazonaws.com
This usually is not a problem in normal cases , but with Internet disabled Notebooks using VPC endpoint this causes a issue , as the global endpoint resolves to a Public IP and hence the above two methods would hang and timeout
Due to implementation on boto3 side , even if you pass a region to the boto client creation for sts it would still use the global endpoint . The only workaround is to pass a endpoint_url to the client passing the regional endpoint
Example :
A way to fix this would be by doing :
The Boto3 issue that addresses this problem is listed : boto/boto3#1859
###################################
Currently I am working around the get_execution_role() by overriding the session object with below chunk of code , obviously this can also be done by just passing the role arn as .fit seems to work regardless :
Minimal repro / logs
a) Create a Internet Disabled SageMaker Notebook
b) Add sts VPC endpoint and also sagemaker.api VPC endpoint (also others if required like s3 and cloudwatch)
c) Run any notebook that calls any of the above function and it would hang
d) From notebook terminal if you nslookup https://sts.amazonaws.com you would get a public IP and not a private IP as required by sts VPC endpoint . But nslookup on https://sts.us-west-2.amazonaws.com would give you a private IP that would go through the STS endpoint
Can you see if this is something that needs to be fixed on the SageMaker SDK or followed up on boto3
The text was updated successfully, but these errors were encountered: