-
Notifications
You must be signed in to change notification settings - Fork 1.2k
Using boto3 in local mode? #403
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
Hello @derekhh, When local mode starts, it uses Please make sure that Another thing, SageMaker has KMS support as well https://aws.amazon.com/about-aws/whats-new/2018/01/aws-kms-based-encryption-is-now-available-in-amazon-sagemaker-training-and-hosting/ Thanks for using SageMaker! |
Hi @mvsusp, I'm afraid I am running into the same issue though I have a different use case where I am using MXNet. Here are a few code snippets to give you an idea what I am doing:
I then start the training as follows
It fails with (I'll paste the trace below) but one thing worth noting is that it suceeded in zipping and uploading the source directory to S3. The trace is
|
What kind of credentials are you guys using? is it static credentials (ACCESS and SECRET key) or do you have a session token as well? If there is a session token then local mode will NOT pass the credentials to the container. The reason behind this is tokens are short-lived credentials that will expire at some point. If you run a long training job then the credentials can become invalid at some point. On EC2 instances or SageMaker Notebook instances that are using an AWS Role, the container will be able to access the EC2 Metadata service to fetch its own credentials dynamically. I think this is most likely what is happening. |
Hi @iquintero I was considering sending a PR changing https://github.com/aws/sagemaker-python-sdk/blob/master/src/sagemaker/local/image.py#L638 to enable passing the I am now trying to think how to best change this such as the sensible default of falling back the EC2 Metadata service is respected while also enabling my use case. Maybe the checks should be something like
What do you think? if that sounds right to you, I can try to proceed with that, but I am still to figure out how to check for the presence of credentials using the provider chain. |
@iquintero Based on boto/boto3#222 and checking botocore's code, here's a more concrete suggestion def _aws_credentials(session):
try:
creds = session.get_credentials()
access_key = creds.access_key
secret_key = creds.secret_key
token = creds.token
# The presence of a token indicates the credentials are short-lived and as such are risky to be used as they
# might expire while running.
# Long-lived credentials are available either through
# 1. boto session
# 2. EC2 Metadata Service (SageMaker Notebook instances or EC2 instances with roles attached them)
# Short-lived credentials available via boto session are permitted to support running on machines with no
# EC2 Metadata Service but a warning is provided about their danger
if token is None:
logger.info("Using the long-lived AWS credentials found in session")
return [
'AWS_ACCESS_KEY_ID=%s' % (str(access_key)),
'AWS_SECRET_ACCESS_KEY=%s' % (str(secret_key))
]
elif not _aws_credentials_available_in_metadata_service():
logger.warn("Using the short-lived AWS credentials found in session. They might expire while running.")
return [
'AWS_ACCESS_KEY_ID=%s' % (str(access_key)),
'AWS_SECRET_ACCESS_KEY=%s' % (str(secret_key)),
'AWS_SESSION_TOKEN=%s' % (str(token))
]
else:
logger.info("No AWS credentials found in session but credentials from EC2 Metadata Service are available.")
return None
except Exception as e:
logger.info('Could not get AWS credentials: %s' % e)
return None
def _aws_credentials_available_in_metadata_service():
import botocore
from botocore.credentials import InstanceMetadataProvider
from botocore.utils import InstanceMetadataFetcher
session = botocore.session.Session()
instance_metadata_provider = InstanceMetadataProvider(
iam_role_fetcher=InstanceMetadataFetcher(
timeout=session.get_config_variable('metadata_service_timeout'),
num_attempts=session.get_config_variable('metadata_service_num_attempts'),
user_agent=session.user_agent())
)
return not (instance_metadata_provider.load() is None) What do you think? |
That seems like a great solution @humanzz still works for the regular EC2 Meta data, but allows your use case to work. |
Merged! |
Please fill out the form below.
System Information
Describe the problem
Describe the problem or feature request clearly here.
I'm building my own container which requires to use some Boto3 clients, e.g. syncing some TensorFlow Summary data to S3 and getting a KMS client to decrypt some credentials. The code runs fine in SageMaker but if I try to run the same code like:
in local mode I always see errors like this:
Is this something that can be fixed in local mode or am I using it wrong?
Minimal repro / logs
Please provide any logs and a bare minimum reproducible test case, as this will be helpful to diagnose the problem. If including tracebacks, please include the full traceback. Large logs and files should be attached.
The text was updated successfully, but these errors were encountered: