Skip to content

Local mode: role chaining/assumed role on notebook instances does not forward correct credentials #3464

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
jmahlik opened this issue Nov 9, 2022 · 4 comments

Comments

@jmahlik
Copy link
Contributor

jmahlik commented Nov 9, 2022

Describe the bug
When using role chaining/assumed role in a notebook, the correct creds are not forwarded to the training container via environment vars. Credentials are always obtained from the ec2 metadata service when in a sagemaker notebook with it enabled.

The credentials from the boto session's role should be used by default, even if they are temporary, as an assumed role may have different permissions than the notebook's base role.

Somewhat related to: #403. This was assuming users on a notebook would not be using temporary creds.

Relevant code. Seems like a fix would be to take the token if it does exist no matter what. The correct role could be obtained from the session or from the role passed to the image? Current workaround is to manually pass these as env vars.

https://github.com/aws/sagemaker-python-sdk/blob/885423c26ce7288283bbca7d9c1c53c4d0ccf103/src/sagemaker/local/image.py#L989-1035

To reproduce

Have a ~/.aws/config file that assumes a role via the metadata service. This works fine for code outside the container. Boto will refresh the creds whenever it needs to via the metadata service.

[default]
...

[profile my-profile]
role_arn =  arn:...:...
role_session_name = session-name
credential_source = Ec2InstanceMetadata

export AWS_DEFAULT_PROFILE=my-profile

The notebooks base role has different permissions than the role assumed via the profile. Run a script that uses boto outside the container. Then use local mode.

script -- The notebook's base role doesn't have permissions to list certain buckets but the assumed role from the profile does.

import boto3
boto3.client("s3").list_buckets()

'
Would result in the bucket not being found/permission errors since the base role from the ec2 instance metadata gets picked up.

Expected behavior
The same role/creds from an assumed role to be passed to the container at runtime.

Screenshots or logs
If applicable, add screenshots or logs to help explain your problem.

System information
A description of your system. Please provide:

  • SageMaker Python SDK version: 2.116.0
  • Framework name (eg. PyTorch) or algorithm (eg. KMeans): script processor
  • Framework version: n/a
  • Python version: 3.9
  • CPU or GPU: cpu
  • Custom Docker image (Y/N): y

Additional context
Add any other context about the problem here.

@Ransaka
Copy link

Ransaka commented Nov 11, 2022

As per the documentation Once you are ready to execute the pipeline on the managed SageMaker Pipelines service, you can do so by replacing LocalPipelineSession in the previous code snippet with PipelineSession (as shown in the following code sample) and rerunning the code. But with local mode, I can't access the resources such as S3 files and secret manager inside the containers. Previously this worked well.

@naomine-biz
Copy link

I recently met a similar problem, but in my case the NAT settings for docker network sagemaker-local and the link local address of the notebook instance were broken, so I fixed that and the problem was solved. I don't know if it will solve all cases because I haven't dive deep yet. FYI.

Do ip link show and correct br-xxxx to the ID shown.

$ sudo iptables -S PREROUTING -t nat
-A PREROUTING -d 169.254.169.254/32 -i br-xxxx -p tcp -m tcp --dport 80 -j DNAT --to-destination 169.254.0.2:9081

@jmahlik
Copy link
Contributor Author

jmahlik commented Dec 14, 2022

Sharing a workaround. Passing the creds directly as environment variables to the local job. Just have to be really careful not to pass them to a real job otherwise they will show up in cloud trail logs etc.

Something like.

if instance_type == "local":
        creds = boto3.Session().get_credentials()
        # env goes into the jobs env kwarg
        env = {
            "AWS_ACCESS_KEY_ID": str(creds.access_key),
            "AWS_SECRET_ACCESS_KEY": str(creds.secret_key),
            "AWS_SESSION_TOKEN": str(creds.token),
        }

#3501 looks promising as a fix

@jmahlik
Copy link
Contributor Author

jmahlik commented Apr 12, 2023

Was fixed by #3501. Setting the environment variable USE_SHORT_LIVED_CREDENTIALS works. Thanks a ton @wcarpenter1-godaddy!

@jmahlik jmahlik closed this as completed Apr 12, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants