Skip to content

SageMaker Bring Your Own Container on local mode - ProcessingOutput is not linked to local filesystem #3083

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
idanmoradarthas opened this issue Apr 27, 2022 · 7 comments

Comments

@idanmoradarthas
Copy link

Describe the feature you'd like
During the work with SageMaker BYOC on local mode (with Python SDK), we encountered the situation where the outputs of the container are staged into the SageMaker default artifact bucket. Then the SDK does not download those artifacts into the local file system.

How would this feature be used? Please describe.
We want that SDK will download the artifacts created automatically into the local file system.

Describe alternatives you've considered
We had to create a mechanism to download those files by ourselves:
Lack_of_impl_on_S3_outputs_download
In the following snippet of code, we used https://github.com/aws-samples/amazon-sagemaker-local-mode/blob/main/scikit_learn_bring_your_own_container_local_processing/scikit_learn_bring_your_own_container_local_processing.py as a reference (also used the output_config dictionary)

@shreyapandit
Copy link
Contributor

Hi @idanmoradarthas

Thank you for your feedback! We will bring this to the team and will work on discussing and prioritizing this enhancement as part of our roadmap.

Regards,
Shreya

@idanmoradarthas
Copy link
Author

Hi @shreyapandit
Thank you so much so the response.

I do want to emphasize that Outseer, my company, will benefit very much from a local mode that is completely local, without internet connection, As we use the local mode for our testing suite.
You did solved for us the initial stage in issue #3084, I know this issue is about the auto-download, but in your solution we would very much appreciate if after the container had finish its run the output will be copied to the local output of the PC without reaching S3 or opening an internet connection.

@dsradecki
Copy link

dsradecki commented Oct 19, 2022

Hi @shreyapandit. Would you be able to share any insights on when this could be resolved? Specifically, I'm speaking about the fact that sagemaker.processing.Processor seems to be completely ignoring a session initialised like:

sagemaker_session = LocalSession()
sagemaker_session.config = {'local': {'local_code': True}}

and this is because it still requires default_bucket to be specified while sagemaker.estimator.Estimator works without it.

@clausagerskov
Copy link

the core of this issue seems to be the default_bucket definition in the local session, even though it is specified when creating the session, sagemaker sdk still does the whole _create_s3_bucket_if_it_does_not_exist, which requires internet and credentials set up, which blocks solutions such as localstack for mocking s3

@clausagerskov
Copy link

this is marked as fixed here but is not actually fixed
#3084

@clausagerskov
Copy link

@shreyapandit

@josh-gree
Copy link

Without the ability to map outputs back to the local filesytem - local mode is basicly unusable - it can't actually even be that hard - there is a volume in the docker-compose for the output directory - its just thrown away on completion...

@nargokul nargokul added the component: pysdk-team Related to SageMaker Python SDK Core Issues label Apr 14, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

7 participants