Skip to content

Role for batch transform jobs #303

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
dennis-ec opened this issue Jul 18, 2018 · 3 comments
Closed

Role for batch transform jobs #303

dennis-ec opened this issue Jul 18, 2018 · 3 comments

Comments

@dennis-ec
Copy link

Hello,

first of all thank you so much for constantly improving sagemaker. It's really interesting to dive into and think of the many use cases which can be implemented with it.

I noticed in the process of creating a batch transform job you don't need to specify a role. When creating a training job you need to specify a role which need to have access to the resources which sagemaker use to set up the training. Also if you want to use other aws services inside your training you can control the permission by the same role.

So my question is: Why can't I specify a role to the batch transform job and am I not allowed to call other aws services from within?

In case you wonder why I need this, here is my use case:
I have a production environment in which I trigger a daily training. For my training I use gpu enabled instance. Before the training I need to gather my data from outside aws via long api calls (>5min, so no lambda function is possible) and then do a little preprocessing. Because this process can take a while I don't want to do this on a gpu machine.

So I have a training job on a non gpu machine which doesn't actually produce a model but does the api calls and the preprocessing and then upload the files to s3. Then it starts the real training job on a gpu enabled instance.

I figured that the batch transform job is a little closer to what I want to accomplish in my preprocessing step and therefore would be a more elegant approach. But for this to happen, I need to specify a role that my batch transform job is able to start training jobs and have access to secrets (which I use to store credentials for my api calls)

_Also it would be nice to be able to optionally pass hyperparameters to the batch transform job. (But I could also use environment variable if this is too far from the intended use case)

@tomfaulhaber
Copy link

tomfaulhaber commented Jul 19, 2018

Hi @dennis-ec,

Thanks for your question. The short answer is SageMaker Batch Transform does use roles to access data and services.

Batch uses the named model object that is created with the CreateModel call. When you create this object, you can specify a role that the model should run as. The model container in Batch Transform will use that role.

Since you're posting this in the Python SDK repo, I'll point out that you can do this from the SDK with Session.create_model() or Session.create_model_from_job() functions. The latter will default the role to the role used for training, but you can override that by specifying the role argument explicitly.

If you create a transformer directly from the Estimator with the Estimator.transformer(), it automatically uses the role from the training job. Currently, there is no way to override this with a role you specify, but that seems like a very reasonable thing to add.

On to your second question: we don't support hyperparameters in SageMaker Batch, but you can use environment variables either when you create the model or when you create the transform job (or both, with the transform job "winning" if you specify the same variable in both places). This is our recommendation for specifying run-time parameters. If you look at our DeepAR algorithm, you'll see this technique in practice.

Hope that helps!

@andremoeller
Copy link
Contributor

Hi @dennis-ec ,

I'm going to close this issue, but please feel free to reopen it if you have more questions. Thanks!

@Munazir
Copy link

Munazir commented Sep 5, 2018

Hi,
Can you please let me know the use of batch transform job in sagemaker.

Suppose I have taken a dataset perform certain transformation on it and then i trained sagemakers inbuilt xgboost algorithm on it.

Now I want to feed the testing data into the model predictor and get the predictions so how can I do that?

Let's assume my testing set data is in csv form , so can you please let me know any ways by which I will put the testing data in s3 folder and call the model predictor and save the predictions in s3 back automatically.

It would be great help if you could point me out the way.

Thanks a lot

apacker pushed a commit to apacker/sagemaker-python-sdk that referenced this issue Nov 15, 2018
Fix undefined variable reference
knakad pushed a commit to knakad/sagemaker-python-sdk that referenced this issue Dec 4, 2019
- Remove `smdebug_rulesconfig` module from the `src/sagemaker` directory.
- Remove `pip install` from wheel in `tox.ini`.
- Install `smdebug-rulesconfig` in `setup.py`, pinned to version 0.1.2.
knakad pushed a commit that referenced this issue Dec 4, 2019
- Remove `smdebug_rulesconfig` module from the `src/sagemaker` directory.
- Remove `pip install` from wheel in `tox.ini`.
- Install `smdebug-rulesconfig` in `setup.py`, pinned to version 0.1.2.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants