Role for batch transform jobs #303

dennis-ec · 2018-07-18T13:51:55Z

Hello,

first of all thank you so much for constantly improving sagemaker. It's really interesting to dive into and think of the many use cases which can be implemented with it.

I noticed in the process of creating a batch transform job you don't need to specify a role. When creating a training job you need to specify a role which need to have access to the resources which sagemaker use to set up the training. Also if you want to use other aws services inside your training you can control the permission by the same role.

So my question is: Why can't I specify a role to the batch transform job and am I not allowed to call other aws services from within?

In case you wonder why I need this, here is my use case:
I have a production environment in which I trigger a daily training. For my training I use gpu enabled instance. Before the training I need to gather my data from outside aws via long api calls (>5min, so no lambda function is possible) and then do a little preprocessing. Because this process can take a while I don't want to do this on a gpu machine.

So I have a training job on a non gpu machine which doesn't actually produce a model but does the api calls and the preprocessing and then upload the files to s3. Then it starts the real training job on a gpu enabled instance.

I figured that the batch transform job is a little closer to what I want to accomplish in my preprocessing step and therefore would be a more elegant approach. But for this to happen, I need to specify a role that my batch transform job is able to start training jobs and have access to secrets (which I use to store credentials for my api calls)

_Also it would be nice to be able to optionally pass hyperparameters to the batch transform job. (But I could also use environment variable if this is too far from the intended use case)

tomfaulhaber · 2018-07-19T03:58:07Z

Hi @dennis-ec,

Thanks for your question. The short answer is SageMaker Batch Transform does use roles to access data and services.

Batch uses the named model object that is created with the CreateModel call. When you create this object, you can specify a role that the model should run as. The model container in Batch Transform will use that role.

Since you're posting this in the Python SDK repo, I'll point out that you can do this from the SDK with Session.create_model() or Session.create_model_from_job() functions. The latter will default the role to the role used for training, but you can override that by specifying the role argument explicitly.

If you create a transformer directly from the Estimator with the Estimator.transformer(), it automatically uses the role from the training job. Currently, there is no way to override this with a role you specify, but that seems like a very reasonable thing to add.

On to your second question: we don't support hyperparameters in SageMaker Batch, but you can use environment variables either when you create the model or when you create the transform job (or both, with the transform job "winning" if you specify the same variable in both places). This is our recommendation for specifying run-time parameters. If you look at our DeepAR algorithm, you'll see this technique in practice.

Hope that helps!

andremoeller · 2018-07-19T05:16:40Z

Hi @dennis-ec ,

I'm going to close this issue, but please feel free to reopen it if you have more questions. Thanks!

Munazir · 2018-09-05T07:17:08Z

Hi,
Can you please let me know the use of batch transform job in sagemaker.

Suppose I have taken a dataset perform certain transformation on it and then i trained sagemakers inbuilt xgboost algorithm on it.

Now I want to feed the testing data into the model predictor and get the predictions so how can I do that?

Let's assume my testing set data is in csv form , so can you please let me know any ways by which I will put the testing data in s3 folder and call the model predictor and save the predictions in s3 back automatically.

It would be great help if you could point me out the way.

Thanks a lot

Fix undefined variable reference

- Remove `smdebug_rulesconfig` module from the `src/sagemaker` directory. - Remove `pip install` from wheel in `tox.ini`. - Install `smdebug-rulesconfig` in `setup.py`, pinned to version 0.1.2.

andremoeller closed this as completed Jul 19, 2018

laurenyu mentioned this issue Jul 19, 2018

Allow Model and Transformer to use a different role from the Estimator #308

Merged

4 tasks

apacker pushed a commit to apacker/sagemaker-python-sdk that referenced this issue Nov 15, 2018

Merge pull request aws#303 from smatsumoto78/master

d73e8cb

Fix undefined variable reference

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Role for batch transform jobs #303

Role for batch transform jobs #303

dennis-ec commented Jul 18, 2018

tomfaulhaber commented Jul 19, 2018 •

edited

Loading

Uh oh!

andremoeller commented Jul 19, 2018

Uh oh!

Munazir commented Sep 5, 2018

Uh oh!

Role for batch transform jobs #303

Role for batch transform jobs #303

Comments

dennis-ec commented Jul 18, 2018

tomfaulhaber commented Jul 19, 2018 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

andremoeller commented Jul 19, 2018

Uh oh!

Munazir commented Sep 5, 2018

Uh oh!

tomfaulhaber commented Jul 19, 2018 •

edited

Loading