-
Notifications
You must be signed in to change notification settings - Fork 1.2k
Role for batch transform jobs #303
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
Hi @dennis-ec, Thanks for your question. The short answer is SageMaker Batch Transform does use roles to access data and services. Batch uses the named model object that is created with the CreateModel call. When you create this object, you can specify a role that the model should run as. The model container in Batch Transform will use that role. Since you're posting this in the Python SDK repo, I'll point out that you can do this from the SDK with If you create a transformer directly from the Estimator with the On to your second question: we don't support hyperparameters in SageMaker Batch, but you can use environment variables either when you create the model or when you create the transform job (or both, with the transform job "winning" if you specify the same variable in both places). This is our recommendation for specifying run-time parameters. If you look at our DeepAR algorithm, you'll see this technique in practice. Hope that helps! |
Hi @dennis-ec , I'm going to close this issue, but please feel free to reopen it if you have more questions. Thanks! |
Hi, Suppose I have taken a dataset perform certain transformation on it and then i trained sagemakers inbuilt xgboost algorithm on it. Now I want to feed the testing data into the model predictor and get the predictions so how can I do that? Let's assume my testing set data is in csv form , so can you please let me know any ways by which I will put the testing data in s3 folder and call the model predictor and save the predictions in s3 back automatically. It would be great help if you could point me out the way. Thanks a lot |
Fix undefined variable reference
- Remove `smdebug_rulesconfig` module from the `src/sagemaker` directory. - Remove `pip install` from wheel in `tox.ini`. - Install `smdebug-rulesconfig` in `setup.py`, pinned to version 0.1.2.
- Remove `smdebug_rulesconfig` module from the `src/sagemaker` directory. - Remove `pip install` from wheel in `tox.ini`. - Install `smdebug-rulesconfig` in `setup.py`, pinned to version 0.1.2.
Hello,
first of all thank you so much for constantly improving sagemaker. It's really interesting to dive into and think of the many use cases which can be implemented with it.
I noticed in the process of creating a batch transform job you don't need to specify a role. When creating a training job you need to specify a role which need to have access to the resources which sagemaker use to set up the training. Also if you want to use other aws services inside your training you can control the permission by the same role.
So my question is: Why can't I specify a role to the batch transform job and am I not allowed to call other aws services from within?
In case you wonder why I need this, here is my use case:
I have a production environment in which I trigger a daily training. For my training I use gpu enabled instance. Before the training I need to gather my data from outside aws via long api calls (>5min, so no lambda function is possible) and then do a little preprocessing. Because this process can take a while I don't want to do this on a gpu machine.
So I have a training job on a non gpu machine which doesn't actually produce a model but does the api calls and the preprocessing and then upload the files to s3. Then it starts the real training job on a gpu enabled instance.
I figured that the batch transform job is a little closer to what I want to accomplish in my preprocessing step and therefore would be a more elegant approach. But for this to happen, I need to specify a role that my batch transform job is able to start training jobs and have access to secrets (which I use to store credentials for my api calls)
_Also it would be nice to be able to optionally pass hyperparameters to the batch transform job. (But I could also use environment variable if this is too far from the intended use case)
The text was updated successfully, but these errors were encountered: