-
Notifications
You must be signed in to change notification settings - Fork 1.2k
sagemaker job failing in transformation step #753
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
hi @NEIA20, thanks for using SageMaker! Could you provide your entry point code and data (or something similar if your code/data is private) for us to reproduce the error? |
Hi @laurenyu, thanks for getting back to me Here's some of our tensorflow entry point code:
And here's one of our testing files: |
@NEIA20 thanks for including all of that! would you also be able to include either the training code or a model artifact? (trying to run this on my end) |
sure thing :) Let me know if you need anything else |
@NEIA20 thanks! the batch transform job succeeded for me (ran it twice just to be sure). here's what I ran (should be very similar to what you posted): import sagemaker
from sagemaker.tensorflow import TensorFlowModel
ROLE = 'role-name' # replaced with dummy string
JOB_PATH = 's3-bucket-and-path' # replaced with dummy string
PREDICTION_TABLE_NAME = 'predict'
model = TensorFlowModel(f's3://{JOB_PATH}/model.tar.gz',
ROLE,
'entry.py',
sagemaker_session=sagemaker.session.Session())
transformer = model.transformer(instance_count=10,
instance_type='ml.c5.2xlarge',
strategy='SingleRecord',
assemble_with='Line',
max_payload=1,
max_concurrent_transforms=100)
transformer.output_path = f's3://{JOB_PATH}/predictions/{PREDICTION_TABLE_NAME}'
transformer.transform(f's3://{JOB_PATH}/testing_data_encoded_mean_aligned_0_0_1.csv',
content_type='text/csv',
split_type='Line')
transformer.wait() and I copied your entry point code above into a file called Are you consistently encountering the error? |
@laurenyu thanks for trying it out. Yes we are consistently getting the error. I thought I had given you everything relevant but perhaps it's an issue in our
|
I created a new batch transform job based on the full entry point code, and it still succeeded for me. I'm going to reach out to the relevant service team to see if they have any insight. thanks for your patience! |
Hey @NEIA20 , It looks like the algorithm isn't responding to some of the requests. Could you try reducing |
hi @andremoeller @laurenyu - sorry for the late response. We had previously tried lowering |
hi @andremoeller @laurenyu, any updates on what might be the issue? |
Hey @NEIA20 , We're actively looking into this with your most recent Transform Job. It seems like we aren't handling errors in certain cases correctly, but we don't have a fix yet. We will keep this issue updated, and when we know when we expect to have a fix deployed, we will let you know. Thanks! |
thanks @andremoeller, glad to hear it. If you need any more information from us just let me know |
Hi @NEIA20 , It seems like problem is both that the container isn't handling multiple requests well, and we aren't handling certain types of errors. There's a newer TensorFlow Serving container that you can deploy to with Would using the newer TensorFlow serving container satisfy your use case? Thank you. |
hi @andremoeller - the thing is that we're not generating predictions using a SageMaker Endpoint, we're using batch transform instead |
Hi @NEIA20 Got it -- that doc is missing some information relevant to Batch Transform, but it's still possible to use Batch with the newer TensorFlow Serving container. If you already have trained a model, you can use the new TensorFlow Serving container like this:
|
It looks like the question has been solved, so I closing this issue. |
I'm having this issue as well. The "solution" @andremoeller offered is unhelpful. |
* feature: Add experiment plus Run class (#691) * feature: Add Experiment helper classes (#646) * feature: Add Experiment helper classes feature: Add helper class _RunEnvironment * change: Change sleep retry to backoff retry for get TC * minor fixes in backoff retry Co-authored-by: Dewen Qi <[email protected]> * feature: Add helper classes and methods for Run class (#660) * feature: Add helper classes and methods for Run class * Add Parent class to address comment * fix docstyle check * Add arg docstrings in _helper Co-authored-by: Dewen Qi <[email protected]> * feature: Add Experiment Run class (#651) Co-authored-by: Dewen Qi <[email protected]> * change: Add integ tests for Run (#673) Co-authored-by: Dewen Qi <[email protected]> * Update run log metric to use MetricsManager (#678) * Update run.log_metric to use _MetricsManager * fix several metrics issues * Add doc strings to metrics.py Co-authored-by: Dana Benson <[email protected]> Co-authored-by: Dana Benson <[email protected]> Co-authored-by: Dewen Qi <[email protected]> Co-authored-by: Dewen Qi <[email protected]> Co-authored-by: Dana Benson <[email protected]> Co-authored-by: Dana Benson <[email protected]> * change: Simplify exp plus integ test configuration (#694) Co-authored-by: Dewen Qi <[email protected]> * feature: add RunName to expeirment_config (#696) * change: Update Run init and add Run load and _RunContext (#707) * change: Update Run init and add Run load Add exp name and run group name to load and address comments * Address nit comments Co-authored-by: Dewen Qi <[email protected]> * fix: Fix run name uniqueness issue (#730) Co-authored-by: Dewen Qi <[email protected]> * change: Update integ tests for Exp Plus M1 changes (#741) Co-authored-by: Dewen Qi <[email protected]> * add metrics client to session object (#745) Co-authored-by: Dewen Qi <[email protected]> Co-authored-by: Dana Benson <[email protected]> Co-authored-by: Dana Benson <[email protected]> Co-authored-by: qidewenwhen <[email protected]> * change: Add integ test for using Run in Transform Job (#749) Co-authored-by: Dewen Qi <[email protected]> * Add async metrics sink (#739) Co-authored-by: Dewen Qi <[email protected]> Co-authored-by: Dana Benson <[email protected]> Co-authored-by: Dana Benson <[email protected]> Co-authored-by: qidewenwhen <[email protected]> * use metrics client provided by session (#754) * fix flaky metrics test (#753) * change: Change Run.init and Run.load to constructor and module method respectively (#752) Co-authored-by: Dewen Qi <[email protected]> * feature: Add latest metric service model (#757) Co-authored-by: Dewen Qi <[email protected]> Co-authored-by: qidewenwhen <[email protected]> * fix: lowercase run name (#767) * Change: Minimize use of lower case tc name (#769) * change: Clean up test resources to remove model files (#756) * change: Clean up test resources to remove model files * fix: Change experiment enums to upper case * change: Upgrade boto3 and update test to validate mixed case name * fix: Update as per latest botocore release and backend change Co-authored-by: Dewen Qi <[email protected]> * lowercase trial component name (#776) * change: Expose sagemaker experiment doc strings * fix: Fix exp name mixed case in issue Co-authored-by: Dewen Qi <[email protected]> Co-authored-by: Dana Benson <[email protected]> Co-authored-by: Dana Benson <[email protected]> Co-authored-by: Yifei Zhu <[email protected]>
* feature: Add experiment plus Run class (aws#691) * feature: Add Experiment helper classes (aws#646) * feature: Add Experiment helper classes feature: Add helper class _RunEnvironment * change: Change sleep retry to backoff retry for get TC * minor fixes in backoff retry Co-authored-by: Dewen Qi <[email protected]> * feature: Add helper classes and methods for Run class (aws#660) * feature: Add helper classes and methods for Run class * Add Parent class to address comment * fix docstyle check * Add arg docstrings in _helper Co-authored-by: Dewen Qi <[email protected]> * feature: Add Experiment Run class (aws#651) Co-authored-by: Dewen Qi <[email protected]> * change: Add integ tests for Run (aws#673) Co-authored-by: Dewen Qi <[email protected]> * Update run log metric to use MetricsManager (aws#678) * Update run.log_metric to use _MetricsManager * fix several metrics issues * Add doc strings to metrics.py Co-authored-by: Dana Benson <[email protected]> Co-authored-by: Dana Benson <[email protected]> Co-authored-by: Dewen Qi <[email protected]> Co-authored-by: Dewen Qi <[email protected]> Co-authored-by: Dana Benson <[email protected]> Co-authored-by: Dana Benson <[email protected]> * change: Simplify exp plus integ test configuration (aws#694) Co-authored-by: Dewen Qi <[email protected]> * feature: add RunName to expeirment_config (aws#696) * change: Update Run init and add Run load and _RunContext (aws#707) * change: Update Run init and add Run load Add exp name and run group name to load and address comments * Address nit comments Co-authored-by: Dewen Qi <[email protected]> * fix: Fix run name uniqueness issue (aws#730) Co-authored-by: Dewen Qi <[email protected]> * change: Update integ tests for Exp Plus M1 changes (aws#741) Co-authored-by: Dewen Qi <[email protected]> * add metrics client to session object (aws#745) Co-authored-by: Dewen Qi <[email protected]> Co-authored-by: Dana Benson <[email protected]> Co-authored-by: Dana Benson <[email protected]> Co-authored-by: qidewenwhen <[email protected]> * change: Add integ test for using Run in Transform Job (aws#749) Co-authored-by: Dewen Qi <[email protected]> * Add async metrics sink (aws#739) Co-authored-by: Dewen Qi <[email protected]> Co-authored-by: Dana Benson <[email protected]> Co-authored-by: Dana Benson <[email protected]> Co-authored-by: qidewenwhen <[email protected]> * use metrics client provided by session (aws#754) * fix flaky metrics test (aws#753) * change: Change Run.init and Run.load to constructor and module method respectively (aws#752) Co-authored-by: Dewen Qi <[email protected]> * feature: Add latest metric service model (aws#757) Co-authored-by: Dewen Qi <[email protected]> Co-authored-by: qidewenwhen <[email protected]> * fix: lowercase run name (aws#767) * Change: Minimize use of lower case tc name (aws#769) * change: Clean up test resources to remove model files (aws#756) * change: Clean up test resources to remove model files * fix: Change experiment enums to upper case * change: Upgrade boto3 and update test to validate mixed case name * fix: Update as per latest botocore release and backend change Co-authored-by: Dewen Qi <[email protected]> * lowercase trial component name (aws#776) * change: Expose sagemaker experiment doc strings * fix: Fix exp name mixed case in issue Co-authored-by: Dewen Qi <[email protected]> Co-authored-by: Dana Benson <[email protected]> Co-authored-by: Dana Benson <[email protected]> Co-authored-by: Yifei Zhu <[email protected]>
* feature: Add experiment plus Run class (aws#691) * feature: Add Experiment helper classes (aws#646) * feature: Add Experiment helper classes feature: Add helper class _RunEnvironment * change: Change sleep retry to backoff retry for get TC * minor fixes in backoff retry Co-authored-by: Dewen Qi <[email protected]> * feature: Add helper classes and methods for Run class (aws#660) * feature: Add helper classes and methods for Run class * Add Parent class to address comment * fix docstyle check * Add arg docstrings in _helper Co-authored-by: Dewen Qi <[email protected]> * feature: Add Experiment Run class (aws#651) Co-authored-by: Dewen Qi <[email protected]> * change: Add integ tests for Run (aws#673) Co-authored-by: Dewen Qi <[email protected]> * Update run log metric to use MetricsManager (aws#678) * Update run.log_metric to use _MetricsManager * fix several metrics issues * Add doc strings to metrics.py Co-authored-by: Dana Benson <[email protected]> Co-authored-by: Dana Benson <[email protected]> Co-authored-by: Dewen Qi <[email protected]> Co-authored-by: Dewen Qi <[email protected]> Co-authored-by: Dana Benson <[email protected]> Co-authored-by: Dana Benson <[email protected]> * change: Simplify exp plus integ test configuration (aws#694) Co-authored-by: Dewen Qi <[email protected]> * feature: add RunName to expeirment_config (aws#696) * change: Update Run init and add Run load and _RunContext (aws#707) * change: Update Run init and add Run load Add exp name and run group name to load and address comments * Address nit comments Co-authored-by: Dewen Qi <[email protected]> * fix: Fix run name uniqueness issue (aws#730) Co-authored-by: Dewen Qi <[email protected]> * change: Update integ tests for Exp Plus M1 changes (aws#741) Co-authored-by: Dewen Qi <[email protected]> * add metrics client to session object (aws#745) Co-authored-by: Dewen Qi <[email protected]> Co-authored-by: Dana Benson <[email protected]> Co-authored-by: Dana Benson <[email protected]> Co-authored-by: qidewenwhen <[email protected]> * change: Add integ test for using Run in Transform Job (aws#749) Co-authored-by: Dewen Qi <[email protected]> * Add async metrics sink (aws#739) Co-authored-by: Dewen Qi <[email protected]> Co-authored-by: Dana Benson <[email protected]> Co-authored-by: Dana Benson <[email protected]> Co-authored-by: qidewenwhen <[email protected]> * use metrics client provided by session (aws#754) * fix flaky metrics test (aws#753) * change: Change Run.init and Run.load to constructor and module method respectively (aws#752) Co-authored-by: Dewen Qi <[email protected]> * feature: Add latest metric service model (aws#757) Co-authored-by: Dewen Qi <[email protected]> Co-authored-by: qidewenwhen <[email protected]> * fix: lowercase run name (aws#767) * Change: Minimize use of lower case tc name (aws#769) * change: Clean up test resources to remove model files (aws#756) * change: Clean up test resources to remove model files * fix: Change experiment enums to upper case * change: Upgrade boto3 and update test to validate mixed case name * fix: Update as per latest botocore release and backend change Co-authored-by: Dewen Qi <[email protected]> * lowercase trial component name (aws#776) * change: Expose sagemaker experiment doc strings * fix: Fix exp name mixed case in issue Co-authored-by: Dewen Qi <[email protected]> Co-authored-by: Dana Benson <[email protected]> Co-authored-by: Dana Benson <[email protected]> Co-authored-by: Yifei Zhu <[email protected]>
* feature: Add experiment plus Run class (#691) * feature: Add Experiment helper classes (#646) * feature: Add Experiment helper classes feature: Add helper class _RunEnvironment * change: Change sleep retry to backoff retry for get TC * minor fixes in backoff retry Co-authored-by: Dewen Qi <[email protected]> * feature: Add helper classes and methods for Run class (#660) * feature: Add helper classes and methods for Run class * Add Parent class to address comment * fix docstyle check * Add arg docstrings in _helper Co-authored-by: Dewen Qi <[email protected]> * feature: Add Experiment Run class (#651) Co-authored-by: Dewen Qi <[email protected]> * change: Add integ tests for Run (#673) Co-authored-by: Dewen Qi <[email protected]> * Update run log metric to use MetricsManager (#678) * Update run.log_metric to use _MetricsManager * fix several metrics issues * Add doc strings to metrics.py Co-authored-by: Dana Benson <[email protected]> Co-authored-by: Dana Benson <[email protected]> Co-authored-by: Dewen Qi <[email protected]> Co-authored-by: Dewen Qi <[email protected]> Co-authored-by: Dana Benson <[email protected]> Co-authored-by: Dana Benson <[email protected]> * change: Simplify exp plus integ test configuration (#694) Co-authored-by: Dewen Qi <[email protected]> * feature: add RunName to expeirment_config (#696) * change: Update Run init and add Run load and _RunContext (#707) * change: Update Run init and add Run load Add exp name and run group name to load and address comments * Address nit comments Co-authored-by: Dewen Qi <[email protected]> * fix: Fix run name uniqueness issue (#730) Co-authored-by: Dewen Qi <[email protected]> * change: Update integ tests for Exp Plus M1 changes (#741) Co-authored-by: Dewen Qi <[email protected]> * add metrics client to session object (#745) Co-authored-by: Dewen Qi <[email protected]> Co-authored-by: Dana Benson <[email protected]> Co-authored-by: Dana Benson <[email protected]> Co-authored-by: qidewenwhen <[email protected]> * change: Add integ test for using Run in Transform Job (#749) Co-authored-by: Dewen Qi <[email protected]> * Add async metrics sink (#739) Co-authored-by: Dewen Qi <[email protected]> Co-authored-by: Dana Benson <[email protected]> Co-authored-by: Dana Benson <[email protected]> Co-authored-by: qidewenwhen <[email protected]> * use metrics client provided by session (#754) * fix flaky metrics test (#753) * change: Change Run.init and Run.load to constructor and module method respectively (#752) Co-authored-by: Dewen Qi <[email protected]> * feature: Add latest metric service model (#757) Co-authored-by: Dewen Qi <[email protected]> Co-authored-by: qidewenwhen <[email protected]> * fix: lowercase run name (#767) * Change: Minimize use of lower case tc name (#769) * change: Clean up test resources to remove model files (#756) * change: Clean up test resources to remove model files * fix: Change experiment enums to upper case * change: Upgrade boto3 and update test to validate mixed case name * fix: Update as per latest botocore release and backend change Co-authored-by: Dewen Qi <[email protected]> * lowercase trial component name (#776) * change: Expose sagemaker experiment doc strings * fix: Fix exp name mixed case in issue Co-authored-by: Dewen Qi <[email protected]> Co-authored-by: Dana Benson <[email protected]> Co-authored-by: Dana Benson <[email protected]> Co-authored-by: Yifei Zhu <[email protected]>
* feature: Add experiment plus Run class (aws#691) * feature: Add Experiment helper classes (aws#646) * feature: Add Experiment helper classes feature: Add helper class _RunEnvironment * change: Change sleep retry to backoff retry for get TC * minor fixes in backoff retry Co-authored-by: Dewen Qi <[email protected]> * feature: Add helper classes and methods for Run class (aws#660) * feature: Add helper classes and methods for Run class * Add Parent class to address comment * fix docstyle check * Add arg docstrings in _helper Co-authored-by: Dewen Qi <[email protected]> * feature: Add Experiment Run class (aws#651) Co-authored-by: Dewen Qi <[email protected]> * change: Add integ tests for Run (aws#673) Co-authored-by: Dewen Qi <[email protected]> * Update run log metric to use MetricsManager (aws#678) * Update run.log_metric to use _MetricsManager * fix several metrics issues * Add doc strings to metrics.py Co-authored-by: Dana Benson <[email protected]> Co-authored-by: Dana Benson <[email protected]> Co-authored-by: Dewen Qi <[email protected]> Co-authored-by: Dewen Qi <[email protected]> Co-authored-by: Dana Benson <[email protected]> Co-authored-by: Dana Benson <[email protected]> * change: Simplify exp plus integ test configuration (aws#694) Co-authored-by: Dewen Qi <[email protected]> * feature: add RunName to expeirment_config (aws#696) * change: Update Run init and add Run load and _RunContext (aws#707) * change: Update Run init and add Run load Add exp name and run group name to load and address comments * Address nit comments Co-authored-by: Dewen Qi <[email protected]> * fix: Fix run name uniqueness issue (aws#730) Co-authored-by: Dewen Qi <[email protected]> * change: Update integ tests for Exp Plus M1 changes (aws#741) Co-authored-by: Dewen Qi <[email protected]> * add metrics client to session object (aws#745) Co-authored-by: Dewen Qi <[email protected]> Co-authored-by: Dana Benson <[email protected]> Co-authored-by: Dana Benson <[email protected]> Co-authored-by: qidewenwhen <[email protected]> * change: Add integ test for using Run in Transform Job (aws#749) Co-authored-by: Dewen Qi <[email protected]> * Add async metrics sink (aws#739) Co-authored-by: Dewen Qi <[email protected]> Co-authored-by: Dana Benson <[email protected]> Co-authored-by: Dana Benson <[email protected]> Co-authored-by: qidewenwhen <[email protected]> * use metrics client provided by session (aws#754) * fix flaky metrics test (aws#753) * change: Change Run.init and Run.load to constructor and module method respectively (aws#752) Co-authored-by: Dewen Qi <[email protected]> * feature: Add latest metric service model (aws#757) Co-authored-by: Dewen Qi <[email protected]> Co-authored-by: qidewenwhen <[email protected]> * fix: lowercase run name (aws#767) * Change: Minimize use of lower case tc name (aws#769) * change: Clean up test resources to remove model files (aws#756) * change: Clean up test resources to remove model files * fix: Change experiment enums to upper case * change: Upgrade boto3 and update test to validate mixed case name * fix: Update as per latest botocore release and backend change Co-authored-by: Dewen Qi <[email protected]> * lowercase trial component name (aws#776) * change: Expose sagemaker experiment doc strings * fix: Fix exp name mixed case in issue Co-authored-by: Dewen Qi <[email protected]> Co-authored-by: Dana Benson <[email protected]> Co-authored-by: Dana Benson <[email protected]> Co-authored-by: Yifei Zhu <[email protected]>
* feature: Add experiment plus Run class (aws#691) * feature: Add Experiment helper classes (aws#646) * feature: Add Experiment helper classes feature: Add helper class _RunEnvironment * change: Change sleep retry to backoff retry for get TC * minor fixes in backoff retry Co-authored-by: Dewen Qi <[email protected]> * feature: Add helper classes and methods for Run class (aws#660) * feature: Add helper classes and methods for Run class * Add Parent class to address comment * fix docstyle check * Add arg docstrings in _helper Co-authored-by: Dewen Qi <[email protected]> * feature: Add Experiment Run class (aws#651) Co-authored-by: Dewen Qi <[email protected]> * change: Add integ tests for Run (aws#673) Co-authored-by: Dewen Qi <[email protected]> * Update run log metric to use MetricsManager (aws#678) * Update run.log_metric to use _MetricsManager * fix several metrics issues * Add doc strings to metrics.py Co-authored-by: Dana Benson <[email protected]> Co-authored-by: Dana Benson <[email protected]> Co-authored-by: Dewen Qi <[email protected]> Co-authored-by: Dewen Qi <[email protected]> Co-authored-by: Dana Benson <[email protected]> Co-authored-by: Dana Benson <[email protected]> * change: Simplify exp plus integ test configuration (aws#694) Co-authored-by: Dewen Qi <[email protected]> * feature: add RunName to expeirment_config (aws#696) * change: Update Run init and add Run load and _RunContext (aws#707) * change: Update Run init and add Run load Add exp name and run group name to load and address comments * Address nit comments Co-authored-by: Dewen Qi <[email protected]> * fix: Fix run name uniqueness issue (aws#730) Co-authored-by: Dewen Qi <[email protected]> * change: Update integ tests for Exp Plus M1 changes (aws#741) Co-authored-by: Dewen Qi <[email protected]> * add metrics client to session object (aws#745) Co-authored-by: Dewen Qi <[email protected]> Co-authored-by: Dana Benson <[email protected]> Co-authored-by: Dana Benson <[email protected]> Co-authored-by: qidewenwhen <[email protected]> * change: Add integ test for using Run in Transform Job (aws#749) Co-authored-by: Dewen Qi <[email protected]> * Add async metrics sink (aws#739) Co-authored-by: Dewen Qi <[email protected]> Co-authored-by: Dana Benson <[email protected]> Co-authored-by: Dana Benson <[email protected]> Co-authored-by: qidewenwhen <[email protected]> * use metrics client provided by session (aws#754) * fix flaky metrics test (aws#753) * change: Change Run.init and Run.load to constructor and module method respectively (aws#752) Co-authored-by: Dewen Qi <[email protected]> * feature: Add latest metric service model (aws#757) Co-authored-by: Dewen Qi <[email protected]> Co-authored-by: qidewenwhen <[email protected]> * fix: lowercase run name (aws#767) * Change: Minimize use of lower case tc name (aws#769) * change: Clean up test resources to remove model files (aws#756) * change: Clean up test resources to remove model files * fix: Change experiment enums to upper case * change: Upgrade boto3 and update test to validate mixed case name * fix: Update as per latest botocore release and backend change Co-authored-by: Dewen Qi <[email protected]> * lowercase trial component name (aws#776) * change: Expose sagemaker experiment doc strings * fix: Fix exp name mixed case in issue Co-authored-by: Dewen Qi <[email protected]> Co-authored-by: Dana Benson <[email protected]> Co-authored-by: Dana Benson <[email protected]> Co-authored-by: Yifei Zhu <[email protected]>
* feature: Add experiment plus Run class (aws#691) * feature: Add Experiment helper classes (aws#646) * feature: Add Experiment helper classes feature: Add helper class _RunEnvironment * change: Change sleep retry to backoff retry for get TC * minor fixes in backoff retry Co-authored-by: Dewen Qi <[email protected]> * feature: Add helper classes and methods for Run class (aws#660) * feature: Add helper classes and methods for Run class * Add Parent class to address comment * fix docstyle check * Add arg docstrings in _helper Co-authored-by: Dewen Qi <[email protected]> * feature: Add Experiment Run class (aws#651) Co-authored-by: Dewen Qi <[email protected]> * change: Add integ tests for Run (aws#673) Co-authored-by: Dewen Qi <[email protected]> * Update run log metric to use MetricsManager (aws#678) * Update run.log_metric to use _MetricsManager * fix several metrics issues * Add doc strings to metrics.py Co-authored-by: Dana Benson <[email protected]> Co-authored-by: Dana Benson <[email protected]> Co-authored-by: Dewen Qi <[email protected]> Co-authored-by: Dewen Qi <[email protected]> Co-authored-by: Dana Benson <[email protected]> Co-authored-by: Dana Benson <[email protected]> * change: Simplify exp plus integ test configuration (aws#694) Co-authored-by: Dewen Qi <[email protected]> * feature: add RunName to expeirment_config (aws#696) * change: Update Run init and add Run load and _RunContext (aws#707) * change: Update Run init and add Run load Add exp name and run group name to load and address comments * Address nit comments Co-authored-by: Dewen Qi <[email protected]> * fix: Fix run name uniqueness issue (aws#730) Co-authored-by: Dewen Qi <[email protected]> * change: Update integ tests for Exp Plus M1 changes (aws#741) Co-authored-by: Dewen Qi <[email protected]> * add metrics client to session object (aws#745) Co-authored-by: Dewen Qi <[email protected]> Co-authored-by: Dana Benson <[email protected]> Co-authored-by: Dana Benson <[email protected]> Co-authored-by: qidewenwhen <[email protected]> * change: Add integ test for using Run in Transform Job (aws#749) Co-authored-by: Dewen Qi <[email protected]> * Add async metrics sink (aws#739) Co-authored-by: Dewen Qi <[email protected]> Co-authored-by: Dana Benson <[email protected]> Co-authored-by: Dana Benson <[email protected]> Co-authored-by: qidewenwhen <[email protected]> * use metrics client provided by session (aws#754) * fix flaky metrics test (aws#753) * change: Change Run.init and Run.load to constructor and module method respectively (aws#752) Co-authored-by: Dewen Qi <[email protected]> * feature: Add latest metric service model (aws#757) Co-authored-by: Dewen Qi <[email protected]> Co-authored-by: qidewenwhen <[email protected]> * fix: lowercase run name (aws#767) * Change: Minimize use of lower case tc name (aws#769) * change: Clean up test resources to remove model files (aws#756) * change: Clean up test resources to remove model files * fix: Change experiment enums to upper case * change: Upgrade boto3 and update test to validate mixed case name * fix: Update as per latest botocore release and backend change Co-authored-by: Dewen Qi <[email protected]> * lowercase trial component name (aws#776) * change: Expose sagemaker experiment doc strings * fix: Fix exp name mixed case in issue Co-authored-by: Dewen Qi <[email protected]> Co-authored-by: Dana Benson <[email protected]> Co-authored-by: Dana Benson <[email protected]> Co-authored-by: Yifei Zhu <[email protected]>
Please fill out the form below.
System Information
Describe the problem
We're using sagemaker to parallelize a tensorflow job. We create a model using tensorflow. Training completes successfully. When the job moves on to transformation, it fails with an error: “Unable to get response from algorithm.”
Minimal repro / logs
Stack trace:
Transformation logs:
Tensorflow model settings:
Transformation settings:
The text was updated successfully, but these errors were encountered: