-
Notifications
You must be signed in to change notification settings - Fork 1.2k
sagemaker-containers ERROR ExecuteUserScriptError without error details #1225
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
I've reached out to the team that owns SKLearn to see if they have any insight. Thanks for the detailed information. |
It was a computing resource issue: it succeeded in local mode on an The logs are not meaningful when the training job fails for this reason. |
Does anyone know the root cause of this issue? I met the similar issue in another notebook. |
I am also having somewhat same issue .Tried changing the gpu wrt my region ,still issue persists. |
unable to resolve the error even by using powerful training instances ml.m5.2xlarge. how to solve it. Any suggestion is appreciated. |
Please fill out the form below.
System Information
Describe the problem
Sagemaker training job fails giving no trace back error messages only message like below:
Minimal repro / logs
Please provide any logs and a bare minimum reproducible test case, as this will be helpful to diagnose the problem. If including tracebacks, please include the full traceback. Large logs and files should be attached.
I instanciate an SKLearn estimator:
The provided training script
mailswitch-train-and-calibrate-lsvc.py
is:The error occurs when fitting the estimator:
All the input path are correct locations on s3.
Note that the training succeeds in local mode when specifying
train_instance_type='local'
in the SKLearn estimator instanciation (and commenting out the line definingsagemaker_session
)The complete logs of the failed
fit
execution is:And the Traceback is:
The text was updated successfully, but these errors were encountered: