You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Describe the problem or feature request clearly here.
Minimal repro / logs
Please provide any logs and a bare minimum reproducible test case, as this will be helpful to diagnose the problem. If including tracebacks, please include the full traceback. Large logs and files should be attached.
The crash is reproducible on any completed TensorFlow job which was executed in Script Mode.
After attaching to such job and getting hyperparameters from it, the SDK crashes.
When submitting TensorFlow job in script mode, it is not possible to specify checkpoint_path anymore (if specified, the SDK raises exception stating that parameter is not supported in script mode).
When attaching to the completed job, the SDK does not set value for the checkpoint_path variable (for obvious reasons) and does not populate private variable _current_job_name as well (which is supposed to hold job name).
When call to hyperparameters() is made, first it tries to get checkpoint_path (which is not used below for the script mode - there is an if). But since checkpoint_path is not defined, it calls _default_s3_path() method to recreate one. That function in turn just concatenates series of sub-paths including _current_job_name variable which is set to None. So, here SDK crashes.
To fix it, the code line which queries check point path should be moved to the else section where it is used.
The text was updated successfully, but these errors were encountered:
Please fill out the form below.
System Information
Describe the problem
Describe the problem or feature request clearly here.
Minimal repro / logs
Please provide any logs and a bare minimum reproducible test case, as this will be helpful to diagnose the problem. If including tracebacks, please include the full traceback. Large logs and files should be attached.
The crash is reproducible on any completed TensorFlow job which was executed in Script Mode.
After attaching to such job and getting hyperparameters from it, the SDK crashes.
Analysis
When submitting TensorFlow job in script mode, it is not possible to specify
checkpoint_path
anymore (if specified, the SDK raises exception stating that parameter is not supported in script mode).When attaching to the completed job, the SDK does not set value for the
checkpoint_path
variable (for obvious reasons) and does not populate private variable_current_job_name
as well (which is supposed to hold job name).When call to
hyperparameters()
is made, first it tries to getcheckpoint_path
(which is not used below for the script mode - there is anif
). But sincecheckpoint_path
is not defined, it calls_default_s3_path()
method to recreate one. That function in turn just concatenates series of sub-paths including_current_job_name
variable which is set toNone
. So, here SDK crashes.To fix it, the code line which queries check point path should be moved to the
else
section where it is used.The text was updated successfully, but these errors were encountered: