-
Notifications
You must be signed in to change notification settings - Fork 1.2k
Cannot colocate nodes, Cannot merge devices with incompatible jobs: '/job:master/task:0' and '/job:ps/task:1' #328
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
sess = tf.Session(config=tf.ConfigProto(allow_soft_placement=True, log_device_placement=False)) |
Hello, This will be difficult to diagnose without getting a minimal repro. Thanks! |
Distributed tensorflow training is not currently supported if you use the keras_model_fn. See the following: |
A new top-level directory requires a separate change to show up in Amazon SageMaker, so moving these back under existing top-level directories
Please fill out the form below.
System Information
Describe the problem
I created a keras_model_fn and am trying to train the model on 3 c4 instances. Unfortunately, I get the following error (detailed below).
Stackoverflow suggest using soft_placement (dont know what that means, or how to use it)
Help!
Minimal repro / logs
InvalidArgumentError (see above for traceback): Cannot colocate nodes 'embedding_1/embeddings' and 'training/Adam/gradients/embedding_1/GatherV2_grad/Shape: Cannot merge devices with incompatible jobs: '/job:master/task:0' and '/job:ps/task:1'
#11 [[Node: embedding_1/embeddings = VariableV2_class=["loc:@embedding_1/embeddings"], container="", dtype=DT_FLOAT, shape=[28,300], shared_name="", _device="/job:ps/task:1"]]
The text was updated successfully, but these errors were encountered: