Skip to content

NCHW format not supported in c5.xlarge deployment #771

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
gautiese opened this issue Apr 29, 2019 · 6 comments
Closed

NCHW format not supported in c5.xlarge deployment #771

gautiese opened this issue Apr 29, 2019 · 6 comments

Comments

@gautiese
Copy link

Please fill out the form below.

System Information

  • Framework (e.g. TensorFlow) / Algorithm (e.g. KMeans): Tensorflow
  • Framework Version:1.12
  • Python Version:2.7
  • CPU or GPU:CPU
  • Python SDK Version:
  • Are you using a custom image:No

Describe the problem

Describe the problem or feature request clearly here.

We trained a model on Tensorflow which consumes images in the format of NCHW. The training was done on GPUs (I believe NCHW is only supported by GPUs or MKL capable Intel processors).

When I try to infer from the model using an endpoint which is ml.c5.xlarge, I get the following error:

E external/org_tensorflow/tensorflow/core/common_runtime/executor.cc:623] Executor failed to create kernel. Invalid argument: Conv2DCustomBackpropInputOp only supports NHWC.

#11 [[{{node Gs/cond/8x8/Conv0_up/conv2d_transpose}} = Conv2DBackpropInput[T=DT_FLOAT, _output_shapes=[[1,512,8,8]], data_format="NCHW", dilations=[1, 1, 1, 1], padding="SAME", strides=[1, 1, 2, 2], use_cudnn_on_gpu=true, _device="/job:localhost/replica:0/task:0/device:CPU:0"](Gs/cond/8x8/Conv0_up/conv2d_transpose/output_shape, Gs/cond/8x8/Conv0_up/AddN, Gs/cond/ToRGB_lod6/Conv2D/Switch:1)]]

Strangely, when I deploy the same on the local notebook instance (which is also ml.c5.xlarge) the model works just fine!

@jesterhazy
Copy link
Contributor

Hi @gautiese, thanks for using SageMaker!

Yes, it looks like you are using a non-MKL build of TensorFlow in SageMaker, and an MKL build in EC2.

Which "framework_version" (or container image uri) are you using when you create the endpoint?

@gautiese
Copy link
Author

gautiese commented Apr 30, 2019 via email

@jesterhazy
Copy link
Contributor

jesterhazy commented Apr 30, 2019

We haven't released any tensorflow 1.13 container yet, are you sure that's the right version? The info in the original post says 1.12, so I'm going to assume that's still correct.

Right now our tensorflow containers do not include an MKL build of TensorFlow Serving. We plan to add it but don't have a target release date yet.

In the meantime, your best bet would be to change your model so it accepts NHWC inputs.

I'm going to tag this as a feature request (for MKL support) and leave it open.

@gautiese
Copy link
Author

gautiese commented May 1, 2019

You are right, I am on 1.12.
Will be waiting eagerly for the MKL containers!
Re training this model will be very expensive at this moment.

@gautiese
Copy link
Author

If I was to create an MKL-DNN sagemaker tensorflow serving container myself... How would I go about it? I am being pressed to deploy my model on a CPU instance soon (budgets)

@martinRenou
Copy link
Collaborator

More recent containers now come with the MKL optimization. Closing as fixed.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

6 participants