NCHW format not supported in c5.xlarge deployment #771

gautiese · 2019-04-29T12:30:34Z

Please fill out the form below.

System Information

Framework (e.g. TensorFlow) / Algorithm (e.g. KMeans): Tensorflow
Framework Version:1.12
Python Version:2.7
CPU or GPU:CPU
Python SDK Version:
Are you using a custom image:No

Describe the problem

Describe the problem or feature request clearly here.

We trained a model on Tensorflow which consumes images in the format of NCHW. The training was done on GPUs (I believe NCHW is only supported by GPUs or MKL capable Intel processors).

When I try to infer from the model using an endpoint which is ml.c5.xlarge, I get the following error:

E external/org_tensorflow/tensorflow/core/common_runtime/executor.cc:623] Executor failed to create kernel. Invalid argument: Conv2DCustomBackpropInputOp only supports NHWC.

#11 [[{{node Gs/cond/8x8/Conv0_up/conv2d_transpose}} = Conv2DBackpropInput[T=DT_FLOAT, _output_shapes=[[1,512,8,8]], data_format="NCHW", dilations=[1, 1, 1, 1], padding="SAME", strides=[1, 1, 2, 2], use_cudnn_on_gpu=true, _device="/job:localhost/replica:0/task:0/device:CPU:0"](Gs/cond/8x8/Conv0_up/conv2d_transpose/output_shape, Gs/cond/8x8/Conv0_up/AddN, Gs/cond/ToRGB_lod6/Conv2D/Switch:1)]]

Strangely, when I deploy the same on the local notebook instance (which is also ml.c5.xlarge) the model works just fine!

jesterhazy · 2019-04-29T17:02:06Z

Hi @gautiese, thanks for using SageMaker!

Yes, it looks like you are using a non-MKL build of TensorFlow in SageMaker, and an MKL build in EC2.

Which "framework_version" (or container image uri) are you using when you create the endpoint?

gautiese · 2019-04-30T01:11:03Z

I am using 1.13 as the framework version. Not using a custom image. Is the sagemaker tensorflow inference image a non-MKL build?

…

On Mon, 29 Apr 2019 at 10:32 PM, jesterhazy ***@***.***> wrote: Hi @gautiese <https://github.com/gautiese>, thanks for using SageMaker! Yes, it looks like you are using a non-MKL build of TensorFlow in SageMaker, and an MKL build in EC2. Which "framework_version" (or container image uri) are you using when you create the endpoint? — You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub <#771 (comment)>, or mute the thread <https://github.com/notifications/unsubscribe-auth/AES7VK5SFHJ4WRXDRAKT2A3PS4SZHANCNFSM4HJC3ZTA> .

jesterhazy · 2019-04-30T15:27:21Z

We haven't released any tensorflow 1.13 container yet, are you sure that's the right version? The info in the original post says 1.12, so I'm going to assume that's still correct.

Right now our tensorflow containers do not include an MKL build of TensorFlow Serving. We plan to add it but don't have a target release date yet.

In the meantime, your best bet would be to change your model so it accepts NHWC inputs.

I'm going to tag this as a feature request (for MKL support) and leave it open.

gautiese · 2019-05-01T04:36:10Z

You are right, I am on 1.12.
Will be waiting eagerly for the MKL containers!
Re training this model will be very expensive at this moment.

gautiese · 2019-07-11T02:56:13Z

If I was to create an MKL-DNN sagemaker tensorflow serving container myself... How would I go about it? I am being pressed to deploy my model on a CPU instance soon (budgets)

martinRenou · 2023-09-28T13:08:49Z

More recent containers now come with the MKL optimization. Closing as fixed.

jesterhazy added the feature request label Apr 30, 2019

laurenyu added the NON-PY-SDK label May 8, 2020

ajaykarpur added type: feature request and removed type: feature request labels Jun 24, 2020

shreyapandit removed the type: unrelated to Python SDK label Jul 29, 2021

martinRenou closed this as completed Sep 28, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

NCHW format not supported in c5.xlarge deployment #771

NCHW format not supported in c5.xlarge deployment #771

gautiese commented Apr 29, 2019

jesterhazy commented Apr 29, 2019

gautiese commented Apr 30, 2019 via email

jesterhazy commented Apr 30, 2019 •

edited

Loading

gautiese commented May 1, 2019

gautiese commented Jul 11, 2019

martinRenou commented Sep 28, 2023

NCHW format not supported in c5.xlarge deployment #771

NCHW format not supported in c5.xlarge deployment #771

Comments

gautiese commented Apr 29, 2019

System Information

Describe the problem

jesterhazy commented Apr 29, 2019

gautiese commented Apr 30, 2019 via email

jesterhazy commented Apr 30, 2019 • edited Loading

gautiese commented May 1, 2019

gautiese commented Jul 11, 2019

martinRenou commented Sep 28, 2023

jesterhazy commented Apr 30, 2019 •

edited

Loading