documentation: TFS support for pre/processing functions (aws#807)

mvsusp · web-flow · commit fbe1802af9a7 · 2019-05-31T10:54:56.000-07:00
diff --git a/src/sagemaker/tensorflow/deploying_tensorflow_serving.rst b/src/sagemaker/tensorflow/deploying_tensorflow_serving.rst
@@ -269,6 +269,192 @@ More information on how to create ``export_outputs`` can be found in `specifying
 refer to TensorFlow's `Save and Restore <https://www.tensorflow.org/guide/saved_model>`_ documentation for other ways to control the
 inference-time behavior of your SavedModels.
 
+Providing Python scripts for pre/pos-processing
+~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
+
+You can add your customized Python code to process your input and output data:
+
+.. code::
+
+    from sagemaker.tensorflow.serving import Model
+
+    model = Model(entry_point='inference.py',
+                  model_data='s3://mybucket/model.tar.gz',
+                  role='MySageMakerRole')
+
+How to implement the pre- and/or post-processing handler(s)
+^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
+
+Your entry point file should implement either a pair of ``input_handler``
+   and ``output_handler`` functions or a single ``handler`` function.
+   Note that if ``handler`` function is implemented, ``input_handler``
+   and ``output_handler`` are ignored.
+
+To implement pre- and/or post-processing handler(s), use the Context
+object that the Python service creates. The Context object is a namedtuple with the following attributes:
+
+-  ``model_name (string)``: the name of the model to use for
+   inference. For example, 'half-plus-three'
+
+-  ``model_version (string)``: version of the model. For example, '5'
+
+-  ``method (string)``: inference method. For example, 'predict',
+   'classify' or 'regress', for more information on methods, please see
+   `Classify and Regress
+   API <https://www.tensorflow.org/tfx/serving/api_rest#classify_and_regress_api>`__
+   and `Predict
+   API <https://www.tensorflow.org/tfx/serving/api_rest#predict_api>`__
+
+-  ``rest_uri (string)``: the TFS REST uri generated by the Python
+   service. For example,
+   'http://localhost:8501/v1/models/half_plus_three:predict'
+
+-  ``grpc_uri (string)``: the GRPC port number generated by the Python
+   service. For example, '9000'
+
+-  ``custom_attributes (string)``: content of
+   'X-Amzn-SageMaker-Custom-Attributes' header from the original
+   request. For example,
+   'tfs-model-name=half*plus*\ three,tfs-method=predict'
+
+-  ``request_content_type (string)``: the original request content type,
+   defaulted to 'application/json' if not provided
+
+-  ``accept_header (string)``: the original request accept type,
+   defaulted to 'application/json' if not provided
+
+-  ``content_length (int)``: content length of the original request
+
+The following code example implements ``input_handler`` and
+``output_handler``. By providing these, the Python service posts the
+request to the TFS REST URI with the data pre-processed by ``input_handler``
+and passes the response to ``output_handler`` for post-processing.
+
+.. code::
+
+   import json
+
+   def input_handler(data, context):
+       """ Pre-process request input before it is sent to TensorFlow Serving REST API
+       Args:
+           data (obj): the request data, in format of dict or string
+           context (Context): an object containing request and configuration details
+       Returns:
+           (dict): a JSON-serializable dict that contains request body and headers
+       """
+       if context.request_content_type == 'application/json':
+           # pass through json (assumes it's correctly formed)
+           d = data.read().decode('utf-8')
+           return d if len(d) else ''
+
+       if context.request_content_type == 'text/csv':
+           # very simple csv handler
+           return json.dumps({
+               'instances': [float(x) for x in data.read().decode('utf-8').split(',')]
+           })
+
+       raise ValueError('{{"error": "unsupported content type {}"}}'.format(
+           context.request_content_type or "unknown"))
+
+
+   def output_handler(data, context):
+       """Post-process TensorFlow Serving output before it is returned to the client.
+       Args:
+           data (obj): the TensorFlow serving response
+           context (Context): an object containing request and configuration details
+       Returns:
+           (bytes, string): data to return to client, response content type
+       """
+       if data.status_code != 200:
+           raise ValueError(data.content.decode('utf-8'))
+
+       response_content_type = context.accept_header
+       prediction = data.content
+       return prediction, response_content_type
+
+You might want to have complete control over the request.
+For example, you might want to make a TFS request (REST or GRPC) to the first model,
+inspect the results, and then make a request to a second model. In this case, implement
+the ``handler`` method instead of the ``input_handler`` and ``output_handler`` methods, as demonstrated
+in the following code:
+
+.. code::
+
+   import json
+   import requests
+
+
+   def handler(data, context):
+       """Handle request.
+       Args:
+           data (obj): the request data
+           context (Context): an object containing request and configuration details
+       Returns:
+           (bytes, string): data to return to client, (optional) response content type
+       """
+       processed_input = _process_input(data, context)
+       response = requests.post(context.rest_uri, data=processed_input)
+       return _process_output(response, context)
+
+
+   def _process_input(data, context):
+       if context.request_content_type == 'application/json':
+           # pass through json (assumes it's correctly formed)
+           d = data.read().decode('utf-8')
+           return d if len(d) else ''
+
+       if context.request_content_type == 'text/csv':
+           # very simple csv handler
+           return json.dumps({
+               'instances': [float(x) for x in data.read().decode('utf-8').split(',')]
+           })
+
+       raise ValueError('{{"error": "unsupported content type {}"}}'.format(
+           context.request_content_type or "unknown"))
+
+
+   def _process_output(data, context):
+       if data.status_code != 200:
+           raise ValueError(data.content.decode('utf-8'))
+
+       response_content_type = context.accept_header
+       prediction = data.content
+       return prediction, response_content_type
+
+You can also bring in external dependencies to help with your data
+processing. There are 2 ways to do this:
+
+1. If you included ``requirements.txt`` in your ``source_dir`` or in
+    your dependencies, the container installs the Python dependencies at runtime using ``pip install -r``:
+
+.. code::
+
+    from sagemaker.tensorflow.serving import Model
+
+    model = Model(entry_point='inference.py',
+                  dependencies=['requirements.txt'],
+                  model_data='s3://mybucket/model.tar.gz',
+                  role='MySageMakerRole')
+
+
+2. If you are working in a network-isolation situation or if you don't
+   want to install dependencies at runtime every time your endpoint starts or a batch
+   transform job runs, you might want to put
+   pre-downloaded dependencies under a ``lib`` directory and this
+   directory as dependency. The container adds the modules to the Python
+   path. Note that if both ``lib`` and ``requirements.txt``
+   are present in the model archive, the ``requirements.txt`` is ignored:
+
+.. code::
+
+    from sagemaker.tensorflow.serving import Model
+
+    model = Model(entry_point='inference.py',
+                  dependencies=['/path/to/folder/named/lib'],
+                  model_data='s3://mybucket/model.tar.gz',
+                  role='MySageMakerRole')
+
+
 Deploying more than one model to your Endpoint
 ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~