How to use Sagemaker to create an Inference Pipeline that tokenizes and converts text to word indices? #520

mshseek · 2018-12-01T02:49:28Z

I have a Sagemaker endpoint that is a tensorflow serving which expects as input a serialized TF Example.

Now I need to convert a string into a sequence of word indices by doing a lookup against a txt file, convert it into a serialised TF Example and pass it to the endpoint above.

Is Sagemaker the best tool to use to build the above preprocessing step? The closest example I could find is the inference_pipeline_sparkml_blazingtext_dbpedia notebook but this notebook does not show how to convert the tokenized text into word indices.

I was thinking of using scikit learn to do the feature transformation but am unsure how to do the word index lookup and whether the Scikit estimator in Sagemaker will allow me to call tensorflow functions to create the TF Example.

orchidmajumder · 2018-12-06T01:30:15Z

Hi, thanks for using Amazon SageMaker. For your use-case, the example that you looked is indeed the right approach.

Continue to use Spark

In order to modify the same example for your use-case, you need to use StringIndexer feature transformer from SparkML. In the Pipeline, after Tokenizer, you need to have a StringIndexer and a OneHotEncoder to build your feature processing step.

Take a look at the notebook mentioned below. Though the task is not related to text analytics, the dataset has categorical columns and the example uses the two feature processors I mentioned above.

https://github.com/awslabs/amazon-sagemaker-examples/tree/master/advanced_functionality/inference_pipeline_sparkml_xgboost_abalone

Use Scikit-learn instead of Spark

However, if you want to use Scikit-learn instead of Spark, you can use that as well. Here is a notebook on how to use Scikit-learn for an Inference Pipeline:
https://github.com/awslabs/amazon-sagemaker-examples/tree/master/sagemaker-python-sdk/scikit_learn_inference_pipeline

Again, you need to use Scikit-learn LabelEncoder to convert word indices into an integer mapping.

And you can indeed use Scikit-learn with SageMaker Tensorflow in an Inference Pipeline setup.

laurenyu · 2018-12-20T20:36:36Z

closing due to inactivity. feel free to reopen if necessary.

ansh997 · 2019-06-25T11:09:18Z

In my use case, I have to feed a sparse matrix to an algorithm which is already deployed. I have used one hot encoder to convert an alphanumeric string into a sparse matrix as per my use-case to train the model. But, once the model is deployed, I can't use that one hot encoder inside the model. So, I tried to use a pipeline to process the input before feeding it into the deployed model, For which I have to deploy another model whose endpoint would act as a feed for my previous model. How should I achieve this?

I have already tried to make a preprocessing model but after deploying the model, I am unable to get the desired result. I need my preprocessing model to return a sparse matrix but I am getting error 500. Is there another way to tackle this?
Note: My model needs input in the form of a sparse matrix to make predictions.
ERR: ModelError: An error occurred (ModelError) when calling the InvokeEndpoint operation: Received server error (500) from model with message " <title>500 Internal Server Error</title>

Internal Server Error

The server encountered an internal error and was unable to complete your request. Either the server is overloaded or there is an error in the application.

Co-authored-by: Aaron Markham <[email protected]>

laurenyu closed this as completed Dec 20, 2018

ChoiByungWook pushed a commit that referenced this issue Dec 8, 2020

doc: minor updates to doc strings (#520)

47439f7

Co-authored-by: Aaron Markham <[email protected]>

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

How to use Sagemaker to create an Inference Pipeline that tokenizes and converts text to word indices? #520

How to use Sagemaker to create an Inference Pipeline that tokenizes and converts text to word indices? #520

mshseek commented Dec 1, 2018 •

edited

Loading

orchidmajumder commented Dec 6, 2018 •

edited

Loading

Uh oh!

laurenyu commented Dec 20, 2018

Uh oh!

ansh997 commented Jun 25, 2019

Uh oh!

How to use Sagemaker to create an Inference Pipeline that tokenizes and converts text to word indices? #520

How to use Sagemaker to create an Inference Pipeline that tokenizes and converts text to word indices? #520

Comments

mshseek commented Dec 1, 2018 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

orchidmajumder commented Dec 6, 2018 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Continue to use Spark

Use Scikit-learn instead of Spark

Uh oh!

laurenyu commented Dec 20, 2018

Uh oh!

ansh997 commented Jun 25, 2019

Internal Server Error

Uh oh!

mshseek commented Dec 1, 2018 •

edited

Loading

orchidmajumder commented Dec 6, 2018 •

edited

Loading