Skip to content

Improve error logging when invoking custom handler methods #164

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 5 commits into from
Mar 15, 2024

Conversation

namannandan
Copy link
Contributor

@namannandan namannandan commented Mar 13, 2024

Issue #163

Description of changes:
Improve debuggability during model load and inference failures caused by custom handler method implementation.
This is done by logging the exception traceback in addition to sending the traceback in the response back to client. Although this trackback is sent back to the client in the response body, the client may sometimes fail to load entire response body, for ex:
botocore.errorfactory.ModelError: An error occurred (ModelError) when calling the InvokeEndpoint operation: Received server error (500) from primary and could not load the entire response body. See https://us-west-2.console.aws.amazon.com/cloudwatch/home?region=us-west-2#logEventViewer:group=/aws/sagemaker/Endpoints/sagemaker-pytorch-serving-**********-**** in account ************ for more information.

Testing:
Using a custom handler with expected error:

.....
.....
def predict_fn(input_data, model_pack):

    print("predict_fn got input Data: {}".format(input_data))
    model = model_pack[0]
    tokenizer = model_pack[1]
    mapping_file_path = model_pack[2]

    with open(mapping_file_path) as f:
        mapping = json.load(f)

    assert False

    inputs = tokenizer.encode_plus(
        input_data,
        max_length=128,
        pad_to_max_length=True,
        add_special_tokens=True,
        return_tensors="pt",
    )
.....
.....

On deploying and making inference request, the cloudwatch logs contain the following log line:
2024-03-15T00:54:26,721 [INFO ] W-9000-model_1.0-stdout MODEL_LOG - Transform failed for model: model. Error traceback: ['Traceback (most recent call last):', ' File "/sagemaker-pytorch-inference-toolkit/src/sagemaker_inference/transformer.py", line 150, in transform', ' result = self._run_handler_function(', ' File "/sagemaker-pytorch-inference-toolkit/src/sagemaker_inference/transformer.py", line 284, in _run_handler_function', ' result = func(*argv_context)', ' File "/sagemaker-pytorch-inference-toolkit/src/sagemaker_inference/transformer.py", line 268, in _default_transform_fn', ' prediction = self._run_handler_function(self._predict_fn, *(data, model))', ' File "/sagemaker-pytorch-inference-toolkit/src/sagemaker_inference/transformer.py", line 280, in _run_handler_function', ' result = func(*argv)', ' File "/opt/ml/model/code/custom_inference.py", line 52, in predict_fn', ' assert False', 'AssertionError']

Note that the traceback is printed as a list of strings instead of a multi line string because this can cause other log statements to get interleaved with the exception traceback.

@namannandan namannandan requested review from nikhil-sk and lxning March 14, 2024 18:26
@namannandan namannandan force-pushed the improve-error-logging branch from 10375dc to 88d7eb5 Compare March 15, 2024 00:44
nikhil-sk
nikhil-sk previously approved these changes Mar 15, 2024
@namannandan namannandan merged commit d49082e into aws:master Mar 15, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants