Skip to content

Add new unit and integration tests #155

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 2 commits into from
Oct 16, 2023
Merged

Conversation

sachanub
Copy link
Contributor

@sachanub sachanub commented Oct 10, 2023

Issue #, if available:

Description of changes:

The objective of this PR is to add new unit and integration tests. Specifically, we include the following:

  • Unit test for ts_environment.py: test/unit/test_ts_environment.py
    • Added unit test to create a mock TorchServe environment and check the values of the environment variables.
  • SageMaker integration tests for MNIST inductor: test/integration/sagemaker/test_mnist_inductor.py
    • test/resources/mnist/model_inductor:
      • Included model files from DLC repo i.e. code/mnist.py and torch_model.pth
    • test/integration/__init__.py:
      • Added logic to define variables for MNIST inductor script and MNIST inductor model artifact.
    • test/integration/sagemaker/test_mnist_inductor.py:
      • Added CPU test to create and test endpoint with ml.c5.9xlarge instance type.
      • Added GPU test to create and test endpoints with ml.p3.2xlarge, ml.g4dn.4xlarge and ml.g5.4xlarge instance types.
  • SageMaker integration tests for multi-model endpoint (MME): test/integration/sagemaker/test_multi_model_endpoint_sagemaker.py
    • test/resources/mme:
      • Copied ResNet-18 model artifact from test/resources/resnet18/default_model.
      • Copied traced ResNet-18 model artifact from test/resources/resnet18/default_traced_resnet.
    • test/integration/__init__.py:
      • Add logic to create model artifacts for the MME tests.
    • test/integration/sagemaker/test_multi_model_endpoint_sagemaker.py:
      • Added test to check only ResNet-18 model is available on the multi-model endpoint.
      • Added test to check only traced ResNet-18 model is available on the multi-model endpoint.
      • Added test to check both models are available on the multi-model endpoint.
      • Added test to no models are available on the multi-model endpoint.
      • Added test to invoke both the ResNet-18 and traced ResNet-18 models and verify output length.
  • Checks for verifying that dependencies from requirements.txt are installed:
    • requirements.txt file: This file has the transformers dependency.
      • test/resources/mnist/model_cpu/code/requirements.txt.
    • Added logic to try and import transformers in the following script:
      • test/resources/mnist/model_cpu/code/mnist.py.
    • test/integration/__init__.py:
      • Added logic to include requirements.txt in MNIST tar file.
    • test/utils/file_utils.py:
      • Added logic to include requirements.txt in tar file creation.

UPDATE:

  • Refactored test/integration/__init__.py to obtain model information from test/integration/all_models_info.json and iteratively create attributes for the model directory, model script name and model tar file.

Added the following local integration tests:

  • Local integration test for multi-model endpoint:
    • test/integration/sagemaker/test_multi_model_endpoint_local.py:
      • Added test to send a successful ping.
      • Added test to check that list of models is empty.
      • Added test to check that the ResNet-18 and traced ResNet-18 models are successfully loaded.
      • Added test to check that the ResNet-18 model is successfully unloaded.
      • Added test to assert failure on trying to unload a model which has already been unloaded i.e. ResNet-18.
      • Added test to assert failure on trying to load a model which has already been loaded i.e. traced ResNet-18.
      • Added test to invoke both the ResNet-18 and traced ResNet-18 models and verify output length.
  • Local integration test to check that all GPU IDs are returned on multi-GPU host:
    • test/integration/__init__.py:
      • Define directory path for model_gpu_context.
    • test/resources/model_gpu_context/code/inference.py:
      • Create custom model_fn, input_fn, predict_fn and output_fn to dynamically get all the GPU IDs from the context object and their corresponding PIDs and thread IDs and write them in different CSV files. Return dummy values from the functions.
    • test/integration/local/test_model_fn_context.py:
      • Added test to check all GPU IDs are returned by the context object in the model_fn, input_fn, predict_fn and output_fn functions by reading from the CSV files created in test/resources/model_gpu_context/code/inference.py.
      • Added test to check that for a given device ID, PID and thread ID are same for the model_fn, input_fn, predict_fn, output_fn functions.

By submitting this pull request, I confirm that you can use, modify, copy, and redistribute this contribution, under the terms of your choice.

@@ -0,0 +1,39 @@
# Copyright 2019-2020 Amazon.com, Inc. or its affiliates. All Rights Reserved.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added test for is_env_set() in test_ts_environment.py.

@chen3933
Copy link
Contributor

I notice we don't have local integration test. Is there any use case we want to cover in local integration test?

@chen3933
Copy link
Contributor

I didn't see user script overwrites input_fn or output_fn. Do we want to add those tests?

@sachanub
Copy link
Contributor Author

I notice we don't have local integration test. Is there any use case we want to cover in local integration test?

Added the following local integration tests:

  • MME tests.
  • Test to check all GPU device IDs are dynamically returned by the context object in model_fn.

@sachanub
Copy link
Contributor Author

I didn't see user script overwrites input_fn or output_fn. Do we want to add those tests?

We define custom input_fn and output_fn for this test: https://github.com/aws/sagemaker-pytorch-inference-toolkit/blob/master/test/integration/local/test_serving.py#L87. This script defines custom input_fn and output_fn functions: https://github.com/aws/sagemaker-pytorch-inference-toolkit/blob/master/test/resources/mnist/model_cpu/code/call_model_fn_once.py

device = torch.device("cuda:" + str(context.system_properties.get("gpu_id")))
device_str = str(device)[-1]
with open(file_path, "a") as file:
file.write(device_str + "\n")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We can add both pid-thread-id and device_id. With the identifier and we can verify device_id are consistent when calling input_fn etc.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Acknowledged. Fixed in latest commit.

input_fn_device_info, output_fn_device_info, predict_fn_device_info
):

device_id_input_fn, pid_input_fn, threadid_input_fn = input_fn_row
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We cannot guarantee the sequence of input_fn_device_info.csv, output_fn_device_info.csv and predict_fn_device_info.csv are the same, right?

Eg:
input_fn_device_info.csv

pid=0 device_id=0
pid=1 device_id=1

output_fn_device_info.csv

pid=1 device_id=1
pid=0 device_id=0

in above, the behavior is correct but test will fail.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also, we want to check device_id consistency across all four functions, namely model_fn, input_fn, output_fn and predict_fn

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Acknowledged. Fixed in latest commit.

@sachanub sachanub force-pushed the add_new_tests branch 3 times, most recently from dfc96bc to 79e49d2 Compare October 12, 2023 18:44
chen3933
chen3933 previously approved these changes Oct 12, 2023

traced_resnet18_model_dir = os.path.join(mme_path, traced_resnet18_sub_dir)
traced_resnet18_script = os.path.join(traced_resnet18_model_dir, code_sub_dir, "inference.py")
traced_resnet18_tar = file_utils.make_tarfile(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: it seems that we're repeating the process of making the tarfile for each model. Can we loop through this process to make the file cleaner?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Acknowledged. Updated in latest commit.

def container(image_uri, use_gpu):
try:
gpu_option = "--gpus device=0" if use_gpu else ""
resnet18_path = os.path.join(mme_path, 'resnet18')
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The fixture is using resnet18_path as default - would be good to make the model path name generic or parametrize it.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Acknowledged. Updated in latest commit.

try:
import transformers
except ImportError:
raise ImportError("The 'transformers' module was not found.'")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Curious why this explicit try-catch is necessary here.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Acknowledged. Removed try-catch block. If transformers is not present, the error will show up anyways. Added comment above import statement to explain the reason for importing transformers i.e. to check that dependencies in requirements.txt get installed.

@sachanub sachanub merged commit bae1816 into aws:master Oct 16, 2023
@sachanub sachanub deleted the add_new_tests branch October 20, 2023 18:08
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants