Add new unit and integration tests #155

sachanub · 2023-10-10T03:08:46Z

Issue #, if available:

Description of changes:

The objective of this PR is to add new unit and integration tests. Specifically, we include the following:

Unit test for ts_environment.py: test/unit/test_ts_environment.py
- Added unit test to create a mock TorchServe environment and check the values of the environment variables.
SageMaker integration tests for MNIST inductor: test/integration/sagemaker/test_mnist_inductor.py
- test/resources/mnist/model_inductor:
  - Included model files from DLC repo i.e. code/mnist.py and torch_model.pth
- test/integration/__init__.py:
  - Added logic to define variables for MNIST inductor script and MNIST inductor model artifact.
- test/integration/sagemaker/test_mnist_inductor.py:
  - Added CPU test to create and test endpoint with ml.c5.9xlarge instance type.
  - Added GPU test to create and test endpoints with ml.p3.2xlarge, ml.g4dn.4xlarge and ml.g5.4xlarge instance types.
SageMaker integration tests for multi-model endpoint (MME): test/integration/sagemaker/test_multi_model_endpoint_sagemaker.py
- test/resources/mme:
  - Copied ResNet-18 model artifact from test/resources/resnet18/default_model.
  - Copied traced ResNet-18 model artifact from test/resources/resnet18/default_traced_resnet.
- test/integration/__init__.py:
  - Add logic to create model artifacts for the MME tests.
- test/integration/sagemaker/test_multi_model_endpoint_sagemaker.py:
  - Added test to check only ResNet-18 model is available on the multi-model endpoint.
  - Added test to check only traced ResNet-18 model is available on the multi-model endpoint.
  - Added test to check both models are available on the multi-model endpoint.
  - Added test to no models are available on the multi-model endpoint.
  - Added test to invoke both the ResNet-18 and traced ResNet-18 models and verify output length.
Checks for verifying that dependencies from requirements.txt are installed:
- requirements.txt file: This file has the transformers dependency.
  - test/resources/mnist/model_cpu/code/requirements.txt.
- Added logic to try and import transformers in the following script:
  - test/resources/mnist/model_cpu/code/mnist.py.
- test/integration/__init__.py:
  - Added logic to include requirements.txt in MNIST tar file.
- test/utils/file_utils.py:
  - Added logic to include requirements.txt in tar file creation.

UPDATE:

Refactored test/integration/__init__.py to obtain model information from test/integration/all_models_info.json and iteratively create attributes for the model directory, model script name and model tar file.

Added the following local integration tests:

Local integration test for multi-model endpoint:
- test/integration/sagemaker/test_multi_model_endpoint_local.py:
  - Added test to send a successful ping.
  - Added test to check that list of models is empty.
  - Added test to check that the ResNet-18 and traced ResNet-18 models are successfully loaded.
  - Added test to check that the ResNet-18 model is successfully unloaded.
  - Added test to assert failure on trying to unload a model which has already been unloaded i.e. ResNet-18.
  - Added test to assert failure on trying to load a model which has already been loaded i.e. traced ResNet-18.
  - Added test to invoke both the ResNet-18 and traced ResNet-18 models and verify output length.
Local integration test to check that all GPU IDs are returned on multi-GPU host:
- test/integration/__init__.py:
  - Define directory path for model_gpu_context.
- test/resources/model_gpu_context/code/inference.py:
  - Create custom model_fn, input_fn, predict_fn and output_fn to dynamically get all the GPU IDs from the context object and their corresponding PIDs and thread IDs and write them in different CSV files. Return dummy values from the functions.
- test/integration/local/test_model_fn_context.py:
  - Added test to check all GPU IDs are returned by the context object in the model_fn, input_fn, predict_fn and output_fn functions by reading from the CSV files created in test/resources/model_gpu_context/code/inference.py.
  - Added test to check that for a given device ID, PID and thread ID are same for the model_fn, input_fn, predict_fn, output_fn functions.

By submitting this pull request, I confirm that you can use, modify, copy, and redistribute this contribution, under the terms of your choice.

chen3933 · 2023-10-10T17:41:17Z

test/unit/test_ts_environment.py

@@ -0,0 +1,39 @@
+# Copyright 2019-2020 Amazon.com, Inc. or its affiliates. All Rights Reserved.


You probably want to cover is_env_set as well : https://github.com/aws/sagemaker-pytorch-inference-toolkit/blob/master/src/sagemaker_pytorch_serving_container/ts_environment.py#L58-L64

Added test for is_env_set() in test_ts_environment.py.

chen3933 · 2023-10-10T18:20:28Z

I notice we don't have local integration test. Is there any use case we want to cover in local integration test?

chen3933 · 2023-10-10T18:24:00Z

I didn't see user script overwrites input_fn or output_fn. Do we want to add those tests?

sachanub · 2023-10-11T20:10:02Z

I notice we don't have local integration test. Is there any use case we want to cover in local integration test?

Added the following local integration tests:

MME tests.
Test to check all GPU device IDs are dynamically returned by the context object in model_fn.

sachanub · 2023-10-11T20:11:05Z

I didn't see user script overwrites input_fn or output_fn. Do we want to add those tests?

We define custom input_fn and output_fn for this test: https://github.com/aws/sagemaker-pytorch-inference-toolkit/blob/master/test/integration/local/test_serving.py#L87. This script defines custom input_fn and output_fn functions: https://github.com/aws/sagemaker-pytorch-inference-toolkit/blob/master/test/resources/mnist/model_cpu/code/call_model_fn_once.py

chen3933 · 2023-10-11T22:22:50Z

test/resources/model_gpu_context/code/inference.py

+    device = torch.device("cuda:" + str(context.system_properties.get("gpu_id")))
+    device_str = str(device)[-1]
+    with open(file_path, "a") as file:
+        file.write(device_str + "\n")


We can add both pid-thread-id and device_id. With the identifier and we can verify device_id are consistent when calling input_fn etc.

Acknowledged. Fixed in latest commit.

chen3933 · 2023-10-12T16:21:35Z

test/integration/local/test_model_context.py

+        input_fn_device_info, output_fn_device_info, predict_fn_device_info
+    ):
+
+        device_id_input_fn, pid_input_fn, threadid_input_fn = input_fn_row


We cannot guarantee the sequence of input_fn_device_info.csv, output_fn_device_info.csv and predict_fn_device_info.csv are the same, right?

Eg:
input_fn_device_info.csv

pid=0 device_id=0 pid=1 device_id=1

output_fn_device_info.csv

pid=1 device_id=1 pid=0 device_id=0

in above, the behavior is correct but test will fail.

Also, we want to check device_id consistency across all four functions, namely model_fn, input_fn, output_fn and predict_fn

Acknowledged. Fixed in latest commit.

nikhil-sk · 2023-10-13T18:44:05Z

test/integration/__init__.py

+
+traced_resnet18_model_dir = os.path.join(mme_path, traced_resnet18_sub_dir)
+traced_resnet18_script = os.path.join(traced_resnet18_model_dir, code_sub_dir, "inference.py")
+traced_resnet18_tar = file_utils.make_tarfile(


Nit: it seems that we're repeating the process of making the tarfile for each model. Can we loop through this process to make the file cleaner?

Acknowledged. Updated in latest commit.

nikhil-sk · 2023-10-13T18:49:46Z

test/integration/local/test_multi_model_endpoint_local.py

+def container(image_uri, use_gpu):
+    try:
+        gpu_option = "--gpus device=0" if use_gpu else ""
+        resnet18_path = os.path.join(mme_path, 'resnet18')


The fixture is using resnet18_path as default - would be good to make the model path name generic or parametrize it.

Acknowledged. Updated in latest commit.

nikhil-sk · 2023-10-13T19:01:09Z

test/resources/mnist/model_cpu/code/mnist.py

+try:
+    import transformers
+except ImportError:
+    raise ImportError("The 'transformers' module was not found.'")


Curious why this explicit try-catch is necessary here.

Acknowledged. Removed try-catch block. If transformers is not present, the error will show up anyways. Added comment above import statement to explain the reason for importing transformers i.e. to check that dependencies in requirements.txt get installed.

chen3933 reviewed Oct 10, 2023

View reviewed changes

sachanub force-pushed the add_new_tests branch from ee0b329 to 25f594d Compare October 11, 2023 11:02

sachanub force-pushed the add_new_tests branch from 48d5d98 to a424fc9 Compare October 12, 2023 02:49

chen3933 reviewed Oct 12, 2023

View reviewed changes

sachanub force-pushed the add_new_tests branch 3 times, most recently from dfc96bc to 79e49d2 Compare October 12, 2023 18:44

chen3933 previously approved these changes Oct 12, 2023

View reviewed changes

nikhil-sk reviewed Oct 13, 2023

View reviewed changes

sachanub dismissed chen3933’s stale review via 1aa7032 October 13, 2023 23:07

sachanub force-pushed the add_new_tests branch from 17eb1fd to 1aa7032 Compare October 13, 2023 23:07

Add new unit and integration tests

d0b94e3

sachanub force-pushed the add_new_tests branch from bbd5c0e to d0b94e3 Compare October 14, 2023 07:35

Merge branch 'master' into add_new_tests

824fd61

nikhil-sk approved these changes Oct 16, 2023

View reviewed changes

chen3933 approved these changes Oct 16, 2023

View reviewed changes

sachanub merged commit bae1816 into aws:master Oct 16, 2023

sachanub deleted the add_new_tests branch October 20, 2023 18:08

		@@ -0,0 +1,39 @@
		# Copyright 2019-2020 Amazon.com, Inc. or its affiliates. All Rights Reserved.

Add new unit and integration tests #155

Add new unit and integration tests #155

Uh oh!

Conversation

sachanub commented Oct 10, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

chen3933 commented Oct 10, 2023

Uh oh!

chen3933 commented Oct 10, 2023

Uh oh!

sachanub commented Oct 11, 2023

Uh oh!

sachanub commented Oct 11, 2023

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

sachanub commented Oct 10, 2023 •

edited

Loading