Skip to content

Server reruns same task multiple times #133

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
kurtgdl opened this issue Dec 4, 2024 · 0 comments
Open

Server reruns same task multiple times #133

kurtgdl opened this issue Dec 4, 2024 · 0 comments

Comments

@kurtgdl
Copy link

kurtgdl commented Dec 4, 2024

I used

deploy = HuggingFaceModel(
  name=model_name,
  role=role,
  code_location="abc",
  model_data=path_to_s3,
  transformers_version="4.37",  
  pytorch_version="2.1",       
  py_version='py310',
  model_server_workers=1,
)
emb = deploy.deploy(
  endpoint_name=model_name,
  initial_instance_count=1,
  instance_type="ml.c5.4xlarge",
  container_startup_health_check_timeout=300,
)

The custom script was

def model_fn(model_dir):
    processor = DataProcess() # A class that contains logic for processing each file.
    return processor

def predict_fn(data, model):
    text = model.process_file(data)
    return {"output": text}

The input data is a base64 string of a file content.
It's strange that when the file is pretty small, under 1MB, the server runs model_fn and predict_fn once, and the process took around 30 seconds. But when I inputted large file of around 1.5MB, it runs model_fn and predict_fn multiple times, each time lasting around 2mins. I know this because the same request gives multiple contents of

 [INFO ] W-model-1-stdout com.amazonaws.ml.mms.wlm.WorkerLifeCycle - Preprocess time - 5.128383636474609 ms
 [INFO ] W-model-1-stdout com.amazonaws.ml.mms.wlm.WorkerLifeCycle - Predict time - 162199.17178153992 ms
 [INFO ] W-model-1-stdout com.amazonaws.ml.mms.wlm.WorkerLifeCycle - Postprocess time - 0.00762939453125 ms

It's probably unorthodox to use the server for the data processing job. But what configs did I miss?

Related: aws/amazon-sagemaker-examples#1073

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant