Skip to content

AutoMLStep Does Not Support Constant-Valued problem_type #3908

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
mbbourgo opened this issue Jun 6, 2023 · 8 comments · Fixed by #4303
Closed

AutoMLStep Does Not Support Constant-Valued problem_type #3908

mbbourgo opened this issue Jun 6, 2023 · 8 comments · Fixed by #4303
Assignees
Labels
component: auto-ml Relates to SageMaker AutoML type: bug

Comments

@mbbourgo
Copy link

mbbourgo commented Jun 6, 2023

Describe the bug
When using AutoMLStep, specifying the problem type results in an error upon pipeline upsert.

To reproduce
Run the attached file, automlstep-test.py.txt, as a Python script.

Expected behavior
The named SageMaker Pipeline is updated..

Screenshots or logs
If applicable, add screenshots or logs to help explain your problem.

System information
A description of your system. Please provide:

  • SageMaker Python SDK version: 2.159.0
  • Framework name (eg. PyTorch) or algorithm (eg. KMeans): SageMaker Pipeline
  • Framework version:
  • Python version: SageMaker Studio Python 3 (Data Science) kernel
  • CPU or GPU: CPU
  • Custom Docker image (Y/N): N

Additional context
Add any other context about the problem here.
automlstep-test.py.txt

@mufaddal-rohawala
Copy link
Member

@mbbourgo can you please provide additional logs/error tracelog for further debug? What exact error is recieved in this case?

@mufaddal-rohawala mufaddal-rohawala added the component: pipelines Relates to the SageMaker Pipeline Platform label Jun 6, 2023
@mbbourgo
Copy link
Author

mbbourgo commented Jun 7, 2023

@mufaddal-rohawala
The base error message is
ClientError: An error occurred (ValidationException) when calling the CreatePipeline operation: Unable to parse pipeline definition. Expecting start of Json Object.
The complete error message is
`---------------------------------------------------------------------------
ClientError Traceback (most recent call last)
in
41 )
42
---> 43 pipeline.upsert(role_arn=execution_role)

/opt/conda/lib/python3.7/site-packages/sagemaker/workflow/pipeline.py in upsert(self, role_arn, description, tags, parallelism_config)
280 error_message = ce.response["Error"]["Message"]
281 if not (error_code == "ValidationException" and "already exists" in error_message):
--> 282 raise ce
283 # already exists
284 response = self.update(role_arn, description)

/opt/conda/lib/python3.7/site-packages/sagemaker/workflow/pipeline.py in upsert(self, role_arn, description, tags, parallelism_config)
275 raise ValueError("An AWS IAM role is required to create or update a Pipeline.")
276 try:
--> 277 response = self.create(role_arn, description, tags, parallelism_config)
278 except ClientError as ce:
279 error_code = ce.response["Error"]["Code"]

/opt/conda/lib/python3.7/site-packages/sagemaker/workflow/pipeline.py in create(self, role_arn, description, tags, parallelism_config)
149 Tags=tags,
150 )
--> 151 return self.sagemaker_session.sagemaker_client.create_pipeline(**kwargs)
152
153 def _create_args(

/opt/conda/lib/python3.7/site-packages/botocore/client.py in _api_call(self, *args, **kwargs)
528 )
529 # The "self" in this scope is referring to the BaseClient.
--> 530 return self._make_api_call(operation_name, kwargs)
531
532 _api_call.name = str(py_operation_name)

/opt/conda/lib/python3.7/site-packages/botocore/client.py in _make_api_call(self, operation_name, api_params)
962 error_code = parsed_response.get("Error", {}).get("Code")
963 error_class = self.exceptions.from_code(error_code)
--> 964 raise error_class(parsed_response, operation_name)
965 else:
966 return parsed_response

ClientError: An error occurred (ValidationException) when calling the CreatePipeline operation: Unable to parse pipeline definition. Expecting start of Json Object.`

@nmadan
Copy link
Member

nmadan commented Jun 12, 2023

Hi, what's the pipeline definition?
pipeline.definition()

@mbbourgo
Copy link
Author

@nmadan Here it is:
{
"Version": "2020-12-01",
"Metadata": {},
"Parameters": [],
"PipelineExperimentConfig": {
"ExperimentName": {"Get": "Execution.PipelineName"},
"TrialName": {"Get": "Execution.PipelineExecutionId"}
},
"Steps": [{
"Name": "AutoMLTrainingStep",
"Type": "AutoML",
"Arguments": {
"InputDataConfig": [{
"DataSource": {
"S3DataSource": {
"S3DataType": "S3Prefix",
"S3Uri": "s3://sagemaker-us-east-1-XXXXXXXXXXXX/loan-data/data.csv"
}},
"TargetAttributeName": "target",
"ChannelType": "training"
}],
"OutputDataConfig": {
"S3OutputPath": "s3://sagemaker-us-east-1-XXXXXXXXXXXX/"
},
"AutoMLJobConfig": {
"CompletionCriteria": {"MaxAutoMLJobRuntimeInSeconds": 3600},
"SecurityConfig": {"EnableInterContainerTrafficEncryption": false},
"Mode": "ENSEMBLING"},
"RoleArn": "arn:aws:iam::XXXXXXXXXXXX:role/service-role/AmazonSageMaker-ExecutionRole-20201208T160993",
"AutoMLJobObjective": "F1",
"ProblemType": "BinaryClassification"
}}]
}

@jerrypeng7773
Copy link
Contributor

If we look at the pipeline definition, the json struct of AutoMLJobObjective is wrong. Instead, it should,

"AutoMLJobObjective": {
        "MetricName": "F1"
      },

https://docs.aws.amazon.com/sagemaker/latest/APIReference/API_AutoMLJobObjective.html

@jerrypeng7773
Copy link
Contributor

should be an easy fix here:

auto_ml_job_request["AutoMLJobObjective"] = job_objective

@jerrypeng7773
Copy link
Contributor

@mbbourgo When you create the autoML instance, for the job_objective, what value did you input? Can you put

{"MetricName": "F1"}

and retry?

@jerrypeng7773
Copy link
Contributor

jerrypeng7773 commented Jun 30, 2023

I don't think this is a bug from Pipeline. The above solution might unblock @mbbourgo right away. However, I did find the shape of job_objective is inconsistently used in AutoML package.

In session.py, the job_objective is treated as an dict here like {"MetricName": "F1"}. Whereas in the attach class method here, MetricName is extracted and its value is assigned to job_objective, so job_objective="F1".

Here is the suggested fix for AutoML team.

  1. Remove the extraction here.
  2. Add a check on the user given job_objective, make sure its a dict of shape {"MetricName": "F1"}
  3. Or, alternatively, we can create a python class abstracting this {"MetricName": "F1"}

@jerrypeng7773 jerrypeng7773 added component: auto-ml Relates to SageMaker AutoML and removed component: pipelines Relates to the SageMaker Pipeline Platform labels Jun 30, 2023
@martinRenou martinRenou self-assigned this Dec 4, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
component: auto-ml Relates to SageMaker AutoML type: bug
Projects
None yet
Development

Successfully merging a pull request may close this issue.

5 participants