Skip to content

ValueError: instance_type should not be a pipeline variable in SKLearnProcessor #3201

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
zerualem opened this issue Jun 28, 2022 · 3 comments
Labels
component: pipelines Relates to the SageMaker Pipeline Platform type: bug

Comments

@zerualem
Copy link

zerualem commented Jun 28, 2022

Describe the bug
The sagemaker.sklearn.processing SKLearnProcessor object throws a value error when sagemaker.workflow.parameters.ParameterString is passed as instance_type.
I have been running the exact same script, and I never had an issue previously.

To reproduce

from sagemaker.workflow.parameters import (
    ParameterInteger,
    ParameterString,
    ParameterFloat
)
from sagemaker.sklearn.processing import SKLearnProcessor

processing_instance_count = ParameterInteger(name="ProcessingInstanceCount", default_value=1)
processing_instance_type = ParameterString(name="ProcessingInstanceType", default_value="ml.t3.large")
framework_version = "0.23-1"

sklearn_processor = SKLearnProcessor(
    framework_version=framework_version,
    instance_type=processing_instance_type,
    instance_count=processing_instance_count,
    base_job_name="sk_preprocess",
    role=role,
)

Screenshots or logs

---------------------------------------------------------------------------
ValueError                                Traceback (most recent call last)
<ipython-input-4-9898fc9aebc0> in <module>
      8     instance_count=processing_instance_count,
      9     base_job_name="sk_preprocess",
---> 10     role=role,
     11 )

/opt/conda/lib/python3.7/site-packages/sagemaker/sklearn/processing.py in __init__(self, framework_version, role, instance_type, instance_count, command, volume_size_in_gb, volume_kms_key, output_kms_key, max_runtime_in_seconds, base_job_name, sagemaker_session, env, tags, network_config)
     89 
     90         image_uri = image_uris.retrieve(
---> 91             defaults.SKLEARN_NAME, region, version=framework_version, instance_type=instance_type
     92         )
     93 

/opt/conda/lib/python3.7/site-packages/sagemaker/image_uris.py in retrieve(framework, region, version, py_version, instance_type, accelerator_type, image_scope, container_version, distribution, base_framework_version, training_compiler_config, model_id, model_version, tolerate_vulnerable_model, tolerate_deprecated_model, sdk_version, inference_tool, serverless_inference_config)
    115     for name, val in args.items():
    116         if is_pipeline_variable(val):
--> 117             raise ValueError("%s should not be a pipeline variable (%s)" % (name, type(val)))
    118 
    119     if is_jumpstart_model_input(model_id, model_version):

ValueError: instance_type should not be a pipeline variable (<class 'sagemaker.workflow.parameters.ParameterString'>)

System information
A description of your system. Please provide:

  • SageMaker Python SDK version: 2.94.0
  • I working on a SageMaker studio notebook
@bobbywlindsey
Copy link
Contributor

@zerualem I had this problem too but there was a new release a few min ago that fixed it (2.97.0).

@navaj0 navaj0 added the component: pipelines Relates to the SageMaker Pipeline Platform label Jun 29, 2022
@zerualem
Copy link
Author

@bobbywlindsey thanks for the suggestion. After upgrading to SageMaker 2.97 and now instead of throwing an error, I get a warning.

WARNING:root:instance_type should not be a pipeline variable (<class 'sagemaker.workflow.parameters.ParameterString'>). The default_value of this Parameter object will be used to override it. Please remove this pipeline variable and use python primitives instead.

@qidewenwhen
Copy link
Member

Hi @zerualem and @bobbywlindsey, sorry for the confusing warning message. I've opened a PR (see above) to improve the warning message.

FYI: the warning you're seeing is thrown when retrieving the image_uri via instance_type.

  • If we do not pass in an image_uri to the SKLearnProcessor, the default value of instance_type (a plain string) is used to retrieve image_uri for processor/estimator. As for this part (retrieving the image_uri based on instance_type), it’s not able to make it parameterized unless a user directly passes in the image_uri as a ParameterString
  • On the other hand, the instance_type of a processor/estimator can be parameterized e.g. giving a ParameterString. This behavior still works.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
component: pipelines Relates to the SageMaker Pipeline Platform type: bug
Projects
None yet
Development

No branches or pull requests

4 participants