Skip to content

Passing ParameterString for Processor arguments fails in Pipeline definition #3323

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
amitpeshwani opened this issue Aug 22, 2022 · 8 comments
Labels
component: pipelines Relates to the SageMaker Pipeline Platform type: bug

Comments

@amitpeshwani
Copy link

Describe the bug
Passing a ParameterString parameter in the arguments for Processor throws an error when defining it as part of a pipeline.

Getting TypeError: Object of type ParameterString is not JSON serializable error

To reproduce

train_or_infer = ParameterString(name="TrainOrInfer", default_value="infer")

script_processor = SKLearnProcessor(framework_version='0.20.0',
                                instance_count=1,
                                instance_type=processing_instance_type,
                                role=role, 
                                sagemaker_session=sagemaker_session)

processor_step_args = script_processor.run(
    inputs=[
        ProcessingInput(source="s3://path", destination="/opt/ml/processing/payload"),
    ],
    outputs=[
        ProcessingOutput(output_name="payload_train", source="/opt/ml/processing/processed_payload_train"),
        ProcessingOutput(output_name="payload_infer", source="/opt/ml/processing/processed_payload_infer")
    ],
    code=os.path.join("localpath", "processing.py"),
    arguments=[
        '--train_or_infer', train_or_infer, 
    ]
)   

Expected behavior
The pipeline definition printed out

Screenshots or logs
If applicable, add screenshots or logs to help explain your problem.

System information
A description of your system. Please provide:

  • SageMaker Python SDK version: 2.105.0
  • Framework name (eg. PyTorch) or algorithm (eg. KMeans): SKLearnProcessor
  • Framework version: 0.20.0
  • Python version: 3.8
  • CPU or GPU: CPU
  • Custom Docker image (Y/N): N

Additional context

  • I'm passing LocalPipelineSession as sagemaker_session. from sagemaker.workflow.pipeline_context import LocalPipelineSession
  • Removing the pipelineparameter from arguments, creates the pipeline definition successfully
@george-chenvibes
Copy link

george-chenvibes commented Aug 23, 2022

Seems similar to my problem. I would also add the label: "component pipelines". Have you tried just passing in a string rather than a ParameterString?

@qidewenwhen qidewenwhen added the component: pipelines Relates to the SageMaker Pipeline Platform label Aug 24, 2022
@qidewenwhen
Copy link
Member

qidewenwhen commented Sep 2, 2022

Hi @amitpeshwani , thanks for using SageMaker!

I've tried your code snippet with both LocalPipelineSession and PipelineSession under v2.105.0 but I'm not able to reproduce the issue. Both cases passed on my side and the parameterized arguments worked.

To help us for further investigation, could you please provide information below:

  1. The code snippet on how you define the session. On my side, I defined the session as:
    local_session = LocalPipelineSession()
    pipeline_session = PipelineSession()


   script_processor = SKLearnProcessor(
                                framework_version='0.20.0',
                                instance_count=1,
                                instance_type=processing_instance_type,
                                role=role, 
                                sagemaker_session=local_session, # Or pipeline_session
   )
  1. Could you provide us the entire error log trace?

@qidewenwhen
Copy link
Member

I tried one more time to test with the normal session:

from sagemaker import Session
sagemaker_session = Session()

   script_processor = SKLearnProcessor(
                                framework_version='0.20.0',
                                instance_count=1,
                                instance_type=processing_instance_type,
                                role=role, 
                                sagemaker_session=sagemaker_session, # <<<<<<<<<<<<
   )

This time I can reproduce the error you've seen, see below.

.tox/py39/lib/python3.9/site-packages/sagemaker/workflow/pipeline_context.py:248: in wrapper
    return run_func(*args, **kwargs)
.tox/py39/lib/python3.9/site-packages/sagemaker/processing.py:569: in run
    self.latest_job = ProcessingJob.start_new(
.tox/py39/lib/python3.9/site-packages/sagemaker/processing.py:793: in start_new
    processor.sagemaker_session.process(**process_args)
.tox/py39/lib/python3.9/site-packages/sagemaker/session.py:943: in process
    self._intercept_create_request(process_request, submit, self.process.__name__)
.tox/py39/lib/python3.9/site-packages/sagemaker/session.py:4304: in _intercept_create_request
    return create(request)
.tox/py39/lib/python3.9/site-packages/sagemaker/session.py:940: in submit
    LOGGER.debug("process request: %s", json.dumps(request, indent=4))
/usr/local/Cellar/[email protected]/3.9.6/Frameworks/Python.framework/Versions/3.9/lib/python3.9/json/__init__.py:234: in dumps
    return cls(
/usr/local/Cellar/[email protected]/3.9.6/Frameworks/Python.framework/Versions/3.9/lib/python3.9/json/encoder.py:201: in encode
    chunks = list(chunks)
/usr/local/Cellar/[email protected]/3.9.6/Frameworks/Python.framework/Versions/3.9/lib/python3.9/json/encoder.py:431: in _iterencode
    yield from _iterencode_dict(o, _current_indent_level)
/usr/local/Cellar/[email protected]/3.9.6/Frameworks/Python.framework/Versions/3.9/lib/python3.9/json/encoder.py:405: in _iterencode_dict
    yield from chunks
/usr/local/Cellar/[email protected]/3.9.6/Frameworks/Python.framework/Versions/3.9/lib/python3.9/json/encoder.py:405: in _iterencode_dict
    yield from chunks
/usr/local/Cellar/[email protected]/3.9.6/Frameworks/Python.framework/Versions/3.9/lib/python3.9/json/encoder.py:405: in _iterencode_dict
    yield from chunks
/usr/local/Cellar/[email protected]/3.9.6/Frameworks/Python.framework/Versions/3.9/lib/python3.9/json/encoder.py:438: in _iterencode
    o = _default(o)
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _

self = <json.encoder.JSONEncoder object at 0x129d27910>, o = ParameterString(name='MyPI', parameter_type=<ParameterTypeEnum.STRING: 'String'>, default_value=None)

    def default(self, o):
...
>       raise TypeError(f'Object of type {o.__class__.__name__} '
                        f'is not JSON serializable')
E       TypeError: Object of type ParameterString is not JSON serializable

@qidewenwhen
Copy link
Member

Note: only PipelineSession or LocalPipelineSession supports to generate the step arguments: processor_step_args = script_processor.run(...) as they prevent the .run from creating a processing job in sdk compile time and return the request body as step arguments for the pipeline.

Please replace the sagemaker_session below to be local pipeline session and let us know if it works for you

   local_session = LocalPipelineSession()
   script_processor = SKLearnProcessor(
                                framework_version='0.20.0',
                                instance_count=1,
                                instance_type=processing_instance_type,
                                role=role, 
                                sagemaker_session=local_session, # <<<<<<<<<<<<
   )

@amitpeshwani
Copy link
Author

Hey @qidewenwhen ,

Thanks for looking into the issue. As specified in the Additional context section, I'm passing LocalPipelineSession as sagemaker_session from sagemaker.workflow.pipeline_context import LocalPipelineSession

Issue is occurring due to passing Pipeline ParameterString parameter (train_or_infer) in arguments of script_processor.run() .

@amitpeshwani
Copy link
Author

amitpeshwani commented Sep 2, 2022

Folllowing is the code snippet:

boto_session = boto3.Session(region_name="eu-west-1", profile_name='dev')

local_pipeline_session = LocalPipelineSession(
    boto_session=boto_session
)

train_or_infer = ParameterString(name="TrainOrInfer", default_value='infer')

script_processor = SKLearnProcessor(
    framework_version='0.20.0',
    instance_count=1,
    instance_type=processing_instance_type,  
    role=role, 
    sagemaker_session=local_pipeline_session

)

processor_step_args = script_processor.run(
    inputs=[
        ProcessingInput(source="s3://path", destination="/opt/ml/processing/payload"),
    ],
    outputs=[
        ProcessingOutput(output_name="payload_train", source="/opt/ml/processing/processed_payload_train"),
        ProcessingOutput(output_name="payload_infer", source="/opt/ml/processing/processed_payload_infer")
    ],
    code=os.path.join("/localpath", "processing.py"),
    arguments=[
        '--train_or_infer', train_or_infer, # <<<<<<<<<<< Pipeline Parameter
    ]
)

step_process = ProcessingStep(
    name="PayloadGenerator",
    step_args=processor_step_args
)

Error Message:

Pipeline step 'PayloadGenerator' FAILED. Failure message is: TypeError: Object of type ParameterString is not JSON serializable
Pipeline execution dd309799-85bb-4ea3-b7f2-735e5c6eb0be FAILED because step 'PayloadGenerator' failed.

@qidewenwhen
Copy link
Member

qidewenwhen commented Sep 2, 2022

Thanks for the details! Seems the error was raised during Pipeline local mode execution time rather than the compile time. I can reproduce the issue when starting an execution.

The issue may relate to these code lines:

elif isinstance(v, list):
list_copy = []
for item in v:
list_copy.append(self._parse_arguments(item, step_name))
obj_copy[k] = list_copy
elif isinstance(v, PipelineVariable):
obj_copy[k] = self.evaluate_pipeline_variable(v, step_name)
return obj_copy
return obj

  • As v - ContainerArguments is a list, it recursively invoke _parse_arguments and pass the parameterized train_or_infer into the function
  • However, as train_or_infer is not a dict, it is returned directly.

Engaging the feature owner to look into it

@qidewenwhen
Copy link
Member

The fix has been merged in v2.109.0 a month ago.
Closing this issue. Feel free to reopen if you have any questions.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
component: pipelines Relates to the SageMaker Pipeline Platform type: bug
Projects
None yet
Development

No branches or pull requests

3 participants