Skip to content

Exception with ParameterString in PySparkProcessor.run() Method #3425

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
dipanjank opened this issue Oct 19, 2022 · 9 comments
Open

Exception with ParameterString in PySparkProcessor.run() Method #3425

dipanjank opened this issue Oct 19, 2022 · 9 comments
Assignees
Labels
component: processing Relates to the SageMaker Processing Platform PySpark type: bug

Comments

@dipanjank
Copy link
Contributor

Describe the bug
If I use a ParameterString or any other PipelineVariable object in the list passed to the arguments argument in PySparkProcessor.run method, I get a TypeError (TypeError: Object of type ParameterString is not JSON serializable).

According to the documentation, arguments can be a list of PipelineVariables, so expecting this to work. Is this not supported?

To reproduce
A clear, step-by-step set of instructions to reproduce the bug.

    spark_processor = PySparkProcessor(
        base_job_name="sagemaker-spark",
        framework_version="3.1",
        role=role,
        instance_count=2,
        instance_type="ml.m5.xlarge",
        sagemaker_session=sagemaker_session,
        max_runtime_in_seconds=1200,
    )

    spark_processor.run(
        submit_app="spark_processing/preprocess.py",
        arguments=[
            "--s3_input_bucket",
            ParameterString(name="s3-input-bucket", default_value=bucket),
            "--s3_input_key_prefix",
            input_prefix_abalone,
            "--s3_output_bucket",
            bucket,
            "--s3_output_key_prefix",
            input_preprocessed_prefix_abalone,
        ],
    )

Expected behavior
A clear and concise description of what you expected to happen.

Expect a SageMaker ProcessingJob to be created.

Screenshots or logs
If applicable, add screenshots or logs to help explain your problem.

Traceback (most recent call last):
  File "/Users/[email protected]/PycharmProjects/sagemaker-sdk-test/run_pyspark_processor.py", line 63, in <module>
    run_sagemaker_spark_job(
  File "/Users/[email protected]/PycharmProjects/sagemaker-sdk-test/run_pyspark_processor.py", line 37, in run_sagemaker_spark_job
    spark_processor.run(
  File "/Users/[email protected]/PycharmProjects/sagemaker-sdk-test/venv/lib/python3.9/site-packages/sagemaker/spark/processing.py", line 902, in run
    return super().run(
  File "/Users/[email protected]/PycharmProjects/sagemaker-sdk-test/venv/lib/python3.9/site-packages/sagemaker/spark/processing.py", line 265, in run
    return super().run(
  File "/Users/[email protected]/PycharmProjects/sagemaker-sdk-test/venv/lib/python3.9/site-packages/sagemaker/workflow/pipeline_context.py", line 248, in wrapper
    return run_func(*args, **kwargs)
  File "/Users/[email protected]/PycharmProjects/sagemaker-sdk-test/venv/lib/python3.9/site-packages/sagemaker/processing.py", line 572, in run
    self.latest_job = ProcessingJob.start_new(
  File "/Users/[email protected]/PycharmProjects/sagemaker-sdk-test/venv/lib/python3.9/site-packages/sagemaker/processing.py", line 796, in start_new
    processor.sagemaker_session.process(**process_args)
  File "/Users/[email protected]/PycharmProjects/sagemaker-sdk-test/venv/lib/python3.9/site-packages/sagemaker/session.py", line 956, in process
    self._intercept_create_request(process_request, submit, self.process.__name__)
  File "/Users/[email protected]/PycharmProjects/sagemaker-sdk-test/venv/lib/python3.9/site-packages/sagemaker/session.py", line 4317, in _intercept_create_request
    return create(request)
  File "/Users/[email protected]/PycharmProjects/sagemaker-sdk-test/venv/lib/python3.9/site-packages/sagemaker/session.py", line 953, in submit
    LOGGER.debug("process request: %s", json.dumps(request, indent=4))
  File "/Users/[email protected]/opt/anaconda3/lib/python3.9/json/__init__.py", line 234, in dumps
    return cls(
  File "/Users/[email protected]/opt/anaconda3/lib/python3.9/json/encoder.py", line 201, in encode
    chunks = list(chunks)
  File "/Users/[email protected]/opt/anaconda3/lib/python3.9/json/encoder.py", line 431, in _iterencode
    yield from _iterencode_dict(o, _current_indent_level)
  File "/Users/[email protected]/opt/anaconda3/lib/python3.9/json/encoder.py", line 405, in _iterencode_dict
    yield from chunks
  File "/Users/[email protected]/opt/anaconda3/lib/python3.9/json/encoder.py", line 405, in _iterencode_dict
    yield from chunks
  File "/Users/[email protected]/opt/anaconda3/lib/python3.9/json/encoder.py", line 325, in _iterencode_list
    yield from chunks
  File "/Users/[email protected]/opt/anaconda3/lib/python3.9/json/encoder.py", line 438, in _iterencode
    o = _default(o)
  File "/Users/[email protected]/opt/anaconda3/lib/python3.9/json/encoder.py", line 179, in default
    raise TypeError(f'Object of type {o.__class__.__name__} '
TypeError: Object of type ParameterString is not JSON serializable

System information
A description of your system. Please provide:

  • SageMaker Python SDK version: 2.112.2
  • Framework name (eg. PyTorch) or algorithm (eg. KMeans): PySpark
  • Framework version: 3.1
  • Python version: default
  • CPU or GPU: CPU
  • Custom Docker image (Y/N): N

Additional context
Add any other context about the problem here.

@martinRenou martinRenou added PySpark component: processing Relates to the SageMaker Processing Platform labels Oct 6, 2023
@martinRenou martinRenou self-assigned this Nov 17, 2023
@OwenAshton
Copy link

Any update on this issue? Getting the same problem when using any ScriptProcessor.

Only work around is to go back to a loaded ProcessingStep() which has now been marked as deprecated.

@DavidRooney
Copy link

Hi @martinRenou, This is causing some pretty big issues for us at the moment. Do you have any helpful updates on this please?

@martinRenou
Copy link
Collaborator

I'm not working with the Sagemaker team at the moment, you may have better luck pinging people who work on this code-base these days.

@DavidRooney
Copy link

Thanks for getting back. I tagged you as it says you are assigned to it? Can you assign to someone on the team? There's 425 contributors so any help knowing who to link to this would be greatly appreciated. The best I can think of is to ping people who have done recent commits 🤷

@martinRenou
Copy link
Collaborator

Friendly ping @knikure

@DavidRooney
Copy link

DavidRooney commented Feb 22, 2024

Any response at all? We would really like to continue using sagemaker but working around this issue is taking it's tole. @knikure

@DavidRooney
Copy link

@martinRenou Is there anyone else to friendly ping on this? knikure unassigned 👎

@rsareddy0329
Copy link
Contributor

Closing this as the Exception is fixed for PipelineVariables here: #5122

@rsareddy0329
Copy link
Contributor

Reopening this issue as the change is reverted #5134 now.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
component: processing Relates to the SageMaker Processing Platform PySpark type: bug
Projects
None yet
Development

No branches or pull requests

5 participants