FrameworkProcessor is broken with SageMaker Pipelines #2656

dgallitelli · 2021-09-23T15:24:48Z

Describe the bug
Trying to use any Processor derived from FrameworkProcessor is bugged with SageMaker Pipelines. There is a problem with the command and entrypoint parameter, where command does not pass python3, causing the following error:

line 2: import: command not found

To reproduce

Create a FrameworkProcessor (i.e. PyTorchProcessor, TensorFlowProcessor)
Create a ProcessingStep and a Pipeline
Execute it
See it fail

Expected behavior
The pipeline should go through.

Screenshots or logs

Screenshot from Pipelines:

Logs from CloudWatch:

/opt/ml/processing/input/entrypoint/inference_with_processing.py: line 2: import: command not found
/opt/ml/processing/input/entrypoint/inference_with_processing.py: line 3: import: command not found
/opt/ml/processing/input/entrypoint/inference_with_processing.py: line 4: import: command not found
/opt/ml/processing/input/entrypoint/inference_with_processing.py: line 5: import: command not found
/opt/ml/processing/input/entrypoint/inference_with_processing.py: line 6: from: command not found

System information
A description of your system. Please provide:

SageMaker Python SDK version: 2.57.0
Framework name (eg. PyTorch) or algorithm (eg. KMeans): Every Framework
Framework version: Every version supported by SM
Python version: 3.8
CPU or GPU: CPU and GPU
Custom Docker image (Y/N): N

Additional context
N/A

The text was updated successfully, but these errors were encountered:

athewsey · 2021-09-23T15:32:27Z

Thanks for raising this @dgallitelli

As discussed offline & detailed further on the linked PR, the integration between FrameworkProcessor and ProcessingStep is currently broken: The error you see is caused by the framework processor incorrectly trying to treat your processing script (Python) as the framework bootstrap script (shell).

We're actively working on a solution, but some possible things I could suggest to try in the interim if you need:

If your job is a single script and you don't need requirements.txt dependencies, maybe can try using the ScriptProcessor instead and explicitly passing in the PyTorch/TensorFlow container URI?
I wonder if adding a shebang comment something like #!/usr/bin/python3 (not sure whether this is the actual installed location of Python in these containers) at the very top of your script file could persuade bash to run the script through the Python interpreter instead?

dgallitelli · 2021-09-23T15:57:23Z

UPDATE 1: Adding shebang does not currently force ProcessingStep into using Python3 in the command.

dgallitelli · 2021-09-23T16:15:34Z

UPDATE 2: ScriptProcessor does work, however there is no support for source_dir parameter (as commented above by @athewsey ). If you need custom dependencies or for multi-files script, create your own custom container by extending the SM images for TF/PyTorch/HuggingFace/MXNet.

For those who need some sort of directions on how to change from FrameworkProcessor to ScriptProcessor, here is an example for TF2.3:

##### COMMENT THE TENSORFLOWPROCESSOR
 
# from sagemaker.tensorflow import TensorFlowProcessor
# tp = TensorFlowProcessor(
#     framework_version='2.3',
#     role = get_execution_role(),
#     instance_count=1,
#     instance_type='ml.m5.large',
#     base_job_name='DSM-TF-Demo-Process',
#     py_version='py37'
# )
 
 
##### AND REPLACE WITH
 
from sagemaker.image_uris import retrieve
from sagemaker.processing import ScriptProcessor
from sagemaker import get_execution_role
 
image_uri = retrieve(
    framework='tensorflow', 
    region='eu-west-1', 
    version='2.3', 
    py_version='py37', 
    image_scope='training',
    instance_type='ml.m5.xlarge'
)
sp = ScriptProcessor(
    role=get_execution_role(),
    image_uri=image_uri,
    command=['python3'],
    instance_count=1,
    instance_type='ml.m5.xlarge'
)
# Now, either run sp.run() or create a sagemaker.workflow.steps.ProcessingStep() , as needed

A very short example of a Dockerfile to extend the default TF container and install dependencies (not tested yet):

FROM 763104351884.dkr.ecr.eu-west-1.amazonaws.com/tensorflow-training:2.3-cpu-py37
COPY requirements.txt /opt/ml/processing/input/code/requirements.txt
RUN pip install -r /opt/ml/processing/input/code/requirements.txt

khalidbourhaba · 2022-03-15T14:32:20Z

Any updates on this ?

darkreapyre · 2022-03-16T15:55:45Z

Is there an ETA on this fix?

AakankshCTR · 2022-04-22T13:17:40Z

Is this issue fixed?

dgallitelli · 2022-05-10T07:44:47Z

This is still not fixed as of today (May 10th 2022).

morfaer · 2022-05-16T14:22:00Z

Any update on this issue? Facing the same problem.

dgallitelli · 2022-05-16T15:16:39Z

Still the case for now.

However, there is now a possibility to use the new sagemaker.workflow.pipeline_context.PipelineSession to have the .run() generate the arguments without actually running the Processing job. Tried in a Jupyter Notebook with a custom FrameworkProcessor, but should work with any FrameworkProcessor. Your code would look like:

from sagemaker.sklearn import SKLearn, SKLearnProcessor
from sagemaker.processing import FrameworkProcessor  # or change with any other FrameworkProcessor like HuggingFaceProcessor
from sagemaker.workflow.pipeline_context import PipelineSession

session = PipelineSession()

skpv2 = FrameworkProcessor(
    estimator_cls=SKLearn,
    framework_version='0.23-1',
    role = get_execution_role(),
    instance_count=1,
    instance_type='ml.m5.large',
    sagemaker_session = session
)

step_args = skpv2.run(
    code='processing.py',
    source_dir="code", # add processing.py and requirements.txt here
    inputs=[...], outputs=[...]
)

from sagemaker.workflow.pipeline import Pipeline
from sagemaker.workflow.steps import ProcessingStep

processing_step = ProcessingStep(
    name="MyProcessingStep",
    step_args=step_args
)

# [ define the other steps if any ]

pipeline = Pipeline(steps=[...])

Just make sure to update the SageMaker Python SDK to the latest version :)

jerrypeng7773 · 2022-05-16T15:56:54Z

Thanks @dgallitelli

We would encourage users to adopt this new way to construct TrainingStep, ProcessingStep, TransformStep, TuningStep, and ModelStep.

We have a readthedocs about to releasing to introduce all the improvements we made to the SageMaker pythonSDK Pipeline module.

mohamed-ali · 2022-05-19T10:23:40Z

The FrameworkProcessor has a method called get_run_args (doc here) that is designed to help integrate this processor to the ProcessingStep, which can be put within a SageMaker pipeline. If you want to add pip dependencies, you can add a requirements.txt file under BASE_DIR.

Here is a simplified code that helps to connect the dots between: FrameworkProcessor, get_run_args, ProcessingStep and Pipeline.

from sagemaker.processing import (
    ProcessingInput,
    ProcessingOutput,
    FrameworkProcessor
)

from sagemaker.workflow.functions import Join
from sagemaker.workflow.pipeline import Pipeline
from sagemaker.workflow.steps import (
    ProcessingStep
)

from sagemaker.tensorflow import TensorFlow

BASE_DIR = os.path.dirname(os.path.realpath(__file__))

preprocessing_processor = FrameworkProcessor(
    estimator_cls=TensorFlow,
    framework_version='2.4.3',
    role=role,
    instance_count=1,
    instance_type="ml.m5.xlarge",
    py_version='py37',
    command=["python3"],
    base_job_name="some-preprocessing-step"
)

train_data_in_s3 = ProcessingOutput(
    source="/opt/ml/processing/output/train/",
    destination=Join(
        on="/",
        values=[
            "s3:/",
            data_s3_bucket,
            os.environ["SAGEMAKER_PROJECT_NAME"],
            data_s3_key,
            'train/'
        ],
    ),
    output_name='train',
    s3_upload_mode='Continuous',
)

test_data_in_s3 = ProcessingOutput(
    source="/opt/ml/processing/output/test/",
    destination=Join(
        on="/",
        values=[
            "s3:/",
            data_s3_bucket,
            os.environ["SAGEMAKER_PROJECT_NAME"],
            data_s3_key,
            'test/'
        ],
    ),
    output_name='test',
    s3_upload_mode='Continuous',
)

data_s3_key_in_project = Join(
    on="/",
    values=[
        os.environ["SAGEMAKER_PROJECT_NAME"],
        data_s3_key
    ],
)

preprocessing_run_args = preprocessing_processor.get_run_args(
    code="preprocess.py",
    source_dir=BASE_DIR,
    inputs=[],
    outputs=[train_data_in_s3, test_data_in_s3],
    arguments=[
        '--data-s3-bucket', "your bucket name",
        '--data-s3-key', "your key"
    ]
)

preprocessing_step = ProcessingStep(
    name="your-preprocessing-step-name",
    processor=preprocessing_processor,
    inputs=preprocessing_run_args.inputs,
    outputs=preprocessing_run_args.outputs,
    job_arguments=preprocessing_run_args.arguments,
    code=preprocessing_run_args.code
)

pipeline_name = "your-pipeline-name"

distributed_ml_training_pipeline = Pipeline(
    name=pipeline_name,
    parameters=[
        # your pipeline parameters here
    ],
    steps=[preprocessing_step, ...]
)

If you are using this inside a SageMaker Studio MLOps Project, make sure to declare your requirements.txt inside a MANIFEST.in file to be shipped with the library: https://packaging.python.org/en/latest/guides/using-manifest-in/.

marianokamp · 2022-05-21T11:19:42Z

Is there any running example for a ProcessingStep with PyTorch that allows source_dir?

morfaer · 2022-06-07T12:42:52Z

Thanks @dgallitelli

We would encourage users to adopt this new way to construct TrainingStep, ProcessingStep, TransformStep, TuningStep, and ModelStep.

We have a readthedocs about to releasing to introduce all the improvements we made to the SageMaker pythonSDK Pipeline module.

Any update when this updated readthedocs will be released?

jerrypeng7773 · 2022-06-07T16:08:55Z

@morfaer here we go https://sagemaker.readthedocs.io/en/stable/amazon_sagemaker_model_building_pipeline.html

morfaer · 2022-06-09T14:28:42Z

I tried the ModelStep example from docs here: https://sagemaker.readthedocs.io/en/stable/amazon_sagemaker_model_building_pipeline.html#model-step

Assuming that the Model object in this example is sagemaker.model.Model, the register method returns a ModelPackage | None type but the step_args argument in the ModelStep call expects an object of type _ModelStepArguments. So type-wise this example looks fishy to me.

Also this example creates the following error for me

  File "/home/.../run_pipeline.py", line 261, in model_registration_step
    register_model_step_args = model.register(
  File "/home/.../.venv/lib/python3.8/site-packages/sagemaker/workflow/pipeline_context.py", line 209, in wrapper
    return run_func(*args, **kwargs)
  File "/home/.../.venv/lib/python3.8/site-packages/sagemaker/model.py", line 373, in register
    model_package = self.sagemaker_session.create_model_package_from_containers(
AttributeError: 'NoneType' object has no attribute 'create_model_package_from_containers'

So it seems like this example does not work.

I'm using version 2.94.0 of the Sagemaker SDK from a local PC (not Sagemaker notebook) to start the Pipeline. Any ideas how this is supposed to work?

jerrypeng7773 · 2022-06-09T17:19:40Z

I tried the ModelStep example from docs here: https://sagemaker.readthedocs.io/en/stable/amazon_sagemaker_model_building_pipeline.html#model-step

Assuming that the Model object in this example is sagemaker.model.Model, the register method returns a ModelPackage | None type but the step_args argument in the ModelStep call expects an object of type _ModelStepArguments. So type-wise this example looks fishy to me.

Also this example creates the following error for me
  File "/home/.../run_pipeline.py", line 261, in model_registration_step
    register_model_step_args = model.register(
  File "/home/.../.venv/lib/python3.8/site-packages/sagemaker/workflow/pipeline_context.py", line 209, in wrapper
    return run_func(*args, **kwargs)
  File "/home/.../.venv/lib/python3.8/site-packages/sagemaker/model.py", line 373, in register
    model_package = self.sagemaker_session.create_model_package_from_containers(
AttributeError: 'NoneType' object has no attribute 'create_model_package_from_containers'
So it seems like this example does not work.

I'm using version 2.94.0 of the Sagemaker SDK from a local PC (not Sagemaker notebook) to start the Pipeline. Any ideas how this is supposed to work?

can you please confirm sagemaker.workflow.pipeline_context.PipelineSession() is used? more specifically,

from sagemaker.workflow.pipeline_context import PipelineSession
pipeline_session = PipelineSession()
....
model = Model(
    image_uri=pytorch_estimator.training_image_uri(),
    model_data=step_train.properties.ModelArtifacts.S3ModelArtifacts,
    **sagemaker_session=pipeline_session**,
    role=role,
)
register_args = model.register(...)

mstfldmr · 2022-06-09T18:34:29Z

If you are using this inside a SageMaker Studio MLOps Project, make sure to declare your requirements.txt inside a MANIFEST.in file to be shipped with the library: https://packaging.python.org/en/latest/guides/using-manifest-in/.

Should MANIFEST.in be located in source_dir or somewhere else?

mohamed-ali · 2022-06-10T07:56:55Z

If you are using this inside a SageMaker Studio MLOps Project, make sure to declare your requirements.txt inside a MANIFEST.in file to be shipped with the library: https://packaging.python.org/en/latest/guides/using-manifest-in/.

Should MANIFEST.in be located in source_dir or somewhere else?

It should be at the same level as the setup.py.

morfaer · 2022-06-10T09:34:40Z

@jerrypeng7773 The additional pipeline_session parameter worked for. Thanks. What's weird though is that the ModelStep appends a RegisterModel suffix to my pipeline step in Sagemaker Studio.

I guess the reason is the _append_register_model_step method in model_step.py in the workflow module.

The other step types don't seem to add such a suffix so this looks inconsistent to me. Is there a specific reason to do this for ModelStep?

qidewenwhen · 2022-07-15T01:57:17Z

Hi @morfaer, both ModelStep and RegisterModel are StepCollection. StepCollection is not an actual step/graph node that shows up in the pipeline graph, instead it is a collection of steps/graph nodes.
Both ModelStep and RegisterModel can contain the following list of steps:

_RepackModelStep (optional): this step is a TrainingStep under the hood, which do model repacking in execution time, if some specific conditions are met. The model repacking is to repack model artifacts with custom inference entry points and generate the new repacked model artifacts for registry.
_RegisterModelStep: it is the step which is doing the model registry in the execution.

What's weird though is that the ModelStep appends a RegisterModel suffix to my pipeline step

Compared with the RegisterModel, ModelStep explicitly appends the -RegisterModel suffix, so that its sub step names look like:

"-RepackModel-" for _RepackModelStep
"-RegisterModel" for _RegisterModelStep

This can give users a clear hint that what each sub step is doing.

In addition, we recently pushed this PR: #3240 to apply this naming convention of ModelStep to RegisterModel as well because RegisterModel's previous naming manner can cause some issues.

pjbhaumik · 2022-07-21T19:30:34Z

Still the case for now.

However, there is now a possibility to use the new sagemaker.workflow.pipeline_context.PipelineSession to have the .run() generate the arguments without actually running the Processing job. Tried in a Jupyter Notebook with a custom FrameworkProcessor, but should work with any FrameworkProcessor. Your code would look like:

from sagemaker.sklearn import SKLearn, SKLearnProcessor
from sagemaker.processing import FrameworkProcessor  # or change with any other FrameworkProcessor like HuggingFaceProcessor
from sagemaker.workflow.pipeline_context import PipelineSession

session = PipelineSession()

skpv2 = FrameworkProcessor(
    estimator_cls=SKLearn,
    framework_version='0.23-1',
    role = get_execution_role(),
    instance_count=1,
    instance_type='ml.m5.large',
    sagemaker_session = session
)

step_args = skpv2.run(
    code='processing.py',
    source_dir="code", # add processing.py and requirements.txt here
    inputs=[...], outputs=[...]
)

from sagemaker.workflow.pipeline import Pipeline
from sagemaker.workflow.steps import ProcessingStep

processing_step = ProcessingStep(
    name="MyProcessingStep",
    step_args=step_args
)

# [ define the other steps if any ]

pipeline = Pipeline(steps=[...])

Just make sure to update the SageMaker Python SDK to the latest version :)

This method works... it creates a processor instance which the ProcessingStep accepts in the step_args parameter.

The method proposed by mohamed-ali did not work for me... it creates a list of arguments for the job-arguments parameter.

It may be worth noting here that each method works on a different parameter. When using the step_args parameter, you cannot use the processor argument... because the processor instance supplies the processor for the ProcessingStep.

When using the job-arguments parameter, there were no conflicts with the processor parameter, but the job still failed citing the missing 'code' without any reference to the missing 'source_dir'... this means it may have solved the source directory issue, but there was an issue retrieving the .py file from it.

From this experience, it appears that using the step-args method via the PipelineSession is a good idea.

MetelStairs · 2022-10-10T12:27:37Z

Is there any update on this issue, has it been fixed? Currently running into the same problem

pjbhaumik · 2022-10-10T13:00:47Z

Is there any update on this issue, has it been fixed? Currently running into the same problem

Yes, using a PipelineSession() will work... follow the example from dgallitelli posted 5/16.

rlamba89 · 2024-07-25T13:10:05Z

I am using PipelineSession(), but getting below error:

ValueError: No file named "processing.py" was found in directory "code".

from sagemaker.sklearn import SKLearn, SKLearnProcessor
from sagemaker.processing import FrameworkProcessor  # or change with any other FrameworkProcessor like HuggingFaceProcessor
from sagemaker.workflow.pipeline_context import PipelineSession

session = PipelineSession()

skpv2 = FrameworkProcessor(
    estimator_cls=SKLearn,
    framework_version='0.23-1',
    role = role,
    instance_count=1,
    instance_type='ml.m5.xlarge',
    sagemaker_session = session
)

step_args = skpv2.run(
    code='processing.py',
    source_dir="code", # add processing.py and requirements.txt here
    outputs=[
        sagemaker.processing.ProcessingOutput(
            output_name="train", source=f"{processing_dir}/output/train"
        ),
        sagemaker.processing.ProcessingOutput(
            output_name="validation", source=f"{processing_dir}/output/validation"
        ),
        sagemaker.processing.ProcessingOutput(
            output_name="test", source=f"{processing_dir}/output/test"
        ),
    ],
)



from sagemaker.workflow.pipeline import Pipeline
from sagemaker.workflow.steps import ProcessingStep

step_process = ProcessingStep(name="ModelPreProcess", step_args=step_args)

athewsey mentioned this issue Sep 23, 2021

Restore SKLearn FrameworkProcessor via _normalize_args #2633

Closed

7 tasks

athewsey mentioned this issue Sep 24, 2021

Fix FrameworkProcessor for SageMaker Pipelines #2664

Closed

9 tasks

ahsan-z-khan added the type: bug label Sep 30, 2021

xchen909 added the component: processing Relates to the SageMaker Processing Platform label Sep 30, 2021

calvin0112 mentioned this issue Feb 7, 2022

SageMaker processing step not finding /opt/ml/processing/input/code/ #2909

Closed

qidewenwhen added the component: pipelines Relates to the SageMaker Pipeline Platform label Jun 8, 2022

jerrypeng7773 closed this as completed Dec 19, 2022

athewsey mentioned this issue Jul 28, 2023

TensorFlowProcessor tries to run python script using /bin/bash as its entrypoint #4028

Closed

solanki-ravi mentioned this issue Jul 25, 2024

HuggingFaceProcessor with ProcessingStep results in import errors (similar to issues/2656) #4802

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

FrameworkProcessor is broken with SageMaker Pipelines #2656

FrameworkProcessor is broken with SageMaker Pipelines #2656

dgallitelli commented Sep 23, 2021

athewsey commented Sep 23, 2021 •

edited

Loading

dgallitelli commented Sep 23, 2021

dgallitelli commented Sep 23, 2021 •

edited

Loading

khalidbourhaba commented Mar 15, 2022

darkreapyre commented Mar 16, 2022

AakankshCTR commented Apr 22, 2022

dgallitelli commented May 10, 2022

morfaer commented May 16, 2022

dgallitelli commented May 16, 2022 •

edited

Loading

jerrypeng7773 commented May 16, 2022

mohamed-ali commented May 19, 2022 •

edited

Loading

marianokamp commented May 21, 2022

morfaer commented Jun 7, 2022

jerrypeng7773 commented Jun 7, 2022

morfaer commented Jun 9, 2022 •

edited

Loading

jerrypeng7773 commented Jun 9, 2022 •

edited

Loading

mstfldmr commented Jun 9, 2022 •

edited

Loading

mohamed-ali commented Jun 10, 2022

morfaer commented Jun 10, 2022

qidewenwhen commented Jul 15, 2022 •

edited

Loading

pjbhaumik commented Jul 21, 2022

MetelStairs commented Oct 10, 2022

pjbhaumik commented Oct 10, 2022

rlamba89 commented Jul 25, 2024

FrameworkProcessor is broken with SageMaker Pipelines #2656

FrameworkProcessor is broken with SageMaker Pipelines #2656

Comments

dgallitelli commented Sep 23, 2021

athewsey commented Sep 23, 2021 • edited Loading

dgallitelli commented Sep 23, 2021

dgallitelli commented Sep 23, 2021 • edited Loading

khalidbourhaba commented Mar 15, 2022

darkreapyre commented Mar 16, 2022

AakankshCTR commented Apr 22, 2022

dgallitelli commented May 10, 2022

morfaer commented May 16, 2022

dgallitelli commented May 16, 2022 • edited Loading

jerrypeng7773 commented May 16, 2022

mohamed-ali commented May 19, 2022 • edited Loading

marianokamp commented May 21, 2022

morfaer commented Jun 7, 2022

jerrypeng7773 commented Jun 7, 2022

morfaer commented Jun 9, 2022 • edited Loading

jerrypeng7773 commented Jun 9, 2022 • edited Loading

mstfldmr commented Jun 9, 2022 • edited Loading

mohamed-ali commented Jun 10, 2022

morfaer commented Jun 10, 2022

qidewenwhen commented Jul 15, 2022 • edited Loading

pjbhaumik commented Jul 21, 2022

MetelStairs commented Oct 10, 2022

pjbhaumik commented Oct 10, 2022

rlamba89 commented Jul 25, 2024

athewsey commented Sep 23, 2021 •

edited

Loading

dgallitelli commented Sep 23, 2021 •

edited

Loading

dgallitelli commented May 16, 2022 •

edited

Loading

mohamed-ali commented May 19, 2022 •

edited

Loading

morfaer commented Jun 9, 2022 •

edited

Loading

jerrypeng7773 commented Jun 9, 2022 •

edited

Loading

mstfldmr commented Jun 9, 2022 •

edited

Loading

qidewenwhen commented Jul 15, 2022 •

edited

Loading