Skip to content

FrameworkProcessor is broken with SageMaker Pipelines #2656

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
dgallitelli opened this issue Sep 23, 2021 · 24 comments
Closed

FrameworkProcessor is broken with SageMaker Pipelines #2656

dgallitelli opened this issue Sep 23, 2021 · 24 comments
Labels
component: pipelines Relates to the SageMaker Pipeline Platform component: processing Relates to the SageMaker Processing Platform type: bug

Comments

@dgallitelli
Copy link

Describe the bug
Trying to use any Processor derived from FrameworkProcessor is bugged with SageMaker Pipelines. There is a problem with the command and entrypoint parameter, where command does not pass python3, causing the following error:

line 2: import: command not found

To reproduce

  1. Create a FrameworkProcessor (i.e. PyTorchProcessor, TensorFlowProcessor)
  2. Create a ProcessingStep and a Pipeline
  3. Execute it
  4. See it fail

Expected behavior
The pipeline should go through.

Screenshots or logs

Screenshot from Pipelines:
image

Logs from CloudWatch:

/opt/ml/processing/input/entrypoint/inference_with_processing.py: line 2: import: command not found
/opt/ml/processing/input/entrypoint/inference_with_processing.py: line 3: import: command not found
/opt/ml/processing/input/entrypoint/inference_with_processing.py: line 4: import: command not found
/opt/ml/processing/input/entrypoint/inference_with_processing.py: line 5: import: command not found
/opt/ml/processing/input/entrypoint/inference_with_processing.py: line 6: from: command not found

System information
A description of your system. Please provide:

  • SageMaker Python SDK version: 2.57.0
  • Framework name (eg. PyTorch) or algorithm (eg. KMeans): Every Framework
  • Framework version: Every version supported by SM
  • Python version: 3.8
  • CPU or GPU: CPU and GPU
  • Custom Docker image (Y/N): N

Additional context
N/A

@athewsey
Copy link
Collaborator

athewsey commented Sep 23, 2021

Thanks for raising this @dgallitelli

As discussed offline & detailed further on the linked PR, the integration between FrameworkProcessor and ProcessingStep is currently broken: The error you see is caused by the framework processor incorrectly trying to treat your processing script (Python) as the framework bootstrap script (shell).

We're actively working on a solution, but some possible things I could suggest to try in the interim if you need:

  • If your job is a single script and you don't need requirements.txt dependencies, maybe can try using the ScriptProcessor instead and explicitly passing in the PyTorch/TensorFlow container URI?
  • I wonder if adding a shebang comment something like #!/usr/bin/python3 (not sure whether this is the actual installed location of Python in these containers) at the very top of your script file could persuade bash to run the script through the Python interpreter instead?

@dgallitelli
Copy link
Author

UPDATE 1: Adding shebang does not currently force ProcessingStep into using Python3 in the command.

@dgallitelli
Copy link
Author

dgallitelli commented Sep 23, 2021

UPDATE 2: ScriptProcessor does work, however there is no support for source_dir parameter (as commented above by @athewsey ). If you need custom dependencies or for multi-files script, create your own custom container by extending the SM images for TF/PyTorch/HuggingFace/MXNet.

For those who need some sort of directions on how to change from FrameworkProcessor to ScriptProcessor, here is an example for TF2.3:

##### COMMENT THE TENSORFLOWPROCESSOR
 
# from sagemaker.tensorflow import TensorFlowProcessor
# tp = TensorFlowProcessor(
#     framework_version='2.3',
#     role = get_execution_role(),
#     instance_count=1,
#     instance_type='ml.m5.large',
#     base_job_name='DSM-TF-Demo-Process',
#     py_version='py37'
# )
 
 
##### AND REPLACE WITH
 
from sagemaker.image_uris import retrieve
from sagemaker.processing import ScriptProcessor
from sagemaker import get_execution_role
 
image_uri = retrieve(
    framework='tensorflow', 
    region='eu-west-1', 
    version='2.3', 
    py_version='py37', 
    image_scope='training',
    instance_type='ml.m5.xlarge'
)
sp = ScriptProcessor(
    role=get_execution_role(),
    image_uri=image_uri,
    command=['python3'],
    instance_count=1,
    instance_type='ml.m5.xlarge'
)
# Now, either run sp.run() or create a sagemaker.workflow.steps.ProcessingStep() , as needed

A very short example of a Dockerfile to extend the default TF container and install dependencies (not tested yet):

FROM 763104351884.dkr.ecr.eu-west-1.amazonaws.com/tensorflow-training:2.3-cpu-py37
COPY requirements.txt /opt/ml/processing/input/code/requirements.txt
RUN pip install -r /opt/ml/processing/input/code/requirements.txt

@khalidbourhaba
Copy link

Any updates on this ?

@darkreapyre
Copy link

Is there an ETA on this fix?

@AakankshCTR
Copy link

Is this issue fixed?

@dgallitelli
Copy link
Author

This is still not fixed as of today (May 10th 2022).

@morfaer
Copy link

morfaer commented May 16, 2022

Any update on this issue? Facing the same problem.

@dgallitelli
Copy link
Author

dgallitelli commented May 16, 2022

Still the case for now.

However, there is now a possibility to use the new sagemaker.workflow.pipeline_context.PipelineSession to have the .run() generate the arguments without actually running the Processing job. Tried in a Jupyter Notebook with a custom FrameworkProcessor, but should work with any FrameworkProcessor. Your code would look like:

from sagemaker.sklearn import SKLearn, SKLearnProcessor
from sagemaker.processing import FrameworkProcessor  # or change with any other FrameworkProcessor like HuggingFaceProcessor
from sagemaker.workflow.pipeline_context import PipelineSession

session = PipelineSession()

skpv2 = FrameworkProcessor(
    estimator_cls=SKLearn,
    framework_version='0.23-1',
    role = get_execution_role(),
    instance_count=1,
    instance_type='ml.m5.large',
    sagemaker_session = session
)

step_args = skpv2.run(
    code='processing.py',
    source_dir="code", # add processing.py and requirements.txt here
    inputs=[...], outputs=[...]
)

from sagemaker.workflow.pipeline import Pipeline
from sagemaker.workflow.steps import ProcessingStep

processing_step = ProcessingStep(
    name="MyProcessingStep",
    step_args=step_args
)

# [ define the other steps if any ]

pipeline = Pipeline(steps=[...])

Just make sure to update the SageMaker Python SDK to the latest version :)

@jerrypeng7773
Copy link
Contributor

Thanks @dgallitelli

We would encourage users to adopt this new way to construct TrainingStep, ProcessingStep, TransformStep, TuningStep, and ModelStep.

We have a readthedocs about to releasing to introduce all the improvements we made to the SageMaker pythonSDK Pipeline module.

@mohamed-ali
Copy link
Contributor

mohamed-ali commented May 19, 2022

The FrameworkProcessor has a method called get_run_args (doc here) that is designed to help integrate this processor to the ProcessingStep, which can be put within a SageMaker pipeline. If you want to add pip dependencies, you can add a requirements.txt file under BASE_DIR.

Here is a simplified code that helps to connect the dots between: FrameworkProcessor, get_run_args, ProcessingStep and Pipeline.

from sagemaker.processing import (
    ProcessingInput,
    ProcessingOutput,
    FrameworkProcessor
)

from sagemaker.workflow.functions import Join
from sagemaker.workflow.pipeline import Pipeline
from sagemaker.workflow.steps import (
    ProcessingStep
)

from sagemaker.tensorflow import TensorFlow

BASE_DIR = os.path.dirname(os.path.realpath(__file__))

preprocessing_processor = FrameworkProcessor(
    estimator_cls=TensorFlow,
    framework_version='2.4.3',
    role=role,
    instance_count=1,
    instance_type="ml.m5.xlarge",
    py_version='py37',
    command=["python3"],
    base_job_name="some-preprocessing-step"
)

train_data_in_s3 = ProcessingOutput(
    source="/opt/ml/processing/output/train/",
    destination=Join(
        on="/",
        values=[
            "s3:/",
            data_s3_bucket,
            os.environ["SAGEMAKER_PROJECT_NAME"],
            data_s3_key,
            'train/'
        ],
    ),
    output_name='train',
    s3_upload_mode='Continuous',
)

test_data_in_s3 = ProcessingOutput(
    source="/opt/ml/processing/output/test/",
    destination=Join(
        on="/",
        values=[
            "s3:/",
            data_s3_bucket,
            os.environ["SAGEMAKER_PROJECT_NAME"],
            data_s3_key,
            'test/'
        ],
    ),
    output_name='test',
    s3_upload_mode='Continuous',
)

data_s3_key_in_project = Join(
    on="/",
    values=[
        os.environ["SAGEMAKER_PROJECT_NAME"],
        data_s3_key
    ],
)

preprocessing_run_args = preprocessing_processor.get_run_args(
    code="preprocess.py",
    source_dir=BASE_DIR,
    inputs=[],
    outputs=[train_data_in_s3, test_data_in_s3],
    arguments=[
        '--data-s3-bucket', "your bucket name",
        '--data-s3-key', "your key"
    ]
)

preprocessing_step = ProcessingStep(
    name="your-preprocessing-step-name",
    processor=preprocessing_processor,
    inputs=preprocessing_run_args.inputs,
    outputs=preprocessing_run_args.outputs,
    job_arguments=preprocessing_run_args.arguments,
    code=preprocessing_run_args.code
)

pipeline_name = "your-pipeline-name"

distributed_ml_training_pipeline = Pipeline(
    name=pipeline_name,
    parameters=[
        # your pipeline parameters here
    ],
    steps=[preprocessing_step, ...]
)

If you are using this inside a SageMaker Studio MLOps Project, make sure to declare your requirements.txt inside a MANIFEST.in file to be shipped with the library: https://packaging.python.org/en/latest/guides/using-manifest-in/.

@marianokamp
Copy link
Collaborator

Is there any running example for a ProcessingStep with PyTorch that allows source_dir?

@morfaer
Copy link

morfaer commented Jun 7, 2022

Thanks @dgallitelli

We would encourage users to adopt this new way to construct TrainingStep, ProcessingStep, TransformStep, TuningStep, and ModelStep.

We have a readthedocs about to releasing to introduce all the improvements we made to the SageMaker pythonSDK Pipeline module.

Any update when this updated readthedocs will be released?

@jerrypeng7773
Copy link
Contributor

@qidewenwhen qidewenwhen added the component: pipelines Relates to the SageMaker Pipeline Platform label Jun 8, 2022
@morfaer
Copy link

morfaer commented Jun 9, 2022

I tried the ModelStep example from docs here: https://sagemaker.readthedocs.io/en/stable/amazon_sagemaker_model_building_pipeline.html#model-step

Assuming that the Model object in this example is sagemaker.model.Model, the register method returns a ModelPackage | None type but the step_args argument in the ModelStep call expects an object of type _ModelStepArguments. So type-wise this example looks fishy to me.

Also this example creates the following error for me

  File "/home/.../run_pipeline.py", line 261, in model_registration_step
    register_model_step_args = model.register(
  File "/home/.../.venv/lib/python3.8/site-packages/sagemaker/workflow/pipeline_context.py", line 209, in wrapper
    return run_func(*args, **kwargs)
  File "/home/.../.venv/lib/python3.8/site-packages/sagemaker/model.py", line 373, in register
    model_package = self.sagemaker_session.create_model_package_from_containers(
AttributeError: 'NoneType' object has no attribute 'create_model_package_from_containers'

So it seems like this example does not work.

I'm using version 2.94.0 of the Sagemaker SDK from a local PC (not Sagemaker notebook) to start the Pipeline. Any ideas how this is supposed to work?

@jerrypeng7773
Copy link
Contributor

jerrypeng7773 commented Jun 9, 2022

I tried the ModelStep example from docs here: https://sagemaker.readthedocs.io/en/stable/amazon_sagemaker_model_building_pipeline.html#model-step

Assuming that the Model object in this example is sagemaker.model.Model, the register method returns a ModelPackage | None type but the step_args argument in the ModelStep call expects an object of type _ModelStepArguments. So type-wise this example looks fishy to me.

Also this example creates the following error for me

  File "/home/.../run_pipeline.py", line 261, in model_registration_step
    register_model_step_args = model.register(
  File "/home/.../.venv/lib/python3.8/site-packages/sagemaker/workflow/pipeline_context.py", line 209, in wrapper
    return run_func(*args, **kwargs)
  File "/home/.../.venv/lib/python3.8/site-packages/sagemaker/model.py", line 373, in register
    model_package = self.sagemaker_session.create_model_package_from_containers(
AttributeError: 'NoneType' object has no attribute 'create_model_package_from_containers'

So it seems like this example does not work.

I'm using version 2.94.0 of the Sagemaker SDK from a local PC (not Sagemaker notebook) to start the Pipeline. Any ideas how this is supposed to work?

can you please confirm sagemaker.workflow.pipeline_context.PipelineSession() is used? more specifically,

from sagemaker.workflow.pipeline_context import PipelineSession
pipeline_session = PipelineSession()
....
model = Model(
    image_uri=pytorch_estimator.training_image_uri(),
    model_data=step_train.properties.ModelArtifacts.S3ModelArtifacts,
    **sagemaker_session=pipeline_session**,
    role=role,
)
register_args = model.register(...)

@mstfldmr
Copy link

mstfldmr commented Jun 9, 2022

If you are using this inside a SageMaker Studio MLOps Project, make sure to declare your requirements.txt inside a MANIFEST.in file to be shipped with the library: https://packaging.python.org/en/latest/guides/using-manifest-in/.

Should MANIFEST.in be located in source_dir or somewhere else?

@mohamed-ali
Copy link
Contributor

If you are using this inside a SageMaker Studio MLOps Project, make sure to declare your requirements.txt inside a MANIFEST.in file to be shipped with the library: https://packaging.python.org/en/latest/guides/using-manifest-in/.

Should MANIFEST.in be located in source_dir or somewhere else?

It should be at the same level as the setup.py.

@morfaer
Copy link

morfaer commented Jun 10, 2022

@jerrypeng7773 The additional pipeline_session parameter worked for. Thanks. What's weird though is that the ModelStep appends a RegisterModel suffix to my pipeline step in Sagemaker Studio.

I guess the reason is the _append_register_model_step method in model_step.py in the workflow module.

The other step types don't seem to add such a suffix so this looks inconsistent to me. Is there a specific reason to do this for ModelStep?

@qidewenwhen
Copy link
Member

qidewenwhen commented Jul 15, 2022

Hi @morfaer, both ModelStep and RegisterModel are StepCollection. StepCollection is not an actual step/graph node that shows up in the pipeline graph, instead it is a collection of steps/graph nodes.
Both ModelStep and RegisterModel can contain the following list of steps:

  • _RepackModelStep (optional): this step is a TrainingStep under the hood, which do model repacking in execution time, if some specific conditions are met. The model repacking is to repack model artifacts with custom inference entry points and generate the new repacked model artifacts for registry.
  • _RegisterModelStep: it is the step which is doing the model registry in the execution.

What's weird though is that the ModelStep appends a RegisterModel suffix to my pipeline step

Compared with the RegisterModel, ModelStep explicitly appends the -RegisterModel suffix, so that its sub step names look like:

  • "-RepackModel-" for _RepackModelStep
  • "-RegisterModel" for _RegisterModelStep

This can give users a clear hint that what each sub step is doing.

In addition, we recently pushed this PR: #3240 to apply this naming convention of ModelStep to RegisterModel as well because RegisterModel's previous naming manner can cause some issues.

@pjbhaumik
Copy link

Still the case for now.

However, there is now a possibility to use the new sagemaker.workflow.pipeline_context.PipelineSession to have the .run() generate the arguments without actually running the Processing job. Tried in a Jupyter Notebook with a custom FrameworkProcessor, but should work with any FrameworkProcessor. Your code would look like:

from sagemaker.sklearn import SKLearn, SKLearnProcessor
from sagemaker.processing import FrameworkProcessor  # or change with any other FrameworkProcessor like HuggingFaceProcessor
from sagemaker.workflow.pipeline_context import PipelineSession

session = PipelineSession()

skpv2 = FrameworkProcessor(
    estimator_cls=SKLearn,
    framework_version='0.23-1',
    role = get_execution_role(),
    instance_count=1,
    instance_type='ml.m5.large',
    sagemaker_session = session
)

step_args = skpv2.run(
    code='processing.py',
    source_dir="code", # add processing.py and requirements.txt here
    inputs=[...], outputs=[...]
)

from sagemaker.workflow.pipeline import Pipeline
from sagemaker.workflow.steps import ProcessingStep

processing_step = ProcessingStep(
    name="MyProcessingStep",
    step_args=step_args
)

# [ define the other steps if any ]

pipeline = Pipeline(steps=[...])

Just make sure to update the SageMaker Python SDK to the latest version :)

This method works... it creates a processor instance which the ProcessingStep accepts in the step_args parameter.

The method proposed by mohamed-ali did not work for me... it creates a list of arguments for the job-arguments parameter.

It may be worth noting here that each method works on a different parameter. When using the step_args parameter, you cannot use the processor argument... because the processor instance supplies the processor for the ProcessingStep.

When using the job-arguments parameter, there were no conflicts with the processor parameter, but the job still failed citing the missing 'code' without any reference to the missing 'source_dir'... this means it may have solved the source directory issue, but there was an issue retrieving the .py file from it.

From this experience, it appears that using the step-args method via the PipelineSession is a good idea.

@MetelStairs
Copy link

Is there any update on this issue, has it been fixed? Currently running into the same problem

@pjbhaumik
Copy link

Is there any update on this issue, has it been fixed? Currently running into the same problem

Yes, using a PipelineSession() will work... follow the example from dgallitelli posted 5/16.

@rlamba89
Copy link

I am using PipelineSession(), but getting below error:

ValueError: No file named "processing.py" was found in directory "code".

from sagemaker.sklearn import SKLearn, SKLearnProcessor
from sagemaker.processing import FrameworkProcessor  # or change with any other FrameworkProcessor like HuggingFaceProcessor
from sagemaker.workflow.pipeline_context import PipelineSession

session = PipelineSession()

skpv2 = FrameworkProcessor(
    estimator_cls=SKLearn,
    framework_version='0.23-1',
    role = role,
    instance_count=1,
    instance_type='ml.m5.xlarge',
    sagemaker_session = session
)

step_args = skpv2.run(
    code='processing.py',
    source_dir="code", # add processing.py and requirements.txt here
    outputs=[
        sagemaker.processing.ProcessingOutput(
            output_name="train", source=f"{processing_dir}/output/train"
        ),
        sagemaker.processing.ProcessingOutput(
            output_name="validation", source=f"{processing_dir}/output/validation"
        ),
        sagemaker.processing.ProcessingOutput(
            output_name="test", source=f"{processing_dir}/output/test"
        ),
    ],
)



from sagemaker.workflow.pipeline import Pipeline
from sagemaker.workflow.steps import ProcessingStep

step_process = ProcessingStep(name="ModelPreProcess", step_args=step_args)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
component: pipelines Relates to the SageMaker Pipeline Platform component: processing Relates to the SageMaker Processing Platform type: bug
Projects
None yet
Development

No branches or pull requests