Skip to content

SageMaker Framework Processing step not finding /opt/ml/processing/input/code/ #4272

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
vickytang1993 opened this issue Nov 25, 2023 · 2 comments
Assignees
Labels
component: pipelines Relates to the SageMaker Pipeline Platform type: bug

Comments

@vickytang1993
Copy link

Hi,

I'm using FrameworkProcessor from the SageMaker Python SDK for a ProcessingStep in my SageMaker pipeline. When running the pipeline from a Jupyter notebook in SageMaker Studio, I'm getting the following error:

2023-11-25T19:16:07.921+05:30Copy  File "/opt/ml/processing/input/entrypoint/runproc.sh", line 3    
cd /opt/ml/processing/input/code/ ^


2023-11-25T19:16:07.921+05:30CopySyntaxError: invalid syntax

This is from the script runproc.sh, which is generated by FrameworkProcessor. It looks like the script is trying to go to the directory "/opt/ml/processing/input/code/" to find the entrypoint python file to run for the processing but can't find the file. Here is my Python code for my pipeline:

from sagemaker.processing import ProcessingInput, ProcessingOutput
from sagemaker import get_execution_role
from sagemaker.processing import FrameworkProcessor
import sagemaker.processing as sm_processing

# # Create a FrameworkProcessorArgs object
# framework_processor_args = sm_processing.FrameworkProcessorArgs(
#     framework_entrypoint_command=["Python"]
# )
from sagemaker.workflow.pipeline_context import PipelineSession

session = PipelineSession()


image_uri = "763104351884.dkr.ecr.us-east-1.amazonaws.com/pytorch-training:1.8-cpu-py3"
s3_bucket ='di-dev-sagemaker'
inference_output = f"s3://{s3_bucket}/Flanges-Corrosion/BatchTransformOutputJson"
input_json_dir = "Flanges-Corrosion/BatchTransformOutputJson"
script_eval = FrameworkProcessor(
    framework_version='1.8',
    estimator_cls=PyTorch,
    image_uri=image_uri,
    
    instance_type="ml.m5.xlarge",
    instance_count=1,
    base_job_name="job-args-processor",
    role=role,
    sagemaker_session = session

)
script_eval.framework_entrypoint_command= ['python3']
# Run the processing job
step_args = script_eval.run(
    code="pp_main.py",
    # code = "entrypoint.sh",
    
    source_dir='VT_postprocessing',
    inputs= None,
    # [
    #         ProcessingInput(source="VT_postprocessing/data_handler.py", destination="/opt/ml/processing/input/code/"),
    #         ],
  
    outputs= None
   
)




step_postprocess = ProcessingStep(
    name="Framework-PostProcessing-v1",
    # processor=script_eval,
    
    inputs = None,
    # [
    #         ProcessingInput(source="VT_postprocessing/data_handler.py", destination="/opt/ml/processing/input/code/data_handler.py"),
    #         ],
    outputs = None,
    # code="VT_postprocessing/pp_main.py",
    job_arguments = ["--s3_bucket",s3_bucket,"--input_json_dir",input_json_dir],
    step_args = step_args
)
step_postprocess.add_depends_on([step_transform_corr_seg, step_transform_langsam])

I would appreciate any help with this. I found an issue regarding the broken integration between FrameworkProcessor and ProcessingStep (#2909). Is it related?

@trungleduc
Copy link
Collaborator

Hi @vickytang1993 , could you provide a fully reproducible code (including external code, and data...)?

@aoguo64 aoguo64 added the component: pipelines Relates to the SageMaker Pipeline Platform label Dec 19, 2023
@martinRenou
Copy link
Collaborator

Closing for triaging. Feel free to continue the discussion or reopen the issue if needed.

Thanks!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
component: pipelines Relates to the SageMaker Pipeline Platform type: bug
Projects
None yet
Development

No branches or pull requests

4 participants