-
Notifications
You must be signed in to change notification settings - Fork 1.2k
_repack_script_launcher.sh is overwriting runproc.sh on subsequent pipeline steps making pipeline to fail #3467
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
Hi @RicardoHS thanks for using AWS SageMaker!
|
Hi @qidewenwhen, see the answers below
My first ProcessingStep fails due an strange error (makes no sense for the script). After inspecting the logs, this first step must call a custom script
Note the final line, this is the run file for the first step so that line must be When searching the S3 path for the last ProcessingStep I realized it's the same S3 path as the first step (both steps use the same path). This only happens for v2.116.0, for v2.112.0 the S3 path is different for each step:
Hope it's more clear with the previous explanation
I'm using ProcessingStep, ConditionStep, LambdaStep, ModelStep and TrainingStep
I'm sorry but the whole code is splitted between multiple files and merging all together would be a lot of work. I can provide you the pipeline.describe() output if it's useful Finally: If it's not clear, when reverting the version to 2.112.0 everything starts working again |
Thanks @RicardoHS! Based on all these info, the issue seems to relate to our recently released Cache changes. To help us confirm this and to take next step for the fix, could you please kindly provide the following detailed info:
|
@qidewenwhen here it is. Some information is inside
pipe.describe() from v2.116.0
|
Hi @RicardoHS, thanks a lot for providing all the information. One other clarifying question to confirm the issue: When you first encountered this behavior, were all the code files located in the same local directory, like |
Hi @brockwade633 . That's exactly the situation in the code when the behaviour was encountered. |
Hi @RicardoHS, thanks for your reply. The behavior that you describe does represent a bug - the WorkaroundIn the meantime, I was able to produce a valid pipeline with distinct upload locations by using the path structures you included in the last code snippet, using additional subdirectories such as
As a temporary workaround, the folder separations produce different S3 upload folders for each step, and prevents |
Hi @brockwade633 I have tried putting every script on a separated folder and if I use v2.116.0 the pipeline throw me this new error:
|
@RicardoHS, thanks for letting us know. We have experienced a couple issues specifically around FrameworkProcessors and SparkProcessors recently and this looks like an example of an idempotency problem that's being tracked here: #3451. In your code, are you calling pipeline.definition(), pipeline.create(), pipeline.update(), pipeline.upsert(), or step.arguments, multiple times? |
Hi @RicardoHS, a new sagemaker package version is available ( |
Hi @brockwade633 after the update to |
Okay, great! You're welcome. Closing out the issue. |
Describe the bug
Not sure if it was introduced in this commit d5261f2. I have found my pipelines fails when using v2.116.0 instead of v2.112.0. Comparing pipeline.describe() output from when I upsert the pipeline from my local machine (using v2.112.0) vs the one executed on my CI/CD (uses last version available) shows me that if a step uses a custom entrypoint, the S3 path used is the same for all steps. Because runproc.sh is stored there, only the runproc.sh of last step with custom entrypoint will survive.
In v2.112.0 pipeline.describe() output:
My first step:
My second step
In v2.116.0 pipeline.describe() output:
My first step:
My second step
To reproduce
Use two steps with custom entrypoint. In 2.112.0 it should work, in 2.116.0 all steps will use the same entrypoint code (the one from the last step defined)
Expected behavior
Custom entrypoint code is not overwritten
Screenshots or logs
System information
A description of your system. Please provide:
Additional context
The text was updated successfully, but these errors were encountered: