-
Notifications
You must be signed in to change notification settings - Fork 1.2k
multiple calls to pipeline.definition() causes extra "code" ProcessingInput to the appended to ProcessingStep #3451
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
We are seeing this issue not just when |
More ContextThis issue stems from the lack of idempotency within the Before the caching improvements were implemented, Work AroundThe issue can be avoided by building pipelines and running notebooks in the previous fashion, using sdk version <= 2.115. If using sdk version 2.116, this issue can be mitigated by re-running the cell that first defines the processing input list and then re-building the pipeline. For example: cell 1: cell 2 cell 3 cell 4 cell 5 cell 6 Run cells 1-4, then re-run cell 2. This will re-define the original, correct inputs. Then move on to cell 5-6. Another iteration of this issue occurs in the case of FixIn this PR, existing unit tests have been extended to test for idempotency in Processing, Training, Transform and Tuning steps. The |
Mentioned Work Around doesn't seem to work when using a script instead of colab notebooks |
Closing this issue as PR#3460 is merged |
Hi @harshraj22, yes, the workaround will be a little more difficult in the context of a regular python script file, rather than a Jupyter notebook. Do you have multiple calls to pipeline.definition() or step.arguments() in your script? One thing you can try is to limit those to just a single call. I will provide an update as soon as I have more info to share. |
No there's no multiple call.
|
Hi @harshraj22, I had same issue, tried upgrading it was still there, when running locally from script. It looks that inside the pipeline start the step.arguments are called multiple time, I found there if self.step_args: that then calls execute_job_functions that sends the code. To mitigate the issue one workaround that works for me is to do not create step_args outside/do not call run. You can pass processor inputs/outputs/args/code directly to the ProcessingStep. In your case something like this:
This uploaded the code only once. |
Hi @harshraj22 and @Cayd3Nine, thanks for your comments. There have been several fixes included in the latest package version |
Describe the bug
I have multiple calls to
pipeline.definition()
in my notebook. Everytime I make this call, an extra "code" ProcessingInput gets appended to my ProcessingStep argument definitionTo reproduce
After one call to pipeline.definition(), my step json looks like:
After every subsequent call, another "code" ProcessingInput gets added
Expected behavior
No extra ProcessingInputs are appended
Additional context
This issue was noticed in sdk version 2.116. It does not exist in v2.115.
The text was updated successfully, but these errors were encountered: