-
Notifications
You must be signed in to change notification settings - Fork 1.2k
TransformStep in Pipelines mishandling source_dir attribute (PyTorch/Script Mode) #2549
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
Hi @marianokamp , Thanks for using Amazon SageMaker. @jerrypeng7773 Can you take a look at the issue for this pipeline bug? |
Hi, I'm getting the same issue. It seems like the |
we are working on this. |
#3091 is possibly related to this. |
@marianokamp Can you please retry, I think this should be addressed. The |
Meanwhile I created my own container and such to deal with that issue from last year. I would like to re-test, @jerrypeng7773, but my code evolved and I am not sure what the equivalent with the BatchTransformStep is. My processing step that I created as a workaround for the broken transform creates an named output, but I leave it to Pipelines to name the location:
I then use this output as an input in a subsequent step:
This works. But for that I had to build my own container (no source_dir for PyTorch) and re-implement the batch transform logic. So back to the fixed version of the batch transform step ... I tried to replace it, but could not find much control over the outputs, especially not how to name the outputs, so that I can look them up later. But it looks like I can leave output_path=None. So that may work as the docs say the default bucket will be used. Except it doesn't. And anyway it is on the Transformer, while it should, IMHO, be on the Step. But the docs say no: No output control as far as I could find.
I get the following error during the upsert.
|
@marianokamp sorry for the inconvenience, we release a new way to construct the step in 2.89.0, can you try it out to see if it fixes the issue? For processing, we can do
For transformer, you can do
In summary, we introduced the This works for estimator, transformer, and tuner too. |
When creating a Transform the
source_dir
attribute that points to the local directory containing the source is used to pack up the sources, put them onto S3 and when the Transform is executed, it uses that code that has been transferred from the S3 location to the running Transform container.This works with the SDK directly, but not with SM Pipelines. Here
source_dir
and the consequent upload of the source is not handled properly.What is working?
(Plain use of SDK to instantiate model and transfomer)
This is working in the SDK. The source dir is packaged up, transfered to S3 and also the resulting model contains the property:
SAGEMAKER_SUBMIT_DIRECTORY = s3://.../model.tar.gz
Output of a describe-model and inspection of the tar ball:
^^^ Please recognize the existence of
code/inference.py
and see theSAGEMAKER_SUBMIT_DIRECTORY
.What is not working?
(Equivalent steps with SM Pipelines)
When running the Transform I get the dreaded
Please provide a model_fn implementation
error.The main difference seems to be that the
SAGEMAKER_SUBMIT_DIRECTORY
property has not been updated tos3://.../model.tar.gz
as in the working case above, but shows the local pathfile://.
. And the tarball does not contain the inference code.Full output:
System information
A description of your system. Please provide:
You can slack me for details, I am mkamp@.
The text was updated successfully, but these errors were encountered: