Skip to content

DataConfig: no way to parameterize s3_analysis_config_output_path using Sagemaker Pipeline parameters #3879

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
maslick opened this issue May 23, 2023 · 3 comments
Labels
component: pipelines Relates to the SageMaker Pipeline Platform Pending information type: question

Comments

@maslick
Copy link

maslick commented May 23, 2023

It looks it's not possible to parameterize s3_analysis_config_output_path using Sagemaker Pipeline parameters:

model_explainability_data_config = DataConfig(
    s3_data_input_path=step_process.properties.ProcessingOutputConfig.Outputs[
        "shap"
    ].S3Output.S3Uri,
    s3_output_path=ParameterString(name="s3_output_path", default_value="s3://helloworld/"),
    s3_analysis_config_output_path=ParameterString(name="s3_analysis_config_output_path", default_value="s3://helloworld/analasys_config"),
    label='target',
    dataset_type="text/csv",
)
Exception: s3_analysis_config_output_path cannot be of type ExecutionVariable/Expression/Parameter/Properties

Nor it is possible to leave s3_analysis_config_output_path as default value i.e. None:

model_explainability_data_config = DataConfig(
    s3_data_input_path=step_process.properties.ProcessingOutputConfig.Outputs[
        "shap"
    ].S3Output.S3Uri,
    s3_output_path=ParameterString(name="s3_output_path", default_value="s3://helloworld/"),
    label='target',
    dataset_type="text/csv",
)
Exception: `s3_output_path` cannot be of type ExecutionVariable/Expression/Parameter/Properties if `s3_analysis_config_output_path` is none or empty

The documenation states:

If this field is None, then the s3_output_path will be used to store the analysis_config output.:

Sagemaker API version: sagemaker>=2.158.0

Originally posted by @maslick in #2698 (comment)

@thbrooks22 thbrooks22 added the component: pipelines Relates to the SageMaker Pipeline Platform label May 23, 2023
@qidewenwhen
Copy link
Member

Hi @maslick thanks for reaching out!

The behavior you reported is by design. Here's the rationale behind it:

  • Sagemaker Pipeline parameters/properties/Expression/ExecutionVariable is runtime variables which would be interpreted only in execution time (when triggering pipeline.start).
  • However, the s3_analysis_config_output_path is only used during the compile time (when invoking pipeline.create or pipeline.definition) as a S3 path where the analysis config should be uploaded to.

Thus a ParameterString does not apply to the s3_analysis_config_output_path.

The same applies to dataconfig. s3_output_path if s3_analysis_config_output_path is None, which is the second exception you posted. As in this case, dataconfig. s3_output_path is used (as a backup) in compile time to upload the analysis config, it should not be a pipeline parameter/properties etc.

@qidewenwhen
Copy link
Member

I'd recommend to keep a local variable for s3_analysis_config_output_path and update it as needed. In this way you can keep the dataconfig. s3_output_path parameterized. See the example below:

s3_analysis_config_output_path = s3://helloworld/analasys_config. <<<< 
...
model_explainability_data_config = DataConfig(
    s3_data_input_path=step_process.properties.ProcessingOutputConfig.Outputs[
        "shap"
    ].S3Output.S3Uri,
    s3_output_path=ParameterString(name="s3_output_path", default_value="s3://helloworld/"),  <<<<< Unchanged
    s3_analysis_config_output_path=s3_analysis_config_output_path,  <<<<<<<
    label='target',
    dataset_type="text/csv",
)

@nmadan
Copy link
Member

nmadan commented Aug 22, 2023

Resolving for now. Please reach out if you still have questions regarding this.

@nmadan nmadan closed this as completed Aug 22, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
component: pipelines Relates to the SageMaker Pipeline Platform Pending information type: question
Projects
None yet
Development

No branches or pull requests

4 participants