Skip to content

TuningStep changing sagemaker_job_name in StaticHyperParameters when run get pipeline definition and pipeline upsert and kill cache #3651

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
ZhangPei25 opened this issue Feb 10, 2023 · 2 comments
Labels
component: pipelines Relates to the SageMaker Pipeline Platform type: feature request

Comments

@ZhangPei25
Copy link

Describe the bug
During get pipeline definition and pipeline upsert, a sagemaker_job_name in Tuning step is created with base name and currenttimestamp, and added into "TrainingJobDefinition": {"StaticHyperParameters": xxx }. This will kill cache configuration even nothing has changed. This sagemaker_job_name is modified each time we run get pipeline definition or upsert, even with no change at all.

To reproduce

  1. create a pipeline with a tuningstep in it and also cache configuration for tuning step
  2. create a function can do get pipeline definition and pipeline upsert
  3. run pipeline.definition() to get the pipeline json output many times without changes
  4. compare pipeline definition Json file for different runs, you can see sagemaker_job_name is change with current timestap added
  5. or run pipeline upsert multiple times, you can see every time, it will start from tuning again even it succeed at first run.

Expected behavior
This sagemaker_job_name should not change when do upsert, and should cause cache get killed.

Screenshots or logs
No

System information
A description of your system. Please provide:

  • SageMaker Python SDK version: 2.130.0
  • Framework name (eg. PyTorch) or algorithm (eg. KMeans): xgboost (I am using my own image and scripts)
  • Framework version: xgboost-1.5-1
  • Python version: 3.8
  • CPU or GPU: CPU
  • Custom Docker image (Y/N): Y

Additional context
There is a similar bug for training and fixed last year
#2940

see definition of job name :

base_tuning_job_name (str): Prefix for the hyperparameter tuning job

and fix for trianing job :
https://github.com/aws/sagemaker-python-sdk/blob/master/src/sagemaker/workflow/steps.py#L493

@aoguo64 aoguo64 added the component: pipelines Relates to the SageMaker Pipeline Platform label Feb 13, 2023
@qidewenwhen
Copy link
Member

qidewenwhen commented Mar 2, 2023

Thanks for pointing this out!
The sagemaker_job_name added into TrainingJobDefinition-StaticHyperParameters is actually generated by the estimator object which is supplied to the tuner.
In this previous fix: https://github.com/aws/sagemaker-python-sdk/pull/2950/files, we removed sagemaker_job_name from hyperparameters in the TrainingStep, we may need to do similar thing to clean it up in the TuningStep as well.

Will open a PR to fix the issue

Update:

Even if we make the similar change of removing the sagemaker_job_name in the TuningStep request dict, it's still ending up with cache miss under the script mode (i.e. user supplied script in the estimator object), since the sagemaker_submit_directory in the TrainingJobDefinition-StaticHyperParameters is still changing each time with the timestamp suffix.

When looking into the current cache improvement logic, it seems to support TrainingStep and ProcessingStep only (see here)
Thus this issue is acctually a feature request for cache improvement on TuningStep.
cc the feature expert @brockwade633

Steps to reproduce

  1. Start from this integ test
  2. Update the TuningStep by:
    step_args = tuner.fit(inputs=inputs)
    step_cache_config = CacheConfig(enable_caching=True, expire_after="T12H")
    step_tune = TuningStep(
        name="my-tuning-step",
        step_args=step_args,
        cache_config=step_cache_config
    )
  1. After the first execution, update the pipeline and trigger another one:
        pipeline.update(role)
        execution = pipeline.start(parameters={})
        execution_steps = execution.list_steps()
        print(execution_steps)

@ShwetaSingh801
Copy link
Collaborator

Hi @ZhangPei25,

Thanks for using SageMaker and taking the time to suggest ways to improve SageMaker Python SDK. We have added your feature request it to our backlog of feature requests and may consider putting it into future SDK versions. I will go ahead and close the issue now, please let me know if you have any more feedback. Let me know if you have any other questions.

Best,
Shweta

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
component: pipelines Relates to the SageMaker Pipeline Platform type: feature request
Projects
None yet
Development

No branches or pull requests

4 participants