-
Notifications
You must be signed in to change notification settings - Fork 1.2k
Processing job running in a pipeline can't load default experiment #4114
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
I came up with the following workaround to elicit from sagemaker.utils import retry_with_backoff
import sagemaker.experiments
environment = sagemaker.experiments._environment._RunEnvironment.load()
print(f"{environment.source_arn=}")
job_name = environment.source_arn.split("/")[-1]
print(f"job_name: {job_name}")
experiment_config = retry_with_backoff(lambda: sagemaker.Session().describe_processing_job(job_name).get("ExperimentConfig"), num_attempts=4)
run_name = experiment_config['TrialName']
experiment_name = experiment_config['ExperimentName'] Then, I loaded the run. with sagemaker.experiments.load_run(experiment_name=experiment_name, run_name=run_name) as run:
products = [models.Product[product_str.upper()] for product_str in products]
run.log_parameter('testparam', 'schnell')
run.log_metric('testmetric', 89) The from sagemaker.utils import retry_with_backoff
import sagemaker.experiments
environment = sagemaker.experiments._environment._RunEnvironment.load()
print(f"{environment.source_arn=}")
job_name = environment.source_arn.split("/")[-1]
print(f"job_name: {job_name}")
response = retry_with_backoff(
lambda: sagemaker.Session().describe_processing_job(job_name), num_attempts=4
)
run_name = job_name + "-aws-processing-job"
experiment_config = response.get("ExperimentConfig")
run_group_name = experiment_config["TrialName"]
experiment_name = experiment_config["ExperimentName"] And further down sagemaker.experiments.run.TRIAL_NAME_TEMPLATE = run_group_name However, this does give me the following error:
This can be also circumvented since it seems that the SDK for some reasons restricts the length in
However, it still won't work:
Hence, it seems impossible to log to the run name that was already created with the processing job, even when trying manually. The docs promises to do that all automatically. |
@lorenzwalthert I'm facing the exact same issue. I'm creating a TrainingStep for an estimator, and similar to your experience above, I wasn't able to access the trial_name or any of the specified ExperimentConfig inside my run. Were you able to solve this issue any differently? |
@lorenzwalthert how about treating the step (be it a TrainingStep, ProcessingStep) as if it was a job: use |
That may require container re-build or complex passing of job names and creation of unique names. My example shows how you can extract all relevant information manually in teh container, it's just impossible to log to exactly the same name that was already created. |
Are there any updates on this issue? This is an important problem for me. |
I think upvoting the initial post and (if you have) forward a link to this issue with AWS Premium support is the way to go... |
Hi @lorenzwalthert, Thanks for using SageMaker and taking the time to suggest ways to improve SageMaker Python SDK. We have added your feature request it to our backlog of feature requests and may consider putting it into future SDK versions. I will go ahead and close the issue now, please let me know if you have any more feedback. Let me know if you have any other questions. Best, |
Describe the bug
When I try to load the currenlty active run with
with sagemaker.experiments.load_run()
in a processing job part of a pipeline without specifying the experiment config anywhere, I getfrom my processing script. When I pass an experiment name and run name to
sagemaker.experiments.load_run()
the problem disappears. However, this is impractical, as I want to log to the current execution ID, which I can't retrieve in the processing job easily.From the intro blog post, it says:
According to the doc page Experiment + Pipeline integration, I don't need to specify an experiment when creating a pipeline start a run or within a run etc.
To reproduce
I wanted to create a minimal reproducible example, but your latest scikit-learn container does not even contain the sagemaker sdk, so it's a bit difficult...
But here is my abbreviated pipeline script:
And my processing script
script.py
contains:Expected behavior
I expect that in accordance with the docs, an experiment config has not to be specified, neighter when creating the pipeline nor in the processing scripts, to log to the current execution id of the pipeline.
Screenshots or logs
System information
A description of your system. Please provide:
Additional context
This bug basically means that sagemaker experiments can't be used in processing steps in pipelines.
Everything seems to work fine if I try to run the same processing job outside of a pipeline using the syntax introduced in the intro blog post:
The text was updated successfully, but these errors were encountered: