Skip to content

ValueError(f"Bad value for instance type: '{instance_type}'") #3993

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
mingchun-liu opened this issue Jul 12, 2023 · 7 comments · Fixed by #4065
Closed

ValueError(f"Bad value for instance type: '{instance_type}'") #3993

mingchun-liu opened this issue Jul 12, 2023 · 7 comments · Fixed by #4065
Labels
component: pipelines Relates to the SageMaker Pipeline Platform type: bug

Comments

@mingchun-liu
Copy link
Contributor

mingchun-liu commented Jul 12, 2023

Describe the bug
Good Morning, I get an error when trying to create a pipeline execution, after updating sagemaker, as follows:

...Traceback Messages...
  File "blah/blah", line xxxx, in get_estimator
    estimator = Estimator(
  File "/var/task/sagemaker/estimator.py", line 2896, in __init__
    super(Estimator, self).__init__(
  File "/var/task/sagemaker/estimator.py", line 622, in __init__
    raise ValueError(f"Bad value for instance type: '{instance_type}'")
  File "/var/task/sagemaker/workflow/entities.py", line 86, in __str__
    raise TypeError(
TypeError: Pipeline variables do not support __str__ operation. Please use `.to_string()` to convert it to string type in execution timeor use `.expr` to translate it to Json for display purpose in Python SDK.
+ [[ False = True ]]

Firstly the TypeError is incorrect, I don't want to to_string() the pipeline variable right off the bat, as I don't know what instance type I will need. I want to dynamically assign an instance_type after I've estimated the resources required to run a job. This resource estimation is part of our pipeline.

To reproduce
Try to create an estimator with instance_type = a PipelineVariable with recent version of sagemaker==2.171.0

Expected behavior
Before recent changes I was able to dynamically assign an instance type. Now I have to do some kind of workaround, such as using ConditionStep and JsonGet'ing the type. Or some other workaround, if you could please provide any suggestions.

Is this expected behavior? Is the point that each step should be assigned an instance_type at pipeline execution start or is there another reason? If not, assigning an instance_type is important as we want to accurately estimate the resources for a training job rather than overprovision or guess too early. We have files that might have high compression ratios that we cannot accurate gauge the size of early on.

@eugeneyarovoi
Copy link

eugeneyarovoi commented Jul 12, 2023

To summarize the problem a bit more compactly: it seems that as of this writing, the newer versions of the sagemaker Python package (e.g. 2.171.0), require instance_type as passed to Estimator to be str or ParameterString. But ours is a different kind of PipelineVariable, namely a JsonGet, where we get the value from a property file. This causes an error when we try to create our pipeline as before. This was valid in earlier versions. I don't see any clear reason the value shouldn't be allowed to be a JsonGet, ExecutionVariable, or these other kinds of PipelineVariables.

Is there a reason it was changed now, or is it a bug?

@svia3 svia3 added the component: pipelines Relates to the SageMaker Pipeline Platform label Jul 18, 2023
@mingchun-liu
Copy link
Contributor Author

mingchun-liu commented Jul 19, 2023

Hi @evakravi, pinging you because you're the author of the change mentioned above. We noticed the recent change you made doesn't evaluate the ParameterString value, but uses the default_value. This might let an invalid instance_type pass if its default_instance_type is valid for the volume.

We're considering creating a PR to modify this logic. We want to use a PipelineVariable akin to a JsonGet (as we've done before), allowing us to assign an instance_type later in the pipeline execution instead of at the start. So basically we need to bypass the check altogether because we can't know the volume ahead of time.

Could you please provide more context for your change, to help us understand if our approach is okay?

@qidewenwhen
Copy link
Member

A note for following oncalls and contributors: this seems to be a similar issue as #3917

A fix #3972 has been merged and released in v2.171.0 to fix the issue but it does not seem to cover this issue #3993 case.

@qidewenwhen
Copy link
Member

Putting a summary here for the regression resulted from #3870:

Tagging the change owner @evakravi for more context, can we fix this issue and make your change backward compatible?

@qidewenwhen
Copy link
Member

qidewenwhen commented Aug 10, 2023

Also, seems we're missing test cases for inputing pipeline variables other than ParameterString (e.g. JsonGet) to instance_type.

Hi @mlsquareup, as you've mentioned:

We're considering creating a PR to modify this logic.

Have you opened a PR or have done that locally already?

Thanks!

@eugeneyarovoi
Copy link

I work with @mlsquareup. We haven't created such a PR yet, as we were first looking to get feedback explaining why the change was made in the first place. We'd still rather someone with more familiarity with the codebase than us make the change (this is the first time we've looked at the source), if that's possible and something folks are interested in addressing.

@qidewenwhen
Copy link
Member

Thanks @eugeneyarovoi for confirming!
I was asking this to avoid duplicate work. I'll reach out to the change owner offline to understand the change and work on a fix.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
component: pipelines Relates to the SageMaker Pipeline Platform type: bug
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants