Limited size of parameters #314

PedroCardoso · 2018-07-26T11:26:48Z

Please fill out the form below.

System Information

Framework (e.g. TensorFlow) / Algorithm (e.g. KMeans):
Tensorflow
Python Version:
2.7
Python SDK Version:
1.7.0

Describe the problem

When calling Tensorflow from the SDK, we are limited in the size of the parameters :

ClientError: An error occurred (ValidationException) when calling the CreateTrainingJob operation: 1 validation error detected: Value '{sagemaker_requirements="", batch_size=32, evaluation_steps=null, ... sagemaker_job_name="train-image-nature-2018-07-26-11-05-33-968", epochs=10, training_steps=3450}' at 'hyperParameters' failed to satisfy constraint: Map value must satisfy constraint: [Member must have length less than or equal to 256, Member must have length greater than or equal to 0]

256 is small, in particular if you send a list of labels or have many parameters.

Minimal repro / logs

.../envs/sagemaker_tf_27/lib/python2.7/site-packages/sagemaker/session.pyc in train(self, image, input_mode, input_config, role, job_name, output_config, resource_config, hyperparameters, stop_condition, tags)
    262         LOGGER.info('Creating training-job with name: {}'.format(job_name))
    263         LOGGER.debug('train request: {}'.format(json.dumps(train_request, indent=4)))
--> 264         self.sagemaker_client.create_training_job(**train_request)
    265 
    266     def tune(self, job_name, strategy, objective_type, objective_metric_name,

.../envs/sagemaker_tf_27/lib/python2.7/site-packages/botocore/client.pyc in _api_call(self, *args, **kwargs)
    312                     "%s() only accepts keyword arguments." % py_operation_name)
    313             # The "self" in this scope is referring to the BaseClient.
--> 314             return self._make_api_call(operation_name, kwargs)
    315 
    316         _api_call.__name__ = str(py_operation_name)

.../envs/sagemaker_tf_27/lib/python2.7/site-packages/botocore/client.pyc in _make_api_call(self, operation_name, api_params)
    610             error_code = parsed_response.get("Error", {}).get("Code")
    611             error_class = self.exceptions.from_code(error_code)
--> 612             raise error_class(parsed_response, operation_name)
    613         else:
    614             return parsed_response

The text was updated successfully, but these errors were encountered:

yangaws · 2018-07-30T21:28:23Z

Hi @PedroCardoso ,

For each hyper-parameter in the map, we have limits that each key or value should have length no more than 256.

For what you mentioned, if you have too many hyper-parameters, that won't reach this limit if each of them has key or value length within 256. If the map value is a list of a lot things, it might be a problem.

So could you give me a specific example? Then we can either recommend better practice to you or increase the limit to a more reasonable number.

Thanks

PedroCardoso · 2018-07-31T09:46:51Z

Hi @yangaws

I believe that my particular problem is with sending a list of labels as parameter. I do need those to build the Estimator.

As an example, think of a parameter that contains a list with 30 or 40 strings objects.

yangaws · 2018-07-31T21:59:30Z

@PedroCardoso

I am not confident that we will increase that limit recently. I can put a feature request here. If we keep receiving such issues, we will definitely prioritize this feature.

For now my suggestion is, for your list of 30-40 labels, specify all the labels as a separate channel in some common format like JSON.

PedroCardoso · 2018-08-03T17:10:30Z

Are the channels information present in the parameters for the function call estimator_fn() ?

ChoiByungWook · 2018-08-15T00:58:57Z

Hello,

I don't think the channels information is exposed to the estimator_fn(), as evident here https://github.com/aws/sagemaker-tensorflow-container/blob/master/src/tf_container/trainer.py#L92

I believe only the train_input_fn and eval_input_fn have access to the channels.
https://github.com/aws/sagemaker-tensorflow-container/blob/master/src/tf_container/trainer.py#L116
https://github.com/aws/sagemaker-tensorflow-container/blob/master/src/tf_container/trainer.py#L153

A workaround for this is to use the hyperparameters to store the channel metadata. Like...
hp = {'my_channel': 's3//:url/labels.json'}

Use get_image_uri for pyspark_mnist_customer_estimator notebook

laurenyu · 2018-12-20T21:16:17Z

Closing due to inactivity. Feel free to reopen if necessary.

zkghost · 2019-01-25T23:35:56Z

Just hit this issue, using a custom docker container to train a model and I can't specify the features I want to train on. 👎

dahnny012 · 2020-06-26T07:41:10Z

hitting the same thing too. Its odd that this notebook for shows a value larger than 256 in the hyper params but its actually not supported

https://github.com/awslabs/amazon-sagemaker-examples/blob/master/advanced_functionality/autogluon-tabular/AutoGluon_Tabular_SageMaker.ipynb

PedroCardoso · 2020-06-26T09:59:59Z

for those hitting this. My solution was pass the big parameters as a json file, and have it send to the job with a manifesto file.

dahnny012 · 2020-07-06T22:03:36Z

for those hitting this. My solution was pass the big parameters as a json file, and have it send to the job with a manifesto file.

do you have a sample for that?

Lambik · 2020-11-17T09:59:18Z

for those hitting this. My solution was pass the big parameters as a json file, and have it send to the job with a manifesto file.

I too am interested in learning about this, since I'm currently using the hyperparams file for all my image annotation labels in an object recognition case, and there are too many labels apparently.

pnadolny13 · 2020-11-19T17:56:32Z

I'm also stuck here. My use case is that I need to set the SAGEMAKER_SPARKML_SCHEMA environment variable when using the https://github.com/aws/sagemaker-sparkml-serving-container (required for CSV input) and I also have ~40 features to pass. I don't think this is an uncommon pattern

sebasarango1180 · 2021-11-08T17:00:12Z

Having this issue under the same context of @pnadolny13 . CSV inference requires passing the schema as an environment variable to the sagemaker-sparkml-serving Docker container in the SparkMLModel constructor, but I'm getting the

ClientError: An error occurred (ValidationException) when calling the CreateTrainingJob operation: 1 validation error detected: ... failed to satisfy constraint: Map value must satisfy constraint: [Member must have length less than or equal to 1024, Member must have length greater than or equal to 0].

I'd dare to say that 1024 characters is still a very small limit when dealing with highly-dimensional schemas (in my case, I must indicate 350+ features along with their data types). Is there any suggested workaround to this?

PedroCardoso · 2021-11-08T20:32:34Z

I send the long parameters as json in an S3 blob in a parameters channel.

Having this issue under the same context of @pnadolny13 . CSV inference requires passing the schema as an environment variable to the sagemaker-sparkml-serving Docker container in the SparkMLModel constructor, but I'm getting the

ClientError: An error occurred (ValidationException) when calling the CreateTrainingJob operation: 1 validation error detected: ... failed to satisfy constraint: Map value must satisfy constraint: [Member must have length less than or equal to 1024, Member must have length greater than or equal to 0].

I'd dare to say that 1024 characters is still a very small limit when dealing with highly-dimensional schemas (in my case, I must indicate 350+ features along with their data types). Is there any suggested workaround to this?

eugeneyarovoi · 2023-11-15T02:48:43Z

I have this problem in a new context:

botocore.exceptions.ClientError: An error occurred (ValidationException) when calling the CreatePipeline operation: Unable to parse pipeline definition. Model Validation failed: string 's3://long_path/1.jar,s3://long_path/2.jar,s3://long_path/3.jar' with length=260 cannot be greater than max=256.0 defined for ContainerEntrypointString.

The long_path are really some long S3 path specific to me. This happens from this code:

pyspark_processor = PySparkProcessor(
      base_job_name="spark-processor",
      image_uri=image_uri,
      role=role,
  )
  run_args = pyspark_processor.get_run_args(
      submit_app=entry_point,
      inputs=inputs,
      outputs=outputs,
      submit_py_files=[code_package],
      submit_jars=[jar1, jar2, jar3],
      arguments=arguments,
  )
  preprocessing_step = ProcessingStep(
      name=step_name,
      processor=pyspark_processor,
      inputs=run_args.inputs,
      outputs=run_args.outputs,
      code=run_args.code,
      job_arguments=run_args.arguments,
  )

This is a bit problematic since it's not immediately clear if there's a workaround. If I need to specify a long list of jars, there may be no other way to pass this information. Each jar1, jar2, etc. string will have s3://bucket_name/path_to_jar, so this effectively puts a very small cap on how many JARs there could be.

yangaws added the type: question label Jul 30, 2018

yangaws added the feature request label Jul 31, 2018

apacker pushed a commit to apacker/sagemaker-python-sdk that referenced this issue Nov 15, 2018

Merge pull request aws#314 from yangaws/pyspark

3d6ba23

Use get_image_uri for pyspark_mnist_customer_estimator notebook

laurenyu closed this as completed Dec 20, 2018

timforby mentioned this issue Jul 13, 2020

Addded trimming to encoded sagemaker parameters in shell package. awslabs/gluonts#917

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Limited size of parameters #314

Limited size of parameters #314

PedroCardoso commented Jul 26, 2018

yangaws commented Jul 30, 2018

PedroCardoso commented Jul 31, 2018

yangaws commented Jul 31, 2018

PedroCardoso commented Aug 3, 2018

ChoiByungWook commented Aug 15, 2018

laurenyu commented Dec 20, 2018

zkghost commented Jan 25, 2019

dahnny012 commented Jun 26, 2020

PedroCardoso commented Jun 26, 2020

dahnny012 commented Jul 6, 2020

Lambik commented Nov 17, 2020

pnadolny13 commented Nov 19, 2020

sebasarango1180 commented Nov 8, 2021

PedroCardoso commented Nov 8, 2021

eugeneyarovoi commented Nov 15, 2023 •

edited

Loading

Limited size of parameters #314

Limited size of parameters #314

Comments

PedroCardoso commented Jul 26, 2018

System Information

Describe the problem

Minimal repro / logs

yangaws commented Jul 30, 2018

PedroCardoso commented Jul 31, 2018

yangaws commented Jul 31, 2018

PedroCardoso commented Aug 3, 2018

ChoiByungWook commented Aug 15, 2018

laurenyu commented Dec 20, 2018

zkghost commented Jan 25, 2019

dahnny012 commented Jun 26, 2020

PedroCardoso commented Jun 26, 2020

dahnny012 commented Jul 6, 2020

Lambik commented Nov 17, 2020

pnadolny13 commented Nov 19, 2020

sebasarango1180 commented Nov 8, 2021

PedroCardoso commented Nov 8, 2021

eugeneyarovoi commented Nov 15, 2023 • edited Loading

eugeneyarovoi commented Nov 15, 2023 •

edited

Loading