Skip to content

Limited size of parameters #314

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
PedroCardoso opened this issue Jul 26, 2018 · 15 comments
Closed

Limited size of parameters #314

PedroCardoso opened this issue Jul 26, 2018 · 15 comments

Comments

@PedroCardoso
Copy link

Please fill out the form below.

System Information

  • Framework (e.g. TensorFlow) / Algorithm (e.g. KMeans):
    Tensorflow
  • Python Version:
    2.7
  • Python SDK Version:
    1.7.0

Describe the problem

When calling Tensorflow from the SDK, we are limited in the size of the parameters :

ClientError: An error occurred (ValidationException) when calling the CreateTrainingJob operation: 1 validation error detected: Value '{sagemaker_requirements="", batch_size=32, evaluation_steps=null, ... sagemaker_job_name="train-image-nature-2018-07-26-11-05-33-968", epochs=10, training_steps=3450}' at 'hyperParameters' failed to satisfy constraint: Map value must satisfy constraint: [Member must have length less than or equal to 256, Member must have length greater than or equal to 0]

256 is small, in particular if you send a list of labels or have many parameters.

Minimal repro / logs

.../envs/sagemaker_tf_27/lib/python2.7/site-packages/sagemaker/session.pyc in train(self, image, input_mode, input_config, role, job_name, output_config, resource_config, hyperparameters, stop_condition, tags)
    262         LOGGER.info('Creating training-job with name: {}'.format(job_name))
    263         LOGGER.debug('train request: {}'.format(json.dumps(train_request, indent=4)))
--> 264         self.sagemaker_client.create_training_job(**train_request)
    265 
    266     def tune(self, job_name, strategy, objective_type, objective_metric_name,

.../envs/sagemaker_tf_27/lib/python2.7/site-packages/botocore/client.pyc in _api_call(self, *args, **kwargs)
    312                     "%s() only accepts keyword arguments." % py_operation_name)
    313             # The "self" in this scope is referring to the BaseClient.
--> 314             return self._make_api_call(operation_name, kwargs)
    315 
    316         _api_call.__name__ = str(py_operation_name)

.../envs/sagemaker_tf_27/lib/python2.7/site-packages/botocore/client.pyc in _make_api_call(self, operation_name, api_params)
    610             error_code = parsed_response.get("Error", {}).get("Code")
    611             error_class = self.exceptions.from_code(error_code)
--> 612             raise error_class(parsed_response, operation_name)
    613         else:
    614             return parsed_response
@yangaws
Copy link
Contributor

yangaws commented Jul 30, 2018

Hi @PedroCardoso ,

For each hyper-parameter in the map, we have limits that each key or value should have length no more than 256.

For what you mentioned, if you have too many hyper-parameters, that won't reach this limit if each of them has key or value length within 256. If the map value is a list of a lot things, it might be a problem.

So could you give me a specific example? Then we can either recommend better practice to you or increase the limit to a more reasonable number.

Thanks

@PedroCardoso
Copy link
Author

Hi @yangaws

I believe that my particular problem is with sending a list of labels as parameter. I do need those to build the Estimator.

As an example, think of a parameter that contains a list with 30 or 40 strings objects.

@yangaws
Copy link
Contributor

yangaws commented Jul 31, 2018

@PedroCardoso

I am not confident that we will increase that limit recently. I can put a feature request here. If we keep receiving such issues, we will definitely prioritize this feature.

For now my suggestion is, for your list of 30-40 labels, specify all the labels as a separate channel in some common format like JSON.

@PedroCardoso
Copy link
Author

Are the channels information present in the parameters for the function call estimator_fn() ?

@ChoiByungWook
Copy link
Contributor

Hello,

I don't think the channels information is exposed to the estimator_fn(), as evident here https://github.com/aws/sagemaker-tensorflow-container/blob/master/src/tf_container/trainer.py#L92

I believe only the train_input_fn and eval_input_fn have access to the channels.
https://github.com/aws/sagemaker-tensorflow-container/blob/master/src/tf_container/trainer.py#L116
https://github.com/aws/sagemaker-tensorflow-container/blob/master/src/tf_container/trainer.py#L153

A workaround for this is to use the hyperparameters to store the channel metadata. Like...
hp = {'my_channel': 's3//:url/labels.json'}

apacker pushed a commit to apacker/sagemaker-python-sdk that referenced this issue Nov 15, 2018
Use get_image_uri for pyspark_mnist_customer_estimator notebook
@laurenyu
Copy link
Contributor

Closing due to inactivity. Feel free to reopen if necessary.

@zkghost
Copy link

zkghost commented Jan 25, 2019

Just hit this issue, using a custom docker container to train a model and I can't specify the features I want to train on. 👎

@dahnny012
Copy link

hitting the same thing too. Its odd that this notebook for shows a value larger than 256 in the hyper params but its actually not supported

https://github.com/awslabs/amazon-sagemaker-examples/blob/master/advanced_functionality/autogluon-tabular/AutoGluon_Tabular_SageMaker.ipynb

@PedroCardoso
Copy link
Author

for those hitting this. My solution was pass the big parameters as a json file, and have it send to the job with a manifesto file.

@dahnny012
Copy link

for those hitting this. My solution was pass the big parameters as a json file, and have it send to the job with a manifesto file.

do you have a sample for that?

@Lambik
Copy link

Lambik commented Nov 17, 2020

for those hitting this. My solution was pass the big parameters as a json file, and have it send to the job with a manifesto file.

I too am interested in learning about this, since I'm currently using the hyperparams file for all my image annotation labels in an object recognition case, and there are too many labels apparently.

@pnadolny13
Copy link

I'm also stuck here. My use case is that I need to set the SAGEMAKER_SPARKML_SCHEMA environment variable when using the https://github.com/aws/sagemaker-sparkml-serving-container (required for CSV input) and I also have ~40 features to pass. I don't think this is an uncommon pattern

@sebasarango1180
Copy link

Having this issue under the same context of @pnadolny13 . CSV inference requires passing the schema as an environment variable to the sagemaker-sparkml-serving Docker container in the SparkMLModel constructor, but I'm getting the

ClientError: An error occurred (ValidationException) when calling the CreateTrainingJob operation: 1 validation error detected: ... failed to satisfy constraint: Map value must satisfy constraint: [Member must have length less than or equal to 1024, Member must have length greater than or equal to 0].

I'd dare to say that 1024 characters is still a very small limit when dealing with highly-dimensional schemas (in my case, I must indicate 350+ features along with their data types). Is there any suggested workaround to this?

@PedroCardoso
Copy link
Author

I send the long parameters as json in an S3 blob in a parameters channel.

Having this issue under the same context of @pnadolny13 . CSV inference requires passing the schema as an environment variable to the sagemaker-sparkml-serving Docker container in the SparkMLModel constructor, but I'm getting the

ClientError: An error occurred (ValidationException) when calling the CreateTrainingJob operation: 1 validation error detected: ... failed to satisfy constraint: Map value must satisfy constraint: [Member must have length less than or equal to 1024, Member must have length greater than or equal to 0].

I'd dare to say that 1024 characters is still a very small limit when dealing with highly-dimensional schemas (in my case, I must indicate 350+ features along with their data types). Is there any suggested workaround to this?

@eugeneyarovoi
Copy link

eugeneyarovoi commented Nov 15, 2023

I have this problem in a new context:

botocore.exceptions.ClientError: An error occurred (ValidationException) when calling the CreatePipeline operation: Unable to parse pipeline definition. Model Validation failed: string 's3://long_path/1.jar,s3://long_path/2.jar,s3://long_path/3.jar' with length=260 cannot be greater than max=256.0 defined for ContainerEntrypointString.

The long_path are really some long S3 path specific to me. This happens from this code:

pyspark_processor = PySparkProcessor(
      base_job_name="spark-processor",
      image_uri=image_uri,
      role=role,
  )
  run_args = pyspark_processor.get_run_args(
      submit_app=entry_point,
      inputs=inputs,
      outputs=outputs,
      submit_py_files=[code_package],
      submit_jars=[jar1, jar2, jar3],
      arguments=arguments,
  )
  preprocessing_step = ProcessingStep(
      name=step_name,
      processor=pyspark_processor,
      inputs=run_args.inputs,
      outputs=run_args.outputs,
      code=run_args.code,
      job_arguments=run_args.arguments,
  )

This is a bit problematic since it's not immediately clear if there's a workaround. If I need to specify a long list of jars, there may be no other way to pass this information. Each jar1, jar2, etc. string will have s3://bucket_name/path_to_jar, so this effectively puts a very small cap on how many JARs there could be.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

10 participants