Skip to content

error with hyperparameters tuning #224

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
guilloufre opened this issue Jun 11, 2018 · 3 comments
Closed

error with hyperparameters tuning #224

guilloufre opened this issue Jun 11, 2018 · 3 comments

Comments

@guilloufre
Copy link

Please fill out the form below.

System Information

  • Framework (e.g. TensorFlow) / Algorithm (e.g. KMeans): Factorization Machine
  • Framework Version:
  • Python Version: 2.7
  • CPU or GPU: CPU
  • Python SDK Version: 1.4.1
  • Are you using a custom image:

Describe the problem

Hello,
I tried to use the newly released hyperparameters tuning described on the aws blog, but some error was thrown, when I launch the following command:
tuner.fit({'train': records_train, 'test': records_val})

This is the error:

Traceback (most recent call last):

  File "<ipython-input-20-9c3ba798ac1a>", line 1, in <module>
    tuner.fit({'train': s3_train_data, 'test': s3_val_data})

  File "/usr/local/lib/python2.7/dist-packages/sagemaker/tuner.py", line 144, in fit
    self.estimator._prepare_for_training(job_name)

  File "/usr/local/lib/python2.7/dist-packages/sagemaker/amazon/amazon_estimator.py", line 117, in _prepare_for_training
    feature_dim = records.feature_dim

AttributeError: 'NoneType' object has no attribute 'feature_dim'

Both records_train and records_val are RecordSet objects. For example, this is records_train:
(<class 'sagemaker.amazon.amazon_estimator.RecordSet'>, {'s3_data_type': 'S3Prefix', 'feature_dim': 2229, 'num_records': 7923, 'channel': 'train', 's3_data': 's3://###############'})

The training of the Factorization machine works if I launch:
fm_estimator.fit(records_train, mini_batch_size = 1000)
I also tried by providing direct links to s3 instead of RecordSet object with
tuner.fit({'train': s3_train_data, 'test': s3_val_data})
like in the example on the blog, but it throws the same error.

Thanks for helping me about this issue!

@laurenyu
Copy link
Contributor

hi @guilloufre, thanks for trying out the new hyperparameter tuning feature!

The error is because you need to pass a list with the RecordSet objects instead of a dict to fit(). (the channel names are already specified in the RecordSet class, so no need to write them again.)

@guilloufre
Copy link
Author

Thanks, it was actually easy! :)

apacker pushed a commit to apacker/sagemaker-python-sdk that referenced this issue Nov 15, 2018
Fixed: payload was larger than SageMaker limit
@phschimm
Copy link

phschimm commented Feb 9, 2022

When using the record_set() method referenced here, the created RecordSet does not use the required S3DataDistributionType=FullyReplicated but instead uses ShardedByS3Key.

I'm trying to use the high level Python API to get metrics about my created models:

from sagemaker import RandomCutForest
from sagemaker.tuner import HyperparameterTuner, IntegerParameter

rcf = RandomCutForest(..., eval_metrics=['accuracy', 'precision_recall_fscore'])

train_set = rcf.record_set(features,
                           channel='train')

test_set = rcf.record_set(features,
                          labels=labels,
                          channel='test')

tuner = HyperparameterTuner(estimator=rcf,
                            objective_metric_name='test:f1',
                            hyperparameter_ranges={'num_samples_per_tree': IntegerParameter(32, 512),
                                                   'num_trees': IntegerParameter(50, 1000)},
                            max_jobs=1,
                            max_parallel_jobs=1)

tuner.fit([train_set, test_set])

When I execute this code, I get the following error in the AWS SageMaker console:

Failure reason
ClientError: Unable to initialize the algorithm. Failed to validate input data configuration. (caused by ValidationError) Caused by: 'ShardedByS3Key' is not one of ['FullyReplicated'] Failed validating 'enum' in schema['properties']['test']['properties']['S3DistributionType']: {'enum': ['FullyReplicated'], 'type': 'string'} On instance['test']['S3DistributionType']: 'ShardedByS3Key'

Is there a manual way to create a RecordSet that has the correct S3DataDistributionType via the Pyton API?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants