Unable to pass eval_metrics to KMeans estimator #889

rddefauw · 2019-06-27T23:12:44Z

Please fill out the form below.

System Information

Framework (e.g. TensorFlow) / Algorithm (e.g. KMeans): KMeans (Sagemaker built-in algorithm)
Framework Version: n/a
Python Version: Python 3.6.5 :: Anaconda, Inc.
CPU or GPU: CPU
Python SDK Version: sagemaker==1.28.3
Are you using a custom image: no

Describe the problem

I am trying to pass in the eval_metrics parameter to the KMeans estimator:

kmeans = KMeans(role=role,
            train_instance_count=1,
            train_instance_type='ml.c4.xlarge',
            output_path='s3://____',              
            k=3,
           eval_metrics=["msd", "ssd"])

That's the example value for eval_metrics used in the unit test for KMeans. However, when I run the training job I get this error:

[06/27/2019 22:30:58 ERROR 139696703387456] Customer Error: Hyperparameter must be valid json, but found eval_metrics: (caused by ValueError)

Caused by: No JSON object could be decoded

I tried several formats including 'eval_metrics': '[\"msd\",\"ssd\"]'.

However I am able to pass in the parameters if I use boto3:

    import boto3
client = boto3.client('sagemaker')
response = client.create_training_job(
    TrainingJobName='rdevalmetrics',
    HyperParameters={
        'feature_dim': '34',
        'k': '3',
        'eval_metrics': '[\"msd\",\"ssd\"]'
    },
    AlgorithmSpecification={
        'TrainingImage': '174872318107.dkr.ecr.us-west-2.amazonaws.com/kmeans:1',
        'TrainingInputMode': 'File'
    },
    RoleArn='arn:aws:iam::____:role/service-role/AmazonSageMaker-ExecutionRole-20180717T085401',
    InputDataConfig=[
        {
            'ChannelName': 'train',
            'DataSource': {
                'S3DataSource': {
                    'S3DataType': 'ManifestFile',
                    'S3Uri': 's3://sagemaker-us-west-2-____/sagemaker-record-sets/KMeans-2019-06-27-22-48-57-424/.amazon.manifest',
                    'S3DataDistributionType': 'FullyReplicated'
                    
                }
            }
            
        }
    ],
    OutputDataConfig={
        
        'S3OutputPath': 's3://___/kmeanstest'
    },
    StoppingCondition={
        'MaxRuntimeInSeconds': 600
    },
    ResourceConfig={
        'InstanceType': 'ml.m4.xlarge',
        'InstanceCount': 1,
        'VolumeSizeInGB': 50,
    }
)

That job completed successfully.

Minimal repro / logs

See above code.

The text was updated successfully, but these errors were encountered:

icywang86rui · 2019-07-01T20:31:15Z

@rddefauw When did you see the error? After creating the estimator or during training? I run the same code it seems to be working fine:

~/work/test$ cat test_kmeans.py
from sagemaker.amazon.kmeans import KMeans, KMeansPredictor

kmeans = KMeans(role='role',
                train_instance_count=1,
                train_instance_type='ml.c4.xlarge',
                output_path='s3://____',
                k=3,
           	eval_metrics=["msd", "ssd"])

print(kmeans.hyperparameters())
~/work/test$ python test_kmeans.py
{'force_dense': 'True', 'k': '3', 'eval_metrics': "['msd', 'ssd']"}

rddefauw · 2019-07-01T23:10:41Z

I saw it during training.

…

Sent from my iPhone On Jul 1, 2019, at 2:32 PM, icywang86rui <[email protected]<mailto:[email protected]>> wrote: @rddefauw<https://github.com/rddefauw> When did you see the error? After creating the estimator or during training? I run the same code it seems to be working fine: ~/work/test$ cat test_kmeans.py from sagemaker.amazon.kmeans import KMeans, KMeansPredictor kmeans = KMeans(role='role', train_instance_count=1, train_instance_type='ml.c4.xlarge', output_path='s3://____', k=3, eval_metrics=["msd", "ssd"]) print(kmeans.hyperparameters()) ~/work/test$ python test_kmeans.py {'force_dense': 'True', 'k': '3', 'eval_metrics': "['msd', 'ssd']"} — You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub<#889?email_source=notifications&email_token=AIF3BNCIDZIUEVMUCYQ5ZLTP5JSRRA5CNFSM4H4A3OIKYY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGODY7IOGI#issuecomment-507414297>, or mute the thread<https://github.com/notifications/unsubscribe-auth/AIF3BNAMMPONDIOPTHOXOEDP5JSRRANCNFSM4H4A3OIA>.

chuyang-deng · 2019-07-08T20:56:35Z

Hi @rddefauw, can you make an object of hyperparameter and assign it to eval_metrics? From the information you provided, I think we need to make some changes in the Python SDK code. But at the mean time, assigning a hyperparameter object to eval_metrics might be a workaround.

Thanks.

rddefauw · 2019-07-08T21:55:16Z

I'm not entirely sure what you mean. An object of what type?

I tried passing in a dictionary as with the low level SDK:

kmeans = KMeans(role=role,
                train_instance_count=1,
                train_instance_type='ml.c4.xlarge',
                output_path='s3://rdnocdata/counties/',              
                k=num_clusters,
                eval_metrics = {
        'eval_metrics': '[\"msd\",\"ssd\"]'
    })

But I got the same output from kmeans.fit:

[07/08/2019 21:54:14 ERROR 139978699167552] Customer Error: Hyperparameter must be valid json, but found eval_metrics: (caused by ValueError)

Caused by: No JSON object could be decoded

chuyang-deng · 2019-07-08T22:51:23Z

Sorry I should've been more specific: an object of hyperparameter. https://github.com/aws/sagemaker-python-sdk/blob/master/src/sagemaker/amazon/kmeans.py#L43

rddefauw · 2019-07-09T15:50:55Z

I guess I'm not clear on what you're suggesting. The HP object is a descriptor so I tried just using the set method:

kmeans.eval_metrics = ["msd","ssd"]

But that resulted in the same error.

ChoiByungWook · 2019-07-10T01:03:36Z

Hello @rddefauw,

I just ran an example with the same eval_metrics hyperparameter value and was able to reproduce the error. Going to investigate now.

[07/10/2019 01:00:56 ERROR 139636592990016] Customer Error: Hyperparameter must be valid json, but found eval_metrics: (caused by ValueError)

Caused by: No JSON object could be decoded`

ChoiByungWook · 2019-07-10T02:07:17Z

@rddefauw,

It looks like a bug on our end. We end up sending the list as a string as defined here: https://github.com/aws/sagemaker-python-sdk/blob/master/src/sagemaker/amazon/hyperparameter.py#L70. On the algorithm side it expects the incoming list input to be JSON formatted, however we only serialize as a string, so some of the encoding gets lost.

Here is an easy repro showcasing this, which can be run using python -s.

CURRENT

incorrect_list_json = {'eval_metrics': str(['msd', 'ssd'])}
eval_metrics = hp['eval_metrics']
json.loads(raw_value) # ERROR

CORRECT

import json
correct_list_json = {'eval_metrics': json.dumps(['msd', 'ssd'])}
eval_metrics = hp['eval_metrics']
json.loads(raw_value) # CORRECT

The json.dumps() works with int, float, boolean and list. Which should handle all of the current existing hyperparameters correctly.

I'll submit a PR.

Thank you for bringing this to our attention.

ChoiByungWook · 2019-07-10T23:01:53Z

PR: #922

laurenyu · 2019-07-19T20:39:56Z

released in 1.34.0: https://github.com/aws/sagemaker-python-sdk/blob/master/CHANGELOG.md#v1340-2019-07-18

Co-authored-by: Zhankui Lu <[email protected]>

chuyang-deng added type: bug contributions welcome labels Jul 8, 2019

ChoiByungWook added the In progress label Jul 10, 2019

ChoiByungWook mentioned this issue Jul 10, 2019

change: fix list serialization for 1P algos #922

Merged

4 tasks

ChoiByungWook added status: pending release The fix have been merged but not yet released to PyPI and removed contributions welcome In progress labels Jul 11, 2019

laurenyu closed this as completed Jul 19, 2019

nmadan pushed a commit to nmadan/sagemaker-python-sdk that referenced this issue Apr 18, 2023

Set AWS_DEFAULT_REGION environment variable (aws#889)

c6d1c1a

Co-authored-by: Zhankui Lu <[email protected]>

nmadan pushed a commit to nmadan/sagemaker-python-sdk that referenced this issue Apr 18, 2023

Set AWS_DEFAULT_REGION environment variable (aws#889)

3ad8eb9

Co-authored-by: Zhankui Lu <[email protected]>

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Unable to pass eval_metrics to KMeans estimator #889

Unable to pass eval_metrics to KMeans estimator #889

rddefauw commented Jun 27, 2019

icywang86rui commented Jul 1, 2019

rddefauw commented Jul 1, 2019 via email

chuyang-deng commented Jul 8, 2019

rddefauw commented Jul 8, 2019

chuyang-deng commented Jul 8, 2019

rddefauw commented Jul 9, 2019

ChoiByungWook commented Jul 10, 2019

ChoiByungWook commented Jul 10, 2019

ChoiByungWook commented Jul 10, 2019

laurenyu commented Jul 19, 2019

Unable to pass eval_metrics to KMeans estimator #889

Unable to pass eval_metrics to KMeans estimator #889

Comments

rddefauw commented Jun 27, 2019

System Information

Describe the problem

Minimal repro / logs

icywang86rui commented Jul 1, 2019

rddefauw commented Jul 1, 2019 via email

chuyang-deng commented Jul 8, 2019

rddefauw commented Jul 8, 2019

chuyang-deng commented Jul 8, 2019

rddefauw commented Jul 9, 2019

ChoiByungWook commented Jul 10, 2019

ChoiByungWook commented Jul 10, 2019

ChoiByungWook commented Jul 10, 2019

laurenyu commented Jul 19, 2019