Skip to content

tests/integ failed #35

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
anfeng opened this issue Jan 1, 2018 · 1 comment
Closed

tests/integ failed #35

anfeng opened this issue Jan 1, 2018 · 1 comment

Comments

@anfeng
Copy link

anfeng commented Jan 1, 2018

I am trying to follow README. While unit tests work fine, I got the following errors at integration tests. Any suggestion?
`$ tox tests/integ
GLOB sdist-make: /Users/andyfeng/dev/sagemaker-python-sdk/setup.py
py27 inst-nodeps: /Users/andyfeng/dev/sagemaker-python-sdk/.tox/dist/sagemaker-1.0.1.zip
py27 installed: apipkg==1.4,attrs==17.4.0,backports.weakref==1.0.post1,bleach==1.5.0,boto3==1.5.7,botocore==1.8.21,contextlib2==0.5.5,coverage==4.4.2,docutils==0.14,enum34==1.1.6,execnet==1.5.0,funcsigs==1.0.2,futures==3.2.0,html5lib==0.9999999,jmespath==0.9.3,Markdown==2.6.10,mock==2.0.0,numpy==1.13.3,pbr==3.1.1,pluggy==0.6.0,protobuf==3.5.1,py==1.5.2,pytest==3.3.1,pytest-cov==2.5.1,pytest-forked==0.2,pytest-xdist==1.21.0,python-dateutil==2.6.1,s3transfer==0.1.12,sagemaker==1.0.1,scipy==1.0.0,six==1.11.0,teamcity-messages==1.21,tensorflow==1.4.1,tensorflow-tensorboard==0.4.0rc3,Werkzeug==0.14
py27 runtests: PYTHONHASHSEED='3746448766'
py27 runtests: commands[0] | pytest tests/integ
================================================================ test session starts =================================================================
platform darwin -- Python 2.7.14, pytest-3.3.1, py-1.5.2, pluggy-0.6.0 -- /Users/andyfeng/dev/sagemaker-python-sdk/.tox/py27/bin/python2.7
cachedir: .cache
rootdir: /Users/andyfeng/dev/sagemaker-python-sdk, inifile: setup.cfg
plugins: teamcity-messages-1.21, xdist-1.21.0, forked-0.2, cov-2.5.1
collected 7 items

tests/integ/test_kmeans.py::test_kmeans FAILED [ 14%]
tests/integ/test_linear_learner.py::test_linear_learner FAILED [ 28%]
tests/integ/test_mxnet_train.py::test_attach_deploy ERROR [ 42%]
tests/integ/test_mxnet_train.py::test_deploy_model ERROR [ 57%]
tests/integ/test_pca.py::test_pca FAILED [ 71%]
tests/integ/test_tf.py::test_tf FAILED [ 85%]
tests/integ/test_tf_cifar.py::test_cifar FAILED [100%]

=================================================================================== ERRORS ===================================================================================
____________________________________________________________________ ERROR at setup of test_attach_deploy ____________________________________________________________________

sagemaker_session = <sagemaker.session.Session object at 0x10f20b890>

@pytest.fixture(scope='module')
def mxnet_training_job(sagemaker_session):
    with timeout(minutes=15):
        script_path = os.path.join(DATA_DIR, 'mxnet_mnist', 'mnist.py')
        data_path = os.path.join(DATA_DIR, 'mxnet_mnist')

        mx = MXNet(entry_point=script_path, role='SageMakerRole',
                   train_instance_count=1, train_instance_type='ml.c4.xlarge',
                   sagemaker_session=sagemaker_session)

        train_input = mx.sagemaker_session.upload_data(path=os.path.join(data_path, 'train'),
                                                       key_prefix='integ-test-data/mxnet_mnist/train')
        test_input = mx.sagemaker_session.upload_data(path=os.path.join(data_path, 'test'),
                                                      key_prefix='integ-test-data/mxnet_mnist/test')
      mx.fit({'train': train_input, 'test': test_input})

tests/integ/test_mxnet_train.py:47:


.tox/py27/lib/python2.7/site-packages/sagemaker/estimator.py:517: in fit
super(Framework, self).fit(inputs, wait, logs, self._current_job_name)
.tox/py27/lib/python2.7/site-packages/sagemaker/estimator.py:154: in fit
self.latest_training_job.wait(logs=logs)
.tox/py27/lib/python2.7/site-packages/sagemaker/estimator.py:323: in wait
self.sagemaker_session.logs_for_job(self.job_name, wait=True)
.tox/py27/lib/python2.7/site-packages/sagemaker/session.py:647: in logs_for_job
self._check_job_status(job_name, description)


self = <sagemaker.session.Session object at 0x10f20b890>, job = 'sagemaker-mxnet-py2-cpu-2018-01-01-03-36-55-859'
desc = {'AlgorithmSpecification': {'TrainingImage': '520713654638.dkr.ecr.us-west-2.amazonaws.com/sagemaker-mxnet-py2-cpu:1.0...sagemaker_job_name': '"sagemaker-mxnet-py2-cpu-2018-01-01-03-36-55-859"', 'sagemaker_program': '"mnist.py"', ...}, ...}

def _check_job_status(self, job, desc):
    """Check to see if the job completed successfully and, if not, construct and
        raise a ValueError.

        Args:
            job (str): The name of the job to check.
            desc (dict[str, str]): The result of ``describe_training_job()``.

        Raises:
            ValueError: If the training job fails.
        """
    status = desc['TrainingJobStatus']

    if status != 'Completed':
        reason = desc.get('FailureReason', '(No reason provided)')
      raise ValueError('Error training {}: {} Reason: {}'.format(job, status, reason))

E ValueError: Error training sagemaker-mxnet-py2-cpu-2018-01-01-03-36-55-859: Failed Reason: ClientError: SageMaker was unable to assume the role 'arn:aws:iam::379899735384:role/SageMakerRole'

.tox/py27/lib/python2.7/site-packages/sagemaker/session.py:390: ValueError
--------------------------------------------------------------------------- Captured stdout setup ----------------------------------------------------------------------------
..........................
--------------------------------------------------------------------------- Captured stderr setup ----------------------------------------------------------------------------
INFO:botocore.credentials:Found credentials in shared credentials file: ~/.aws/credentials
INFO:botocore.vendored.requests.packages.urllib3.connectionpool:Starting new HTTPS connection (1): sts.amazonaws.com
INFO:botocore.vendored.requests.packages.urllib3.connectionpool:Starting new HTTPS connection (1): s3.us-west-2.amazonaws.com
INFO:botocore.vendored.requests.packages.urllib3.connectionpool:Starting new HTTPS connection (1): s3.us-west-2.amazonaws.com
INFO:botocore.vendored.requests.packages.urllib3.connectionpool:Starting new HTTPS connection (1): s3.us-west-2.amazonaws.com
INFO:botocore.vendored.requests.packages.urllib3.connectionpool:Starting new HTTPS connection (1): s3.us-west-2.amazonaws.com
INFO:botocore.vendored.requests.packages.urllib3.connectionpool:Starting new HTTPS connection (1): sts.amazonaws.com
INFO:sagemaker:Creating training-job with name: sagemaker-mxnet-py2-cpu-2018-01-01-03-36-55-859
INFO:botocore.vendored.requests.packages.urllib3.connectionpool:Starting new HTTPS connection (1): sagemaker.us-west-2.amazonaws.com
INFO:botocore.vendored.requests.packages.urllib3.connectionpool:Starting new HTTPS connection (1): logs.us-west-2.amazonaws.com
INFO:botocore.vendored.requests.packages.urllib3.connectionpool:Resetting dropped connection: logs.us-west-2.amazonaws.com
INFO:botocore.vendored.requests.packages.urllib3.connectionpool:Resetting dropped connection: logs.us-west-2.amazonaws.com
INFO:botocore.vendored.requests.packages.urllib3.connectionpool:Resetting dropped connection: logs.us-west-2.amazonaws.com
INFO:botocore.vendored.requests.packages.urllib3.connectionpool:Resetting dropped connection: logs.us-west-2.amazonaws.com
INFO:botocore.vendored.requests.packages.urllib3.connectionpool:Resetting dropped connection: logs.us-west-2.amazonaws.com
INFO:botocore.vendored.requests.packages.urllib3.connectionpool:Resetting dropped connection: logs.us-west-2.amazonaws.com
INFO:botocore.vendored.requests.packages.urllib3.connectionpool:Resetting dropped connection: logs.us-west-2.amazonaws.com
INFO:botocore.vendored.requests.packages.urllib3.connectionpool:Resetting dropped connection: logs.us-west-2.amazonaws.com
INFO:botocore.vendored.requests.packages.urllib3.connectionpool:Resetting dropped connection: logs.us-west-2.amazonaws.com
INFO:botocore.vendored.requests.packages.urllib3.connectionpool:Resetting dropped connection: logs.us-west-2.amazonaws.com
INFO:botocore.vendored.requests.packages.urllib3.connectionpool:Resetting dropped connection: logs.us-west-2.amazonaws.com
INFO:botocore.vendored.requests.packages.urllib3.connectionpool:Resetting dropped connection: logs.us-west-2.amazonaws.com
INFO:botocore.vendored.requests.packages.urllib3.connectionpool:Resetting dropped connection: logs.us-west-2.amazonaws.com
INFO:botocore.vendored.requests.packages.urllib3.connectionpool:Resetting dropped connection: logs.us-west-2.amazonaws.com
INFO:botocore.vendored.requests.packages.urllib3.connectionpool:Resetting dropped connection: logs.us-west-2.amazonaws.com
INFO:botocore.vendored.requests.packages.urllib3.connectionpool:Resetting dropped connection: logs.us-west-2.amazonaws.com
INFO:botocore.vendored.requests.packages.urllib3.connectionpool:Resetting dropped connection: logs.us-west-2.amazonaws.com
INFO:botocore.vendored.requests.packages.urllib3.connectionpool:Resetting dropped connection: logs.us-west-2.amazonaws.com
INFO:botocore.vendored.requests.packages.urllib3.connectionpool:Resetting dropped connection: logs.us-west-2.amazonaws.com
INFO:botocore.vendored.requests.packages.urllib3.connectionpool:Resetting dropped connection: logs.us-west-2.amazonaws.com
INFO:botocore.vendored.requests.packages.urllib3.connectionpool:Resetting dropped connection: logs.us-west-2.amazonaws.com
INFO:botocore.vendored.requests.packages.urllib3.connectionpool:Resetting dropped connection: logs.us-west-2.amazonaws.com
INFO:botocore.vendored.requests.packages.urllib3.connectionpool:Resetting dropped connection: logs.us-west-2.amazonaws.com
INFO:botocore.vendored.requests.packages.urllib3.connectionpool:Resetting dropped connection: logs.us-west-2.amazonaws.com
INFO:botocore.vendored.requests.packages.urllib3.connectionpool:Resetting dropped connection: logs.us-west-2.amazonaws.com
----------------------------------------------------------------------------- Captured log setup -----------------------------------------------------------------------------
credentials.py 1031 INFO Found credentials in shared credentials file: ~/.aws/credentials
connectionpool.py 735 INFO Starting new HTTPS connection (1): sts.amazonaws.com
connectionpool.py 735 INFO Starting new HTTPS connection (1): s3.us-west-2.amazonaws.com
connectionpool.py 735 INFO Starting new HTTPS connection (1): s3.us-west-2.amazonaws.com
connectionpool.py 735 INFO Starting new HTTPS connection (1): s3.us-west-2.amazonaws.com
connectionpool.py 735 INFO Starting new HTTPS connection (1): s3.us-west-2.amazonaws.com
connectionpool.py 735 INFO Starting new HTTPS connection (1): sts.amazonaws.com
session.py 237 INFO Creating training-job with name: sagemaker-mxnet-py2-cpu-2018-01-01-03-36-55-859
connectionpool.py 735 INFO Starting new HTTPS connection (1): sagemaker.us-west-2.amazonaws.com
connectionpool.py 735 INFO Starting new HTTPS connection (1): logs.us-west-2.amazonaws.com
connectionpool.py 238 INFO Resetting dropped connection: logs.us-west-2.amazonaws.com
connectionpool.py 238 INFO Resetting dropped connection: logs.us-west-2.amazonaws.com
connectionpool.py 238 INFO Resetting dropped connection: logs.us-west-2.amazonaws.com
connectionpool.py 238 INFO Resetting dropped connection: logs.us-west-2.amazonaws.com
connectionpool.py 238 INFO Resetting dropped connection: logs.us-west-2.amazonaws.com
connectionpool.py 238 INFO Resetting dropped connection: logs.us-west-2.amazonaws.com
connectionpool.py 238 INFO Resetting dropped connection: logs.us-west-2.amazonaws.com
connectionpool.py 238 INFO Resetting dropped connection: logs.us-west-2.amazonaws.com
connectionpool.py 238 INFO Resetting dropped connection: logs.us-west-2.amazonaws.com
connectionpool.py 238 INFO Resetting dropped connection: logs.us-west-2.amazonaws.com
connectionpool.py 238 INFO Resetting dropped connection: logs.us-west-2.amazonaws.com
connectionpool.py 238 INFO Resetting dropped connection: logs.us-west-2.amazonaws.com
connectionpool.py 238 INFO Resetting dropped connection: logs.us-west-2.amazonaws.com
connectionpool.py 238 INFO Resetting dropped connection: logs.us-west-2.amazonaws.com
connectionpool.py 238 INFO Resetting dropped connection: logs.us-west-2.amazonaws.com
connectionpool.py 238 INFO Resetting dropped connection: logs.us-west-2.amazonaws.com
connectionpool.py 238 INFO Resetting dropped connection: logs.us-west-2.amazonaws.com
connectionpool.py 238 INFO Resetting dropped connection: logs.us-west-2.amazonaws.com
connectionpool.py 238 INFO Resetting dropped connection: logs.us-west-2.amazonaws.com
connectionpool.py 238 INFO Resetting dropped connection: logs.us-west-2.amazonaws.com
connectionpool.py 238 INFO Resetting dropped connection: logs.us-west-2.amazonaws.com
connectionpool.py 238 INFO Resetting dropped connection: logs.us-west-2.amazonaws.com
connectionpool.py 238 INFO Resetting dropped connection: logs.us-west-2.amazonaws.com
connectionpool.py 238 INFO Resetting dropped connection: logs.us-west-2.amazonaws.com
connectionpool.py 238 INFO Resetting dropped connection: logs.us-west-2.amazonaws.com
____________________________________________________________________ ERROR at setup of test_deploy_model _____________________________________________________________________

sagemaker_session = <sagemaker.session.Session object at 0x10f20b890>

@pytest.fixture(scope='module')
def mxnet_training_job(sagemaker_session):
    with timeout(minutes=15):
        script_path = os.path.join(DATA_DIR, 'mxnet_mnist', 'mnist.py')
        data_path = os.path.join(DATA_DIR, 'mxnet_mnist')

        mx = MXNet(entry_point=script_path, role='SageMakerRole',
                   train_instance_count=1, train_instance_type='ml.c4.xlarge',
                   sagemaker_session=sagemaker_session)

        train_input = mx.sagemaker_session.upload_data(path=os.path.join(data_path, 'train'),
                                                       key_prefix='integ-test-data/mxnet_mnist/train')
        test_input = mx.sagemaker_session.upload_data(path=os.path.join(data_path, 'test'),
                                                      key_prefix='integ-test-data/mxnet_mnist/test')
      mx.fit({'train': train_input, 'test': test_input})

tests/integ/test_mxnet_train.py:47:


.tox/py27/lib/python2.7/site-packages/sagemaker/estimator.py:517: in fit
super(Framework, self).fit(inputs, wait, logs, self._current_job_name)
.tox/py27/lib/python2.7/site-packages/sagemaker/estimator.py:154: in fit
self.latest_training_job.wait(logs=logs)
.tox/py27/lib/python2.7/site-packages/sagemaker/estimator.py:323: in wait
self.sagemaker_session.logs_for_job(self.job_name, wait=True)
.tox/py27/lib/python2.7/site-packages/sagemaker/session.py:647: in logs_for_job
self._check_job_status(job_name, description)


self = <sagemaker.session.Session object at 0x10f20b890>, job = 'sagemaker-mxnet-py2-cpu-2018-01-01-03-36-55-859'
desc = {'AlgorithmSpecification': {'TrainingImage': '520713654638.dkr.ecr.us-west-2.amazonaws.com/sagemaker-mxnet-py2-cpu:1.0...sagemaker_job_name': '"sagemaker-mxnet-py2-cpu-2018-01-01-03-36-55-859"', 'sagemaker_program': '"mnist.py"', ...}, ...}

def _check_job_status(self, job, desc):
    """Check to see if the job completed successfully and, if not, construct and
        raise a ValueError.

        Args:
            job (str): The name of the job to check.
            desc (dict[str, str]): The result of ``describe_training_job()``.

        Raises:
            ValueError: If the training job fails.
        """
    status = desc['TrainingJobStatus']

    if status != 'Completed':
        reason = desc.get('FailureReason', '(No reason provided)')
      raise ValueError('Error training {}: {} Reason: {}'.format(job, status, reason))

E ValueError: Error training sagemaker-mxnet-py2-cpu-2018-01-01-03-36-55-859: Failed Reason: ClientError: SageMaker was unable to assume the role 'arn:aws:iam::379899735384:role/SageMakerRole'

.tox/py27/lib/python2.7/site-packages/sagemaker/session.py:390: ValueError
================================================================================== FAILURES ==================================================================================
________________________________________________________________________________ test_kmeans _________________________________________________________________________________

def test_kmeans():

    with timeout(minutes=15):
        sagemaker_session = sagemaker.Session(boto_session=boto3.Session(region_name=REGION))
        data_path = os.path.join(DATA_DIR, 'one_p_mnist', 'mnist.pkl.gz')
        pickle_args = {} if sys.version_info.major == 2 else {'encoding': 'latin1'}

        # Load the data into memory as numpy arrays
        with gzip.open(data_path, 'rb') as f:
            train_set, _, _ = pickle.load(f, **pickle_args)

        kmeans = KMeans(role='SageMakerRole', train_instance_count=1,
                        train_instance_type='ml.c4.xlarge',
                        k=10, sagemaker_session=sagemaker_session, base_job_name='test-kmeans')

        kmeans.init_method = 'random'
        kmeans.max_iterators = 1
        kmeans.tol = 1
        kmeans.num_trials = 1
        kmeans.local_init_method = 'kmeans++'
        kmeans.half_life_time_size = 1
        kmeans.epochs = 1
        kmeans.center_factor = 1
      kmeans.fit(kmeans.record_set(train_set[0][:100]))

tests/integ/test_kmeans.py:51:


.tox/py27/lib/python2.7/site-packages/sagemaker/amazon/amazon_estimator.py:96: in fit
super(AmazonAlgorithmEstimatorBase, self).fit(data, **kwargs)
.tox/py27/lib/python2.7/site-packages/sagemaker/estimator.py:154: in fit
self.latest_training_job.wait(logs=logs)
.tox/py27/lib/python2.7/site-packages/sagemaker/estimator.py:323: in wait
self.sagemaker_session.logs_for_job(self.job_name, wait=True)
.tox/py27/lib/python2.7/site-packages/sagemaker/session.py:647: in logs_for_job
self._check_job_status(job_name, description)


self = <sagemaker.session.Session object at 0x10f2c4e50>, job = 'test-kmeans-2018-01-01-03-32-56-860'
desc = {'AlgorithmSpecification': {'TrainingImage': '174872318107.dkr.ecr.us-west-2.amazonaws.com/kmeans:1', 'TrainingInputMo... 'HyperParameters': {'epochs': '1', 'extra_center_factor': '1', 'feature_dim': '784', 'force_dense': 'True', ...}, ...}

def _check_job_status(self, job, desc):
    """Check to see if the job completed successfully and, if not, construct and
        raise a ValueError.

        Args:
            job (str): The name of the job to check.
            desc (dict[str, str]): The result of ``describe_training_job()``.

        Raises:
            ValueError: If the training job fails.
        """
    status = desc['TrainingJobStatus']

    if status != 'Completed':
        reason = desc.get('FailureReason', '(No reason provided)')
      raise ValueError('Error training {}: {} Reason: {}'.format(job, status, reason))

E ValueError: Error training test-kmeans-2018-01-01-03-32-56-860: Failed Reason: ClientError: SageMaker was unable to assume the role 'arn:aws:iam::379899735384:role/SageMakerRole'

.tox/py27/lib/python2.7/site-packages/sagemaker/session.py:390: ValueError
---------------------------------------------------------------------------- Captured stdout call ----------------------------------------------------------------------------
....................
---------------------------------------------------------------------------- Captured stderr call ----------------------------------------------------------------------------
INFO:botocore.credentials:Found credentials in shared credentials file: ~/.aws/credentials
INFO:botocore.vendored.requests.packages.urllib3.connectionpool:Starting new HTTPS connection (1): sts.amazonaws.com
INFO:botocore.vendored.requests.packages.urllib3.connectionpool:Starting new HTTPS connection (1): s3.us-west-2.amazonaws.com
INFO:sagemaker:Created S3 bucket: sagemaker-us-west-2-379899735384
INFO:botocore.vendored.requests.packages.urllib3.connectionpool:Starting new HTTPS connection (1): s3.us-west-2.amazonaws.com
INFO:botocore.vendored.requests.packages.urllib3.connectionpool:Starting new HTTPS connection (1): sts.amazonaws.com
INFO:sagemaker:Creating training-job with name: test-kmeans-2018-01-01-03-32-56-860
INFO:botocore.vendored.requests.packages.urllib3.connectionpool:Starting new HTTPS connection (1): sagemaker.us-west-2.amazonaws.com
INFO:botocore.vendored.requests.packages.urllib3.connectionpool:Starting new HTTPS connection (1): logs.us-west-2.amazonaws.com
INFO:botocore.vendored.requests.packages.urllib3.connectionpool:Resetting dropped connection: logs.us-west-2.amazonaws.com
INFO:botocore.vendored.requests.packages.urllib3.connectionpool:Resetting dropped connection: logs.us-west-2.amazonaws.com
INFO:botocore.vendored.requests.packages.urllib3.connectionpool:Resetting dropped connection: logs.us-west-2.amazonaws.com
INFO:botocore.vendored.requests.packages.urllib3.connectionpool:Resetting dropped connection: logs.us-west-2.amazonaws.com
INFO:botocore.vendored.requests.packages.urllib3.connectionpool:Resetting dropped connection: logs.us-west-2.amazonaws.com
INFO:botocore.vendored.requests.packages.urllib3.connectionpool:Resetting dropped connection: logs.us-west-2.amazonaws.com
INFO:botocore.vendored.requests.packages.urllib3.connectionpool:Resetting dropped connection: logs.us-west-2.amazonaws.com
INFO:botocore.vendored.requests.packages.urllib3.connectionpool:Resetting dropped connection: logs.us-west-2.amazonaws.com
INFO:botocore.vendored.requests.packages.urllib3.connectionpool:Resetting dropped connection: logs.us-west-2.amazonaws.com
INFO:botocore.vendored.requests.packages.urllib3.connectionpool:Resetting dropped connection: logs.us-west-2.amazonaws.com
INFO:botocore.vendored.requests.packages.urllib3.connectionpool:Resetting dropped connection: logs.us-west-2.amazonaws.com
INFO:botocore.vendored.requests.packages.urllib3.connectionpool:Resetting dropped connection: logs.us-west-2.amazonaws.com
INFO:botocore.vendored.requests.packages.urllib3.connectionpool:Resetting dropped connection: logs.us-west-2.amazonaws.com
INFO:botocore.vendored.requests.packages.urllib3.connectionpool:Resetting dropped connection: logs.us-west-2.amazonaws.com
INFO:botocore.vendored.requests.packages.urllib3.connectionpool:Resetting dropped connection: logs.us-west-2.amazonaws.com
INFO:botocore.vendored.requests.packages.urllib3.connectionpool:Resetting dropped connection: logs.us-west-2.amazonaws.com
INFO:botocore.vendored.requests.packages.urllib3.connectionpool:Resetting dropped connection: logs.us-west-2.amazonaws.com
INFO:botocore.vendored.requests.packages.urllib3.connectionpool:Resetting dropped connection: logs.us-west-2.amazonaws.com
INFO:botocore.vendored.requests.packages.urllib3.connectionpool:Resetting dropped connection: logs.us-west-2.amazonaws.com
----------------------------------------------------------------------------- Captured log call ------------------------------------------------------------------------------
credentials.py 1031 INFO Found credentials in shared credentials file: ~/.aws/credentials
connectionpool.py 735 INFO Starting new HTTPS connection (1): sts.amazonaws.com
connectionpool.py 735 INFO Starting new HTTPS connection (1): s3.us-west-2.amazonaws.com
session.py 163 INFO Created S3 bucket: sagemaker-us-west-2-379899735384
connectionpool.py 735 INFO Starting new HTTPS connection (1): s3.us-west-2.amazonaws.com
connectionpool.py 735 INFO Starting new HTTPS connection (1): sts.amazonaws.com
session.py 237 INFO Creating training-job with name: test-kmeans-2018-01-01-03-32-56-860
connectionpool.py 735 INFO Starting new HTTPS connection (1): sagemaker.us-west-2.amazonaws.com
connectionpool.py 735 INFO Starting new HTTPS connection (1): logs.us-west-2.amazonaws.com
connectionpool.py 238 INFO Resetting dropped connection: logs.us-west-2.amazonaws.com
connectionpool.py 238 INFO Resetting dropped connection: logs.us-west-2.amazonaws.com
connectionpool.py 238 INFO Resetting dropped connection: logs.us-west-2.amazonaws.com
connectionpool.py 238 INFO Resetting dropped connection: logs.us-west-2.amazonaws.com
connectionpool.py 238 INFO Resetting dropped connection: logs.us-west-2.amazonaws.com
connectionpool.py 238 INFO Resetting dropped connection: logs.us-west-2.amazonaws.com
connectionpool.py 238 INFO Resetting dropped connection: logs.us-west-2.amazonaws.com
connectionpool.py 238 INFO Resetting dropped connection: logs.us-west-2.amazonaws.com
connectionpool.py 238 INFO Resetting dropped connection: logs.us-west-2.amazonaws.com
connectionpool.py 238 INFO Resetting dropped connection: logs.us-west-2.amazonaws.com
connectionpool.py 238 INFO Resetting dropped connection: logs.us-west-2.amazonaws.com
connectionpool.py 238 INFO Resetting dropped connection: logs.us-west-2.amazonaws.com
connectionpool.py 238 INFO Resetting dropped connection: logs.us-west-2.amazonaws.com
connectionpool.py 238 INFO Resetting dropped connection: logs.us-west-2.amazonaws.com
connectionpool.py 238 INFO Resetting dropped connection: logs.us-west-2.amazonaws.com
connectionpool.py 238 INFO Resetting dropped connection: logs.us-west-2.amazonaws.com
connectionpool.py 238 INFO Resetting dropped connection: logs.us-west-2.amazonaws.com
connectionpool.py 238 INFO Resetting dropped connection: logs.us-west-2.amazonaws.com
connectionpool.py 238 INFO Resetting dropped connection: logs.us-west-2.amazonaws.com
____________________________________________________________________________ test_linear_learner _____________________________________________________________________________

def test_linear_learner():
    with timeout(minutes=15):
        sagemaker_session = sagemaker.Session(boto_session=boto3.Session(region_name=REGION))
        data_path = os.path.join(DATA_DIR, 'one_p_mnist', 'mnist.pkl.gz')
        pickle_args = {} if sys.version_info.major == 2 else {'encoding': 'latin1'}

        # Load the data into memory as numpy arrays
        with gzip.open(data_path, 'rb') as f:
            train_set, _, _ = pickle.load(f, **pickle_args)

        train_set[1][:100] = 1
        train_set[1][100:200] = 0
        train_set = train_set[0], train_set[1].astype(np.dtype('float32'))

        ll = LinearLearner('SageMakerRole', 1, 'ml.c4.2xlarge', base_job_name='test-linear-learner',
                           sagemaker_session=sagemaker_session)
        ll.binary_classifier_model_selection_criteria = 'accuracy'
        ll.target_reacall = 0.5
        ll.target_precision = 0.5
        ll.positive_example_weight_mult = 0.1
        ll.epochs = 1
        ll.predictor_type = 'binary_classifier'
        ll.use_bias = True
        ll.num_models = 1
        ll.num_calibration_samples = 1
        ll.init_method = 'uniform'
        ll.init_scale = 0.5
        ll.init_sigma = 0.2
        ll.init_bias = 5
        ll.optimizer = 'adam'
        ll.loss = 'logistic'
        ll.wd = 0.5
        ll.l1 = 0.5
        ll.momentum = 0.5
        ll.learning_rate = 0.1
        ll.beta_1 = 0.1
        ll.beta_2 = 0.1
        ll.use_lr_scheduler = True
        ll.lr_scheduler_step = 2
        ll.lr_scheduler_factor = 0.5
        ll.lr_scheduler_minimum_lr = 0.1
        ll.normalize_data = False
        ll.normalize_label = False
        ll.unbias_data = True
        ll.unbias_label = False
        ll.num_point_for_scala = 10000
      ll.fit(ll.record_set(train_set[0][:200], train_set[1][:200]))

tests/integ/test_linear_learner.py:74:


.tox/py27/lib/python2.7/site-packages/sagemaker/amazon/amazon_estimator.py:96: in fit
super(AmazonAlgorithmEstimatorBase, self).fit(data, **kwargs)
.tox/py27/lib/python2.7/site-packages/sagemaker/estimator.py:154: in fit
self.latest_training_job.wait(logs=logs)
.tox/py27/lib/python2.7/site-packages/sagemaker/estimator.py:323: in wait
self.sagemaker_session.logs_for_job(self.job_name, wait=True)
.tox/py27/lib/python2.7/site-packages/sagemaker/session.py:647: in logs_for_job
self._check_job_status(job_name, description)


self = <sagemaker.session.Session object at 0x113a8c450>, job = 'test-linear-learner-2018-01-01-03-34-54-936'
desc = {'AlgorithmSpecification': {'TrainingImage': '174872318107.dkr.ecr.us-west-2.amazonaws.com/linear-learner:1', 'Trainin...ta_1': '0.1', 'binary_classifier_model_selection_criteria': 'accuracy', 'epochs': '1', 'feature_dim': '784', ...}, ...}

def _check_job_status(self, job, desc):
    """Check to see if the job completed successfully and, if not, construct and
        raise a ValueError.

        Args:
            job (str): The name of the job to check.
            desc (dict[str, str]): The result of ``describe_training_job()``.

        Raises:
            ValueError: If the training job fails.
        """
    status = desc['TrainingJobStatus']

    if status != 'Completed':
        reason = desc.get('FailureReason', '(No reason provided)')
      raise ValueError('Error training {}: {} Reason: {}'.format(job, status, reason))

E ValueError: Error training test-linear-learner-2018-01-01-03-34-54-936: Failed Reason: ClientError: SageMaker was unable to assume the role 'arn:aws:iam::379899735384:role/SageMakerRole'

.tox/py27/lib/python2.7/site-packages/sagemaker/session.py:390: ValueError
---------------------------------------------------------------------------- Captured stdout call ----------------------------------------------------------------------------
....................
---------------------------------------------------------------------------- Captured stderr call ----------------------------------------------------------------------------
INFO:botocore.credentials:Found credentials in shared credentials file: ~/.aws/credentials
INFO:botocore.vendored.requests.packages.urllib3.connectionpool:Starting new HTTPS connection (1): sts.amazonaws.com
INFO:botocore.vendored.requests.packages.urllib3.connectionpool:Starting new HTTPS connection (1): s3.us-west-2.amazonaws.com
INFO:botocore.vendored.requests.packages.urllib3.connectionpool:Starting new HTTPS connection (1): s3.us-west-2.amazonaws.com
INFO:botocore.vendored.requests.packages.urllib3.connectionpool:Starting new HTTPS connection (1): sts.amazonaws.com
INFO:sagemaker:Creating training-job with name: test-linear-learner-2018-01-01-03-34-54-936
INFO:botocore.vendored.requests.packages.urllib3.connectionpool:Starting new HTTPS connection (1): sagemaker.us-west-2.amazonaws.com
INFO:botocore.vendored.requests.packages.urllib3.connectionpool:Starting new HTTPS connection (1): logs.us-west-2.amazonaws.com
INFO:botocore.vendored.requests.packages.urllib3.connectionpool:Resetting dropped connection: logs.us-west-2.amazonaws.com
INFO:botocore.vendored.requests.packages.urllib3.connectionpool:Resetting dropped connection: logs.us-west-2.amazonaws.com
INFO:botocore.vendored.requests.packages.urllib3.connectionpool:Resetting dropped connection: logs.us-west-2.amazonaws.com
INFO:botocore.vendored.requests.packages.urllib3.connectionpool:Resetting dropped connection: logs.us-west-2.amazonaws.com
INFO:botocore.vendored.requests.packages.urllib3.connectionpool:Resetting dropped connection: logs.us-west-2.amazonaws.com
INFO:botocore.vendored.requests.packages.urllib3.connectionpool:Resetting dropped connection: logs.us-west-2.amazonaws.com
INFO:botocore.vendored.requests.packages.urllib3.connectionpool:Resetting dropped connection: logs.us-west-2.amazonaws.com
INFO:botocore.vendored.requests.packages.urllib3.connectionpool:Resetting dropped connection: logs.us-west-2.amazonaws.com
INFO:botocore.vendored.requests.packages.urllib3.connectionpool:Resetting dropped connection: logs.us-west-2.amazonaws.com
INFO:botocore.vendored.requests.packages.urllib3.connectionpool:Resetting dropped connection: logs.us-west-2.amazonaws.com
INFO:botocore.vendored.requests.packages.urllib3.connectionpool:Resetting dropped connection: logs.us-west-2.amazonaws.com
INFO:botocore.vendored.requests.packages.urllib3.connectionpool:Resetting dropped connection: logs.us-west-2.amazonaws.com
INFO:botocore.vendored.requests.packages.urllib3.connectionpool:Resetting dropped connection: logs.us-west-2.amazonaws.com
INFO:botocore.vendored.requests.packages.urllib3.connectionpool:Resetting dropped connection: logs.us-west-2.amazonaws.com
INFO:botocore.vendored.requests.packages.urllib3.connectionpool:Resetting dropped connection: logs.us-west-2.amazonaws.com
INFO:botocore.vendored.requests.packages.urllib3.connectionpool:Resetting dropped connection: logs.us-west-2.amazonaws.com
INFO:botocore.vendored.requests.packages.urllib3.connectionpool:Resetting dropped connection: logs.us-west-2.amazonaws.com
INFO:botocore.vendored.requests.packages.urllib3.connectionpool:Resetting dropped connection: logs.us-west-2.amazonaws.com
INFO:botocore.vendored.requests.packages.urllib3.connectionpool:Resetting dropped connection: logs.us-west-2.amazonaws.com
----------------------------------------------------------------------------- Captured log call ------------------------------------------------------------------------------
credentials.py 1031 INFO Found credentials in shared credentials file: ~/.aws/credentials
connectionpool.py 735 INFO Starting new HTTPS connection (1): sts.amazonaws.com
connectionpool.py 735 INFO Starting new HTTPS connection (1): s3.us-west-2.amazonaws.com
connectionpool.py 735 INFO Starting new HTTPS connection (1): s3.us-west-2.amazonaws.com
connectionpool.py 735 INFO Starting new HTTPS connection (1): sts.amazonaws.com
session.py 237 INFO Creating training-job with name: test-linear-learner-2018-01-01-03-34-54-936
connectionpool.py 735 INFO Starting new HTTPS connection (1): sagemaker.us-west-2.amazonaws.com
connectionpool.py 735 INFO Starting new HTTPS connection (1): logs.us-west-2.amazonaws.com
connectionpool.py 238 INFO Resetting dropped connection: logs.us-west-2.amazonaws.com
connectionpool.py 238 INFO Resetting dropped connection: logs.us-west-2.amazonaws.com
connectionpool.py 238 INFO Resetting dropped connection: logs.us-west-2.amazonaws.com
connectionpool.py 238 INFO Resetting dropped connection: logs.us-west-2.amazonaws.com
connectionpool.py 238 INFO Resetting dropped connection: logs.us-west-2.amazonaws.com
connectionpool.py 238 INFO Resetting dropped connection: logs.us-west-2.amazonaws.com
connectionpool.py 238 INFO Resetting dropped connection: logs.us-west-2.amazonaws.com
connectionpool.py 238 INFO Resetting dropped connection: logs.us-west-2.amazonaws.com
connectionpool.py 238 INFO Resetting dropped connection: logs.us-west-2.amazonaws.com
connectionpool.py 238 INFO Resetting dropped connection: logs.us-west-2.amazonaws.com
connectionpool.py 238 INFO Resetting dropped connection: logs.us-west-2.amazonaws.com
connectionpool.py 238 INFO Resetting dropped connection: logs.us-west-2.amazonaws.com
connectionpool.py 238 INFO Resetting dropped connection: logs.us-west-2.amazonaws.com
connectionpool.py 238 INFO Resetting dropped connection: logs.us-west-2.amazonaws.com
connectionpool.py 238 INFO Resetting dropped connection: logs.us-west-2.amazonaws.com
connectionpool.py 238 INFO Resetting dropped connection: logs.us-west-2.amazonaws.com
connectionpool.py 238 INFO Resetting dropped connection: logs.us-west-2.amazonaws.com
connectionpool.py 238 INFO Resetting dropped connection: logs.us-west-2.amazonaws.com
connectionpool.py 238 INFO Resetting dropped connection: logs.us-west-2.amazonaws.com
__________________________________________________________________________________ test_pca __________________________________________________________________________________

def test_pca():
    with timeout(minutes=15):
        sagemaker_session = sagemaker.Session(boto_session=boto3.Session(region_name=REGION))
        data_path = os.path.join(DATA_DIR, 'one_p_mnist', 'mnist.pkl.gz')
        pickle_args = {} if sys.version_info.major == 2 else {'encoding': 'latin1'}

        # Load the data into memory as numpy arrays
        with gzip.open(data_path, 'rb') as f:
            train_set, _, _ = pickle.load(f, **pickle_args)

        pca = sagemaker.amazon.pca.PCA(role='SageMakerRole', train_instance_count=1,
                                       train_instance_type='ml.m4.xlarge',
                                       num_components=48, sagemaker_session=sagemaker_session, base_job_name='test-pca')

        pca.algorithm_mode = 'randomized'
        pca.subtract_mean = True
        pca.extra_components = 5
      pca.fit(pca.record_set(train_set[0][:100]))

tests/integ/test_pca.py:44:


.tox/py27/lib/python2.7/site-packages/sagemaker/amazon/amazon_estimator.py:96: in fit
super(AmazonAlgorithmEstimatorBase, self).fit(data, **kwargs)
.tox/py27/lib/python2.7/site-packages/sagemaker/estimator.py:154: in fit
self.latest_training_job.wait(logs=logs)
.tox/py27/lib/python2.7/site-packages/sagemaker/estimator.py:323: in wait
self.sagemaker_session.logs_for_job(self.job_name, wait=True)
.tox/py27/lib/python2.7/site-packages/sagemaker/session.py:647: in logs_for_job
self._check_job_status(job_name, description)


self = <sagemaker.session.Session object at 0x113c51ed0>, job = 'test-pca-2018-01-01-03-39-15-456'
desc = {'AlgorithmSpecification': {'TrainingImage': '174872318107.dkr.ecr.us-west-2.amazonaws.com/pca:1', 'TrainingInputMode'...': {'algorithm_mode': 'randomized', 'extra_components': '5', 'feature_dim': '784', 'mini_batch_size': '100', ...}, ...}

def _check_job_status(self, job, desc):
    """Check to see if the job completed successfully and, if not, construct and
        raise a ValueError.

        Args:
            job (str): The name of the job to check.
            desc (dict[str, str]): The result of ``describe_training_job()``.

        Raises:
            ValueError: If the training job fails.
        """
    status = desc['TrainingJobStatus']

    if status != 'Completed':
        reason = desc.get('FailureReason', '(No reason provided)')
      raise ValueError('Error training {}: {} Reason: {}'.format(job, status, reason))

E ValueError: Error training test-pca-2018-01-01-03-39-15-456: Failed Reason: ClientError: SageMaker was unable to assume the role 'arn:aws:iam::379899735384:role/SageMakerRole'

.tox/py27/lib/python2.7/site-packages/sagemaker/session.py:390: ValueError
---------------------------------------------------------------------------- Captured stdout call ----------------------------------------------------------------------------
....................
---------------------------------------------------------------------------- Captured stderr call ----------------------------------------------------------------------------
INFO:botocore.credentials:Found credentials in shared credentials file: ~/.aws/credentials
INFO:botocore.vendored.requests.packages.urllib3.connectionpool:Starting new HTTPS connection (1): sts.amazonaws.com
INFO:botocore.vendored.requests.packages.urllib3.connectionpool:Starting new HTTPS connection (1): s3.us-west-2.amazonaws.com
INFO:botocore.vendored.requests.packages.urllib3.connectionpool:Starting new HTTPS connection (1): s3.us-west-2.amazonaws.com
INFO:botocore.vendored.requests.packages.urllib3.connectionpool:Starting new HTTPS connection (1): sts.amazonaws.com
INFO:sagemaker:Creating training-job with name: test-pca-2018-01-01-03-39-15-456
INFO:botocore.vendored.requests.packages.urllib3.connectionpool:Starting new HTTPS connection (1): sagemaker.us-west-2.amazonaws.com
INFO:botocore.vendored.requests.packages.urllib3.connectionpool:Starting new HTTPS connection (1): logs.us-west-2.amazonaws.com
INFO:botocore.vendored.requests.packages.urllib3.connectionpool:Resetting dropped connection: logs.us-west-2.amazonaws.com
INFO:botocore.vendored.requests.packages.urllib3.connectionpool:Resetting dropped connection: logs.us-west-2.amazonaws.com
INFO:botocore.vendored.requests.packages.urllib3.connectionpool:Resetting dropped connection: logs.us-west-2.amazonaws.com
INFO:botocore.vendored.requests.packages.urllib3.connectionpool:Resetting dropped connection: logs.us-west-2.amazonaws.com
INFO:botocore.vendored.requests.packages.urllib3.connectionpool:Resetting dropped connection: logs.us-west-2.amazonaws.com
INFO:botocore.vendored.requests.packages.urllib3.connectionpool:Resetting dropped connection: logs.us-west-2.amazonaws.com
INFO:botocore.vendored.requests.packages.urllib3.connectionpool:Resetting dropped connection: logs.us-west-2.amazonaws.com
INFO:botocore.vendored.requests.packages.urllib3.connectionpool:Resetting dropped connection: logs.us-west-2.amazonaws.com
INFO:botocore.vendored.requests.packages.urllib3.connectionpool:Resetting dropped connection: logs.us-west-2.amazonaws.com
INFO:botocore.vendored.requests.packages.urllib3.connectionpool:Resetting dropped connection: logs.us-west-2.amazonaws.com
INFO:botocore.vendored.requests.packages.urllib3.connectionpool:Resetting dropped connection: logs.us-west-2.amazonaws.com
INFO:botocore.vendored.requests.packages.urllib3.connectionpool:Resetting dropped connection: logs.us-west-2.amazonaws.com
INFO:botocore.vendored.requests.packages.urllib3.connectionpool:Resetting dropped connection: logs.us-west-2.amazonaws.com
INFO:botocore.vendored.requests.packages.urllib3.connectionpool:Resetting dropped connection: logs.us-west-2.amazonaws.com
INFO:botocore.vendored.requests.packages.urllib3.connectionpool:Resetting dropped connection: logs.us-west-2.amazonaws.com
INFO:botocore.vendored.requests.packages.urllib3.connectionpool:Resetting dropped connection: logs.us-west-2.amazonaws.com
INFO:botocore.vendored.requests.packages.urllib3.connectionpool:Resetting dropped connection: logs.us-west-2.amazonaws.com
INFO:botocore.vendored.requests.packages.urllib3.connectionpool:Resetting dropped connection: logs.us-west-2.amazonaws.com
INFO:botocore.vendored.requests.packages.urllib3.connectionpool:Resetting dropped connection: logs.us-west-2.amazonaws.com
----------------------------------------------------------------------------- Captured log call ------------------------------------------------------------------------------
credentials.py 1031 INFO Found credentials in shared credentials file: ~/.aws/credentials
connectionpool.py 735 INFO Starting new HTTPS connection (1): sts.amazonaws.com
connectionpool.py 735 INFO Starting new HTTPS connection (1): s3.us-west-2.amazonaws.com
connectionpool.py 735 INFO Starting new HTTPS connection (1): s3.us-west-2.amazonaws.com
connectionpool.py 735 INFO Starting new HTTPS connection (1): sts.amazonaws.com
session.py 237 INFO Creating training-job with name: test-pca-2018-01-01-03-39-15-456
connectionpool.py 735 INFO Starting new HTTPS connection (1): sagemaker.us-west-2.amazonaws.com
connectionpool.py 735 INFO Starting new HTTPS connection (1): logs.us-west-2.amazonaws.com
connectionpool.py 238 INFO Resetting dropped connection: logs.us-west-2.amazonaws.com
connectionpool.py 238 INFO Resetting dropped connection: logs.us-west-2.amazonaws.com
connectionpool.py 238 INFO Resetting dropped connection: logs.us-west-2.amazonaws.com
connectionpool.py 238 INFO Resetting dropped connection: logs.us-west-2.amazonaws.com
connectionpool.py 238 INFO Resetting dropped connection: logs.us-west-2.amazonaws.com
connectionpool.py 238 INFO Resetting dropped connection: logs.us-west-2.amazonaws.com
connectionpool.py 238 INFO Resetting dropped connection: logs.us-west-2.amazonaws.com
connectionpool.py 238 INFO Resetting dropped connection: logs.us-west-2.amazonaws.com
connectionpool.py 238 INFO Resetting dropped connection: logs.us-west-2.amazonaws.com
connectionpool.py 238 INFO Resetting dropped connection: logs.us-west-2.amazonaws.com
connectionpool.py 238 INFO Resetting dropped connection: logs.us-west-2.amazonaws.com
connectionpool.py 238 INFO Resetting dropped connection: logs.us-west-2.amazonaws.com
connectionpool.py 238 INFO Resetting dropped connection: logs.us-west-2.amazonaws.com
connectionpool.py 238 INFO Resetting dropped connection: logs.us-west-2.amazonaws.com
connectionpool.py 238 INFO Resetting dropped connection: logs.us-west-2.amazonaws.com
connectionpool.py 238 INFO Resetting dropped connection: logs.us-west-2.amazonaws.com
connectionpool.py 238 INFO Resetting dropped connection: logs.us-west-2.amazonaws.com
connectionpool.py 238 INFO Resetting dropped connection: logs.us-west-2.amazonaws.com
connectionpool.py 238 INFO Resetting dropped connection: logs.us-west-2.amazonaws.com
__________________________________________________________________________________ test_tf ___________________________________________________________________________________

sagemaker_session = <sagemaker.session.Session object at 0x1134ce350>

def test_tf(sagemaker_session):
    with timeout(minutes=15):
        script_path = os.path.join(DATA_DIR, 'iris', 'iris-dnn-classifier.py')
        data_path = os.path.join(DATA_DIR, 'iris', 'data')

        estimator = TensorFlow(entry_point=script_path,
                               role='SageMakerRole',
                               training_steps=1,
                               evaluation_steps=1,
                               hyperparameters={'input_tensor_name': 'inputs'},
                               train_instance_count=1,
                               train_instance_type='ml.c4.xlarge',
                               sagemaker_session=sagemaker_session,
                               base_job_name='test-tf')

        inputs = estimator.sagemaker_session.upload_data(path=data_path, key_prefix='integ-test-data/tf_iris')
      estimator.fit(inputs)

tests/integ/test_tf.py:44:


.tox/py27/lib/python2.7/site-packages/sagemaker/tensorflow/estimator.py:166: in fit
fit_super()
.tox/py27/lib/python2.7/site-packages/sagemaker/tensorflow/estimator.py:154: in fit_super
super(TensorFlow, self).fit(inputs, wait, logs, job_name)
.tox/py27/lib/python2.7/site-packages/sagemaker/estimator.py:517: in fit
super(Framework, self).fit(inputs, wait, logs, self._current_job_name)
.tox/py27/lib/python2.7/site-packages/sagemaker/estimator.py:154: in fit
self.latest_training_job.wait(logs=logs)
.tox/py27/lib/python2.7/site-packages/sagemaker/estimator.py:323: in wait
self.sagemaker_session.logs_for_job(self.job_name, wait=True)
.tox/py27/lib/python2.7/site-packages/sagemaker/session.py:647: in logs_for_job
self._check_job_status(job_name, description)


self = <sagemaker.session.Session object at 0x1134ce350>, job = 'test-tf-2018-01-01-03-41-00-415'
desc = {'AlgorithmSpecification': {'TrainingImage': '520713654638.dkr.ecr.us-west-2.amazonaws.com/sagemaker-tensorflow-py2-cp...ckpoints"', 'evaluation_steps': '1', 'input_tensor_name': '"inputs"', 'sagemaker_container_log_level': '20', ...}, ...}

def _check_job_status(self, job, desc):
    """Check to see if the job completed successfully and, if not, construct and
        raise a ValueError.

        Args:
            job (str): The name of the job to check.
            desc (dict[str, str]): The result of ``describe_training_job()``.

        Raises:
            ValueError: If the training job fails.
        """
    status = desc['TrainingJobStatus']

    if status != 'Completed':
        reason = desc.get('FailureReason', '(No reason provided)')
      raise ValueError('Error training {}: {} Reason: {}'.format(job, status, reason))

E ValueError: Error training test-tf-2018-01-01-03-41-00-415: Failed Reason: ClientError: SageMaker was unable to assume the role 'arn:aws:iam::379899735384:role/SageMakerRole'

.tox/py27/lib/python2.7/site-packages/sagemaker/session.py:390: ValueError
--------------------------------------------------------------------------- Captured stderr setup ----------------------------------------------------------------------------
INFO:botocore.credentials:Found credentials in shared credentials file: ~/.aws/credentials
----------------------------------------------------------------------------- Captured log setup -----------------------------------------------------------------------------
credentials.py 1031 INFO Found credentials in shared credentials file: ~/.aws/credentials
---------------------------------------------------------------------------- Captured stdout call ----------------------------------------------------------------------------
....................
---------------------------------------------------------------------------- Captured stderr call ----------------------------------------------------------------------------
INFO:botocore.vendored.requests.packages.urllib3.connectionpool:Starting new HTTPS connection (1): sts.amazonaws.com
INFO:botocore.vendored.requests.packages.urllib3.connectionpool:Starting new HTTPS connection (1): s3.us-west-2.amazonaws.com
INFO:botocore.vendored.requests.packages.urllib3.connectionpool:Starting new HTTPS connection (1): s3.us-west-2.amazonaws.com
INFO:botocore.vendored.requests.packages.urllib3.connectionpool:Starting new HTTPS connection (1): s3.us-west-2.amazonaws.com
INFO:botocore.vendored.requests.packages.urllib3.connectionpool:Starting new HTTPS connection (1): sts.amazonaws.com
INFO:sagemaker:Creating training-job with name: test-tf-2018-01-01-03-41-00-415
INFO:botocore.vendored.requests.packages.urllib3.connectionpool:Starting new HTTPS connection (1): sagemaker.us-west-2.amazonaws.com
INFO:botocore.vendored.requests.packages.urllib3.connectionpool:Starting new HTTPS connection (1): logs.us-west-2.amazonaws.com
INFO:botocore.vendored.requests.packages.urllib3.connectionpool:Resetting dropped connection: logs.us-west-2.amazonaws.com
INFO:botocore.vendored.requests.packages.urllib3.connectionpool:Resetting dropped connection: logs.us-west-2.amazonaws.com
INFO:botocore.vendored.requests.packages.urllib3.connectionpool:Resetting dropped connection: logs.us-west-2.amazonaws.com
INFO:botocore.vendored.requests.packages.urllib3.connectionpool:Resetting dropped connection: logs.us-west-2.amazonaws.com
INFO:botocore.vendored.requests.packages.urllib3.connectionpool:Resetting dropped connection: logs.us-west-2.amazonaws.com
INFO:botocore.vendored.requests.packages.urllib3.connectionpool:Resetting dropped connection: logs.us-west-2.amazonaws.com
INFO:botocore.vendored.requests.packages.urllib3.connectionpool:Resetting dropped connection: logs.us-west-2.amazonaws.com
INFO:botocore.vendored.requests.packages.urllib3.connectionpool:Resetting dropped connection: logs.us-west-2.amazonaws.com
INFO:botocore.vendored.requests.packages.urllib3.connectionpool:Resetting dropped connection: logs.us-west-2.amazonaws.com
INFO:botocore.vendored.requests.packages.urllib3.connectionpool:Resetting dropped connection: logs.us-west-2.amazonaws.com
INFO:botocore.vendored.requests.packages.urllib3.connectionpool:Resetting dropped connection: logs.us-west-2.amazonaws.com
INFO:botocore.vendored.requests.packages.urllib3.connectionpool:Resetting dropped connection: logs.us-west-2.amazonaws.com
INFO:botocore.vendored.requests.packages.urllib3.connectionpool:Resetting dropped connection: logs.us-west-2.amazonaws.com
INFO:botocore.vendored.requests.packages.urllib3.connectionpool:Resetting dropped connection: logs.us-west-2.amazonaws.com
INFO:botocore.vendored.requests.packages.urllib3.connectionpool:Resetting dropped connection: logs.us-west-2.amazonaws.com
INFO:botocore.vendored.requests.packages.urllib3.connectionpool:Resetting dropped connection: logs.us-west-2.amazonaws.com
INFO:botocore.vendored.requests.packages.urllib3.connectionpool:Resetting dropped connection: logs.us-west-2.amazonaws.com
INFO:botocore.vendored.requests.packages.urllib3.connectionpool:Resetting dropped connection: logs.us-west-2.amazonaws.com
INFO:botocore.vendored.requests.packages.urllib3.connectionpool:Resetting dropped connection: logs.us-west-2.amazonaws.com
----------------------------------------------------------------------------- Captured log call ------------------------------------------------------------------------------
connectionpool.py 735 INFO Starting new HTTPS connection (1): sts.amazonaws.com
connectionpool.py 735 INFO Starting new HTTPS connection (1): s3.us-west-2.amazonaws.com
connectionpool.py 735 INFO Starting new HTTPS connection (1): s3.us-west-2.amazonaws.com
connectionpool.py 735 INFO Starting new HTTPS connection (1): s3.us-west-2.amazonaws.com
connectionpool.py 735 INFO Starting new HTTPS connection (1): sts.amazonaws.com
session.py 237 INFO Creating training-job with name: test-tf-2018-01-01-03-41-00-415
connectionpool.py 735 INFO Starting new HTTPS connection (1): sagemaker.us-west-2.amazonaws.com
connectionpool.py 735 INFO Starting new HTTPS connection (1): logs.us-west-2.amazonaws.com
connectionpool.py 238 INFO Resetting dropped connection: logs.us-west-2.amazonaws.com
connectionpool.py 238 INFO Resetting dropped connection: logs.us-west-2.amazonaws.com
connectionpool.py 238 INFO Resetting dropped connection: logs.us-west-2.amazonaws.com
connectionpool.py 238 INFO Resetting dropped connection: logs.us-west-2.amazonaws.com
connectionpool.py 238 INFO Resetting dropped connection: logs.us-west-2.amazonaws.com
connectionpool.py 238 INFO Resetting dropped connection: logs.us-west-2.amazonaws.com
connectionpool.py 238 INFO Resetting dropped connection: logs.us-west-2.amazonaws.com
connectionpool.py 238 INFO Resetting dropped connection: logs.us-west-2.amazonaws.com
connectionpool.py 238 INFO Resetting dropped connection: logs.us-west-2.amazonaws.com
connectionpool.py 238 INFO Resetting dropped connection: logs.us-west-2.amazonaws.com
connectionpool.py 238 INFO Resetting dropped connection: logs.us-west-2.amazonaws.com
connectionpool.py 238 INFO Resetting dropped connection: logs.us-west-2.amazonaws.com
connectionpool.py 238 INFO Resetting dropped connection: logs.us-west-2.amazonaws.com
connectionpool.py 238 INFO Resetting dropped connection: logs.us-west-2.amazonaws.com
connectionpool.py 238 INFO Resetting dropped connection: logs.us-west-2.amazonaws.com
connectionpool.py 238 INFO Resetting dropped connection: logs.us-west-2.amazonaws.com
connectionpool.py 238 INFO Resetting dropped connection: logs.us-west-2.amazonaws.com
connectionpool.py 238 INFO Resetting dropped connection: logs.us-west-2.amazonaws.com
connectionpool.py 238 INFO Resetting dropped connection: logs.us-west-2.amazonaws.com
_________________________________________________________________________________ test_cifar _________________________________________________________________________________

sagemaker_session = <sagemaker.session.Session object at 0x1150fd850>

def test_cifar(sagemaker_session):
    with timeout(minutes=15):
        script_path = os.path.join(DATA_DIR, 'cifar_10', 'source')

        dataset_path = os.path.join(DATA_DIR, 'cifar_10', 'data')

        estimator = TensorFlow(entry_point='resnet_cifar_10.py', source_dir=script_path, role='SageMakerRole',
                               training_steps=20, evaluation_steps=5,
                               train_instance_count=2, train_instance_type='ml.p2.xlarge',
                               sagemaker_session=sagemaker_session,
                               base_job_name='test-cifar')

        inputs = estimator.sagemaker_session.upload_data(path=dataset_path, key_prefix='data/cifar10')
      estimator.fit(inputs)

tests/integ/test_tf_cifar.py:54:


.tox/py27/lib/python2.7/site-packages/sagemaker/tensorflow/estimator.py:166: in fit
fit_super()
.tox/py27/lib/python2.7/site-packages/sagemaker/tensorflow/estimator.py:154: in fit_super
super(TensorFlow, self).fit(inputs, wait, logs, job_name)
.tox/py27/lib/python2.7/site-packages/sagemaker/estimator.py:517: in fit
super(Framework, self).fit(inputs, wait, logs, self._current_job_name)
.tox/py27/lib/python2.7/site-packages/sagemaker/estimator.py:152: in fit
self.latest_training_job = _TrainingJob.start_new(self, inputs)
.tox/py27/lib/python2.7/site-packages/sagemaker/estimator.py:263: in start_new
hyperparameters=hyperparameters, stop_condition=stop_condition)
.tox/py27/lib/python2.7/site-packages/sagemaker/session.py:239: in train
self.sagemaker_client.create_training_job(**train_request)
.tox/py27/lib/python2.7/site-packages/botocore/client.py:317: in _api_call
return self._make_api_call(operation_name, kwargs)


self = <botocore.client.SageMaker object at 0x1147f4210>, operation_name = 'CreateTrainingJob'
api_params = {'AlgorithmSpecification': {'TrainingImage': '520713654638.dkr.ecr.us-west-2.amazonaws.com/sagemaker-tensorflow-py2-gp...-2-379899735384/data/cifar10'}}}], 'OutputDataConfig': {'S3OutputPath': 's3://sagemaker-us-west-2-379899735384/'}, ...}

def _make_api_call(self, operation_name, api_params):
    operation_model = self._service_model.operation_model(operation_name)
    service_name = self._service_model.service_name
    history_recorder.record('API_CALL', {
        'service': service_name,
        'operation': operation_name,
        'params': api_params,
    })
    if operation_model.deprecated:
        logger.debug('Warning: %s.%s() is deprecated',
                     service_name, operation_name)
    request_context = {
        'client_region': self.meta.region_name,
        'client_config': self.meta.config,
        'has_streaming_input': operation_model.has_streaming_input,
        'auth_type': operation_model.auth_type,
    }
    request_dict = self._convert_to_request_dict(
        api_params, operation_model, context=request_context)

    handler, event_response = self.meta.events.emit_until_response(
        'before-call.{endpoint_prefix}.{operation_name}'.format(
            endpoint_prefix=self._service_model.endpoint_prefix,
            operation_name=operation_name),
        model=operation_model, params=request_dict,
        request_signer=self._request_signer, context=request_context)

    if event_response is not None:
        http, parsed_response = event_response
    else:
        http, parsed_response = self._endpoint.make_request(
            operation_model, request_dict)

    self.meta.events.emit(
        'after-call.{endpoint_prefix}.{operation_name}'.format(
            endpoint_prefix=self._service_model.endpoint_prefix,
            operation_name=operation_name),
        http_response=http, parsed=parsed_response,
        model=operation_model, context=request_context
    )

    if http.status_code >= 300:
        error_code = parsed_response.get("Error", {}).get("Code")
        error_class = self.exceptions.from_code(error_code)
      raise error_class(parsed_response, operation_name)

E ResourceLimitExceeded: An error occurred (ResourceLimitExceeded) when calling the CreateTrainingJob operation: The account-level service limit for training-job/ml.p2.xlarge is 0 Instances, with current utilization of 0 Instances and a request delta of 2 Instances. Please contact AWS support to request an increase for this limit.

.tox/py27/lib/python2.7/site-packages/botocore/client.py:615: ResourceLimitExceeded
--------------------------------------------------------------------------- Captured stderr setup ----------------------------------------------------------------------------
INFO:botocore.credentials:Found credentials in shared credentials file: ~/.aws/credentials
----------------------------------------------------------------------------- Captured log setup -----------------------------------------------------------------------------
credentials.py 1031 INFO Found credentials in shared credentials file: ~/.aws/credentials
---------------------------------------------------------------------------- Captured stderr call ----------------------------------------------------------------------------
INFO:botocore.vendored.requests.packages.urllib3.connectionpool:Starting new HTTPS connection (1): sts.amazonaws.com
INFO:botocore.vendored.requests.packages.urllib3.connectionpool:Starting new HTTPS connection (1): s3.us-west-2.amazonaws.com
INFO:botocore.vendored.requests.packages.urllib3.connectionpool:Starting new HTTPS connection (1): s3.us-west-2.amazonaws.com
INFO:botocore.vendored.requests.packages.urllib3.connectionpool:Starting new HTTPS connection (1): s3.us-west-2.amazonaws.com
INFO:botocore.vendored.requests.packages.urllib3.connectionpool:Starting new HTTPS connection (1): sts.amazonaws.com
INFO:sagemaker:Creating training-job with name: test-cifar-2018-01-01-03-42-43-090
INFO:botocore.vendored.requests.packages.urllib3.connectionpool:Starting new HTTPS connection (1): sagemaker.us-west-2.amazonaws.com
----------------------------------------------------------------------------- Captured log call ------------------------------------------------------------------------------
connectionpool.py 735 INFO Starting new HTTPS connection (1): sts.amazonaws.com
connectionpool.py 735 INFO Starting new HTTPS connection (1): s3.us-west-2.amazonaws.com
connectionpool.py 735 INFO Starting new HTTPS connection (1): s3.us-west-2.amazonaws.com
connectionpool.py 735 INFO Starting new HTTPS connection (1): s3.us-west-2.amazonaws.com
connectionpool.py 735 INFO Starting new HTTPS connection (1): sts.amazonaws.com
session.py 237 INFO Creating training-job with name: test-cifar-2018-01-01-03-42-43-090
connectionpool.py 735 INFO Starting new HTTPS connection (1): sagemaker.us-west-2.amazonaws.com
==================================================================== 5 failed, 2 error in 599.21 seconds =====================================================================
ERROR: InvocationError: '/Users/andyfeng/dev/sagemaker-python-sdk/.tox/py27/bin/pytest tests/integ'`

@andremoeller
Copy link
Contributor

Hi @anfeng ,

That integration test trains on two ml.p2.xlarge instances, but your AWS account currently has a limit of zero ml.p2.xlarge instances:

ResourceLimitExceeded: An error occurred (ResourceLimitExceeded) when calling the CreateTrainingJob operation: The account-level service limit for training-job/ml.p2.xlarge is 0 Instances, with current utilization of 0 Instances and a request delta of 2 Instances. Please contact AWS support to request an increase for this limit.

You can request a limit increase through AWS support. Or you can modify the integration test to use a different instance type, like ml.m4.xlarge.

Thanks for using Amazon SageMaker! Please let us know if you have more questions.

laurenyu added a commit to laurenyu/sagemaker-python-sdk that referenced this issue May 31, 2018
apacker pushed a commit to apacker/sagemaker-python-sdk that referenced this issue Nov 15, 2018
athewsey pushed a commit to athewsey/sagemaker-python-sdk that referenced this issue May 28, 2021
Integration tests for MXNetProcessor and PyTorchProcessor
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants