Skip to content

Failed Reason: AlgorithmError: uncaught exception during training: features should be a dictionary of Tensors. Given type: <type 'function'> #153

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
ghost opened this issue Apr 17, 2018 · 8 comments

Comments

@ghost
Copy link

ghost commented Apr 17, 2018

I'm not exactly sure what happened. All of the sudden all of my training tasks now fail with no code changes. There are definitely authentication issues with aws credentials even though I am training on the online Jupyter notebook and my session is active.

This is how I am constructing the classifier.

classifier = TensorFlow(entry_point='sm_transcript_classifier_ep.py',
                               role=role,
                               training_steps= 1e4,                                  
                               evaluation_steps= 100,
                               train_instance_count=1,
                               train_instance_type=INSTANCE_TYPE,
                               hyperparameters={
                                   "question": QUESTION,
                                   "n_words": _get_n_words()
                               })

model function:

def estimator_fn(run_config, params):
    bow_column = tf.feature_column.categorical_column_with_identity(
        WORDS_FEATURE, num_buckets=params["n_words"])
    bow_embedding_column = tf.feature_column.embedding_column(
        bow_column, dimension=EMBEDDING_SIZE, combiner="sqrtn")
    return tf.estimator.LinearClassifier(
        feature_columns=[bow_embedding_column],
        config=run_config
        #loss_reduction=tf.losses.Reduction.SUM_BY_NONZERO_WEIGHTS #this doesn't work even though SageMaker should support TF 1.6??
    )

Full error log:

...........................................................
2018-04-17 20:34:49,194 INFO - root - running container entrypoint
2018-04-17 20:34:49,194 INFO - root - starting train task
2018-04-17 20:34:49,199 INFO - container_support.training - Training starting
/usr/local/lib/python2.7/dist-packages/h5py/__init__.py:36: FutureWarning: Conversion of the second argument of issubdtype from `float` to `np.floating` is deprecated. In future, it will be treated as `np.float64 == np.dtype(float).type`.
  from ._conv import register_converters as _register_converters
2018-04-17 20:34:51,095 INFO - botocore.vendored.requests.packages.urllib3.connectionpool - Starting new HTTP connection (1): 169.254.170.2
2018-04-17 20:34:51,305 INFO - botocore.vendored.requests.packages.urllib3.connectionpool - Starting new HTTPS connection (1): sagemaker-us-east-1-245511257894.s3.amazonaws.com
2018-04-17 20:34:51,983 INFO - botocore.vendored.requests.packages.urllib3.connectionpool - Starting new HTTPS connection (1): s3.amazonaws.com
2018-04-17 20:34:52,246 INFO - tf_container - ----------------------TF_CONFIG--------------------------
2018-04-17 20:34:52,246 INFO - tf_container - {"environment": "cloud", "cluster": {"master": ["algo-1:2222"]}, "task": {"index": 0, "type": "master"}}
2018-04-17 20:34:52,246 INFO - tf_container - ---------------------------------------------------------
2018-04-17 20:34:52,246 INFO - tf_container - creating RunConfig:
2018-04-17 20:34:52,246 INFO - tf_container - {'save_checkpoints_secs': 300}
2018-04-17 20:34:52,247 INFO - tensorflow - TF_CONFIG environment variable: {u'environment': u'cloud', u'cluster': {u'master': [u'algo-1:2222']}, u'task': {u'index': 0, u'type': u'master'}}
2018-04-17 20:34:52,247 INFO - tf_container - invoking estimator_fn
2018-04-17 20:34:52,247 INFO - tensorflow - Using config: {'_save_checkpoints_secs': 300, '_session_config': None, '_keep_checkpoint_max': 5, '_tf_random_seed': None, '_task_type': u'master', '_global_id_in_cluster': 0, '_is_chief': True, '_cluster_spec': <tensorflow.python.training.server_lib.ClusterSpec object at 0x7fb3b40d4190>, '_model_dir': u's3://sagemaker-us-east-1-245511257894/sagemaker-tensorflow-2018-04-17-20-30-05-729/checkpoints', '_num_worker_replicas': 1, '_task_id': 0, '_log_step_count_steps': 100, '_master': '', '_save_checkpoints_steps': None, '_keep_checkpoint_every_n_hours': 10000, '_evaluation_master': '', '_service': None, '_save_summary_steps': 100, '_num_ps_replicas': 0}
2018-04-17 20:34:52,248 INFO - tensorflow - Skip starting Tensorflow server as there is only one node in the cluster.
2018-04-17 20:34:52.265465: I tensorflow/core/platform/s3/aws_logging.cc:54] Initializing config loader against fileName /root//.aws/config and using profilePrefix = 1
2018-04-17 20:34:52.267103: I tensorflow/core/platform/s3/aws_logging.cc:54] Initializing config loader against fileName /root//.aws/credentials and using profilePrefix = 0
2018-04-17 20:34:52.267120: I tensorflow/core/platform/s3/aws_logging.cc:54] Setting provider to read credentials from /root//.aws/credentials for credentials file and /root//.aws/config for the config file , for use with profile default
2018-04-17 20:34:52.267133: I tensorflow/core/platform/s3/aws_logging.cc:54] Creating HttpClient with max connections2 and scheme http
2018-04-17 20:34:52.267154: I tensorflow/core/platform/s3/aws_logging.cc:54] Initializing CurlHandleContainer with size 2
2018-04-17 20:34:52.267175: I tensorflow/core/platform/s3/aws_logging.cc:54] Creating TaskRole with default ECSCredentialsClient and refresh rate 900000
2018-04-17 20:34:52.267213: I tensorflow/core/platform/s3/aws_logging.cc:54] Unable to open config file /root//.aws/credentials for reading.
2018-04-17 20:34:52.267228: I tensorflow/core/platform/s3/aws_logging.cc:54] Failed to reload configuration.
2018-04-17 20:34:52.267238: I tensorflow/core/platform/s3/aws_logging.cc:54] Unable to open config file /root//.aws/config for reading.
2018-04-17 20:34:52.267244: I tensorflow/core/platform/s3/aws_logging.cc:54] Failed to reload configuration.
2018-04-17 20:34:52.267255: I tensorflow/core/platform/s3/aws_logging.cc:54] Credentials have expired or will expire, attempting to repull from ECS IAM Service.
2018-04-17 20:34:52.267342: I tensorflow/core/platform/s3/aws_logging.cc:54] Pool grown by 2
2018-04-17 20:34:52.267357: I tensorflow/core/platform/s3/aws_logging.cc:54] Connection has been released. Continuing.
2018-04-17 20:34:52.271264: I tensorflow/core/platform/s3/aws_logging.cc:54] Initializing CurlHandleContainer with size 25
2018-04-17 20:34:52.275164: I tensorflow/core/platform/s3/aws_logging.cc:54] Pool grown by 2
2018-04-17 20:34:52.275184: I tensorflow/core/platform/s3/aws_logging.cc:54] Connection has been released. Continuing.
2018-04-17 20:34:52.337301: E tensorflow/core/platform/s3/aws_logging.cc:60] No response body. Response code: 404
2018-04-17 20:34:52.337347: W tensorflow/core/platform/s3/aws_logging.cc:57] If the signature check failed. This could be because of a time skew. Attempting to adjust the signer.
2018-04-17 20:34:52.338141: I tensorflow/core/platform/s3/aws_logging.cc:54] Connection has been released. Continuing.
2018-04-17 20:34:56,292 INFO - tensorflow - Calling model_fn.
2018-04-17 20:34:56,293 ERROR - container_support.training - uncaught exception during training: features should be a dictionary of `Tensor`s. Given type: <type 'function'>
Traceback (most recent call last):
  File "/usr/local/lib/python2.7/dist-packages/container_support/training.py", line 38, in start
    fw.train()
  File "/usr/local/lib/python2.7/dist-packages/tf_container/train.py", line 139, in train
    train_wrapper.train()
  File "/usr/local/lib/python2.7/dist-packages/tf_container/trainer.py", line 73, in train
    tf.estimator.train_and_evaluate(estimator=estimator, train_spec=train_spec, eval_spec=eval_spec)
  File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/estimator/training.py", line 421, in train_and_evaluate
    executor.run()
  File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/estimator/training.py", line 522, in run
    getattr(self, task_to_run)()
  File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/estimator/training.py", line 577, in run_master
    self._start_distributed_training(saving_listeners=saving_listeners)
  File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/estimator/training.py", line 715, in _start_distributed_training
    saving_listeners=saving_listeners)
  File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/estimator/estimator.py", line 352, in train
    loss = self._train_model(input_fn, hooks, saving_listeners)
  File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/estimator/estimator.py", line 812, in _train_model
    features, labels, model_fn_lib.ModeKeys.TRAIN, self.config)
  File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/estimator/estimator.py", line 793, in _call_model_fn
    model_fn_results = self._model_fn(features=features, **kwargs)
  File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/estimator/canned/linear.py", line 316, in _model_fn
    config=config)
  File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/estimator/canned/linear.py", line 138, in _linear_model_fn
    'Given type: {}'.format(type(features)))
ValueError: features should be a dictionary of `Tensor`s. Given type: <type 'function'>


---------------------------------------------------------------------------
ValueError                                Traceback (most recent call last)
<ipython-input-12-2a854a24dd88> in <module>()
     17                                })
     18 
---> 19 classifier.fit(inputs)

~/anaconda3/envs/tensorflow_p36/lib/python3.6/site-packages/sagemaker/tensorflow/estimator.py in fit(self, inputs, wait, logs, job_name, run_tensorboard_locally)
    234                 tensorboard.event.set()
    235         else:
--> 236             fit_super()
    237 
    238     @classmethod

~/anaconda3/envs/tensorflow_p36/lib/python3.6/site-packages/sagemaker/tensorflow/estimator.py in fit_super()
    219         """
    220         def fit_super():
--> 221             super(TensorFlow, self).fit(inputs, wait, logs, job_name)
    222 
    223         if run_tensorboard_locally and wait is False:

~/anaconda3/envs/tensorflow_p36/lib/python3.6/site-packages/sagemaker/estimator.py in fit(self, inputs, wait, logs, job_name)
    608         self._hyperparameters[JOB_NAME_PARAM_NAME] = self._current_job_name
    609         self._hyperparameters[SAGEMAKER_REGION_PARAM_NAME] = self.sagemaker_session.boto_session.region_name
--> 610         super(Framework, self).fit(inputs, wait, logs, self._current_job_name)
    611 
    612     def hyperparameters(self):

~/anaconda3/envs/tensorflow_p36/lib/python3.6/site-packages/sagemaker/estimator.py in fit(self, inputs, wait, logs, job_name)
    163         self.latest_training_job = _TrainingJob.start_new(self, inputs)
    164         if wait:
--> 165             self.latest_training_job.wait(logs=logs)
    166 
    167     @classmethod

~/anaconda3/envs/tensorflow_p36/lib/python3.6/site-packages/sagemaker/estimator.py in wait(self, logs)
    396     def wait(self, logs=True):
    397         if logs:
--> 398             self.sagemaker_session.logs_for_job(self.job_name, wait=True)
    399         else:
    400             self.sagemaker_session.wait_for_job(self.job_name)

~/anaconda3/envs/tensorflow_p36/lib/python3.6/site-packages/sagemaker/session.py in logs_for_job(self, job_name, wait, poll)
    649 
    650         if wait:
--> 651             self._check_job_status(job_name, description)
    652             if dot:
    653                 print()

~/anaconda3/envs/tensorflow_p36/lib/python3.6/site-packages/sagemaker/session.py in _check_job_status(self, job, desc)
    393         if status != 'Completed':
    394             reason = desc.get('FailureReason', '(No reason provided)')
--> 395             raise ValueError('Error training {}: {} Reason: {}'.format(job, status, reason))
    396 
    397     def wait_for_endpoint(self, endpoint, poll=5):

ValueError: Error training sagemaker-tensorflow-2018-04-17-20-30-05-729: Failed Reason: AlgorithmError: uncaught exception during training: features should be a dictionary of `Tensor`s. Given type: <type 'function'>
Traceback (most recent call last):
  File "/usr/local/lib/python2.7/dist-packages/container_support/training.py", line 38, in start
    fw.train()
  File "/usr/local/lib/python2.7/dist-packages/tf_container/train.py", line 139, in train
    train_wrapper.train()
  File "/usr/local/lib/python2.7/dist-packages/tf_container/trainer.py", line 73, in train
    tf.estimator.train_and_evaluate(estimator=estimator, train_spec=train_spec, eval_spec=eval_spec)
  File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/estimator/training.py", line 421, in train_and_evaluate
    executor.run()
  File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/estimator/training.py", line 522, in run
    getattr(self, task_to_run)()
  File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/estimator/training.py", line 577, in run_master
    self._start_distributed_training(saving_liste

@ChoiByungWook
Copy link
Contributor

Hello,

Thanks for using SageMaker!

I am looking into this currently.

Can you please provide the following:

  1. A minimal repo.
  2. Python sdk version and Tensorflow container version.

Thanks!

@ghost
Copy link
Author

ghost commented Apr 18, 2018

OK, that pointed me in the right direction. AWS-SageMaker-Python-SDK is at version 1.2.1. Adding framework_version="1.5" to the constructor fixed my issue. It looks like it is now defaulting to 1.6 which causes the issues above. However, I've been using TF 1.6 and 1.7 on my local machine so the issue is probably SDK related. How do I go about updating the SDK version on the instance notebooks?

@ChoiByungWook
Copy link
Contributor

Hello,

Thank you for that information.

It would be extremely helpful if you could provide a minimal repo case.

In addition, here is the source code to our TensorFlow containers. https://github.com/aws/sagemaker-tensorflow-containers

As for updating the SDK version on the instance notebook, that can be done in the notebook by running the following command in a stand alone cell:
! pip install --upgrade sagemaker
Please restart the kernel and the sdk version should be updated for the corresponding notebook.

@ghost
Copy link
Author

ghost commented Apr 19, 2018

OK updating the SDK made no difference.

minimal repo: https://github.com/david-bishai/sagemaker-python-sdk_issue-31

@ChoiByungWook
Copy link
Contributor

Hello,

Thanks for providing the repo case, I was able to reproduce it on a SageMaker notebook instance using local mode.

2018-04-24 07:02:23,101 ERROR - container_support.training - uncaught exception during training: features should be a dictionary of Tensors. Given type: <type 'function'>
Traceback (most recent call last):
File "/usr/local/lib/python2.7/dist-packages/container_support/training.py", line 38, in start
fw.train()
File "/usr/local/lib/python2.7/dist-packages/tf_container/train.py", line 120, in train
train_wrapper.train()
File "/usr/local/lib/python2.7/dist-packages/tf_container/trainer.py", line 84, in train
tf.estimator.train_and_evaluate(estimator=estimator, train_spec=train_spec, eval_spec=eval_spec)
File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/estimator/training.py", line 421, in train_and_evaluate
executor.run()
File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/estimator/training.py", line 522, in run
getattr(self, task_to_run)()
File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/estimator/training.py", line 577, in run_master
self._start_distributed_training(saving_listeners=saving_listeners)
File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/estimator/training.py", line 715, in _start_distributed_training
saving_listeners=saving_listeners)
File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/estimator/estimator.py", line 352, in train
loss = self._train_model(input_fn, hooks, saving_listeners)
File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/estimator/estimator.py", line 812, in _train_model
features, labels, model_fn_lib.ModeKeys.TRAIN, self.config)
File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/estimator/estimator.py", line 793, in _call_model_fn
model_fn_results = self._model_fn(features=features, **kwargs)
File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/estimator/canned/linear.py", line 316, in _model_fn
config=config)
File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/estimator/canned/linear.py", line 138, in _linear_model_fn
'Given type: {}'.format(type(features)))
ValueError: features should be a dictionary of Tensors. Given type: <type 'function'>

I am not too sure what is causing this issue, and am still investigating, I'll post when I have an update.

@ChoiByungWook
Copy link
Contributor

Hello @david-bishai ,

The reason for this error was because the train_input_fn and eval_input_fn in your user script, entry_point.py, should return only a tuple of features, labels and not a function itself.

The numpy_input_fn returns an input function that would feed dict of numpy arrays into the model and not a tuple of features, labels.

For our TF 1.4 & 1.5 containers, it was an undocumented feature, where we allowed customers to provide a function instead of a value of just features, labels. We apologize for the experience and have documented this.

So to make your entry_point user script work within all versions, just invoke the function before returning. This can be done by adding () at the end of the function. I was able to successfully run your minimal repo with this change.

def train_input_fn(training_dir, params):
    """Returns input function that would feed the model during training"""
    x_train, x_test, y_train, y_test = get_data(training_dir)
    vocab_processor = tf.contrib.learn.preprocessing.VocabularyProcessor(
    MAX_DOCUMENT_LENGTH)
    
    x_transform_train = vocab_processor.fit_transform(x_train)
    x_train = np.array(list(x_transform_train))

    return tf.estimator.inputs.numpy_input_fn(
        x={WORDS_FEATURE: x_train},
        y=y_train,
        batch_size=len(x_train),
        num_epochs=None,
        shuffle=True)()

def eval_input_fn(training_dir, params):
    """Returns input function that would feed the model during evaluation"""
    x_train, x_test, y_train, y_test = get_data(training_dir)
    vocab_processor = tf.contrib.learn.preprocessing.VocabularyProcessor(
    MAX_DOCUMENT_LENGTH)
    vocab_processor.fit(x_train)
    
    x_transform_test = vocab_processor.transform(x_test)
    x_test = np.array(list(x_transform_test))
    
    return tf.estimator.inputs.numpy_input_fn(
        x={WORDS_FEATURE: x_test}, y=y_test, num_epochs=1, shuffle=False)()

Please let me know if this works for you.

Thanks!

@ghost
Copy link
Author

ghost commented Apr 25, 2018

OK thanks, it seems to be training now! I'm not sure what this is about though:

.........................................................................
2018-04-25 20:07:24,466 INFO - root - running container entrypoint
2018-04-25 20:07:24,466 INFO - root - starting train task
2018-04-25 20:07:24,472 INFO - container_support.training - Training starting
/usr/local/lib/python2.7/dist-packages/h5py/__init__.py:36: FutureWarning: Conversion of the second argument of issubdtype from `float` to `np.floating` is deprecated. In future, it will be treated as `np.float64 == np.dtype(float).type`.
  from ._conv import register_converters as _register_converters
2018-04-25 20:07:26,862 INFO - botocore.vendored.requests.packages.urllib3.connectionpool - Starting new HTTP connection (1): 169.254.170.2
2018-04-25 20:07:27,160 INFO - botocore.vendored.requests.packages.urllib3.connectionpool - Starting new HTTPS connection (1): sagemaker-us-east-1-245511257894.s3.amazonaws.com
2018-04-25 20:07:27,618 INFO - botocore.vendored.requests.packages.urllib3.connectionpool - Starting new HTTPS connection (1): s3.amazonaws.com
2018-04-25 20:07:27,670 INFO - tf_container - ----------------------TF_CONFIG--------------------------
2018-04-25 20:07:27,670 INFO - tf_container - {"environment": "cloud", "cluster": {"master": ["algo-1:2222"]}, "task": {"index": 0, "type": "master"}}
2018-04-25 20:07:27,670 INFO - tf_container - ---------------------------------------------------------
2018-04-25 20:07:27,670 INFO - tf_container - creating RunConfig:
2018-04-25 20:07:27,671 INFO - tf_container - {'save_checkpoints_secs': 300}
2018-04-25 20:07:27,671 INFO - tensorflow - TF_CONFIG environment variable: {u'environment': u'cloud', u'cluster': {u'master': [u'algo-1:2222']}, u'task': {u'index': 0, u'type': u'master'}}
2018-04-25 20:07:27,671 INFO - tf_container - invoking estimator_fn
2018-04-25 20:07:27,671 INFO - tensorflow - Using config: {'_save_checkpoints_secs': 300, '_session_config': None, '_keep_checkpoint_max': 5, '_tf_random_seed': None, '_task_type': u'master', '_global_id_in_cluster': 0, '_is_chief': True, '_cluster_spec': <tensorflow.python.training.server_lib.ClusterSpec object at 0x7f81a21ae050>, '_model_dir': u's3://sagemaker-us-east-1-245511257894/sagemaker-tensorflow-2018-04-25-20-01-28-862/checkpoints', '_num_worker_replicas': 1, '_task_id': 0, '_log_step_count_steps': 100, '_master': '', '_save_checkpoints_steps': None, '_keep_checkpoint_every_n_hours': 10000, '_evaluation_master': '', '_service': None, '_save_summary_steps': 100, '_num_ps_replicas': 0}
2018-04-25 20:07:27,672 INFO - tensorflow - Skip starting Tensorflow server as there is only one node in the cluster.
2018-04-25 20:07:27.681011: I tensorflow/core/platform/s3/aws_logging.cc:54] Initializing config loader against fileName /root//.aws/config and using profilePrefix = 1
2018-04-25 20:07:27.681592: I tensorflow/core/platform/s3/aws_logging.cc:54] Initializing config loader against fileName /root//.aws/credentials and using profilePrefix = 0
2018-04-25 20:07:27.681607: I tensorflow/core/platform/s3/aws_logging.cc:54] Setting provider to read credentials from /root//.aws/credentials for credentials file and /root//.aws/config for the config file , for use with profile default
2018-04-25 20:07:27.681620: I tensorflow/core/platform/s3/aws_logging.cc:54] Creating HttpClient with max connections2 and scheme http
2018-04-25 20:07:27.681642: I tensorflow/core/platform/s3/aws_logging.cc:54] Initializing CurlHandleContainer with size 2
2018-04-25 20:07:27.681658: I tensorflow/core/platform/s3/aws_logging.cc:54] Creating TaskRole with default ECSCredentialsClient and refresh rate 900000
2018-04-25 20:07:27.681694: I tensorflow/core/platform/s3/aws_logging.cc:54] Unable to open config file /root//.aws/credentials for reading.
2018-04-25 20:07:27.681711: I tensorflow/core/platform/s3/aws_logging.cc:54] Failed to reload configuration.
2018-04-25 20:07:27.681724: I tensorflow/core/platform/s3/aws_logging.cc:54] Unable to open config file /root//.aws/config for reading.
2018-04-25 20:07:27.681734: I tensorflow/core/platform/s3/aws_logging.cc:54] Failed to reload configuration.
2018-04-25 20:07:27.681745: I tensorflow/core/platform/s3/aws_logging.cc:54] Credentials have expired or will expire, attempting to repull from ECS IAM Service.
2018-04-25 20:07:27.681820: I tensorflow/core/platform/s3/aws_logging.cc:54] Pool grown by 2
2018-04-25 20:07:27.681840: I tensorflow/core/platform/s3/aws_logging.cc:54] Connection has been released. Continuing.
2018-04-25 20:07:27.685588: I tensorflow/core/platform/s3/aws_logging.cc:54] Initializing CurlHandleContainer with size 25
2018-04-25 20:07:27.687830: I tensorflow/core/platform/s3/aws_logging.cc:54] Pool grown by 2
2018-04-25 20:07:27.687851: I tensorflow/core/platform/s3/aws_logging.cc:54] Connection has been released. Continuing.
2018-04-25 20:07:27.745158: E tensorflow/core/platform/s3/aws_logging.cc:60] No response body. Response code: 404
2018-04-25 20:07:27.745196: W tensorflow/core/platform/s3/aws_logging.cc:57] If the signature check failed. This could be because of a time skew. Attempting to adjust the signer.
2018-04-25 20:07:27.746132: I tensorflow/core/platform/s3/aws_logging.cc:54] Connection has been released. Continuing.
2018-04-25 20:07:32,507 INFO - tensorflow - Calling model_fn.
2018-04-25 20:07:32,508 DEBUG - tensorflow - Transforming feature_column _IdentityCategoricalColumn(key='words', num_buckets=2092, default_value=None).
2018-04-25 20:07:32,768 INFO - tensorflow - Done calling model_fn.
2018-04-25 20:07:32,768 INFO - tensorflow - Create CheckpointSaverHook.
2018-04-25 20:07:32.769101: I tensorflow/core/platform/s3/aws_logging.cc:54] Connection has been released. Continuing.
2018-04-25 20:07:32.778035: E tensorflow/core/platform/s3/aws_logging.cc:60] No response body. Response code: 404
2018-04-25 20:07:32.778067: W tensorflow/core/platform/s3/aws_logging.cc:57] If the signature check failed. This could be because of a time skew. Attempting to adjust the signer.
2018-04-25 20:07:32.778223: I tensorflow/core/platform/s3/aws_logging.cc:54] Connection has been released. Continuing.
2018-04-25 20:07:32.787828: I tensorflow/core/platform/s3/aws_logging.cc:54] Connection has been released. Continuing.
2018-04-25 20:07:32.794657: E tensorflow/core/platform/s3/aws_logging.cc:60] No response body. Response code: 404
2018-04-25 20:07:32.794688: W tensorflow/core/platform/s3/aws_logging.cc:57] If the signature check failed. This could be because of a time skew. Attempting to adjust the signer.
2018-04-25 20:07:32.794856: I tensorflow/core/platform/s3/aws_logging.cc:54] Connection has been released. Continuing.
2018-04-25 20:07:32.805357: I tensorflow/core/platform/s3/aws_logging.cc:54] Connection has been released. Continuing.
2018-04-25 20:07:32.812592: E tensorflow/core/platform/s3/aws_logging.cc:60] No response body. Response code: 404
2018-04-25 20:07:32.812626: W tensorflow/core/platform/s3/aws_logging.cc:57] If the signature check failed. This could be because of a time skew. Attempting to adjust the signer.
2018-04-25 20:07:32.812783: I tensorflow/core/platform/s3/aws_logging.cc:54] Connection has been released. Continuing.
2018-04-25 20:07:32.904749: I tensorflow/core/platform/s3/aws_logging.cc:54] Connection has been released. Continuing.
2018-04-25 20:07:32.912901: I tensorflow/core/platform/s3/aws_logging.cc:54] Deleting file: /tmp/s3_filesystem_XXXXXX20180425T2007321524686852903
2018-04-25 20:07:32.952605: I tensorflow/core/platform/s3/aws_logging.cc:54] Connection has been released. Continuing.
2018-04-25 20:07:32.964337: I tensorflow/core/platform/s3/aws_logging.cc:54] Connection has been released. Continuing.
2018-04-25 20:07:32.975972: I tensorflow/core/platform/s3/aws_logging.cc:54] Connection has been released. Continuing.
2018-04-25 20:07:32.984361: I tensorflow/core/platform/s3/aws_logging.cc:54] Connection has been released. Continuing.
2018-04-25 20:07:32.994832: I tensorflow/core/platform/s3/aws_logging.cc:54] Connection has been released. Continuing.
2018-04-25 20:07:33.022984: I tensorflow/core/platform/s3/aws_logging.cc:54] Connection has been released. Continuing.
2018-04-25 20:07:33.045033: I tensorflow/core/platform/s3/aws_logging.cc:54] Connection has been released. Continuing.
2018-04-25 20:07:33.054378: I tensorflow/core/platform/s3/aws_logging.cc:54] Connection has been released. Continuing.
2018-04-25 20:07:33,262 INFO - tensorflow - Graph was finalized.
2018-04-25 20:07:33.266518: I tensorflow/core/platform/s3/aws_logging.cc:54] Connection has been released. Continuing.
2018-04-25 20:07:33.274858: E tensorflow/core/platform/s3/aws_logging.cc:60] No response body. Response code: 404
2018-04-25 20:07:33.274892: W tensorflow/core/platform/s3/aws_logging.cc:57] If the signature check failed. This could be because of a time skew. Attempting to adjust the signer.
2018-04-25 20:07:33.275052: I tensorflow/core/platform/s3/aws_logging.cc:54] Connection has been released. Continuing.
2018-04-25 20:07:33,379 INFO - tensorflow - Running local_init_op.
2018-04-25 20:07:33,382 INFO - tensorflow - Done running local_init_op.
2018-04-25 20:07:33.424724: I tensorflow/core/platform/s3/aws_logging.cc:54] Connection has been released. Continuing.
2018-04-25 20:07:33.496396: E tensorflow/core/platform/s3/aws_logging.cc:60] No response body. Response code: 404
2018-04-25 20:07:33.496449: W tensorflow/core/platform/s3/aws_logging.cc:57] If the signature check failed. This could be because of a time skew. Attempting to adjust the signer.
2018-04-25 20:07:33.496617: I tensorflow/core/platform/s3/aws_logging.cc:54] Connection has been released. Continuing.
2018-04-25 20:07:33.674614: I tensorflow/core/platform/s3/aws_logging.cc:54] Connection has been released. Continuing.
2018-04-25 20:07:33.762400: I tensorflow/core/platform/s3/aws_logging.cc:54] Deleting file: /tmp/s3_filesystem_XXXXXX20180425T2007331524686853674
2018-04-25 20:07:33.763544: I tensorflow/core/platform/s3/aws_logging.cc:54] Connection has been released. Continuing.
2018-04-25 20:07:33.779001: I tensorflow/core/platform/s3/aws_logging.cc:54] Connection has been released. Continuing.
2018-04-25 20:07:33.886115: I tensorflow/core/platform/s3/aws_logging.cc:54] Connection has been released. Continuing.
2018-04-25 20:07:34,361 INFO - tensorflow - Saving checkpoints for 1 into s3://sagemaker-us-east-1-245511257894/sagemaker-tensorflow-2018-04-25-20-01-28-862/checkpoints/model.ckpt.
2018-04-25 20:07:34.380768: I tensorflow/core/platform/s3/aws_logging.cc:54] Connection has been released. Continuing.
2018-04-25 20:07:34.391870: I tensorflow/core/platform/s3/aws_logging.cc:54] Deleting file: /tmp/s3_filesystem_XXXXXX20180425T2007341524686854380
2018-04-25 20:07:34.397628: I tensorflow/core/platform/s3/aws_logging.cc:54] Connection has been released. Continuing.
2018-04-25 20:07:34.422301: I tensorflow/core/platform/s3/aws_logging.cc:54] Deleting file: /tmp/s3_filesystem_XXXXXX20180425T2007341524686854391
2018-04-25 20:07:34.423217: I tensorflow/core/platform/s3/aws_logging.cc:54] Connection has been released. Continuing.
2018-04-25 20:07:34.439377: I tensorflow/core/platform/s3/aws_logging.cc:54] Connection has been released. Continuing.
2018-04-25 20:07:34.529052: I tensorflow/core/platform/s3/aws_logging.cc:54] Connection has been released. Continuing.
2018-04-25 20:07:34.541050: I tensorflow/core/platform/s3/aws_logging.cc:54] Connection has been released. Continuing.
2018-04-25 20:07:34.583608: I tensorflow/core/platform/s3/aws_logging.cc:54] Deleting file: /tmp/s3_filesystem_XXXXXX20180425T2007341524686854540
2018-04-25 20:07:34.583811: I tensorflow/core/platform/s3/aws_logging.cc:54] Connection has been released. Continuing.
2018-04-25 20:07:34.599446: I tensorflow/core/platform/s3/aws_logging.cc:54] Connection has been released. Continuing.
2018-04-25 20:07:34.629216: I tensorflow/core/platform/s3/aws_logging.cc:54] Connection has been released. Continuing.
2018-04-25 20:07:34.641439: I tensorflow/core/platform/s3/aws_logging.cc:54] Connection has been released. Continuing.
2018-04-25 20:07:34.651125: I tensorflow/core/platform/s3/aws_logging.cc:54] Deleting file: /tmp/s3_filesystem_XXXXXX20180425T2007341524686854641
2018-04-25 20:07:34.651328: I tensorflow/core/platform/s3/aws_logging.cc:54] Connection has been released. Continuing.
2018-04-25 20:07:34.660595: I tensorflow/core/platform/s3/aws_logging.cc:54] Connection has been released. Continuing.
2018-04-25 20:07:34.672229: I tensorflow/core/platform/s3/aws_logging.cc:54] Connection has been released. Continuing.
2018-04-25 20:07:34.683251: I tensorflow/core/platform/s3/aws_logging.cc:54] Connection has been released. Continuing.
2018-04-25 20:07:34.698314: I tensorflow/core/platform/s3/aws_logging.cc:54] Connection has been released. Continuing.
2018-04-25 20:07:34.708896: I tensorflow/core/platform/s3/aws_logging.cc:54] Connection has been released. Continuing.
2018-04-25 20:07:34.726261: I tensorflow/core/platform/s3/aws_logging.cc:54] Connection has been released. Continuing.
2018-04-25 20:07:34.806692: I tensorflow/core/platform/s3/aws_logging.cc:54] Connection has been released. Continuing.
2018-04-25 20:07:34.816075: I tensorflow/core/platform/s3/aws_logging.cc:54] Connection has been released. Continuing.
2018-04-25 20:07:34.923903: I tensorflow/core/platform/s3/aws_logging.cc:54] Deleting file: /tmp/s3_filesystem_XXXXXX20180425T2007341524686854815
2018-04-25 20:07:34.924160: I tensorflow/core/platform/s3/aws_logging.cc:54] Connection has been released. Continuing.
2018-04-25 20:07:34.935098: I tensorflow/core/platform/s3/aws_logging.cc:54] Connection has been released. Continuing.
2018-04-25 20:07:34.948491: I tensorflow/core/platform/s3/aws_logging.cc:54] Connection has been released. Continuing.
2018-04-25 20:07:34.960220: I tensorflow/core/platform/s3/aws_logging.cc:54] Connection has been released. Continuing.
2018-04-25 20:07:34.979797: I tensorflow/core/platform/s3/aws_logging.cc:54] Deleting file: /tmp/s3_filesystem_XXXXXX20180425T2007341524686854959
2018-04-25 20:07:34.980019: I tensorflow/core/platform/s3/aws_logging.cc:54] Connection has been released. Continuing.
2018-04-25 20:07:34.994514: I tensorflow/core/platform/s3/aws_logging.cc:54] Connection has been released. Continuing.
2018-04-25 20:07:35.037873: I tensorflow/core/platform/s3/aws_logging.cc:54] Connection has been released. Continuing.
2018-04-25 20:07:35.079588: I tensorflow/core/platform/s3/aws_logging.cc:54] Connection has been released. Continuing.
2018-04-25 20:07:35.086553: E tensorflow/core/platform/s3/aws_logging.cc:60] No response body. Response code: 404
2018-04-25 20:07:35.086585: W tensorflow/core/platform/s3/aws_logging.cc:57] If the signature check failed. This could be because of a time skew. Attempting to adjust the signer.
2018-04-25 20:07:35.086744: I tensorflow/core/platform/s3/aws_logging.cc:54] Connection has been released. Continuing.
2018-04-25 20:07:35.108749: I tensorflow/core/platform/s3/aws_logging.cc:54] Connection has been released. Continuing.
2018-04-25 20:07:35.150794: I tensorflow/core/platform/s3/aws_logging.cc:54] Deleting file: /tmp/s3_filesystem_XXXXXX20180425T2007351524686855108
2018-04-25 20:07:35.151074: I tensorflow/core/platform/s3/aws_logging.cc:54] Connection has been released. Continuing.
2018-04-25 20:07:35.165596: I tensorflow/core/platform/s3/aws_logging.cc:54] Connection has been released. Continuing.
2018-04-25 20:07:35.230282: I tensorflow/core/platform/s3/aws_logging.cc:54] Connection has been released. Continuing.
2018-04-25 20:07:35.247681: I tensorflow/core/platform/s3/aws_logging.cc:54] Connection has been released. Continuing.
2018-04-25 20:07:35.255765: I tensorflow/core/platform/s3/aws_logging.cc:54] Connection has been released. Continuing.
2018-04-25 20:07:35.266954: I tensorflow/core/platform/s3/aws_logging.cc:54] Connection has been released. Continuing.
2018-04-25 20:07:35.273601: I tensorflow/core/platform/s3/aws_logging.cc:54] Connection has been released. Continuing.
2018-04-25 20:07:35.284126: I tensorflow/core/platform/s3/aws_logging.cc:54] Connection has been released. Continuing.
2018-04-25 20:07:35.296354: I tensorflow/core/platform/s3/aws_logging.cc:54] Connection has been released. Continuing.
2018-04-25 20:07:35.313319: I tensorflow/core/platform/s3/aws_logging.cc:54] Connection has been released. Continuing.
2018-04-25 20:07:35.322861: I tensorflow/core/platform/s3/aws_logging.cc:54] Connection has been released. Continuing.
2018-04-25 20:07:35.331801: I tensorflow/core/platform/s3/aws_logging.cc:54] Connection has been released. Continuing.
2018-04-25 20:07:35.340025: I tensorflow/core/platform/s3/aws_logging.cc:54] Connection has been released. Continuing.

2018-04-25 20:07:39,147 INFO - tensorflow - Calling model_fn.
2018-04-25 20:07:39,147 DEBUG - tensorflow - Transforming feature_column _IdentityCategoricalColumn(key='words', num_buckets=2092, default_value=None).
2018-04-25 20:07:39,831 INFO - tensorflow - Done calling model_fn.
2018-04-25 20:07:39,853 INFO - tensorflow - Starting evaluation at 2018-04-25-20:07:39
2018-04-25 20:07:39,918 INFO - tensorflow - Graph was finalized.
2018-04-25 20:07:39,918 INFO - tensorflow - Restoring parameters from s3://sagemaker-us-east-1-245511257894/sagemaker-tensorflow-2018-04-25-20-01-28-862/checkpoints/model.ckpt-1
2018-04-25 20:07:39.973564: I tensorflow/core/platform/s3/aws_logging.cc:54] Connection has been released. Continuing.
2018-04-25 20:07:39.999326: I tensorflow/core/platform/s3/aws_logging.cc:54] Connection has been released. Continuing.
2018-04-25 20:07:40.008086: I tensorflow/core/platform/s3/aws_logging.cc:54] Connection has been released. Continuing.
2018-04-25 20:07:40.020798: I tensorflow/core/platform/s3/aws_logging.cc:54] Connection has been released. Continuing.
2018-04-25 20:07:40.028749: I tensorflow/core/platform/s3/aws_logging.cc:54] Connection has been released. Continuing.
2018-04-25 20:07:40.039653: I tensorflow/core/platform/s3/aws_logging.cc:54] Connection has been released. Continuing.
2018-04-25 20:07:40.047401: I tensorflow/core/platform/s3/aws_logging.cc:54] Connection has been released. Continuing.
2018-04-25 20:07:40.057312: I tensorflow/core/platform/s3/aws_logging.cc:54] Connection has been released. Continuing.
2018-04-25 20:07:40.069649: I tensorflow/core/platform/s3/aws_logging.cc:54] Connection has been released. Continuing.
2018-04-25 20:07:40.082472: I tensorflow/core/platform/s3/aws_logging.cc:54] Connection has been released. Continuing.
2018-04-25 20:07:40.093253: I tensorflow/core/platform/s3/aws_logging.cc:54] Connection has been released. Continuing.
2018-04-25 20:07:40,125 INFO - tensorflow - Running local_init_op.
2018-04-25 20:07:40,152 INFO - tensorflow - Done running local_init_op.
2018-04-25 20:07:40,948 INFO - tensorflow - Finished evaluation at 2018-04-25-20:07:40
2018-04-25 20:07:40,948 INFO - tensorflow - Saving dict for global step 1: accuracy = 0.7, accuracy_baseline = 0.7, auc = 0.5032116, auc_precision_recall = 0.648808, average_loss = 0.6735603, global_step = 1, label/mean = 0.3, loss = 0.6726951, prediction/mean = 0.47389776
2018-04-25 20:07:40.948570: I tensorflow/core/platform/s3/aws_logging.cc:54] Connection has been released. Continuing.
2018-04-25 20:07:40.960667: E tensorflow/core/platform/s3/aws_logging.cc:60] No response body. Response code: 404
2018-04-25 20:07:40.960699: W tensorflow/core/platform/s3/aws_logging.cc:57] If the signature check failed. This could be because of a time skew. Attempting to adjust the signer.
2018-04-25 20:07:40.960861: I tensorflow/core/platform/s3/aws_logging.cc:54] Connection has been released. Continuing.
2018-04-25 20:07:40.975202: I tensorflow/core/platform/s3/aws_logging.cc:54] Connection has been released. Continuing.
2018-04-25 20:07:40.985922: E tensorflow/core/platform/s3/aws_logging.cc:60] No response body. Response code: 404
2018-04-25 20:07:40.985959: W tensorflow/core/platform/s3/aws_logging.cc:57] If the signature check failed. This could be because of a time skew. Attempting to adjust the signer.
2018-04-25 20:07:40.986117: I tensorflow/core/platform/s3/aws_logging.cc:54] Connection has been released. Continuing.
2018-04-25 20:07:40.997728: I tensorflow/core/platform/s3/aws_logging.cc:54] Connection has been released. Continuing.
2018-04-25 20:07:41.007964: E tensorflow/core/platform/s3/aws_logging.cc:60] No response body. Response code: 404
2018-04-25 20:07:41.008000: W tensorflow/core/platform/s3/aws_logging.cc:57] If the signature check failed. This could be because of a time skew. Attempting to adjust the signer.
2018-04-25 20:07:41.008158: I tensorflow/core/platform/s3/aws_logging.cc:54] Connection has been released. Continuing.
2018-04-25 20:07:41.021124: I tensorflow/core/platform/s3/aws_logging.cc:54] Connection has been released. Continuing.
2018-04-25 20:07:41.031989: I tensorflow/core/platform/s3/aws_logging.cc:54] Deleting file: /tmp/s3_filesystem_XXXXXX20180425T2007411524686861020
2018-04-25 20:07:41.178749: I tensorflow/core/platform/s3/aws_logging.cc:54] Connection has been released. Continuing.
2018-04-25 20:07:41.193412: I tensorflow/core/platform/s3/aws_logging.cc:54] Connection has been released. Continuing.
2018-04-25 20:07:41.205561: I tensorflow/core/platform/s3/aws_logging.cc:54] Connection has been released. Continuing.
2018-04-25 20:07:41.214413: I tensorflow/core/platform/s3/aws_logging.cc:54] Connection has been released. Continuing.
2018-04-25 20:07:41.224619: I tensorflow/core/platform/s3/aws_logging.cc:54] Connection has been released. Continuing.
2018-04-25 20:07:41.273470: I tensorflow/core/platform/s3/aws_logging.cc:54] Connection has been released. Continuing.
2018-04-25 20:07:41.294372: I tensorflow/core/platform/s3/aws_logging.cc:54] Connection has been released. Continuing.
2018-04-25 20:07:41.301845: I tensorflow/core/platform/s3/aws_logging.cc:54] Connection has been released. Continuing.
2018-04-25 20:07:41.345282: I tensorflow/core/platform/s3/aws_logging.cc:54] Connection has been released. Continuing.
2018-04-25 20:07:41.373571: I tensorflow/core/platform/s3/aws_logging.cc:54] Connection has been released. Continuing.
2018-04-25 20:07:41.538971: I tensorflow/core/platform/s3/aws_logging.cc:54] Connection has been released. Continuing.
2018-04-25 20:07:41.586359: I tensorflow/core/platform/s3/aws_logging.cc:54] Connection has been released. Continuing.
2018-04-25 20:07:41,608 INFO - tensorflow - Calling model_fn.
2018-04-25 20:07:41,609 DEBUG - tensorflow - Transforming feature_column _IdentityCategoricalColumn(key='words', num_buckets=2092, default_value=None).
2018-04-25 20:07:41,708 INFO - tensorflow - Done calling model_fn.
2018-04-25 20:07:41,708 INFO - tensorflow - Signatures INCLUDED in export for Classify: ['serving_default', 'classification']
2018-04-25 20:07:41,709 INFO - tensorflow - Signatures INCLUDED in export for Regress: ['regression']
2018-04-25 20:07:41,709 INFO - tensorflow - Signatures INCLUDED in export for Predict: ['predict']
2018-04-25 20:07:41.709477: I tensorflow/core/platform/s3/aws_logging.cc:54] Connection has been released. Continuing.
2018-04-25 20:07:41.718035: E tensorflow/core/platform/s3/aws_logging.cc:60] No response body. Response code: 404
2018-04-25 20:07:41.718070: W tensorflow/core/platform/s3/aws_logging.cc:57] If the signature check failed. This could be because of a time skew. Attempting to adjust the signer.
2018-04-25 20:07:41.718252: I tensorflow/core/platform/s3/aws_logging.cc:54] Connection has been released. Continuing.
2018-04-25 20:07:41,740 INFO - tensorflow - Restoring parameters from s3://sagemaker-us-east-1-245511257894/sagemaker-tensorflow-2018-04-25-20-01-28-862/checkpoints/model.ckpt-1
2018-04-25 20:07:41.754121: I tensorflow/core/platform/s3/aws_logging.cc:54] Connection has been released. Continuing.
2018-04-25 20:07:41.771398: I tensorflow/core/platform/s3/aws_logging.cc:54] Connection has been released. Continuing.
2018-04-25 20:07:41.814798: I tensorflow/core/platform/s3/aws_logging.cc:54] Connection has been released. Continuing.
2018-04-25 20:07:41.833340: I tensorflow/core/platform/s3/aws_logging.cc:54] Connection has been released. Continuing.
2018-04-25 20:07:41.843000: I tensorflow/core/platform/s3/aws_logging.cc:54] Connection has been released. Continuing.
2018-04-25 20:07:41.854231: I tensorflow/core/platform/s3/aws_logging.cc:54] Connection has been released. Continuing.
2018-04-25 20:07:41.864706: I tensorflow/core/platform/s3/aws_logging.cc:54] Connection has been released. Continuing.
2018-04-25 20:07:41.965630: I tensorflow/core/platform/s3/aws_logging.cc:54] Connection has been released. Continuing.
2018-04-25 20:07:41.981404: I tensorflow/core/platform/s3/aws_logging.cc:54] Connection has been released. Continuing.
2018-04-25 20:07:42.084626: I tensorflow/core/platform/s3/aws_logging.cc:54] Connection has been released. Continuing.
2018-04-25 20:07:42.113944: I tensorflow/core/platform/s3/aws_logging.cc:54] Connection has been released. Continuing.
2018-04-25 20:07:42.131216: I tensorflow/core/platform/s3/aws_logging.cc:54] Connection has been released. Continuing.
2018-04-25 20:07:42.140608: E tensorflow/core/platform/s3/aws_logging.cc:60] No response body. Response code: 404
2018-04-25 20:07:42.140641: W tensorflow/core/platform/s3/aws_logging.cc:57] If the signature check failed. This could be because of a time skew. Attempting to adjust the signer.
2018-04-25 20:07:42.140812: I tensorflow/core/platform/s3/aws_logging.cc:54] Connection has been released. Continuing.
2018-04-25 20:07:42.154724: I tensorflow/core/platform/s3/aws_logging.cc:54] Connection has been released. Continuing.
2018-04-25 20:07:42.160731: E tensorflow/core/platform/s3/aws_logging.cc:60] No response body. Response code: 404
2018-04-25 20:07:42.160767: W tensorflow/core/platform/s3/aws_logging.cc:57] If the signature check failed. This could be because of a time skew. Attempting to adjust the signer.
2018-04-25 20:07:42.160933: I tensorflow/core/platform/s3/aws_logging.cc:54] Connection has been released. Continuing.
2018-04-25 20:07:42.175357: I tensorflow/core/platform/s3/aws_logging.cc:54] Connection has been released. Continuing.
2018-04-25 20:07:42.189820: E tensorflow/core/platform/s3/aws_logging.cc:60] No response body. Response code: 404
2018-04-25 20:07:42.189856: W tensorflow/core/platform/s3/aws_logging.cc:57] If the signature check failed. This could be because of a time skew. Attempting to adjust the signer.
2018-04-25 20:07:42.190042: I tensorflow/core/platform/s3/aws_logging.cc:54] Connection has been released. Continuing.
2018-04-25 20:07:42.202545: I tensorflow/core/platform/s3/aws_logging.cc:54] Connection has been released. Continuing.
2018-04-25 20:07:42.211210: E tensorflow/core/platform/s3/aws_logging.cc:60] No response body. Response code: 404
2018-04-25 20:07:42.211240: W tensorflow/core/platform/s3/aws_logging.cc:57] If the signature check failed. This could be because of a time skew. Attempting to adjust the signer.
2018-04-25 20:07:42.211394: I tensorflow/core/platform/s3/aws_logging.cc:54] Connection has been released. Continuing.
2018-04-25 20:07:42.225958: I tensorflow/core/platform/s3/aws_logging.cc:54] Connection has been released. Continuing.
2018-04-25 20:07:42.235843: E tensorflow/core/platform/s3/aws_logging.cc:60] No response body. Response code: 404
2018-04-25 20:07:42.235873: W tensorflow/core/platform/s3/aws_logging.cc:57] If the signature check failed. This could be because of a time skew. Attempting to adjust the signer.
2018-04-25 20:07:42.236044: I tensorflow/core/platform/s3/aws_logging.cc:54] Connection has been released. Continuing.
2018-04-25 20:07:42.250989: I tensorflow/core/platform/s3/aws_logging.cc:54] Connection has been released. Continuing.
2018-04-25 20:07:42.263674: I tensorflow/core/platform/s3/aws_logging.cc:54] Deleting file: /tmp/s3_filesystem_XXXXXX20180425T2007421524686862250
2018-04-25 20:07:42.263882: I tensorflow/core/platform/s3/aws_logging.cc:54] Connection has been released. Continuing.
2018-04-25 20:07:42.274792: I tensorflow/core/platform/s3/aws_logging.cc:54] Deleting file: /tmp/s3_filesystem_XXXXXX20180425T2007421524686862263
2018-04-25 20:07:42.275004: I tensorflow/core/platform/s3/aws_logging.cc:54] Connection has been released. Continuing.
2018-04-25 20:07:42.286674: I tensorflow/core/platform/s3/aws_logging.cc:54] Deleting file: /tmp/s3_filesystem_XXXXXX20180425T2007421524686862274
2018-04-25 20:07:42,286 INFO - tensorflow - Assets added to graph.
2018-04-25 20:07:42,287 INFO - tensorflow - No assets to write.
2018-04-25 20:07:42.287315: I tensorflow/core/platform/s3/aws_logging.cc:54] Connection has been released. Continuing.
2018-04-25 20:07:42.297594: E tensorflow/core/platform/s3/aws_logging.cc:60] No response body. Response code: 404
2018-04-25 20:07:42.297632: W tensorflow/core/platform/s3/aws_logging.cc:57] If the signature check failed. This could be because of a time skew. Attempting to adjust the signer.
2018-04-25 20:07:42.297800: I tensorflow/core/platform/s3/aws_logging.cc:54] Connection has been released. Continuing.
2018-04-25 20:07:42.309773: I tensorflow/core/platform/s3/aws_logging.cc:54] Connection has been released. Continuing.
2018-04-25 20:07:42.445939: E tensorflow/core/platform/s3/aws_logging.cc:60] No response body. Response code: 404
2018-04-25 20:07:42.445976: W tensorflow/core/platform/s3/aws_logging.cc:57] If the signature check failed. This could be because of a time skew. Attempting to adjust the signer.
2018-04-25 20:07:42.446170: I tensorflow/core/platform/s3/aws_logging.cc:54] Connection has been released. Continuing.
2018-04-25 20:07:42.473934: I tensorflow/core/platform/s3/aws_logging.cc:54] Connection has been released. Continuing.
2018-04-25 20:07:42.481774: E tensorflow/core/platform/s3/aws_logging.cc:60] No response body. Response code: 404
2018-04-25 20:07:42.481839: W tensorflow/core/platform/s3/aws_logging.cc:57] If the signature check failed. This could be because of a time skew. Attempting to adjust the signer.
2018-04-25 20:07:42.482005: I tensorflow/core/platform/s3/aws_logging.cc:54] Connection has been released. Continuing.
2018-04-25 20:07:42.502119: I tensorflow/core/platform/s3/aws_logging.cc:54] Connection has been released. Continuing.
2018-04-25 20:07:42.514847: I tensorflow/core/platform/s3/aws_logging.cc:54] Deleting file: /tmp/s3_filesystem_XXXXXX20180425T2007421524686862501
2018-04-25 20:07:42.540087: I tensorflow/core/platform/s3/aws_logging.cc:54] Connection has been released. Continuing.
2018-04-25 20:07:42.726431: I tensorflow/core/platform/s3/aws_logging.cc:54] Deleting file: /tmp/s3_filesystem_XXXXXX20180425T2007421524686862539
2018-04-25 20:07:42.730663: I tensorflow/core/platform/s3/aws_logging.cc:54] Connection has been released. Continuing.
2018-04-25 20:07:42.797579: I tensorflow/core/platform/s3/aws_logging.cc:54] Deleting file: /tmp/s3_filesystem_XXXXXX20180425T2007421524686862726
2018-04-25 20:07:42.797810: I tensorflow/core/platform/s3/aws_logging.cc:54] Connection has been released. Continuing.
2018-04-25 20:07:42.886932: I tensorflow/core/platform/s3/aws_logging.cc:54] Connection has been released. Continuing.
2018-04-25 20:07:43.003175: I tensorflow/core/platform/s3/aws_logging.cc:54] Connection has been released. Continuing.
2018-04-25 20:07:43.015325: I tensorflow/core/platform/s3/aws_logging.cc:54] Connection has been released. Continuing.
2018-04-25 20:07:43.037305: I tensorflow/core/platform/s3/aws_logging.cc:54] Deleting file: /tmp/s3_filesystem_XXXXXX20180425T2007431524686863015
2018-04-25 20:07:43.037516: I tensorflow/core/platform/s3/aws_logging.cc:54] Connection has been released. Continuing.
2018-04-25 20:07:43.140125: I tensorflow/core/platform/s3/aws_logging.cc:54] Connection has been released. Continuing.
2018-04-25 20:07:43.175190: I tensorflow/core/platform/s3/aws_logging.cc:54] Connection has been released. Continuing.
2018-04-25 20:07:43.187236: I tensorflow/core/platform/s3/aws_logging.cc:54] Connection has been released. Continuing.
2018-04-25 20:07:43.197812: I tensorflow/core/platform/s3/aws_logging.cc:54] Deleting file: /tmp/s3_filesystem_XXXXXX20180425T2007431524686863187
2018-04-25 20:07:43.198016: I tensorflow/core/platform/s3/aws_logging.cc:54] Connection has been released. Continuing.
2018-04-25 20:07:43.211355: I tensorflow/core/platform/s3/aws_logging.cc:54] Connection has been released. Continuing.
2018-04-25 20:07:43.308609: I tensorflow/core/platform/s3/aws_logging.cc:54] Connection has been released. Continuing.
2018-04-25 20:07:43.326101: I tensorflow/core/platform/s3/aws_logging.cc:54] Connection has been released. Continuing.
2018-04-25 20:07:43.359362: I tensorflow/core/platform/s3/aws_logging.cc:54] Connection has been released. Continuing.
2018-04-25 20:07:43.385707: I tensorflow/core/platform/s3/aws_logging.cc:54] Connection has been released. Continuing.
2018-04-25 20:07:43.405636: I tensorflow/core/platform/s3/aws_logging.cc:54] Connection has been released. Continuing.
2018-04-25 20:07:43.505584: I tensorflow/core/platform/s3/aws_logging.cc:54] Connection has been released. Continuing.
2018-04-25 20:07:43.518001: I tensorflow/core/platform/s3/aws_logging.cc:54] Connection has been released. Continuing.
2018-04-25 20:07:43.543803: I tensorflow/core/platform/s3/aws_logging.cc:54] Deleting file: /tmp/s3_filesystem_XXXXXX20180425T2007431524686863517
2018-04-25 20:07:43.544067: I tensorflow/core/platform/s3/aws_logging.cc:54] Connection has been released. Continuing.
2018-04-25 20:07:43.554117: I tensorflow/core/platform/s3/aws_logging.cc:54] Connection has been released. Continuing.
2018-04-25 20:07:43.582661: I tensorflow/core/platform/s3/aws_logging.cc:54] Connection has been released. Continuing.
2018-04-25 20:07:43.622197: I tensorflow/core/platform/s3/aws_logging.cc:54] Connection has been released. Continuing.
2018-04-25 20:07:43.632210: E tensorflow/core/platform/s3/aws_logging.cc:60] No response body. Response code: 404
2018-04-25 20:07:43.632267: W tensorflow/core/platform/s3/aws_logging.cc:57] If the signature check failed. This could be because of a time skew. Attempting to adjust the signer.
2018-04-25 20:07:43.632441: I tensorflow/core/platform/s3/aws_logging.cc:54] Connection has been released. Continuing.
2018-04-25 20:07:43.653777: I tensorflow/core/platform/s3/aws_logging.cc:54] Connection has been released. Continuing.

2018-04-25 20:07:43.795255: I tensorflow/core/platform/s3/aws_logging.cc:54] Deleting file: /tmp/s3_filesystem_XXXXXX20180425T2007431524686863653

@laurenyu
Copy link
Contributor

laurenyu commented May 1, 2018

That logging output is from interactions with S3 - it doesn't look like there are any errors in this particular run.

It seems like the original issue has been resolved, so I'm going to close this issue. Feel free to reopen if necessary, though.

@laurenyu laurenyu closed this as completed May 1, 2018
knakad pushed a commit to knakad/sagemaker-python-sdk that referenced this issue Nov 27, 2019
* Set eureka VERSION file

* Eureka master (aws#145)

* Eureka trial tracking interface

* Add experiments developer guides (aws#147)

* Add experiments developer guides

* Eureka master (aws#148)

* Add experiments developer guides

* Move to new experiment / trial / trial run data model. Add TrialRun class.

* Eureka master (aws#149)

* Add Trial class

* Experiment class (aws#151)

* Introduce active-record design, include first concrete type - Experiment.

* Add Trial and TrialRun active record classes (aws#152)


* Add Trial and TrialRun active records. Add common created_time / last_modified_time to Record

* List method (aws#153)


* Add general list classmethod to Record. Add concrete impl to experiment.Experiment

* Eureka master (aws#154)

* Use general list method in all list* methods in experiment, trial, trial_run

* Add Trial Run Tracker (aws#156)

* Add Trial Run Tracker

* Add helper methods to Experiment and Trial for fast-creating Trials and TrialRunTrackers

* Set Eureka build to just run linters, doc build, and unit tests

* Add integration tests (aws#159)

* Add experiment and trial integegration tests

* TrialRun bug fixes (aws#160)

Introduce UpdatedData property on TrialRun

* Adapt Python SDK to Experiments Api changes (aws#164)

* Move experiment to new api
* Pin pytest version to 4.4.1

* Fix list trials by experiment without passing experiment name. (aws#166)

* Pass experiment name for list trial api call.

* Eureka master (aws#167)

Flatten trial name input for create_trial

* Make providing the step name optional when creating a trial tracker (aws#169)

* Make step name optional when creating tracker

* Make create tracker obtain TRAINING_JOB_ARN from the environment. (aws#186)

* Make create tracker obtain TRAINING_JOB_ARN from the environment.

* when trial creates a tracker, the training_job_arn can be automatically set as source arn when creating the trial stage.
* correpsonding unit test.

* Add source_arn back as optional param for create_tracker

* make source_arn an optional param for create_tracker function.

* Resolving source arn skeleton for jobs.

* Changing _resolve_job_arn to _resolve_source_arn.

* Using generator to resolve source arn from the environment.

* TrialAnalytics class to convert trial step data to pandas dataframe (aws#188)

* TrialAnalytics class to convert trial stage data to pandas data frame
* pin version flake8-future-import to 0.4.5 to avoid build failure on import annotations missing
* pandas column ordering is different in py27 and py37. Sorting columns to make the order deterministic

* * Rename step to component (aws#194)

* Use new TrialComponent API structure for metrics, artifacts and parameters

* Improve documentation (aws#199)

* Improve documentation and remove unsupported parameters to list_trial_components

* minor doc update

* remove hardcoded alpha endpoint for experiments (aws#201)

* Generate sphinx docs for experiment classes (aws#204)

* Update Sphinx RST files to generate documentation for experiment classes

* Merging master branch in to eureka-master (aws#206)

* prepare release v1.18.16

* update development version to v1.18.17.dev0

* fix: use unique names for test training jobs (aws#765)

* prepare release v1.18.17

* update development version to v1.18.18.dev0

* change: add automatic model tuning integ test for TF script mode (aws#766)

* prepare release v1.18.18

* update development version to v1.18.19.dev0

* change: skip p2/p3 tests in eu-central-1 (aws#769)

* prepare release v1.18.19

* update development version to v1.18.20.dev0

* feature: add document embedding support to Object2Vec algorithm (aws#772)

* prepare release v1.19.0

* update development version to v1.19.1.dev0

* change: add py2 deprecation message for the deep learning framework images (aws#768)

* prepare release v1.19.1

* update development version to v1.19.2.dev0

* feature: add RL Ray 0.6.5 support (aws#779)

* fix: adjust Ray test script for Ray 0.6.5 (aws#781)

* fix: prevent false positive PR test results (aws#783)

* prepare release v1.20.0

* update development version to v1.20.1.dev0

* fix: update TrainingInputMode with s3_input InputMode (aws#776)

* prepare release v1.20.1

* update development version to v1.20.2.dev0

* fix: pin pytest version to 4.4.1 to avoid pluggy version conflict (aws#788)

* prepare release v1.20.2

* update development version to v1.20.3.dev0

* documentation: fix docs in regards to transform_fn for mxnet (aws#790)

* fix: skip local file check for TF requirements file when source_dir is an S3 URI (aws#798)

* fix: run tests if buildspec.yml has been modified (aws#786)

* prepare release v1.20.3

* update development version to v1.20.4.dev0

* feature: Support for TFS preprocessing (aws#797)

* prepare release v1.21.0

* update development version to v1.21.1.dev0

* fix: repack model function works without source directory (aws#804)

* prepare release v1.21.1

* update development version to v1.21.2.dev0

* fix: emit training jobs tags to estimator (aws#803)

* fix: set _current_job_name in attach() (aws#808)

* prepare release v1.21.2

* update development version to v1.21.3.dev0

* fix: honor source_dir from S3 (aws#811)

* feature: add encryption option to "record_set" (aws#794)

* feature: add encryption option to "record_set"

* prepare release v1.22.0

* update development version to v1.22.1.dev0

* documentation: update using_sklearn.rst parameter name (aws#814)

Incorrect parameter name in docs. Updated to match what is implemented in the method and what is used in other estimators.

* feature: support MXNet 1.4 with MMS (aws#812)

* prepare release v1.23.0

* update development version to v1.23.1.dev0

* feature: add region check for Neo service (aws#806)

* prepare release v1.24.0

* update development version to v1.24.1.dev0

* fix: add better default transform job name handling within Transformer (aws#822)

* feature: repack_model support dependencies and code location (aws#821)

* documentation: TFS support for pre/processing functions (aws#807)

* change: skip p2 tests in ap-south-east (aws#823)

* prepare release v1.25.0

* update development version to v1.25.1.dev0

* fix: use unique job name in hyperparameter tuning test (aws#829)

* prepare release v1.25.1

* update development version to v1.25.2.dev0

* feature: Add extra_args to enable encrypted objects upload (aws#836)

* change: downgrade c5 in integ tests and test all TF Script Mode images (aws#840)

* feature: emit estimator transformer tags to model (aws#815)

* doc: include FrameworkModel and ModelPackage in API docs (aws#833)

* prepare release v1.26.0

* update development version to v1.26.1.dev0

* fix: fix logger creation in Chainer integ test script (aws#843)

only one test failed due to a timeout. (the corresponding test failed with the other Python version.) talked to Rui offline.

* feature: add wait argument to estimator deploy (aws#842)

* prepare release v1.27.0

* update development version to v1.27.1.dev0

* feature: Add DataProcessing Fields for Batch Transform (aws#827)

* prepare release v1.28.0

* update development version to v1.28.1.dev0

* Update setup.py (aws#859)

* prepare release v1.28.1

* update development version to v1.28.2.dev0

* fix: prevent race condition in vpc tests (aws#863)

* prepare release v1.28.2

* update development version to v1.28.3.dev0

* doc: clean up MXNet and TF documentation (aws#865)

* doc: fix punctuation in MXNet version list (aws#866)

* change: update Sagemaker Neo regions and instance families (aws#862)

* prepare release v1.28.3

* update development version to v1.28.4.dev0

* feature: network isolation mode in training (aws#791)

* feature: network isolation mode in training

* feature: network isolation mode in tar support training

* change: documentation and check describe training job network isolation

* doc update

* doc update, remove inference section

* sourcedir

* type error fix

* change: moving not canary TFS tests to local mode (aws#870)

* Integrate black into development process (aws#873)

* change: Add Black formatting tool as dependency

As of this commit, Black formatting tool can be run with 'tox -e black-format'.
Black does not run as part of any automated process, yet.

Black is pulled in as a test dependency only if the Python version
is greater than 3.6, as the tool is not vended as part of any
earlier Python version.

* change: Resolve Black formatting failures

Black is unable to handle trailing 'L' or 'l' which is no longer
supported as of python 3.8.

This commit removes those unnecessary 'long' identifiers.

https://www.python.org/dev/peps/pep-0237/

* change: Format all files using Black

This commit contains no functional changes.

* change: Manually resolve flake8 violations after formatting

* change: Manually resolve pylint violations after formatting

* change: Enable black locally and in automated build.

This commit enables black-format as part of "tox tests/unit", in order to
format all files.
It also enables black-check as part of the remote builds, in order to
verify that all files are properly formatted.

* prepare release v1.29.0

* update development version to v1.29.1.dev0

* feature: add git_config and git_clone, validate method (aws#832)

* fix: add pytest.mark.local_mode annotation to broken tests (aws#876)

* fix: add pytest.mark.local_mode annotation to tests

* feature: add TensorFlow 1.13 support (aws#860)

* prepare release v1.30.0

* update development version to v1.30.1.dev0

* fix: add pytest.mark.local_mode annotation to broken tests (aws#884)

* change: remove unnecessary P3 tests from TFS integration tests (aws#885)

* change: allow only one integration test run per time (aws#880)

* change: Update buildspec.yml (aws#887)

* feature: use deep learning images (aws#883)

* prepare release v1.31.0

* update development version to v1.31.1.dev0

* change: build spec improvements. (aws#888)

* fix: remove unnecessary failure case tests (aws#892)

* change: print build execution time (aws#890)

* prepare release v1.31.1

* update development version to v1.31.2.dev0

* fix git test in test_estimator.py (aws#894)

* feature: support Endpoint_type for TF transform (aws#881)

* prepare release v1.32.0

* update development version to v1.32.1.dev0

* change: separate unit, local mode, and notebook tests in different buildspecs (aws#898)

* change: fix notebook tests (aws#900)

* Update displaytime.sh (aws#901)

* doc: refactor the overview topic in the sphinx project (aws#877)

* change: tighten pylint config and expand C and R exceptions (aws#899)

This commit tightens the pylint config with
inspiration from several of Google's pylint
configs.

This commit also expands the C and R exceptions
and disables the specific rules that cause issues
in this package.

* change: correct code per len-as-condition Pylint check (aws#902)

The Pylint check is not actually enabled in this commit as it conflicts
directly with NumPy. Pylint has corrected this, and it will be included
in their next release (2.4.0):
pylint-dev/pylint#2684

Once Pylint 2.4.0 is released, we can consume it and remove this check.
A summary of this information is included in a TODO near the relevant
Pylint disable rule (len-as-condition).

* prepare release v1.32.1

* update development version to v1.32.2.dev0

* change: remove superfluous parens per Pylint rule (aws#903)

* change: enable logging-format-interpolation pylint check (aws#904)

* documentation: add pypi, rtd, black badges to readme (aws#910)

* prepare release v1.32.2

* update development version to v1.32.3.dev0

* feature: allow custom model name during deploy (aws#792)

* feature: allow custom model name during deploy

* black check

* feature: git support for hosting models (aws#878)

* git integration for serving

* fix: Add ap-northeast-1 to Neo algorithms region map (aws#897)

* fix: reset default output path in Transformer.transform  (aws#905)

* fix: reset default output path on create transform job

* Unit and integration tests

* change: enable logging-not-lazy pylint check (aws#909)

* change: enable wrong-import-position pylint check (aws#907)

* change: enable wrong-import-position pylint check

* change: updating import pattern for sagemaker.tensorflow

* fix: fixing integration tests

* change: reformatting

* change: enable signature-differs pylint check (aws#915)

* Revert "change: enable wrong-import-position pylint check (aws#907)" (aws#916)

This reverts commit 8489f86.

* change: enable wrong-import-position pylint check (aws#917)

* change: remove TODO comment on import-error Pylint check (aws#918)

By running Pylint before any of the unit tests (and dependency
installs), the import-error check will always fail since the
dependencies are not yet installed.

We could move Pylint to a later stage to resolve this, but there's
value in this quick check occurring before the unit tests.

As a result, this Pylint check is being disabled.

* prepare release v1.33.0

* update development version to v1.33.1.dev0

* change: enable unidiomatic-typecheck pylint check (aws#921)

* change: enable no-else-return and no-else-raise pylint checks (aws#925)

* change: fix list serialization for 1P algos (aws#922)

* change: enable simplifiable-if-expression pylint checks (aws#926)

* feature: deal with credentials for Git support for GitHub (aws#914)

add authentication info

* feature: Git integration for CodeCommit (aws#927)

* add functions, tests and doc for CodeCommit

* change: enable inconsistent-return-statements Pylint check (aws#930)

Note that this commit also raises ValueErrors in situations that would
previously have returned None.

Per PEP8: Be consistent in return statements. Either all return
statements in a function should return an expression, or none of them
should. If any return statement returns an expression, any return
statements where no value is returned should explicitly state this as
return None, and an explicit return statement should be present at the
end of the function (if reachable).

* change: enable consider-merging-isinstance Pylint check (aws#932)

Note that this commit will also enable simplifiable-if-statement, as
there are no code changes needed for it.

* change: enable attribute-defined-outside-init Pylint check (aws#933)

The logic behind this rule is to improve readability by defining all
the attributes of a class inside the init function, even if it simply
sets them to None.

* change: enable wrong-import-order Pylint check (aws#935)

Per PEP8:
Imports should be grouped in the following order:
1- Standard library imports.
2- Related third party imports.
3- Local application/library specific imports.

*  change: enable ungrouped-imports Pylint check (aws#936)

* change: enable wrong-import-order Pylint check

Per PEP8:
Imports should be grouped in the following order:
1- Standard library imports.
2- Related third party imports.
3- Local application/library specific imports.

* change: fix attach for 1P algorithm estimators (aws#931)

* change: set num_processes_per_host only if provided by user (aws#928)

*  change: enable consider-using-in Pylint check (aws#938)

* change: enable consider-using-in Pylint check

*  change: enable too-many-public-methods Pylint check (aws#939)

* change: enable too-many-public-methods Pylint check

This is a useful check to have, but is a lot of work to retroactively
enforce.
Enabling it while ignoring the single violation allows the validation
to run for future code.

* change: enable chained-comparison Pylint check (aws#940)

* change: enable consider-using-ternary Pylint check (aws#942)

This commit will add an exclusion for all auto-generated files.

I chose to ignore the single violation, because the alternative is
confusingly convoluted:
`(hasattr(obj, '__getitem__') if hasattr(obj, '__iter__')
else isinstance(obj, str))`

* change: modify TODO on disabled Pylint check (aws#943)

The check recommendations are only valid for packages that exclusively
support Python 3. The changes cannot be made in Python 2.
The TODO was updated to clarify this.

* prepare release v1.34.0

* update development version to v1.34.1.dev0

* change: add MXNet 1.4.1 support (aws#886)

* change: format and add missing docstring placeholders (aws#945)

This commit will format all existing docstring to follow Google
style: https://sphinxcontrib-napoleon.readthedocs.io/en/latest/example_google.html
This commit will also add docstring placeholders to any class or method
previously missing it.

An ideal approach would be to take the time to include meaningful
docstrings in every file. However, since that is not a task that will
be prioritized, I've declared docstring bankruptcy on this package, in
order to enforce docstring on all future code changes to this package.

* change: allow serving script to be defined for deploy() and transformer() with frameworks (aws#944)

* change: update PyTorch version (aws#947)

* change: improve documentation of some functions (aws#864)

[pr-827][followups] Improve documentation of some functions.
Also some unit test fixes. See comments from marcio in
aws#827

* doc: update using_tensorflow topic (aws#946)

* fix: update TensorFlow script mode dependency list (aws#869)

* change: improving Chainer integ tests (aws#872)

* change: enable line-too-long Pylint check (aws#948)

* doc: add instructions for setting up Cloud9 environment. (aws#949)

Added instructions that allow for a low-cost ~10min environment setup.

* prepare release v1.34.1

* update development version to v1.34.2.dev0

* change: Replaced generic ValueError with custom subclass when reporting unexpected resource status (aws#919)

* doc: correct wording for Cloud9 environment setup instructions (aws#952)

package => repo

* change: removing unnecessary tests cases (aws#951)

* prepare release v1.34.2

* update development version to v1.34.3.dev0

* change: waiting for training tags to propagate in the test (aws#955)

* prepare release v1.34.3

* update development version to v1.34.4.dev0

* feature: allow serving image to be specified when calling MXNet.deploy (aws#959)

* prepare release v1.35.0

* update development version to v1.35.1.dev0

* doc: refactor and edit using_mxnet topic (aws#956)

* doc: refactor overview section per improvement plan

* Update doc/overview.rst

Co-Authored-By: Marcio Vinicius dos Santos <[email protected]>

* Update doc/overview.rst

Co-Authored-By: Marcio Vinicius dos Santos <[email protected]>

* doc: made changes per feedback comments

* doc: remove duplicate faq section and fixed heading

* doc: fix heading levels in overview.rst

* doc: update TensorFlow using topic

* doc: Update using_tf.rst to address feedback

* doc: fix comment in conf.py per build log

* doc: add newline to conf.py to fix error

* doc: addressed feedback for PR

* doc: update conf.py

* doc: remove duplicate byom section in overview.rst

* doc: remove duplicate headings in several rst files

* doc: Restructure and update Using MXNet topic

* doc: fix link

* doc: add link to mxnet readme container section in using_mxnet.rst topic

* fix: update sklearn document to include 3p dependency installation (aws#960)

* prepare release v1.35.1

* update development version to v1.35.2.dev0

* fix: allow Airflow enabled estimators to use absolute path entry_point (aws#965)

* change: ignore FI18 flake8 rule (aws#969)

* feature: support for TensorFlow 1.14 (aws#967)

* flake8 fixes

* black reformat

* Revert "Merging master branch in to eureka-master (aws#206)"

This reverts commit 080d06d561aa88a177c67f08114902ab292f3883.

* Black + Pylint fixes

* add latest api service model

* skip eureka integ tests temporarily

* Fix integ tests to work with preview SDK model (aws#215)

* Fix integ tests to work with preview SDK model.

* Use search to find trial components for analytics dataframe (aws#219)

* move boto client arg to end of the arg list for all eureka APIs

* Use search to find trial components in TrialAnalytics

* add test to verify value error is thrown if no component filter specified

* drop trial name from analytics frame as trial components wont have trial name in them in the future

* remove trial name column for analytics frame

* Eureka master (aws#236)

* Add ExperimentConfig for estimator.fit and transformer.transform

* experiment_config can be passed to estimator.fit
* experiment_config can be passed to transformer.transform
* unit tests for corresponding changes.

* Remove include only experiment integ tests from tox.ini

* Delete experiments integ tests

* Update the service-2.json.

* Bring in latest sagemaker models
* Remove internal-only shapes and internal operations

* Doc the three optional keys for ExperimentConfig dictionary

* Eureka master (aws#237)

* Add ExperimentConfig for estimator.fit and transformer.transform

* experiment_config can be passed to estimator.fit
* experiment_config can be passed to transformer.transform
* unit tests for corresponding changes.

* Remove include only experiment integ tests from tox.ini

* Delete experiments integ tests

* Update the service-2.json.

* Bring in latest sagemaker models
* Remove internal-only shapes and internal operations

* Doc the three optional keys for ExperimentConfig dictionary

* Fix analytics component and search functionality

* Delete all experiments related classes and their tests.
* Change TrialAnalytics to ExperimentAnalytics.
* Fix ExperimentAnalytics for m-n model change.
* Fix/Modify Search functionality

* Fix Docs

* Remove exp management doc from index

* Fix pass None type ExperimentConfig to transform request.

* Fix formatting in test_session.py

* Do not build empty filters list when experiment name is not given

* Add DisplayName to analytics table

* Fix formatting.

* Add sortBy and sortOrder for ExperimentAnalytics

* Eureka master (aws#259)

Fix bad merge

* Add ExperimentConfig to Processor  (aws#260)

* Add ExperimentConfig to Processor

* Remove broken experiment config from processor test (aws#261)


* Add ExperimentConfig to Processor

* Eureka master (aws#262)


* Remove old setup file and Eureka specific files.

* Eureka master (aws#264)


* Add back missing factorization machines integration test

* Minor style fixes (aws#265)



* Minor style fixes

* Fix broken SageMaker Experiments analytics integration tests (aws#267)


* Fix broken experiments_analytics integration tests

* Eureka master (aws#270)



* Remove experiment_config from analytics test
knakad pushed a commit to knakad/sagemaker-python-sdk that referenced this issue Dec 4, 2019
* Set eureka VERSION file

* Eureka master (aws#145)

* Eureka trial tracking interface

* Add experiments developer guides (aws#147)

* Add experiments developer guides

* Eureka master (aws#148)

* Add experiments developer guides

* Move to new experiment / trial / trial run data model. Add TrialRun class.

* Eureka master (aws#149)

* Add Trial class

* Experiment class (aws#151)

* Introduce active-record design, include first concrete type - Experiment.

* Add Trial and TrialRun active record classes (aws#152)


* Add Trial and TrialRun active records. Add common created_time / last_modified_time to Record

* List method (aws#153)


* Add general list classmethod to Record. Add concrete impl to experiment.Experiment

* Eureka master (aws#154)

* Use general list method in all list* methods in experiment, trial, trial_run

* Add Trial Run Tracker (aws#156)

* Add Trial Run Tracker

* Add helper methods to Experiment and Trial for fast-creating Trials and TrialRunTrackers

* Set Eureka build to just run linters, doc build, and unit tests

* Add integration tests (aws#159)

* Add experiment and trial integegration tests

* TrialRun bug fixes (aws#160)

Introduce UpdatedData property on TrialRun

* Adapt Python SDK to Experiments Api changes (aws#164)

* Move experiment to new api
* Pin pytest version to 4.4.1

* Fix list trials by experiment without passing experiment name. (aws#166)

* Pass experiment name for list trial api call.

* Eureka master (aws#167)

Flatten trial name input for create_trial

* Make providing the step name optional when creating a trial tracker (aws#169)

* Make step name optional when creating tracker

* Make create tracker obtain TRAINING_JOB_ARN from the environment. (aws#186)

* Make create tracker obtain TRAINING_JOB_ARN from the environment.

* when trial creates a tracker, the training_job_arn can be automatically set as source arn when creating the trial stage.
* correpsonding unit test.

* Add source_arn back as optional param for create_tracker

* make source_arn an optional param for create_tracker function.

* Resolving source arn skeleton for jobs.

* Changing _resolve_job_arn to _resolve_source_arn.

* Using generator to resolve source arn from the environment.

* TrialAnalytics class to convert trial step data to pandas dataframe (aws#188)

* TrialAnalytics class to convert trial stage data to pandas data frame
* pin version flake8-future-import to 0.4.5 to avoid build failure on import annotations missing
* pandas column ordering is different in py27 and py37. Sorting columns to make the order deterministic

* * Rename step to component (aws#194)

* Use new TrialComponent API structure for metrics, artifacts and parameters

* Improve documentation (aws#199)

* Improve documentation and remove unsupported parameters to list_trial_components

* minor doc update

* remove hardcoded alpha endpoint for experiments (aws#201)

* Generate sphinx docs for experiment classes (aws#204)

* Update Sphinx RST files to generate documentation for experiment classes

* Merging master branch in to eureka-master (aws#206)

* prepare release v1.18.16

* update development version to v1.18.17.dev0

* fix: use unique names for test training jobs (aws#765)

* prepare release v1.18.17

* update development version to v1.18.18.dev0

* change: add automatic model tuning integ test for TF script mode (aws#766)

* prepare release v1.18.18

* update development version to v1.18.19.dev0

* change: skip p2/p3 tests in eu-central-1 (aws#769)

* prepare release v1.18.19

* update development version to v1.18.20.dev0

* feature: add document embedding support to Object2Vec algorithm (aws#772)

* prepare release v1.19.0

* update development version to v1.19.1.dev0

* change: add py2 deprecation message for the deep learning framework images (aws#768)

* prepare release v1.19.1

* update development version to v1.19.2.dev0

* feature: add RL Ray 0.6.5 support (aws#779)

* fix: adjust Ray test script for Ray 0.6.5 (aws#781)

* fix: prevent false positive PR test results (aws#783)

* prepare release v1.20.0

* update development version to v1.20.1.dev0

* fix: update TrainingInputMode with s3_input InputMode (aws#776)

* prepare release v1.20.1

* update development version to v1.20.2.dev0

* fix: pin pytest version to 4.4.1 to avoid pluggy version conflict (aws#788)

* prepare release v1.20.2

* update development version to v1.20.3.dev0

* documentation: fix docs in regards to transform_fn for mxnet (aws#790)

* fix: skip local file check for TF requirements file when source_dir is an S3 URI (aws#798)

* fix: run tests if buildspec.yml has been modified (aws#786)

* prepare release v1.20.3

* update development version to v1.20.4.dev0

* feature: Support for TFS preprocessing (aws#797)

* prepare release v1.21.0

* update development version to v1.21.1.dev0

* fix: repack model function works without source directory (aws#804)

* prepare release v1.21.1

* update development version to v1.21.2.dev0

* fix: emit training jobs tags to estimator (aws#803)

* fix: set _current_job_name in attach() (aws#808)

* prepare release v1.21.2

* update development version to v1.21.3.dev0

* fix: honor source_dir from S3 (aws#811)

* feature: add encryption option to "record_set" (aws#794)

* feature: add encryption option to "record_set"

* prepare release v1.22.0

* update development version to v1.22.1.dev0

* documentation: update using_sklearn.rst parameter name (aws#814)

Incorrect parameter name in docs. Updated to match what is implemented in the method and what is used in other estimators.

* feature: support MXNet 1.4 with MMS (aws#812)

* prepare release v1.23.0

* update development version to v1.23.1.dev0

* feature: add region check for Neo service (aws#806)

* prepare release v1.24.0

* update development version to v1.24.1.dev0

* fix: add better default transform job name handling within Transformer (aws#822)

* feature: repack_model support dependencies and code location (aws#821)

* documentation: TFS support for pre/processing functions (aws#807)

* change: skip p2 tests in ap-south-east (aws#823)

* prepare release v1.25.0

* update development version to v1.25.1.dev0

* fix: use unique job name in hyperparameter tuning test (aws#829)

* prepare release v1.25.1

* update development version to v1.25.2.dev0

* feature: Add extra_args to enable encrypted objects upload (aws#836)

* change: downgrade c5 in integ tests and test all TF Script Mode images (aws#840)

* feature: emit estimator transformer tags to model (aws#815)

* doc: include FrameworkModel and ModelPackage in API docs (aws#833)

* prepare release v1.26.0

* update development version to v1.26.1.dev0

* fix: fix logger creation in Chainer integ test script (aws#843)

only one test failed due to a timeout. (the corresponding test failed with the other Python version.) talked to Rui offline.

* feature: add wait argument to estimator deploy (aws#842)

* prepare release v1.27.0

* update development version to v1.27.1.dev0

* feature: Add DataProcessing Fields for Batch Transform (aws#827)

* prepare release v1.28.0

* update development version to v1.28.1.dev0

* Update setup.py (aws#859)

* prepare release v1.28.1

* update development version to v1.28.2.dev0

* fix: prevent race condition in vpc tests (aws#863)

* prepare release v1.28.2

* update development version to v1.28.3.dev0

* doc: clean up MXNet and TF documentation (aws#865)

* doc: fix punctuation in MXNet version list (aws#866)

* change: update Sagemaker Neo regions and instance families (aws#862)

* prepare release v1.28.3

* update development version to v1.28.4.dev0

* feature: network isolation mode in training (aws#791)

* feature: network isolation mode in training

* feature: network isolation mode in tar support training

* change: documentation and check describe training job network isolation

* doc update

* doc update, remove inference section

* sourcedir

* type error fix

* change: moving not canary TFS tests to local mode (aws#870)

* Integrate black into development process (aws#873)

* change: Add Black formatting tool as dependency

As of this commit, Black formatting tool can be run with 'tox -e black-format'.
Black does not run as part of any automated process, yet.

Black is pulled in as a test dependency only if the Python version
is greater than 3.6, as the tool is not vended as part of any
earlier Python version.

* change: Resolve Black formatting failures

Black is unable to handle trailing 'L' or 'l' which is no longer
supported as of python 3.8.

This commit removes those unnecessary 'long' identifiers.

https://www.python.org/dev/peps/pep-0237/

* change: Format all files using Black

This commit contains no functional changes.

* change: Manually resolve flake8 violations after formatting

* change: Manually resolve pylint violations after formatting

* change: Enable black locally and in automated build.

This commit enables black-format as part of "tox tests/unit", in order to
format all files.
It also enables black-check as part of the remote builds, in order to
verify that all files are properly formatted.

* prepare release v1.29.0

* update development version to v1.29.1.dev0

* feature: add git_config and git_clone, validate method (aws#832)

* fix: add pytest.mark.local_mode annotation to broken tests (aws#876)

* fix: add pytest.mark.local_mode annotation to tests

* feature: add TensorFlow 1.13 support (aws#860)

* prepare release v1.30.0

* update development version to v1.30.1.dev0

* fix: add pytest.mark.local_mode annotation to broken tests (aws#884)

* change: remove unnecessary P3 tests from TFS integration tests (aws#885)

* change: allow only one integration test run per time (aws#880)

* change: Update buildspec.yml (aws#887)

* feature: use deep learning images (aws#883)

* prepare release v1.31.0

* update development version to v1.31.1.dev0

* change: build spec improvements. (aws#888)

* fix: remove unnecessary failure case tests (aws#892)

* change: print build execution time (aws#890)

* prepare release v1.31.1

* update development version to v1.31.2.dev0

* fix git test in test_estimator.py (aws#894)

* feature: support Endpoint_type for TF transform (aws#881)

* prepare release v1.32.0

* update development version to v1.32.1.dev0

* change: separate unit, local mode, and notebook tests in different buildspecs (aws#898)

* change: fix notebook tests (aws#900)

* Update displaytime.sh (aws#901)

* doc: refactor the overview topic in the sphinx project (aws#877)

* change: tighten pylint config and expand C and R exceptions (aws#899)

This commit tightens the pylint config with
inspiration from several of Google's pylint
configs.

This commit also expands the C and R exceptions
and disables the specific rules that cause issues
in this package.

* change: correct code per len-as-condition Pylint check (aws#902)

The Pylint check is not actually enabled in this commit as it conflicts
directly with NumPy. Pylint has corrected this, and it will be included
in their next release (2.4.0):
pylint-dev/pylint#2684

Once Pylint 2.4.0 is released, we can consume it and remove this check.
A summary of this information is included in a TODO near the relevant
Pylint disable rule (len-as-condition).

* prepare release v1.32.1

* update development version to v1.32.2.dev0

* change: remove superfluous parens per Pylint rule (aws#903)

* change: enable logging-format-interpolation pylint check (aws#904)

* documentation: add pypi, rtd, black badges to readme (aws#910)

* prepare release v1.32.2

* update development version to v1.32.3.dev0

* feature: allow custom model name during deploy (aws#792)

* feature: allow custom model name during deploy

* black check

* feature: git support for hosting models (aws#878)

* git integration for serving

* fix: Add ap-northeast-1 to Neo algorithms region map (aws#897)

* fix: reset default output path in Transformer.transform  (aws#905)

* fix: reset default output path on create transform job

* Unit and integration tests

* change: enable logging-not-lazy pylint check (aws#909)

* change: enable wrong-import-position pylint check (aws#907)

* change: enable wrong-import-position pylint check

* change: updating import pattern for sagemaker.tensorflow

* fix: fixing integration tests

* change: reformatting

* change: enable signature-differs pylint check (aws#915)

* Revert "change: enable wrong-import-position pylint check (aws#907)" (aws#916)

This reverts commit 8489f86.

* change: enable wrong-import-position pylint check (aws#917)

* change: remove TODO comment on import-error Pylint check (aws#918)

By running Pylint before any of the unit tests (and dependency
installs), the import-error check will always fail since the
dependencies are not yet installed.

We could move Pylint to a later stage to resolve this, but there's
value in this quick check occurring before the unit tests.

As a result, this Pylint check is being disabled.

* prepare release v1.33.0

* update development version to v1.33.1.dev0

* change: enable unidiomatic-typecheck pylint check (aws#921)

* change: enable no-else-return and no-else-raise pylint checks (aws#925)

* change: fix list serialization for 1P algos (aws#922)

* change: enable simplifiable-if-expression pylint checks (aws#926)

* feature: deal with credentials for Git support for GitHub (aws#914)

add authentication info

* feature: Git integration for CodeCommit (aws#927)

* add functions, tests and doc for CodeCommit

* change: enable inconsistent-return-statements Pylint check (aws#930)

Note that this commit also raises ValueErrors in situations that would
previously have returned None.

Per PEP8: Be consistent in return statements. Either all return
statements in a function should return an expression, or none of them
should. If any return statement returns an expression, any return
statements where no value is returned should explicitly state this as
return None, and an explicit return statement should be present at the
end of the function (if reachable).

* change: enable consider-merging-isinstance Pylint check (aws#932)

Note that this commit will also enable simplifiable-if-statement, as
there are no code changes needed for it.

* change: enable attribute-defined-outside-init Pylint check (aws#933)

The logic behind this rule is to improve readability by defining all
the attributes of a class inside the init function, even if it simply
sets them to None.

* change: enable wrong-import-order Pylint check (aws#935)

Per PEP8:
Imports should be grouped in the following order:
1- Standard library imports.
2- Related third party imports.
3- Local application/library specific imports.

*  change: enable ungrouped-imports Pylint check (aws#936)

* change: enable wrong-import-order Pylint check

Per PEP8:
Imports should be grouped in the following order:
1- Standard library imports.
2- Related third party imports.
3- Local application/library specific imports.

* change: fix attach for 1P algorithm estimators (aws#931)

* change: set num_processes_per_host only if provided by user (aws#928)

*  change: enable consider-using-in Pylint check (aws#938)

* change: enable consider-using-in Pylint check

*  change: enable too-many-public-methods Pylint check (aws#939)

* change: enable too-many-public-methods Pylint check

This is a useful check to have, but is a lot of work to retroactively
enforce.
Enabling it while ignoring the single violation allows the validation
to run for future code.

* change: enable chained-comparison Pylint check (aws#940)

* change: enable consider-using-ternary Pylint check (aws#942)

This commit will add an exclusion for all auto-generated files.

I chose to ignore the single violation, because the alternative is
confusingly convoluted:
`(hasattr(obj, '__getitem__') if hasattr(obj, '__iter__')
else isinstance(obj, str))`

* change: modify TODO on disabled Pylint check (aws#943)

The check recommendations are only valid for packages that exclusively
support Python 3. The changes cannot be made in Python 2.
The TODO was updated to clarify this.

* prepare release v1.34.0

* update development version to v1.34.1.dev0

* change: add MXNet 1.4.1 support (aws#886)

* change: format and add missing docstring placeholders (aws#945)

This commit will format all existing docstring to follow Google
style: https://sphinxcontrib-napoleon.readthedocs.io/en/latest/example_google.html
This commit will also add docstring placeholders to any class or method
previously missing it.

An ideal approach would be to take the time to include meaningful
docstrings in every file. However, since that is not a task that will
be prioritized, I've declared docstring bankruptcy on this package, in
order to enforce docstring on all future code changes to this package.

* change: allow serving script to be defined for deploy() and transformer() with frameworks (aws#944)

* change: update PyTorch version (aws#947)

* change: improve documentation of some functions (aws#864)

[pr-827][followups] Improve documentation of some functions.
Also some unit test fixes. See comments from marcio in
aws#827

* doc: update using_tensorflow topic (aws#946)

* fix: update TensorFlow script mode dependency list (aws#869)

* change: improving Chainer integ tests (aws#872)

* change: enable line-too-long Pylint check (aws#948)

* doc: add instructions for setting up Cloud9 environment. (aws#949)

Added instructions that allow for a low-cost ~10min environment setup.

* prepare release v1.34.1

* update development version to v1.34.2.dev0

* change: Replaced generic ValueError with custom subclass when reporting unexpected resource status (aws#919)

* doc: correct wording for Cloud9 environment setup instructions (aws#952)

package => repo

* change: removing unnecessary tests cases (aws#951)

* prepare release v1.34.2

* update development version to v1.34.3.dev0

* change: waiting for training tags to propagate in the test (aws#955)

* prepare release v1.34.3

* update development version to v1.34.4.dev0

* feature: allow serving image to be specified when calling MXNet.deploy (aws#959)

* prepare release v1.35.0

* update development version to v1.35.1.dev0

* doc: refactor and edit using_mxnet topic (aws#956)

* doc: refactor overview section per improvement plan

* Update doc/overview.rst

Co-Authored-By: Marcio Vinicius dos Santos <[email protected]>

* Update doc/overview.rst

Co-Authored-By: Marcio Vinicius dos Santos <[email protected]>

* doc: made changes per feedback comments

* doc: remove duplicate faq section and fixed heading

* doc: fix heading levels in overview.rst

* doc: update TensorFlow using topic

* doc: Update using_tf.rst to address feedback

* doc: fix comment in conf.py per build log

* doc: add newline to conf.py to fix error

* doc: addressed feedback for PR

* doc: update conf.py

* doc: remove duplicate byom section in overview.rst

* doc: remove duplicate headings in several rst files

* doc: Restructure and update Using MXNet topic

* doc: fix link

* doc: add link to mxnet readme container section in using_mxnet.rst topic

* fix: update sklearn document to include 3p dependency installation (aws#960)

* prepare release v1.35.1

* update development version to v1.35.2.dev0

* fix: allow Airflow enabled estimators to use absolute path entry_point (aws#965)

* change: ignore FI18 flake8 rule (aws#969)

* feature: support for TensorFlow 1.14 (aws#967)

* flake8 fixes

* black reformat

* Revert "Merging master branch in to eureka-master (aws#206)"

This reverts commit 080d06d561aa88a177c67f08114902ab292f3883.

* Black + Pylint fixes

* add latest api service model

* skip eureka integ tests temporarily

* Fix integ tests to work with preview SDK model (aws#215)

* Fix integ tests to work with preview SDK model.

* Use search to find trial components for analytics dataframe (aws#219)

* move boto client arg to end of the arg list for all eureka APIs

* Use search to find trial components in TrialAnalytics

* add test to verify value error is thrown if no component filter specified

* drop trial name from analytics frame as trial components wont have trial name in them in the future

* remove trial name column for analytics frame

* Eureka master (aws#236)

* Add ExperimentConfig for estimator.fit and transformer.transform

* experiment_config can be passed to estimator.fit
* experiment_config can be passed to transformer.transform
* unit tests for corresponding changes.

* Remove include only experiment integ tests from tox.ini

* Delete experiments integ tests

* Update the service-2.json.

* Bring in latest sagemaker models
* Remove internal-only shapes and internal operations

* Doc the three optional keys for ExperimentConfig dictionary

* Eureka master (aws#237)

* Add ExperimentConfig for estimator.fit and transformer.transform

* experiment_config can be passed to estimator.fit
* experiment_config can be passed to transformer.transform
* unit tests for corresponding changes.

* Remove include only experiment integ tests from tox.ini

* Delete experiments integ tests

* Update the service-2.json.

* Bring in latest sagemaker models
* Remove internal-only shapes and internal operations

* Doc the three optional keys for ExperimentConfig dictionary

* Fix analytics component and search functionality

* Delete all experiments related classes and their tests.
* Change TrialAnalytics to ExperimentAnalytics.
* Fix ExperimentAnalytics for m-n model change.
* Fix/Modify Search functionality

* Fix Docs

* Remove exp management doc from index

* Fix pass None type ExperimentConfig to transform request.

* Fix formatting in test_session.py

* Do not build empty filters list when experiment name is not given

* Add DisplayName to analytics table

* Fix formatting.

* Add sortBy and sortOrder for ExperimentAnalytics

* Eureka master (aws#259)

Fix bad merge

* Add ExperimentConfig to Processor  (aws#260)

* Add ExperimentConfig to Processor

* Remove broken experiment config from processor test (aws#261)


* Add ExperimentConfig to Processor

* Eureka master (aws#262)


* Remove old setup file and Eureka specific files.

* Eureka master (aws#264)


* Add back missing factorization machines integration test

* Minor style fixes (aws#265)



* Minor style fixes

* Fix broken SageMaker Experiments analytics integration tests (aws#267)


* Fix broken experiments_analytics integration tests

* Eureka master (aws#270)



* Remove experiment_config from analytics test
knakad pushed a commit that referenced this issue Dec 4, 2019
* Set eureka VERSION file

* Eureka master (#145)

* Eureka trial tracking interface

* Add experiments developer guides (#147)

* Add experiments developer guides

* Eureka master (#148)

* Add experiments developer guides

* Move to new experiment / trial / trial run data model. Add TrialRun class.

* Eureka master (#149)

* Add Trial class

* Experiment class (#151)

* Introduce active-record design, include first concrete type - Experiment.

* Add Trial and TrialRun active record classes (#152)


* Add Trial and TrialRun active records. Add common created_time / last_modified_time to Record

* List method (#153)


* Add general list classmethod to Record. Add concrete impl to experiment.Experiment

* Eureka master (#154)

* Use general list method in all list* methods in experiment, trial, trial_run

* Add Trial Run Tracker (#156)

* Add Trial Run Tracker

* Add helper methods to Experiment and Trial for fast-creating Trials and TrialRunTrackers

* Set Eureka build to just run linters, doc build, and unit tests

* Add integration tests (#159)

* Add experiment and trial integegration tests

* TrialRun bug fixes (#160)

Introduce UpdatedData property on TrialRun

* Adapt Python SDK to Experiments Api changes (#164)

* Move experiment to new api
* Pin pytest version to 4.4.1

* Fix list trials by experiment without passing experiment name. (#166)

* Pass experiment name for list trial api call.

* Eureka master (#167)

Flatten trial name input for create_trial

* Make providing the step name optional when creating a trial tracker (#169)

* Make step name optional when creating tracker

* Make create tracker obtain TRAINING_JOB_ARN from the environment. (#186)

* Make create tracker obtain TRAINING_JOB_ARN from the environment.

* when trial creates a tracker, the training_job_arn can be automatically set as source arn when creating the trial stage.
* correpsonding unit test.

* Add source_arn back as optional param for create_tracker

* make source_arn an optional param for create_tracker function.

* Resolving source arn skeleton for jobs.

* Changing _resolve_job_arn to _resolve_source_arn.

* Using generator to resolve source arn from the environment.

* TrialAnalytics class to convert trial step data to pandas dataframe (#188)

* TrialAnalytics class to convert trial stage data to pandas data frame
* pin version flake8-future-import to 0.4.5 to avoid build failure on import annotations missing
* pandas column ordering is different in py27 and py37. Sorting columns to make the order deterministic

* * Rename step to component (#194)

* Use new TrialComponent API structure for metrics, artifacts and parameters

* Improve documentation (#199)

* Improve documentation and remove unsupported parameters to list_trial_components

* minor doc update

* remove hardcoded alpha endpoint for experiments (#201)

* Generate sphinx docs for experiment classes (#204)

* Update Sphinx RST files to generate documentation for experiment classes

* Merging master branch in to eureka-master (#206)

* prepare release v1.18.16

* update development version to v1.18.17.dev0

* fix: use unique names for test training jobs (#765)

* prepare release v1.18.17

* update development version to v1.18.18.dev0

* change: add automatic model tuning integ test for TF script mode (#766)

* prepare release v1.18.18

* update development version to v1.18.19.dev0

* change: skip p2/p3 tests in eu-central-1 (#769)

* prepare release v1.18.19

* update development version to v1.18.20.dev0

* feature: add document embedding support to Object2Vec algorithm (#772)

* prepare release v1.19.0

* update development version to v1.19.1.dev0

* change: add py2 deprecation message for the deep learning framework images (#768)

* prepare release v1.19.1

* update development version to v1.19.2.dev0

* feature: add RL Ray 0.6.5 support (#779)

* fix: adjust Ray test script for Ray 0.6.5 (#781)

* fix: prevent false positive PR test results (#783)

* prepare release v1.20.0

* update development version to v1.20.1.dev0

* fix: update TrainingInputMode with s3_input InputMode (#776)

* prepare release v1.20.1

* update development version to v1.20.2.dev0

* fix: pin pytest version to 4.4.1 to avoid pluggy version conflict (#788)

* prepare release v1.20.2

* update development version to v1.20.3.dev0

* documentation: fix docs in regards to transform_fn for mxnet (#790)

* fix: skip local file check for TF requirements file when source_dir is an S3 URI (#798)

* fix: run tests if buildspec.yml has been modified (#786)

* prepare release v1.20.3

* update development version to v1.20.4.dev0

* feature: Support for TFS preprocessing (#797)

* prepare release v1.21.0

* update development version to v1.21.1.dev0

* fix: repack model function works without source directory (#804)

* prepare release v1.21.1

* update development version to v1.21.2.dev0

* fix: emit training jobs tags to estimator (#803)

* fix: set _current_job_name in attach() (#808)

* prepare release v1.21.2

* update development version to v1.21.3.dev0

* fix: honor source_dir from S3 (#811)

* feature: add encryption option to "record_set" (#794)

* feature: add encryption option to "record_set"

* prepare release v1.22.0

* update development version to v1.22.1.dev0

* documentation: update using_sklearn.rst parameter name (#814)

Incorrect parameter name in docs. Updated to match what is implemented in the method and what is used in other estimators.

* feature: support MXNet 1.4 with MMS (#812)

* prepare release v1.23.0

* update development version to v1.23.1.dev0

* feature: add region check for Neo service (#806)

* prepare release v1.24.0

* update development version to v1.24.1.dev0

* fix: add better default transform job name handling within Transformer (#822)

* feature: repack_model support dependencies and code location (#821)

* documentation: TFS support for pre/processing functions (#807)

* change: skip p2 tests in ap-south-east (#823)

* prepare release v1.25.0

* update development version to v1.25.1.dev0

* fix: use unique job name in hyperparameter tuning test (#829)

* prepare release v1.25.1

* update development version to v1.25.2.dev0

* feature: Add extra_args to enable encrypted objects upload (#836)

* change: downgrade c5 in integ tests and test all TF Script Mode images (#840)

* feature: emit estimator transformer tags to model (#815)

* doc: include FrameworkModel and ModelPackage in API docs (#833)

* prepare release v1.26.0

* update development version to v1.26.1.dev0

* fix: fix logger creation in Chainer integ test script (#843)

only one test failed due to a timeout. (the corresponding test failed with the other Python version.) talked to Rui offline.

* feature: add wait argument to estimator deploy (#842)

* prepare release v1.27.0

* update development version to v1.27.1.dev0

* feature: Add DataProcessing Fields for Batch Transform (#827)

* prepare release v1.28.0

* update development version to v1.28.1.dev0

* Update setup.py (#859)

* prepare release v1.28.1

* update development version to v1.28.2.dev0

* fix: prevent race condition in vpc tests (#863)

* prepare release v1.28.2

* update development version to v1.28.3.dev0

* doc: clean up MXNet and TF documentation (#865)

* doc: fix punctuation in MXNet version list (#866)

* change: update Sagemaker Neo regions and instance families (#862)

* prepare release v1.28.3

* update development version to v1.28.4.dev0

* feature: network isolation mode in training (#791)

* feature: network isolation mode in training

* feature: network isolation mode in tar support training

* change: documentation and check describe training job network isolation

* doc update

* doc update, remove inference section

* sourcedir

* type error fix

* change: moving not canary TFS tests to local mode (#870)

* Integrate black into development process (#873)

* change: Add Black formatting tool as dependency

As of this commit, Black formatting tool can be run with 'tox -e black-format'.
Black does not run as part of any automated process, yet.

Black is pulled in as a test dependency only if the Python version
is greater than 3.6, as the tool is not vended as part of any
earlier Python version.

* change: Resolve Black formatting failures

Black is unable to handle trailing 'L' or 'l' which is no longer
supported as of python 3.8.

This commit removes those unnecessary 'long' identifiers.

https://www.python.org/dev/peps/pep-0237/

* change: Format all files using Black

This commit contains no functional changes.

* change: Manually resolve flake8 violations after formatting

* change: Manually resolve pylint violations after formatting

* change: Enable black locally and in automated build.

This commit enables black-format as part of "tox tests/unit", in order to
format all files.
It also enables black-check as part of the remote builds, in order to
verify that all files are properly formatted.

* prepare release v1.29.0

* update development version to v1.29.1.dev0

* feature: add git_config and git_clone, validate method (#832)

* fix: add pytest.mark.local_mode annotation to broken tests (#876)

* fix: add pytest.mark.local_mode annotation to tests

* feature: add TensorFlow 1.13 support (#860)

* prepare release v1.30.0

* update development version to v1.30.1.dev0

* fix: add pytest.mark.local_mode annotation to broken tests (#884)

* change: remove unnecessary P3 tests from TFS integration tests (#885)

* change: allow only one integration test run per time (#880)

* change: Update buildspec.yml (#887)

* feature: use deep learning images (#883)

* prepare release v1.31.0

* update development version to v1.31.1.dev0

* change: build spec improvements. (#888)

* fix: remove unnecessary failure case tests (#892)

* change: print build execution time (#890)

* prepare release v1.31.1

* update development version to v1.31.2.dev0

* fix git test in test_estimator.py (#894)

* feature: support Endpoint_type for TF transform (#881)

* prepare release v1.32.0

* update development version to v1.32.1.dev0

* change: separate unit, local mode, and notebook tests in different buildspecs (#898)

* change: fix notebook tests (#900)

* Update displaytime.sh (#901)

* doc: refactor the overview topic in the sphinx project (#877)

* change: tighten pylint config and expand C and R exceptions (#899)

This commit tightens the pylint config with
inspiration from several of Google's pylint
configs.

This commit also expands the C and R exceptions
and disables the specific rules that cause issues
in this package.

* change: correct code per len-as-condition Pylint check (#902)

The Pylint check is not actually enabled in this commit as it conflicts
directly with NumPy. Pylint has corrected this, and it will be included
in their next release (2.4.0):
pylint-dev/pylint#2684

Once Pylint 2.4.0 is released, we can consume it and remove this check.
A summary of this information is included in a TODO near the relevant
Pylint disable rule (len-as-condition).

* prepare release v1.32.1

* update development version to v1.32.2.dev0

* change: remove superfluous parens per Pylint rule (#903)

* change: enable logging-format-interpolation pylint check (#904)

* documentation: add pypi, rtd, black badges to readme (#910)

* prepare release v1.32.2

* update development version to v1.32.3.dev0

* feature: allow custom model name during deploy (#792)

* feature: allow custom model name during deploy

* black check

* feature: git support for hosting models (#878)

* git integration for serving

* fix: Add ap-northeast-1 to Neo algorithms region map (#897)

* fix: reset default output path in Transformer.transform  (#905)

* fix: reset default output path on create transform job

* Unit and integration tests

* change: enable logging-not-lazy pylint check (#909)

* change: enable wrong-import-position pylint check (#907)

* change: enable wrong-import-position pylint check

* change: updating import pattern for sagemaker.tensorflow

* fix: fixing integration tests

* change: reformatting

* change: enable signature-differs pylint check (#915)

* Revert "change: enable wrong-import-position pylint check (#907)" (#916)

This reverts commit 8489f86.

* change: enable wrong-import-position pylint check (#917)

* change: remove TODO comment on import-error Pylint check (#918)

By running Pylint before any of the unit tests (and dependency
installs), the import-error check will always fail since the
dependencies are not yet installed.

We could move Pylint to a later stage to resolve this, but there's
value in this quick check occurring before the unit tests.

As a result, this Pylint check is being disabled.

* prepare release v1.33.0

* update development version to v1.33.1.dev0

* change: enable unidiomatic-typecheck pylint check (#921)

* change: enable no-else-return and no-else-raise pylint checks (#925)

* change: fix list serialization for 1P algos (#922)

* change: enable simplifiable-if-expression pylint checks (#926)

* feature: deal with credentials for Git support for GitHub (#914)

add authentication info

* feature: Git integration for CodeCommit (#927)

* add functions, tests and doc for CodeCommit

* change: enable inconsistent-return-statements Pylint check (#930)

Note that this commit also raises ValueErrors in situations that would
previously have returned None.

Per PEP8: Be consistent in return statements. Either all return
statements in a function should return an expression, or none of them
should. If any return statement returns an expression, any return
statements where no value is returned should explicitly state this as
return None, and an explicit return statement should be present at the
end of the function (if reachable).

* change: enable consider-merging-isinstance Pylint check (#932)

Note that this commit will also enable simplifiable-if-statement, as
there are no code changes needed for it.

* change: enable attribute-defined-outside-init Pylint check (#933)

The logic behind this rule is to improve readability by defining all
the attributes of a class inside the init function, even if it simply
sets them to None.

* change: enable wrong-import-order Pylint check (#935)

Per PEP8:
Imports should be grouped in the following order:
1- Standard library imports.
2- Related third party imports.
3- Local application/library specific imports.

*  change: enable ungrouped-imports Pylint check (#936)

* change: enable wrong-import-order Pylint check

Per PEP8:
Imports should be grouped in the following order:
1- Standard library imports.
2- Related third party imports.
3- Local application/library specific imports.

* change: fix attach for 1P algorithm estimators (#931)

* change: set num_processes_per_host only if provided by user (#928)

*  change: enable consider-using-in Pylint check (#938)

* change: enable consider-using-in Pylint check

*  change: enable too-many-public-methods Pylint check (#939)

* change: enable too-many-public-methods Pylint check

This is a useful check to have, but is a lot of work to retroactively
enforce.
Enabling it while ignoring the single violation allows the validation
to run for future code.

* change: enable chained-comparison Pylint check (#940)

* change: enable consider-using-ternary Pylint check (#942)

This commit will add an exclusion for all auto-generated files.

I chose to ignore the single violation, because the alternative is
confusingly convoluted:
`(hasattr(obj, '__getitem__') if hasattr(obj, '__iter__')
else isinstance(obj, str))`

* change: modify TODO on disabled Pylint check (#943)

The check recommendations are only valid for packages that exclusively
support Python 3. The changes cannot be made in Python 2.
The TODO was updated to clarify this.

* prepare release v1.34.0

* update development version to v1.34.1.dev0

* change: add MXNet 1.4.1 support (#886)

* change: format and add missing docstring placeholders (#945)

This commit will format all existing docstring to follow Google
style: https://sphinxcontrib-napoleon.readthedocs.io/en/latest/example_google.html
This commit will also add docstring placeholders to any class or method
previously missing it.

An ideal approach would be to take the time to include meaningful
docstrings in every file. However, since that is not a task that will
be prioritized, I've declared docstring bankruptcy on this package, in
order to enforce docstring on all future code changes to this package.

* change: allow serving script to be defined for deploy() and transformer() with frameworks (#944)

* change: update PyTorch version (#947)

* change: improve documentation of some functions (#864)

[pr-827][followups] Improve documentation of some functions.
Also some unit test fixes. See comments from marcio in
#827

* doc: update using_tensorflow topic (#946)

* fix: update TensorFlow script mode dependency list (#869)

* change: improving Chainer integ tests (#872)

* change: enable line-too-long Pylint check (#948)

* doc: add instructions for setting up Cloud9 environment. (#949)

Added instructions that allow for a low-cost ~10min environment setup.

* prepare release v1.34.1

* update development version to v1.34.2.dev0

* change: Replaced generic ValueError with custom subclass when reporting unexpected resource status (#919)

* doc: correct wording for Cloud9 environment setup instructions (#952)

package => repo

* change: removing unnecessary tests cases (#951)

* prepare release v1.34.2

* update development version to v1.34.3.dev0

* change: waiting for training tags to propagate in the test (#955)

* prepare release v1.34.3

* update development version to v1.34.4.dev0

* feature: allow serving image to be specified when calling MXNet.deploy (#959)

* prepare release v1.35.0

* update development version to v1.35.1.dev0

* doc: refactor and edit using_mxnet topic (#956)

* doc: refactor overview section per improvement plan

* Update doc/overview.rst

Co-Authored-By: Marcio Vinicius dos Santos <[email protected]>

* Update doc/overview.rst

Co-Authored-By: Marcio Vinicius dos Santos <[email protected]>

* doc: made changes per feedback comments

* doc: remove duplicate faq section and fixed heading

* doc: fix heading levels in overview.rst

* doc: update TensorFlow using topic

* doc: Update using_tf.rst to address feedback

* doc: fix comment in conf.py per build log

* doc: add newline to conf.py to fix error

* doc: addressed feedback for PR

* doc: update conf.py

* doc: remove duplicate byom section in overview.rst

* doc: remove duplicate headings in several rst files

* doc: Restructure and update Using MXNet topic

* doc: fix link

* doc: add link to mxnet readme container section in using_mxnet.rst topic

* fix: update sklearn document to include 3p dependency installation (#960)

* prepare release v1.35.1

* update development version to v1.35.2.dev0

* fix: allow Airflow enabled estimators to use absolute path entry_point (#965)

* change: ignore FI18 flake8 rule (#969)

* feature: support for TensorFlow 1.14 (#967)

* flake8 fixes

* black reformat

* Revert "Merging master branch in to eureka-master (#206)"

This reverts commit 080d06d561aa88a177c67f08114902ab292f3883.

* Black + Pylint fixes

* add latest api service model

* skip eureka integ tests temporarily

* Fix integ tests to work with preview SDK model (#215)

* Fix integ tests to work with preview SDK model.

* Use search to find trial components for analytics dataframe (#219)

* move boto client arg to end of the arg list for all eureka APIs

* Use search to find trial components in TrialAnalytics

* add test to verify value error is thrown if no component filter specified

* drop trial name from analytics frame as trial components wont have trial name in them in the future

* remove trial name column for analytics frame

* Eureka master (#236)

* Add ExperimentConfig for estimator.fit and transformer.transform

* experiment_config can be passed to estimator.fit
* experiment_config can be passed to transformer.transform
* unit tests for corresponding changes.

* Remove include only experiment integ tests from tox.ini

* Delete experiments integ tests

* Update the service-2.json.

* Bring in latest sagemaker models
* Remove internal-only shapes and internal operations

* Doc the three optional keys for ExperimentConfig dictionary

* Eureka master (#237)

* Add ExperimentConfig for estimator.fit and transformer.transform

* experiment_config can be passed to estimator.fit
* experiment_config can be passed to transformer.transform
* unit tests for corresponding changes.

* Remove include only experiment integ tests from tox.ini

* Delete experiments integ tests

* Update the service-2.json.

* Bring in latest sagemaker models
* Remove internal-only shapes and internal operations

* Doc the three optional keys for ExperimentConfig dictionary

* Fix analytics component and search functionality

* Delete all experiments related classes and their tests.
* Change TrialAnalytics to ExperimentAnalytics.
* Fix ExperimentAnalytics for m-n model change.
* Fix/Modify Search functionality

* Fix Docs

* Remove exp management doc from index

* Fix pass None type ExperimentConfig to transform request.

* Fix formatting in test_session.py

* Do not build empty filters list when experiment name is not given

* Add DisplayName to analytics table

* Fix formatting.

* Add sortBy and sortOrder for ExperimentAnalytics

* Eureka master (#259)

Fix bad merge

* Add ExperimentConfig to Processor  (#260)

* Add ExperimentConfig to Processor

* Remove broken experiment config from processor test (#261)


* Add ExperimentConfig to Processor

* Eureka master (#262)


* Remove old setup file and Eureka specific files.

* Eureka master (#264)


* Add back missing factorization machines integration test

* Minor style fixes (#265)



* Minor style fixes

* Fix broken SageMaker Experiments analytics integration tests (#267)


* Fix broken experiments_analytics integration tests

* Eureka master (#270)



* Remove experiment_config from analytics test
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants