fix flake8 error

Chuyang Deng · Chuyang Deng · commit dbeaa953d8d3 · 2020-05-19T16:36:53.000-07:00
diff --git a/CHANGELOG.md b/CHANGELOG.md
@@ -1,5 +1,28 @@
 # Changelog
 
+## v1.58.3 (2020-05-19)
+
+### Bug Fixes and Other Changes
+
+ * update DatasetFormat key name for sagemakerCaptureJson
+
+### Documentation Changes
+
+ * update Processing job max_runtime_in_seconds docstring
+
+## v1.58.2.post0 (2020-05-18)
+
+### Documentation Changes
+
+ * specify S3 source_dir needs to point to a tar file
+ * update PyTorch BYOM topic
+
+## v1.58.2 (2020-05-13)
+
+### Bug Fixes and Other Changes
+
+ * address flake8 error
+
 ## v1.58.1 (2020-05-11)
 
 ### Bug Fixes and Other Changes
diff --git a/VERSION b/VERSION
@@ -1 +1 @@
-1.58.2.dev0
+1.58.4.dev0
diff --git a/doc/using_pytorch.rst b/doc/using_pytorch.rst
@@ -90,7 +90,7 @@ Note that SageMaker doesn't support argparse actions. If you want to use, for ex
 you need to specify `type` as `bool` in your script and provide an explicit `True` or `False` value for this hyperparameter
 when instantiating PyTorch Estimator.
 
-For more on training environment variables, please visit `SageMaker Containers <https://github.com/aws/sagemaker-containers>`_.
+For more on training environment variables, see the `SageMaker Training Toolkit <https://github.com/aws/sagemaker-training-toolkit/blob/master/ENVIRONMENT_VARIABLES.md>`_.
 
 Save the Model
 --------------
@@ -115,7 +115,7 @@ to a certain filesystem path called ``model_dir``. This value is accessible thro
         with open(os.path.join(args.model_dir, 'model.pth'), 'wb') as f:
             torch.save(model.state_dict(), f)
 
-After your training job is complete, SageMaker will compress and upload the serialized model to S3, and your model data
+After your training job is complete, SageMaker compresses and uploads the serialized model to S3, and your model data
 will be available in the S3 ``output_path`` you specified when you created the PyTorch Estimator.
 
 If you are using Elastic Inference, you must convert your models to the TorchScript format and use ``torch.jit.save`` to save the model.
@@ -566,12 +566,76 @@ The function should return a byte array of data serialized to content_type.
 The default implementation expects ``prediction`` to be a torch.Tensor and can serialize the result to JSON, CSV, or NPY.
 It accepts response content types of "application/json", "text/csv", and "application/x-npy".
 
-Working with Existing Model Data and Training Jobs
-==================================================
 
-Attach to existing training jobs
+Bring your own model
+====================
+
+You can deploy a PyTorch model that you trained outside of SageMaker by using the ``PyTorchModel`` class.
+Typically, you save a PyTorch model as a file with extension ``.pt`` or ``.pth``.
+To do this, you need to:
+
+* Write an inference script.
+* Create the directory structure for your model files.
+* Create the ``PyTorchModel`` object.
+
+Write an inference script
+-------------------------
+
+You must create an inference script that implements (at least) the ``model_fn`` function that calls the loaded model to get a prediction.
+
+**Note**: If you use elastic inference with PyTorch, you can use the default ``model_fn`` implementation provided in the serving container.
+
+Optionally, you can also implement ``input_fn`` and ``output_fn`` to process input and output,
+and ``predict_fn`` to customize how the model server gets predictions from the loaded model.
+For information about how to write an inference script, see `Serve a PyTorch Model <#serve-a-pytorch-model>`_.
+Save the inference script in the same folder where you saved your PyTorch model.
+Pass the filename of the inference script as the ``entry_point`` parameter when you create the ``PyTorchModel`` object.
+
+Create the directory structure for your model files
+---------------------------------------------------
+
+You have to create a directory structure and place your model files in the correct location.
+The ``PyTorchModel`` constructor packs the files into a ``tar.gz`` file and uploads it to S3.
+
+The directory structure where you saved your PyTorch model should look something like the following:
+
+**Note:** This directory struture is for PyTorch versions 1.2 and higher.
+For the directory structure for versions 1.1 and lower,
+see `For versions 1.1 and lower <#for-versions-1.1-and-lower>`_.
+
+::
+
+    |   my_model
+    |           |--model.pth
+    |
+    |           code
+    |               |--inference.py
+    |               |--requirements.txt
+
+Where ``requirments.txt`` is an optional file that specifies dependencies on third-party libraries.
+
+Create a ``PyTorchModel`` object
 --------------------------------
 
+Now call the :class:`sagemaker.pytorch.model.PyTorchModel` constructor to create a model object, and then call its ``deploy()`` method to deploy your model for inference.
+
+.. code:: python
+
+    from sagemaker import get_execution_role
+    role = get_execution_role()
+
+    pytorch_model = PyTorchModel(model_data='s3://my-bucket/my-path/model.tar.gz', role=role,
+                                 entry_point='inference.py')
+
+    predictor = pytorch_model.deploy(instance_type='ml.c4.xlarge', initial_instance_count=1)
+
+
+Now you can call the ``predict()`` method to get predictions from your deployed model.
+
+***********************************************
+Attach an estimator to an existing training job
+***********************************************
+
 You can attach a PyTorch Estimator to an existing training job using the
 ``attach`` method.
 
@@ -592,69 +656,6 @@ The ``attach`` method accepts the following arguments:
 -  ``sagemaker_session:`` The Session used
    to interact with SageMaker
 
-Deploy Endpoints from model data
---------------------------------
-
-In addition to attaching to existing training jobs, you can deploy models directly from model data in S3.
-The following code sample shows how to do this, using the ``PyTorchModel`` class.
-
-.. code:: python
-
-    pytorch_model = PyTorchModel(model_data='s3://bucket/model.tar.gz', role='SageMakerRole',
-                                 entry_point='transform_script.py')
-
-    predictor = pytorch_model.deploy(instance_type='ml.c4.xlarge', initial_instance_count=1)
-
-The PyTorchModel constructor takes the following arguments:
-
--  ``model_dat:`` An S3 location of a SageMaker model data
-   .tar.gz file
--  ``image:`` A Docker image URI
--  ``role:`` An IAM role name or Arn for SageMaker to access AWS
-   resources on your behalf.
--  ``predictor_cls:`` A function to
-   call to create a predictor. If not None, ``deploy`` will return the
-   result of invoking this function on the created endpoint name
--  ``env:`` Environment variables to run with
-   ``image`` when hosted in SageMaker.
--  ``name:`` The model name. If None, a default model name will be
-   selected on each ``deploy.``
--  ``entry_point:`` Path (absolute or relative) to the Python file
-   which should be executed as the entry point to model hosting.
--  ``source_dir:`` Optional. Path (absolute or relative) to a
-   directory with any other training source code dependencies including
-   the entry point file. Structure within this directory will be
-   preserved when training on SageMaker.
--  ``enable_cloudwatch_metrics:`` Optional. If true, training
-   and hosting containers will generate Cloudwatch metrics under the
-   AWS/SageMakerContainer namespace.
--  ``container_log_level:`` Log level to use within the container.
-   Valid values are defined in the Python logging module.
--  ``code_location:`` Optional. Name of the S3 bucket where your
-   custom code will be uploaded to. If not specified, will use the
-   SageMaker default bucket created by sagemaker.Session.
--  ``sagemaker_session:`` The SageMaker Session
-   object, used for SageMaker interaction
-
-Your model data must be a .tar.gz file in S3. SageMaker Training Job model data is saved to .tar.gz files in S3,
-however if you have local data you want to deploy, you can prepare the data yourself.
-
-Assuming you have a local directory containg your model data named "my_model" you can tar and gzip compress the file and
-upload to S3 using the following commands:
-
-::
-
-    tar -czf model.tar.gz my_model
-    aws s3 cp model.tar.gz s3://my-bucket/my-path/model.tar.gz
-
-This uploads the contents of my_model to a gzip compressed tar file to S3 in the bucket "my-bucket", with the key
-"my-path/model.tar.gz".
-
-To run this command, you'll need the AWS CLI tool installed. Please refer to our `FAQ`_ for more information on
-installing this.
-
-.. _FAQ: ../../../README.rst#faq
-
 *************************
 PyTorch Training Examples
 *************************
diff --git a/src/sagemaker/estimator.py b/src/sagemaker/estimator.py
@@ -1481,12 +1481,14 @@ def __init__(
                     >>>         |----- test.py
 
                     You can assign entry_point='src/train.py'.
-            source_dir (str): Path (absolute, relative, or an S3 URI) to a directory with
-                any other training source code dependencies aside from the entry
-                point file (default: None). Structure within this directory are
-                preserved when training on Amazon SageMaker. If 'git_config' is
-                provided, 'source_dir' should be a relative location to a
-                directory in the Git repo. .. admonition:: Example
+            source_dir (str): Path (absolute, relative or an S3 URI) to a directory
+                with any other training source code dependencies aside from the entry
+                point file (default: None). If ``source_dir`` is an S3 URI, it must
+                point to a tar.gz file. Structure within this directory are preserved
+                when training on Amazon SageMaker. If 'git_config' is provided,
+                'source_dir' should be a relative location to a directory in the Git
+                repo.
+                .. admonition:: Example
 
                     With the following GitHub repo directory structure:
 
diff --git a/src/sagemaker/model.py b/src/sagemaker/model.py
@@ -663,13 +663,14 @@ def __init__(
                     >>>         |----- test.py
 
                     You can assign entry_point='src/inference.py'.
-            source_dir (str): Path (absolute or relative) to a directory with
-                any other training source code dependencies aside from the entry
-                point file (default: None). Structure within this directory will
-                be preserved when training on SageMaker. If 'git_config' is
-                provided, 'source_dir' should be a relative location to a
-                directory in the Git repo. If the directory points to S3, no
-                code will be uploaded and the S3 location will be used instead.
+            source_dir (str): Path (absolute, relative or an S3 URI) to a directory
+                with any other training source code dependencies aside from the entry
+                point file (default: None). If ``source_dir`` is an S3 URI, it must
+                point to a tar.gz file. Structure within this directory are preserved
+                when training on Amazon SageMaker. If 'git_config' is provided,
+                'source_dir' should be a relative location to a directory in the Git repo.
+                If the directory points to S3, no code will be uploaded and the S3 location
+                will be used instead.
                 .. admonition:: Example
 
                     With the following GitHub repo directory structure:
diff --git a/src/sagemaker/model_monitor/dataset_format.py b/src/sagemaker/model_monitor/dataset_format.py
@@ -58,4 +58,4 @@ def sagemaker_capture_json():
             dict: JSON string containing DatasetFormat to be used by DefaultModelMonitor.
 
         """
-        return {"sagemaker_capture_json": {}}
+        return {"sagemakerCaptureJson": {}}
diff --git a/src/sagemaker/mxnet/estimator.py b/src/sagemaker/mxnet/estimator.py
@@ -72,10 +72,11 @@ def __init__(
             entry_point (str): Path (absolute or relative) to the Python source
                 file which should be executed as the entry point to training.
                 This should be compatible with either Python 2.7 or Python 3.5.
-            source_dir (str): Path (absolute or relative) to a directory with
-                any other training source code dependencies aside from the entry
-                point file (default: None). Structure within this directory are
-                preserved when training on Amazon SageMaker.
+            source_dir (str): Path (absolute, relative or an S3 URI) to a directory
+                with any other training source code dependencies aside from the entry
+                point file (default: None). If ``source_dir`` is an S3 URI, it must
+                point to a tar.gz file. Structure within this directory are preserved
+                when training on Amazon SageMaker.
             hyperparameters (dict): Hyperparameters that will be used for
                 training (default: None). The hyperparameters are made
                 accessible as a dict[str, str] to the training code on
diff --git a/src/sagemaker/processing.py b/src/sagemaker/processing.py
@@ -70,7 +70,8 @@ def __init__(
             output_kms_key (str): The KMS key ID for processing job outputs (default: None).
             max_runtime_in_seconds (int): Timeout in seconds (default: None).
                 After this amount of time, Amazon SageMaker terminates the job,
-                regardless of its current status.
+                regardless of its current status. If `max_runtime_in_seconds` is not
+                specified, the default value is 24 hours.
             base_job_name (str): Prefix for processing job name. If not specified,
                 the processor generates a default job name, based on the
                 processing image name and current timestamp.
@@ -309,7 +310,8 @@ def __init__(
             output_kms_key (str): The KMS key ID for processing job outputs (default: None).
             max_runtime_in_seconds (int): Timeout in seconds (default: None).
                 After this amount of time, Amazon SageMaker terminates the job,
-                regardless of its current status.
+                regardless of its current status. If `max_runtime_in_seconds` is not
+                specified, the default value is 24 hours.
             base_job_name (str): Prefix for processing name. If not specified,
                 the processor generates a default job name, based on the
                 processing image name and current timestamp.
diff --git a/src/sagemaker/pytorch/estimator.py b/src/sagemaker/pytorch/estimator.py
@@ -68,10 +68,11 @@ def __init__(
             entry_point (str): Path (absolute or relative) to the Python source
                 file which should be executed as the entry point to training.
                 This should be compatible with either Python 2.7 or Python 3.5.
-            source_dir (str): Path (absolute or relative) to a directory with
-                any other training source code dependencies aside from the entry
-                point file (default: None). Structure within this directory are
-                preserved when training on Amazon SageMaker.
+            source_dir (str): Path (absolute, relative or an S3 URI) to a directory
+                with any other training source code dependencies aside from the entry
+                point file (default: None). If ``source_dir`` is an S3 URI, it must
+                point to a tar.gz file. Structure within this directory are preserved
+                when training on Amazon SageMaker.
             hyperparameters (dict): Hyperparameters that will be used for
                 training (default: None). The hyperparameters are made
                 accessible as a dict[str, str] to the training code on
diff --git a/src/sagemaker/rl/estimator.py b/src/sagemaker/rl/estimator.py
@@ -109,10 +109,11 @@ def __init__(
             framework (sagemaker.rl.RLFramework): Framework (MXNet or
                 TensorFlow) you want to be used as a toolkit backed for
                 reinforcement learning training.
-            source_dir (str): Path (absolute or relative) to a directory with
-                any other training source code dependencies aside from the entry
-                point file (default: None). Structure within this directory is
-                preserved when training on Amazon SageMaker.
+            source_dir (str): Path (absolute, relative or an S3 URI) to a directory
+                with any other training source code dependencies aside from the entry
+                point file (default: None). If ``source_dir`` is an S3 URI, it must
+                point to a tar.gz file. Structure within this directory are preserved
+                when training on Amazon SageMaker.
             hyperparameters (dict): Hyperparameters that will be used for
                 training (default: None). The hyperparameters are made
                 accessible as a dict[str, str] to the training code on
diff --git a/src/sagemaker/session.py b/src/sagemaker/session.py
@@ -2583,6 +2583,18 @@ def wait_for_tuning_job(self, job, poll=5):
         self._check_job_status(job, desc, "HyperParameterTuningJobStatus")
         return desc
 
+    def describe_transform_job(self, job_name):
+        """Calls the DescribeTransformJob API for the given job name
+        and returns the response.
+
+        Args:
+            job_name (str): The name of the transform job to describe.
+
+        Returns:
+            dict: A dictionary response with the transform job description.
+        """
+        return self.sagemaker_client.describe_transform_job(TransformJobName=job_name)
+
     def wait_for_transform_job(self, job, poll=5):
         """Wait for an Amazon SageMaker transform job to complete.
 
diff --git a/src/sagemaker/sklearn/estimator.py b/src/sagemaker/sklearn/estimator.py
@@ -69,10 +69,11 @@ def __init__(
             framework_version (str): Scikit-learn version you want to use for
                 executing your model training code. List of supported versions
                 https://github.com/aws/sagemaker-python-sdk#sklearn-sagemaker-estimators
-            source_dir (str): Path (absolute or relative) to a directory with
-                any other training source code dependencies aside from the entry
-                point file (default: None). Structure within this directory are
-                preserved when training on Amazon SageMaker.
+            source_dir (str): Path (absolute, relative or an S3 URI) to a directory
+                with any other training source code dependencies aside from the entry
+                point file (default: None). If ``source_dir`` is an S3 URI, it must
+                point to a tar.gz file. Structure within this directory are preserved
+                when training on Amazon SageMaker.
             hyperparameters (dict): Hyperparameters that will be used for
                 training (default: None). The hyperparameters are made
                 accessible as a dict[str, str] to the training code on
diff --git a/src/sagemaker/tensorflow/estimator.py b/src/sagemaker/tensorflow/estimator.py
@@ -570,11 +570,12 @@ def create_model(
                 should be executed as the entry point to training. If not specified and
                 ``endpoint_type`` is 'tensorflow-serving', no entry point is used. If
                 ``endpoint_type`` is also ``None``, then the training entry point is used.
-            source_dir (str): Path (absolute or relative) to a directory with any other serving
-                source code dependencies aside from the entry point file. If not specified and
-                ``endpoint_type`` is 'tensorflow-serving', no source_dir is used. If
-                ``endpoint_type`` is also ``None``, then the model source directory from training
-                is used.
+            source_dir (str): Path (absolute or relative or an S3 URI ) to a directory with any
+                other serving source code dependencies aside from the entry point file. If
+                ``source_dir`` is an S3 URI, it must point to a tar.gz file. If not specified
+                and ``endpoint_type`` is 'tensorflow-serving', no source_dir is used. If
+                ``endpoint_type`` is also ``None``, then the model source directory from
+                training is used.
             dependencies (list[str]): A list of paths to directories (absolute or relative) with
                 any additional libraries that will be exported to the container.
                 If not specified and ``endpoint_type`` is 'tensorflow-serving', ``dependencies`` is
diff --git a/src/sagemaker/xgboost/estimator.py b/src/sagemaker/xgboost/estimator.py
diff --git a/tests/integ/test_transformer.py b/tests/integ/test_transformer.py
diff --git a/tests/unit/test_amazon_estimator.py b/tests/unit/test_amazon_estimator.py
diff --git a/tests/unit/test_fw_utils.py b/tests/unit/test_fw_utils.py