Skip to content

Commit 8b7be01

Browse files
committed
Merge branch 'master' into zwei
2 parents 666910c + e774fcf commit 8b7be01

Some content is hidden

Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.

46 files changed

+809
-153
lines changed

CHANGELOG.md

Lines changed: 74 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -1,5 +1,79 @@
11
# Changelog
22

3+
## v1.72.0 (2020-07-29)
4+
5+
### Features
6+
7+
* Neo: Add Granular Target Description support for compilation
8+
9+
### Documentation Changes
10+
11+
* Add xgboost doc on bring your own model
12+
* fix typos on processing docs
13+
14+
## v1.71.1 (2020-07-27)
15+
16+
### Bug Fixes and Other Changes
17+
18+
* remove redundant information from the user_agent string.
19+
20+
### Testing and Release Infrastructure
21+
22+
* use unique model name in TFS integ tests
23+
* use pytest-cov instead of coverage
24+
25+
## v1.71.0 (2020-07-23)
26+
27+
### Features
28+
29+
* Add mpi support for mxnet estimator api
30+
31+
### Bug Fixes and Other Changes
32+
33+
* use 'sagemaker' logger instead of root logger
34+
* account for "py36" and "py37" in image tag parsing
35+
36+
## v1.70.2 (2020-07-22)
37+
38+
### Bug Fixes and Other Changes
39+
40+
* convert network_config in processing_config to dict
41+
42+
### Documentation Changes
43+
44+
* Add ECR URI Estimator example
45+
46+
## v1.70.1 (2020-07-21)
47+
48+
### Bug Fixes and Other Changes
49+
50+
* Nullable fields in processing_config
51+
52+
## v1.70.0 (2020-07-20)
53+
54+
### Features
55+
56+
* Add model monitor support for us-gov-west-1
57+
* support TFS 2.2
58+
59+
### Bug Fixes and Other Changes
60+
61+
* reshape Artifacts into data frame in ExperimentsAnalytics
62+
63+
### Documentation Changes
64+
65+
* fix MXNet version info for requirements.txt support
66+
67+
## v1.69.0 (2020-07-09)
68+
69+
### Features
70+
71+
* Add ModelClientConfig Fields for Batch Transform
72+
73+
### Documentation Changes
74+
75+
* add KFP Processing component
76+
377
## v2.0.0.rc1 (2020-07-08)
478

579
### Breaking Changes

buildspec-unittests.yml

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -7,11 +7,11 @@ phases:
77
- TOX_PARALLEL_NO_SPINNER=1
88
- PY_COLORS=0
99
- start_time=`date +%s`
10-
- tox -e flake8,pylint,twine,black-check
10+
- tox -e flake8,pylint,twine,black-check --parallel all
1111
- ./ci-scripts/displaytime.sh 'flake8,pylint,twine,black-check' $start_time
1212

1313
- start_time=`date +%s`
14-
- tox -e sphinx,doc8
14+
- tox -e sphinx,doc8 --parallel all
1515
- ./ci-scripts/displaytime.sh 'sphinx,doc8' $start_time
1616

1717
# run unit tests

doc/amazon_sagemaker_processing.rst

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -10,14 +10,14 @@ Amazon SageMaker Processing allows you to run steps for data pre- or post-proces
1010
Background
1111
==========
1212

13-
Amazon SageMaker lets developers and data scientists train and deploy machine learning models. With Amazon SageMaker Processing, you can run processing jobs on for data processing steps in your machine learning pipeline, which accept data from Amazon S3 as input, and put data into Amazon S3 as output.
13+
Amazon SageMaker lets developers and data scientists train and deploy machine learning models. With Amazon SageMaker Processing, you can run processing jobs for data processing steps in your machine learning pipeline. Processing jobs accept data from Amazon S3 as input and store data into Amazon S3 as output.
1414

1515
.. image:: ./amazon_sagemaker_processing_image1.png
1616

1717
Setup
1818
=====
1919

20-
The fastest way to run get started with Amazon SageMaker Processing is by running a Jupyter notebook. You can follow the `Getting Started with Amazon SageMaker`_ guide to start running notebooks on Amazon SageMaker.
20+
The fastest way to get started with Amazon SageMaker Processing is by running a Jupyter notebook. You can follow the `Getting Started with Amazon SageMaker`_ guide to start running notebooks on Amazon SageMaker.
2121

2222
.. _Getting Started with Amazon SageMaker: https://docs.aws.amazon.com/sagemaker/latest/dg/gs.html
2323

doc/frameworks/mxnet/using_mxnet.rst

Lines changed: 4 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -159,13 +159,14 @@ If there are other packages you want to use with your script, you can include a
159159
Both ``requirements.txt`` and your training script should be put in the same folder.
160160
You must specify this folder in ``source_dir`` argument when creating an MXNet estimator.
161161

162-
The function of installing packages using ``requirements.txt`` is supported for all MXNet versions during training.
162+
The function of installing packages using ``requirements.txt`` is supported for MXNet versions 1.3.0 and higher during training.
163+
163164
When serving an MXNet model, support for this function varies with MXNet versions.
164165
For MXNet 1.6.0 or newer, ``requirements.txt`` must be under folder ``code``.
165166
The SageMaker MXNet Estimator automatically saves ``code`` in ``model.tar.gz`` after training (assuming you set up your script and ``requirements.txt`` correctly as stipulated in the previous paragraph).
166167
In the case of bringing your own trained model for deployment, you must save ``requirements.txt`` under folder ``code`` in ``model.tar.gz`` yourself or specify it through ``dependencies``.
167-
For MXNet 1.4.1, ``requirements.txt`` is not supported for inference.
168-
For MXNet 0.12.1-1.3.0, ``requirements.txt`` must be in ``source_dir``.
168+
For MXNet 0.12.1-1.2.1, 1.4.0-1.4.1, ``requirements.txt`` is not supported for inference.
169+
For MXNet 1.3.0, ``requirements.txt`` must be in ``source_dir``.
169170

170171
A ``requirements.txt`` file is a text file that contains a list of items that are installed by using ``pip install``.
171172
You can also specify the version of an item to install.

doc/frameworks/tensorflow/using_tf.rst

Lines changed: 22 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -178,6 +178,28 @@ To use Python 3.7, please specify both of the args:
178178
Where the S3 url is a path to your training data within Amazon S3.
179179
The constructor keyword arguments define how SageMaker runs your training script.
180180

181+
Specify a Docker image using an Estimator
182+
-----------------------------------------
183+
184+
There are use cases, such as extending an existing pre-built Amazon SageMaker images, that require specifing a Docker image when creating an Estimator by directly specifying the ECR URI instead of the Python and framework version. For a full list of available container URIs, see `Available Deep Learning Containers Images <https://github.com/aws/deep-learning-containers/blob/master/available_images.md>`__ For more information on using Docker containers, see `Use Your Own Algorithms or Models with Amazon SageMaker <https://docs.aws.amazon.com/sagemaker/latest/dg/your-algorithms.html>`__.
185+
186+
When specifying the image, you must use the ``image_name=''`` arg to replace the following arg:
187+
188+
- ``py_version=''``
189+
190+
You should still specify the ``framework_version=''`` arg because the SageMaker Python SDK accomodates for differences in the images based on the version.
191+
192+
The following example uses the ``image_name=''`` arg to specify the container image, Python version, and framework version.
193+
194+
.. code:: python
195+
196+
tf_estimator = TensorFlow(entry_point='tf-train.py',
197+
role='SageMakerRole',
198+
train_instance_count=1,
199+
train_instance_type='ml.p2.xlarge',
200+
image_name='763104351884.dkr.ecr.<region>.amazonaws.com/<framework>-<job type>:<framework version>-<cpu/gpu>-<python version>-ubuntu18.04',
201+
script_mode=True)
202+
181203
For more information about the sagemaker.tensorflow.TensorFlow estimator, see `SageMaker TensorFlow Classes`_.
182204

183205
Call the fit Method

doc/frameworks/xgboost/using_xgboost.rst

Lines changed: 50 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -390,6 +390,56 @@ The function should return a byte array of data serialized to ``content_type``.
390390
The default implementation expects ``prediction`` to be a NumPy array and can serialize the result to JSON, CSV, or NPY.
391391
It accepts response content types of "application/json", "text/csv", and "application/x-npy".
392392

393+
Bring Your Own Model
394+
--------------------
395+
396+
You can deploy an XGBoost model that you trained outside of SageMaker by using the Amazon SageMaker XGBoost container.
397+
Typically, you save an XGBoost model by pickling the ``Booster`` object or calling ``booster.save_model``.
398+
The XGBoost `built-in algorithm mode <https://docs.aws.amazon.com/sagemaker/latest/dg/xgboost.html#xgboost-modes>`_
399+
supports both a pickled ``Booster`` object and a model produced by ``booster.save_model``.
400+
You can also deploy an XGBoost model by using XGBoost as a framework.
401+
By using XGBoost as a framework, you have more flexibility.
402+
To deploy an XGBoost model by using XGBoost as a framework, you need to:
403+
404+
- Write an inference script.
405+
- Create the XGBoostModel object.
406+
407+
Write an Inference Script
408+
^^^^^^^^^^^^^^^^^^^^^^^^^
409+
410+
You must create an inference script that implements (at least) the ``model_fn`` function that calls the loaded model to get a prediction.
411+
412+
Optionally, you can also implement ``input_fn`` and ``output_fn`` to process input and output,
413+
and ``predict_fn`` to customize how the model server gets predictions from the loaded model.
414+
For information about how to write an inference script, see `SageMaker XGBoost Model Server <#sagemaker-xgboost-model-server>`_.
415+
Pass the filename of the inference script as the ``entry_point`` parameter when you create the `XGBoostModel` object.
416+
417+
Create an XGBoostModel Object
418+
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
419+
420+
To create a model object, call the ``sagemaker.xgboost.model.XGBoostModel`` constructor,
421+
and then call its ``deploy()`` method to deploy your model for inference.
422+
423+
.. code:: python
424+
425+
xgboost_model = XGBoostModel(
426+
model_data="s3://my-bucket/my-path/model.tar.gz",
427+
role="my-role",
428+
entry_point="inference.py",
429+
framework_version="1.0-1"
430+
)
431+
432+
predictor = xgboost_model.deploy(
433+
instance_type='ml.c4.xlarge',
434+
initial_instance_count=1
435+
)
436+
437+
# If payload is a string in LIBSVM format, we need to change serializer.
438+
predictor.serializer = str
439+
predictor.predict("<label> <index1>:<value1> <index2>:<value2>")
440+
441+
To get predictions from your deployed model, you can call the ``predict()`` method.
442+
393443
Host Multiple Models with Multi-Model Endpoints
394444
-----------------------------------------------
395445

@@ -401,7 +451,6 @@ in the AWS documentation.
401451
For a sample notebook that uses Amazon SageMaker to deploy multiple XGBoost models to an endpoint, see the
402452
`Multi-Model Endpoint XGBoost Sample Notebook <https://github.com/awslabs/amazon-sagemaker-examples/blob/master/advanced_functionality/multi_model_xgboost_home_value/xgboost_multi_model_endpoint_home_value.ipynb>`_.
403453

404-
405454
*************************
406455
SageMaker XGBoost Classes
407456
*************************

doc/workflows/kubernetes/amazon_sagemaker_components_for_kubeflow_pipelines.rst

Lines changed: 4 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -89,6 +89,10 @@ Pipelines workflow. For more information, see \ `SageMaker
8989
hyperparameter optimization Kubeflow Pipeline
9090
component <https://github.com/kubeflow/pipelines/tree/master/components/aws/sagemaker/hyperparameter_tuning>`__.
9191

92+
**Processing**
93+
94+
The Processing component enables you to submit processing jobs to Amazon SageMaker directly from a Kubeflow Pipelines workflow. For more information, see \ `SageMaker Processing Kubeflow Pipeline component <https://github.com/kubeflow/pipelines/tree/master/components/aws/sagemaker/process>`__.
95+
9296
Inference components
9397
^^^^^^^^^^^^^^^^^^^^
9498

setup.py

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -33,7 +33,7 @@ def read_version():
3333

3434
# Declare minimal set for installation
3535
required_packages = [
36-
"boto3>=1.13.24",
36+
"boto3>=1.14.12",
3737
"google-pasta",
3838
"numpy>=1.9.0",
3939
"protobuf>=3.1",

src/sagemaker/analytics.py

Lines changed: 39 additions & 6 deletions
Original file line numberDiff line numberDiff line change
@@ -23,11 +23,12 @@
2323
from sagemaker.session import Session
2424
from sagemaker.utils import DeferredError
2525

26+
logger = logging.getLogger(__name__)
2627

2728
try:
2829
import pandas as pd
2930
except ImportError as e:
30-
logging.warning("pandas failed to import. Analytics features will be impaired or broken.")
31+
logger.warning("pandas failed to import. Analytics features will be impaired or broken.")
3132
# Any subsequent attempt to use pandas will raise the ImportError
3233
pd = DeferredError(e)
3334

@@ -251,15 +252,13 @@ def training_job_summaries(self, force_refresh=False):
251252
output = []
252253
next_args = {}
253254
for count in range(100):
254-
logging.debug("Calling list_training_jobs_for_hyper_parameter_tuning_job %d", count)
255+
logger.debug("Calling list_training_jobs_for_hyper_parameter_tuning_job %d", count)
255256
raw_result = self._sage_client.list_training_jobs_for_hyper_parameter_tuning_job(
256257
HyperParameterTuningJobName=self.name, MaxResults=100, **next_args
257258
)
258259
new_output = raw_result["TrainingJobSummaries"]
259260
output.extend(new_output)
260-
logging.debug(
261-
"Got %d more TrainingJobs. Total so far: %d", len(new_output), len(output)
262-
)
261+
logger.debug("Got %d more TrainingJobs. Total so far: %d", len(new_output), len(output))
263262
if ("NextToken" in raw_result) and (len(new_output) > 0):
264263
next_args["NextToken"] = raw_result["NextToken"]
265264
else:
@@ -373,7 +372,7 @@ def _fetch_metric(self, metric_name):
373372
}
374373
raw_cwm_data = self._cloudwatch.get_metric_statistics(**request)["Datapoints"]
375374
if len(raw_cwm_data) == 0:
376-
logging.warning("Warning: No metrics called %s found", metric_name)
375+
logger.warning("Warning: No metrics called %s found", metric_name)
377376
return
378377

379378
# Process data: normalize to starting time, and sort.
@@ -431,6 +430,8 @@ def __init__(
431430
metric_names=None,
432431
parameter_names=None,
433432
sagemaker_session=None,
433+
input_artifact_names=None,
434+
output_artifact_names=None,
434435
):
435436
"""Initialize a ``ExperimentAnalytics`` instance.
436437
@@ -450,6 +451,11 @@ def __init__(
450451
sagemaker_session (sagemaker.session.Session): Session object which manages interactions
451452
with Amazon SageMaker APIs and any other AWS services needed. If not specified,
452453
one is created using the default AWS configuration chain.
454+
input_artifact_names(dict optional):The input artifacts for the experiment. Examples of
455+
input artifacts are datasets, algorithms, hyperparameters, source code, and instance
456+
types.
457+
output_artifact_names(dict optional): The output artifacts for the experiment. Examples
458+
of output artifacts are metrics, snapshots, logs, and images.
453459
"""
454460
sagemaker_session = sagemaker_session or Session()
455461
self._sage_client = sagemaker_session.sagemaker_client
@@ -463,6 +469,8 @@ def __init__(
463469
self._sort_order = sort_order
464470
self._metric_names = metric_names
465471
self._parameter_names = parameter_names
472+
self._input_artifact_names = input_artifact_names
473+
self._output_artifact_names = output_artifact_names
466474
self._trial_components = None
467475
super(ExperimentAnalytics, self).__init__()
468476
self.clear_cache()
@@ -516,6 +524,21 @@ def _reshape_metrics(self, metrics):
516524
out["{} - {}".format(metric_name, stat_type)] = stat_value
517525
return out
518526

527+
def _reshape_artifacts(self, artifacts, _artifact_names):
528+
"""Reshape trial component input/output artifacts to a pandas column
529+
Args:
530+
artifacts: trial component input/output artifacts
531+
Returns:
532+
dict: Key: artifacts name, Value: artifacts value
533+
"""
534+
out = OrderedDict()
535+
for name, value in sorted(artifacts.items()):
536+
if _artifact_names and (name not in _artifact_names):
537+
continue
538+
out["{} - {}".format(name, "MediaType")] = value.get("MediaType")
539+
out["{} - {}".format(name, "Value")] = value.get("Value")
540+
return out
541+
519542
def _reshape(self, trial_component):
520543
"""Reshape trial component data to pandas columns
521544
Args:
@@ -533,6 +556,16 @@ def _reshape(self, trial_component):
533556

534557
out.update(self._reshape_parameters(trial_component.get("Parameters", [])))
535558
out.update(self._reshape_metrics(trial_component.get("Metrics", [])))
559+
out.update(
560+
self._reshape_artifacts(
561+
trial_component.get("InputArtifacts", []), self._input_artifact_names
562+
)
563+
)
564+
out.update(
565+
self._reshape_artifacts(
566+
trial_component.get("OutputArtifacts", []), self._output_artifact_names
567+
)
568+
)
536569
return out
537570

538571
def _fetch_dataframe(self):

0 commit comments

Comments
 (0)