Skip to content

Enable setting VPC config when creating/deploying models #412

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 14 commits into from
Oct 5, 2018

Conversation

mmeidl
Copy link
Contributor

@mmeidl mmeidl commented Sep 29, 2018

Issue #, if available: N/A

Description of changes: Previous commits (#353 , #370 ) enabled users to set VpcConfig on Estimators/Training Jobs. The same VpcConfig can be set on Models in order to protect network traffic in inference use cases (see docs).

In this PR I've added the optional parameter vpc_config to Model constructors and Session methods, and modified most create_model and deploy implementations to pass forward the same VPC config used in training, or take an override from the optional parameter vpc_config.

To ensure the functionality works for both training and inference, I've improved the integ tests to cover VPC features for a variety of models. Note that name and resource configurations belonging to the test VPC have changed. Before running the integ tests in your account, you should delete the existing test VPC.

Merge Checklist

Put an x in the boxes that apply. You can also fill these out after creating the PR. If you're unsure about any of them, don't hesitate to ask. We're here to help! This is simply a reminder of what we are going to look for before merging your pull request.

  • I have read the CONTRIBUTING doc
  • I have added tests that prove my fix is effective or that my feature works (if appropriate)
  • I have updated the changelog with a description of my changes (if appropriate)
  • I have updated any necessary documentation (if appropriate)

By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.

Michael Meidl added 5 commits September 25, 2018 01:58
Models created from Estimator or Training Job use the same VpcConfig
Configure subnets and security groups to support multi-node VPC jobs and endpoints
Implement more VPC integ tests covering several Estimators and Predictors
@@ -47,25 +49,43 @@ def _get_route_table_id(ec2_client, vpc_id):
return desc['RouteTables'][0]['RouteTableId']


def create_vpc_with_name(ec2_client, name):
def create_vpc_with_name(ec2_client, region, name):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this called outside of the module? We should make it private as well. I think it was my fault.


subnet_id = ec2_client.create_subnet(CidrBlock='10.0.0.0/24', VpcId=vpc_id)['Subnet']['SubnetId']
# sagemaker endpoints require subnets in at least 2 different AZs for vpc mode
subnet_id_a = ec2_client.create_subnet(CidrBlock='10.0.0.0/24', VpcId=vpc_id,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Optional - I would create another method for creating subnets. YMMV

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think I'll keep this under one create_vpc_with_name method. Most of the create_subnet parameters are coupled to the vpc's ID and CIDR range anyway.

subnets=subnet_ids,
security_group_ids=[security_group_id])

with timeout(minutes=TRAINING_DEFAULT_TIMEOUT_MINUTES):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The code from this line to the end of the function is duplicated multiple times in this test module. I would have a function called something like this:

def run_training_job_and_verify(estimator) with timeout(...): estimator.fit()

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I thought about this, though one difficulty is that each Estimator/Predictor has some slightly different input shapes and serializers that would need to be set by the generic function. I'll give it a try.

@@ -489,6 +496,11 @@ def create_model_from_job(self, training_job_name, name=None, role=None, primary
model_data_url (str): S3 location of the model data (default: None). If None, defaults to
the ``ModelS3Artifacts`` of ``training_job_name``.
env (dict[string,string]): Model environment variables (default: {}).
vpc_config (dict[str, list[str]]):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Delete

@@ -671,6 +684,10 @@ def endpoint_from_job(self, job_name, initial_instance_count, instance_type,
wait (bool): Whether to wait for the endpoint deployment to complete before returning (default: True).
model_environment_vars (dict[str, str]): Environment variables to set on the model container
(default: None).
model_vpc_config (dict[str, list[str]]): Overrides VpcConfig set on the model.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Tom mentioned he wants us to use a marker like USE_MODEL_VPC. Are we still doing that?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Tom clarified that he was asking for a global constant like this which would be used as the default value. It makes it more clear for a user to reuse training's vpc_config, or else override to some different value, or even None. I will add this.

@@ -721,7 +743,8 @@ def endpoint_from_model_data(self, model_s3_location, deployment_image, initial_
role=role,
primary_container=container_def(image=deployment_image,
model_data_url=model_s3_location,
env=model_environment_vars))
env=model_environment_vars),
vpc_config=model_vpc_config)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

indentation

@@ -163,11 +164,19 @@ def __init__(self, role, train_instance_count, train_instance_type,
self.factors_init_sigma = factors_init_sigma
self.factors_init_value = factors_init_value

def create_model(self):
def create_model(self, vpc_config=None):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: add a link to the documentation about vpc config in the docstring.l

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I may add a doc link in the Model/Session methods, but I don't see the need to add it to every Estimator.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That might be fine. This project follows the Python Google Style standards https://github.com/google/styleguide/blob/gh-pages/pyguide.md#383-functions-and-methods. Please take them in consideration.

@@ -88,6 +88,13 @@ def wrapper(*args, **kwargs):
return wrapper


def vpc_config_dict(subnets, security_group_ids):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This function should not be in utils. Maybe in session.py where container_def is defined? https://github.com/aws/sagemaker-python-sdk/blob/master/src/sagemaker/session.py#L917

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please, add docstrings.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The function is imported and used in all Estimators too. It's how they translate the subnets and security_group_ids attributes into a vpc_config passed to the Model or Session method. For this reason I put it in utils so it can be imported from a central module.

Would you rather see

from sagemaker.session import Session, vpc_config_dict

or

from sagemaker.session import Session
from sagemaker.utils import vpc_config_dict

either way, yes I will add docstring

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I also considered moving to a totally new module vpc_utils which would include the following:

VPC_CONFIG_DEFAULT = {}


def vpc_config_dict(subnets, security_group_ids):
    ...


def vpc_config_unpack(vpc_config):
    ...

Any guiding principles for when to split off a new utility module?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Based on https://github.com/google/styleguide/blob/gh-pages/pyguide.md#224-decision. I would prefer

from sagemaker import session

session.Session ...
session.vpc_config_dict

I am ok if you decide to move the vpc to a separate file thought.

DATA_PATH = os.path.join(DATA_DIR, 'iris', 'data')


@pytest.mark.skipif(PYTHON_VERSION != 'py2', reason="TensorFlow image supports only python 2.")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: use not instead of !=

@codecov-io
Copy link

codecov-io commented Oct 4, 2018

Codecov Report

Merging #412 into master will increase coverage by 0.07%.
The diff coverage is 98.23%.

Impacted file tree graph

@@            Coverage Diff             @@
##           master     #412      +/-   ##
==========================================
+ Coverage   93.29%   93.36%   +0.07%     
==========================================
  Files          52       53       +1     
  Lines        3655     3724      +69     
==========================================
+ Hits         3410     3477      +67     
- Misses        245      247       +2
Impacted Files Coverage Δ
src/sagemaker/model.py 95.58% <100%> (+0.06%) ⬆️
src/sagemaker/amazon/randomcutforest.py 100% <100%> (ø) ⬆️
src/sagemaker/amazon/ntm.py 100% <100%> (ø) ⬆️
src/sagemaker/tensorflow/estimator.py 93.43% <100%> (+0.04%) ⬆️
src/sagemaker/estimator.py 89.51% <100%> (+0.45%) ⬆️
src/sagemaker/amazon/lda.py 97.56% <100%> (+0.06%) ⬆️
src/sagemaker/session.py 89.43% <100%> (+0.32%) ⬆️
src/sagemaker/amazon/knn.py 100% <100%> (ø) ⬆️
src/sagemaker/pytorch/estimator.py 100% <100%> (ø) ⬆️
src/sagemaker/amazon/factorization_machines.py 100% <100%> (ø) ⬆️
... and 8 more

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 4d6a4a4...aedd5cd. Read the comment docs.

mvsusp
mvsusp previously approved these changes Oct 4, 2018
Copy link
Contributor

@mvsusp mvsusp left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for working on this.

mvsusp
mvsusp previously approved these changes Oct 4, 2018
@mvsusp mvsusp merged commit dd44d25 into aws:master Oct 5, 2018
pdasamzn pushed a commit to pdasamzn/sagemaker-python-sdk that referenced this pull request Nov 1, 2018
* Create SageMaker Models including optional VpcConfig

Models created from Estimator or Training Job use the same VpcConfig

* Adjust naming, preparing, and passing vpc_config in base classes Estimator, Model, Session

* Add optional vpc_config parameter for Amazon/Frameworks models

* Improve integ test coverage of VPC features

Configure subnets and security groups to support multi-node VPC jobs and endpoints
Implement more VPC integ tests covering several Estimators and Predictors

* Update changelog

* Add vpc_utils with constants and utilities for handling VpcConfig

* Rename/refactor parameter vpc_config_override for models created from estimators

* Consolidate VPC integ tests to a single TensorFlow multi-instance case

* Tweaks to vpc_utils: make VPC_CONFIG_DEFAULT immutable and rename validate to sanitize

* Add integ test for transform with vpc config

* Update README with SageMaker VPC documentation and examples

* Remove tests/integ/test_vpc.py
apacker pushed a commit to apacker/sagemaker-python-sdk that referenced this pull request Nov 15, 2018
Updated: notebook name from HPO to tune
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants