-
Notifications
You must be signed in to change notification settings - Fork 1.2k
Enable setting VPC config when creating/deploying models #412
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Models created from Estimator or Training Job use the same VpcConfig
…mator, Model, Session
Configure subnets and security groups to support multi-node VPC jobs and endpoints Implement more VPC integ tests covering several Estimators and Predictors
tests/integ/vpc_utils.py
Outdated
@@ -47,25 +49,43 @@ def _get_route_table_id(ec2_client, vpc_id): | |||
return desc['RouteTables'][0]['RouteTableId'] | |||
|
|||
|
|||
def create_vpc_with_name(ec2_client, name): | |||
def create_vpc_with_name(ec2_client, region, name): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this called outside of the module? We should make it private as well. I think it was my fault.
tests/integ/vpc_utils.py
Outdated
|
||
subnet_id = ec2_client.create_subnet(CidrBlock='10.0.0.0/24', VpcId=vpc_id)['Subnet']['SubnetId'] | ||
# sagemaker endpoints require subnets in at least 2 different AZs for vpc mode | ||
subnet_id_a = ec2_client.create_subnet(CidrBlock='10.0.0.0/24', VpcId=vpc_id, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Optional - I would create another method for creating subnets. YMMV
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think I'll keep this under one create_vpc_with_name
method. Most of the create_subnet
parameters are coupled to the vpc's ID and CIDR range anyway.
tests/integ/test_vpc.py
Outdated
subnets=subnet_ids, | ||
security_group_ids=[security_group_id]) | ||
|
||
with timeout(minutes=TRAINING_DEFAULT_TIMEOUT_MINUTES): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The code from this line to the end of the function is duplicated multiple times in this test module. I would have a function called something like this:
def run_training_job_and_verify(estimator) with timeout(...): estimator.fit()
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I thought about this, though one difficulty is that each Estimator/Predictor has some slightly different input shapes and serializers that would need to be set by the generic function. I'll give it a try.
src/sagemaker/session.py
Outdated
@@ -489,6 +496,11 @@ def create_model_from_job(self, training_job_name, name=None, role=None, primary | |||
model_data_url (str): S3 location of the model data (default: None). If None, defaults to | |||
the ``ModelS3Artifacts`` of ``training_job_name``. | |||
env (dict[string,string]): Model environment variables (default: {}). | |||
vpc_config (dict[str, list[str]]): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Delete
src/sagemaker/session.py
Outdated
@@ -671,6 +684,10 @@ def endpoint_from_job(self, job_name, initial_instance_count, instance_type, | |||
wait (bool): Whether to wait for the endpoint deployment to complete before returning (default: True). | |||
model_environment_vars (dict[str, str]): Environment variables to set on the model container | |||
(default: None). | |||
model_vpc_config (dict[str, list[str]]): Overrides VpcConfig set on the model. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Tom mentioned he wants us to use a marker like USE_MODEL_VPC. Are we still doing that?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Tom clarified that he was asking for a global constant like this which would be used as the default value. It makes it more clear for a user to reuse training's vpc_config
, or else override to some different value, or even None
. I will add this.
@@ -721,7 +743,8 @@ def endpoint_from_model_data(self, model_s3_location, deployment_image, initial_ | |||
role=role, | |||
primary_container=container_def(image=deployment_image, | |||
model_data_url=model_s3_location, | |||
env=model_environment_vars)) | |||
env=model_environment_vars), | |||
vpc_config=model_vpc_config) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
indentation
@@ -163,11 +164,19 @@ def __init__(self, role, train_instance_count, train_instance_type, | |||
self.factors_init_sigma = factors_init_sigma | |||
self.factors_init_value = factors_init_value | |||
|
|||
def create_model(self): | |||
def create_model(self, vpc_config=None): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nit: add a link to the documentation about vpc config in the docstring.l
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I may add a doc link in the Model/Session methods, but I don't see the need to add it to every Estimator.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That might be fine. This project follows the Python Google Style standards https://github.com/google/styleguide/blob/gh-pages/pyguide.md#383-functions-and-methods. Please take them in consideration.
src/sagemaker/utils.py
Outdated
@@ -88,6 +88,13 @@ def wrapper(*args, **kwargs): | |||
return wrapper | |||
|
|||
|
|||
def vpc_config_dict(subnets, security_group_ids): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This function should not be in utils. Maybe in session.py where container_def is defined? https://github.com/aws/sagemaker-python-sdk/blob/master/src/sagemaker/session.py#L917
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please, add docstrings.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The function is imported and used in all Estimators too. It's how they translate the subnets
and security_group_ids
attributes into a vpc_config
passed to the Model or Session method. For this reason I put it in utils so it can be imported from a central module.
Would you rather see
from sagemaker.session import Session, vpc_config_dict
or
from sagemaker.session import Session
from sagemaker.utils import vpc_config_dict
either way, yes I will add docstring
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I also considered moving to a totally new module vpc_utils
which would include the following:
VPC_CONFIG_DEFAULT = {}
def vpc_config_dict(subnets, security_group_ids):
...
def vpc_config_unpack(vpc_config):
...
Any guiding principles for when to split off a new utility module?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Based on https://github.com/google/styleguide/blob/gh-pages/pyguide.md#224-decision. I would prefer
from sagemaker import session
session.Session ...
session.vpc_config_dict
I am ok if you decide to move the vpc to a separate file thought.
tests/integ/test_vpc.py
Outdated
DATA_PATH = os.path.join(DATA_DIR, 'iris', 'data') | ||
|
||
|
||
@pytest.mark.skipif(PYTHON_VERSION != 'py2', reason="TensorFlow image supports only python 2.") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: use not instead of !=
…idate to sanitize
Codecov Report
@@ Coverage Diff @@
## master #412 +/- ##
==========================================
+ Coverage 93.29% 93.36% +0.07%
==========================================
Files 52 53 +1
Lines 3655 3724 +69
==========================================
+ Hits 3410 3477 +67
- Misses 245 247 +2
Continue to review full report at Codecov.
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for working on this.
* Create SageMaker Models including optional VpcConfig Models created from Estimator or Training Job use the same VpcConfig * Adjust naming, preparing, and passing vpc_config in base classes Estimator, Model, Session * Add optional vpc_config parameter for Amazon/Frameworks models * Improve integ test coverage of VPC features Configure subnets and security groups to support multi-node VPC jobs and endpoints Implement more VPC integ tests covering several Estimators and Predictors * Update changelog * Add vpc_utils with constants and utilities for handling VpcConfig * Rename/refactor parameter vpc_config_override for models created from estimators * Consolidate VPC integ tests to a single TensorFlow multi-instance case * Tweaks to vpc_utils: make VPC_CONFIG_DEFAULT immutable and rename validate to sanitize * Add integ test for transform with vpc config * Update README with SageMaker VPC documentation and examples * Remove tests/integ/test_vpc.py
Updated: notebook name from HPO to tune
Issue #, if available: N/A
Description of changes: Previous commits (#353 , #370 ) enabled users to set VpcConfig on Estimators/Training Jobs. The same VpcConfig can be set on Models in order to protect network traffic in inference use cases (see docs).
In this PR I've added the optional parameter
vpc_config
toModel
constructors andSession
methods, and modified mostcreate_model
anddeploy
implementations to pass forward the same VPC config used in training, or take an override from the optional parametervpc_config
.To ensure the functionality works for both training and inference, I've improved the integ tests to cover VPC features for a variety of models. Note that name and resource configurations belonging to the test VPC have changed. Before running the integ tests in your account, you should delete the existing test VPC.
Merge Checklist
Put an
x
in the boxes that apply. You can also fill these out after creating the PR. If you're unsure about any of them, don't hesitate to ask. We're here to help! This is simply a reminder of what we are going to look for before merging your pull request.By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.