Skip to content

Add APIs to export transform and deploy config #497

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 11 commits into from
Nov 18, 2018

Conversation

yangaws
Copy link
Contributor

@yangaws yangaws commented Nov 16, 2018

Issue #, if available:

Description of changes:

  1. Add APIs to export transform config from a transformer or an estimator.
  2. Add APIs to export deploy config from a model or an estimator.
  3. Add related unit tests.

Merge Checklist

Put an x in the boxes that apply. You can also fill these out after creating the PR. If you're unsure about any of them, don't hesitate to ask. We're here to help! This is simply a reminder of what we are going to look for before merging your pull request.

  • I have read the CONTRIBUTING doc
  • I have added tests that prove my fix is effective or that my feature works (if appropriate)
  • I have updated the changelog with a description of my changes (if appropriate)
  • I have updated any necessary documentation (if appropriate)

By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.

@codecov-io
Copy link

codecov-io commented Nov 16, 2018

Codecov Report

Merging #497 into master will decrease coverage by <.01%.
The diff coverage is 95.23%.

Impacted file tree graph

@@            Coverage Diff             @@
##           master     #497      +/-   ##
==========================================
- Coverage   94.28%   94.28%   -0.01%     
==========================================
  Files          59       59              
  Lines        4551     4603      +52     
==========================================
+ Hits         4291     4340      +49     
- Misses        260      263       +3
Impacted Files Coverage Δ
src/sagemaker/transformer.py 100% <ø> (ø) ⬆️
src/sagemaker/estimator.py 90.47% <100%> (+0.16%) ⬆️
src/sagemaker/workflow/airflow.py 92.09% <93.61%> (+0.55%) ⬆️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update e37ac12...d1a005e. Read the comment docs.

@yangaws yangaws requested a review from iquintero November 17, 2018 00:09
Args:
model (sagemaker.model.FrameworkModel): The framework model
instance_type (str): The EC2 instance type to deploy this Model to. For example, 'ml.p2.xlarge'.
s3_operations (dict): The dict to specify S3 operations (upload `source_dir`).

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

these newlines should stay

If None, server will use one worker per vCPU. Only effective when estimator is
SageMaker framework.
If None, server will use one worker per vCPU. Only effective when estimator is
SageMaker framework.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

s/is/is a

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Updated.

@@ -394,5 +392,223 @@ def model_config_from_estimator(instance_type, estimator, role=None, image=None,
elif isinstance(estimator, sagemaker.estimator.Framework):
model = estimator.create_model(model_server_workers=model_server_workers, role=role,
vpc_config_override=vpc_config_override)
else:
raise TypeError('Estimator must be one of BYO estimator, framework estimator or amazon algorithm'
'estimator.')
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it might be more helpful here to put the paths to the classes themselves, i.e. sagemaker.estimator.Estimator, etc.


if transformer.output_path is None:
transformer.output_path = 's3://{}/{}'.format(
transformer.sagemaker_session.default_bucket(), transformer._current_job_name)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is this logic that's also in Transformer.transform? if so, I wonder if it'd be worth refactoring it into a private method that you can call on transformer here (like EstimatorBase._prepare_for_training)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yep it's similar except the naming method is different. One is airflow_name_from_base and the other is name_from_base. So if I wrap them in a method, that method needs to take a method as input arg which I don't want to do for just these small amounts of codes for now. If we apply the different methods to get name first, then there's just one line left which I guess no need to wrap. I am still targeting at what I mentioned in other comments that we could reformat the codes in session/estimator/transformer/etc in a way that we could wrap a lot codes in methods and couple airflow with them.


return model_config(instance_type, model, role, image)


def transform_config(transformer, data, data_type='S3Prefix', content_type=None, compression_type=None,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

are these kinds of methods potentially reusable for Session? (this is sort of out of scope of this PR)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yep I do want it to be reused in session. Now I cannot because I need to bypass all the validations/describe calls/etc to use Jinja templating.

If in session, we can construct the config first, and then apply all manipulations afterwards. That would be awesome and these two parts can be coupled. (which is really good since for now I am worried if session part got updated, for example, new entries in boto config introduced, airflow part cannot catch it)

Copy link
Contributor

@iquintero iquintero left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just some minor comments.

transform_env = model.env.copy()
if env is not None:
transform_env.update(env)
if self.latest_training_job is not None:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

you can just do

if self.latest_training_job:

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yep that looks nice. But the problem is we won't get into
if var
if var is things like [], or {}. Not just None. Hence I kind of want to explicitly say the only thing I don't want is None. Especially sometimes I do have some var = {} and need to do var.update() after. If there's an if in between, I do need to deal with the {} case.
Yep we probably should do
if var
most of times and
if var is not None
only when needed. But using the second one for now is kind of consistent to what we have in the existed classes (like estimator, model, etc). I prefer not to change for now. If later when we introduce style guide and can force it everywhere, we probably could do the change for all.

logging.warning('No finished training job found associated with this estimator. Please make sure'
'this estimator is only used for building workflow config')
model_name = self._current_job_name
transform_env = env if env is not None else {}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

similar here,

transform_env = env or {}

vpc_config = model.vpc_config
self.sagemaker_session.create_model(model_name, role, container_def, vpc_config)
transform_env = model.env.copy()
if env is not None:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

if env:

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Answered above.

transformer._current_job_name = job_name
else:
base_name = transformer.base_transform_job_name
transformer._current_job_name = utils.airflow_name_from_base(base_name) \
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

transformer._current_job_name = utils.airflow_name_from_base(base_name) if base_name else transformer.model_name

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Answered above.

'TransformResources': job_config['resource_config'],
}

if transformer.strategy is not None:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

similar to my other comments, do all these as

if transformer.strategy:

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Answered above.

production_variant = sagemaker.production_variant(model.name, instance_type, initial_instance_count)
name = model.name
config_options = {'EndpointConfigName': name, 'ProductionVariants': [production_variant]}
if tags is not None:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

if tags:

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Answered above.


# if there is s3 operations needed for model, move it to root level of config
s3_operations = model_base_config.pop('S3Operations', None)
if s3_operations is not None:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

^^

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Answered above.

def test_transformer_config(sagemaker_session):
tf_transformer = transformer.Transformer(
model_name="tensorflow-model",
instance_count="{{ instance_count }}",
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Im not sure I understand what this syntax is doing?

"{{ instance_count }}"

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is Jinja templates. It will be evaluated during Airflow runtime. For example, I can put something in database (using xcom in ariflow) and do "{{ task_instance.xcom_pull(task_id='task', key='key') }}" to get the record in the table 'task' with key 'key'. Here in unit tests I just make some random Jinja templating strings for testing purpose.

assert config == expected_config


def test_transform_config_from_amazon_alg_estimator(sagemaker_session):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can you include in the test name that this also tests the code paths where you don't supply the optional args?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Talked offline. No need to change.

if env is not None:
transform_env.update(env)
else:
logging.warning('No finished training job found associated with this estimator. Please make sure'
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

are there unit tests for this change?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yep the transformer from estimator tests will cover this.

@yangaws yangaws merged commit 53a43f6 into aws:master Nov 18, 2018
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants