TF Framework support for encrypting code upload #646

gnbk-aws · 2019-02-15T22:41:52Z

System Information

Framework (e.g. TensorFlow) / Algorithm (e.g. KMeans): TensorFlow Framework
Framework Version: 1.10,1.11
Python Version: 3.x
CPU or GPU: GPU
Python SDK Version: latest
Are you using a custom image: no

Describe the problem

A large data science org has a requirement for encrypted S3 buckets. BlazingText works effectively and pushes data to encrypted S3 buckets. With the same parameters as BlazingText (role, session, subnets, security groups, and output_path) plus a few others for the Framework specifically (code_location, output_kms_key), I am getting a putObject error (access denied).

Minimal repro / logs

The problem is specifically in the process _prepare_for_training -> _stage_user_code_in_s3 -> tar_and_upload_dir -> object_upload_file. It looks like the Boto3 command here is not using the KMS key, even when one is provided to the framework. Here's the specific line:

sagemaker-python-sdk/src/sagemaker/fw_utils.py

Line 180 in ddfc301

session.resource('s3').Object(bucket, key).upload_file(tar_file)

The upload_file command could have extra ServerSideEncryption and SSEMKSKeyId args for the S3 Transfer Manager.

Exact command to reproduce:

estimator = TensorFlow(entry_point = 'model_file.py', role = role, output_path = 's3://encryptedbucket/key', code_location = 's3://encryptedbucket/code_key', hyperparameters = hyperparameter_dict, training_steps = training_steps, evaluation_steps = evaluation_steps, train_instance_count = 1, train_instance_type = 'ml.p3.2xlarge', sagemaker_session = sagemaker_session, subnets = ['subnet'], security_group_ids = ['securitygroup'], output_kms_ket = 'alias/aws/s3', train_volume_kms_key = 'alias/aws/s3')

estimator.fit(inputs = 's3://encryptedbucket/training_key')

Please let me know if you need any further information--thank you!

ChoiByungWook · 2019-02-16T02:04:26Z

Hello @gnbk-aws,

Thank you for the suggestion. I think this change would require us to modify at the very least the Estimator class to take in an additional parameter for the s3_kms_key: https://github.com/aws/sagemaker-python-sdk/blob/master/src/sagemaker/estimator.py#L51

I'll relay this to the team as a feature request, however I am unable to provide any ETA.

gnbk-aws · 2019-02-16T02:37:05Z

Hi @ChoiByungWook,

Thanks for the prompt response!

I noticed that the estimator has a "outout_kms_key" parameter used for encrypting training results.

sagemaker-python-sdk/src/sagemaker/estimator.py

Line 80 in bf6805f

    
                       output_kms_key (str): Optional. KMS key ID for encrypting the training output (default: None).

I believe this is how estimators provide encrypted training outputs to S3, and it could use the same for staging the user code. From _stage_user_code_in_s3 if you pass the existing KMS key param to tar_and_upload_dir, then we might have a quick fix.

Edit: I forked the repo and committed these edits to be a bit more explicit. It's a draft, of course, and I haven't tested it yet but the commit is here: gnbk-aws/sagemaker-python-sdk@74d2451

mvsusp · 2019-02-26T19:47:29Z

Hi @gnbk-aws ,

Thanks for your code change. Unfortunately, for that change to work, it would be necessary to send the kms to the training container as well. Another possible issue is that the output bucket and the code bucket are different with potentially different kms keys.

We have your feature request in the roadmap.

Thanks for using SageMaker!

gnbk-aws · 2019-03-04T19:19:28Z

Hi @mvsusp ,

That makes sense. Since the buckets are distinct, you'd need a new framework param for the kms key of the upload_code bucket. And I see how the training containers need that key in order to pull the model.py file from S3, so you'd need to send it to the training containers with the other framework params like here:

sagemaker-python-sdk/src/sagemaker/estimator.py

Line 852 in 8b33a30

self._hyperparameters[DIR_PARAM_NAME] = code_dir

I'm not sure what would need to be modified within the training container from there. Perhaps I'll look around for that later.

Thanks for following-through here :)

mvsusp · 2019-03-14T17:50:18Z

Hi @gnbk-aws,

The code changes to support server side encryption is already pushed to the master branch. We are expected to release a new version of the SDK today or tomorrow.

gnbk-aws · 2019-03-14T17:58:31Z

This is great news--thanks @mvsusp !

mvsusp · 2019-03-14T23:36:38Z

Hi @gnbk-aws,

We just released sagemaker 1.18.5 including Server Side Encryption support.

Thanks for using SageMaker.

* feature: Add Experiment helper classes (aws#646) * feature: Add Experiment helper classes feature: Add helper class _RunEnvironment * change: Change sleep retry to backoff retry for get TC * minor fixes in backoff retry Co-authored-by: Dewen Qi <[email protected]> * feature: Add helper classes and methods for Run class (aws#660) * feature: Add helper classes and methods for Run class * Add Parent class to address comment * fix docstyle check * Add arg docstrings in _helper Co-authored-by: Dewen Qi <[email protected]> * feature: Add Experiment Run class (aws#651) Co-authored-by: Dewen Qi <[email protected]> * change: Add integ tests for Run (aws#673) Co-authored-by: Dewen Qi <[email protected]> * Update run log metric to use MetricsManager (aws#678) * Update run.log_metric to use _MetricsManager * fix several metrics issues * Add doc strings to metrics.py Co-authored-by: Dana Benson <[email protected]> Co-authored-by: Dana Benson <[email protected]> Co-authored-by: Dewen Qi <[email protected]> Co-authored-by: Dewen Qi <[email protected]> Co-authored-by: Dana Benson <[email protected]> Co-authored-by: Dana Benson <[email protected]>

* feature: Add experiment plus Run class (#691) * feature: Add Experiment helper classes (#646) * feature: Add Experiment helper classes feature: Add helper class _RunEnvironment * change: Change sleep retry to backoff retry for get TC * minor fixes in backoff retry Co-authored-by: Dewen Qi <[email protected]> * feature: Add helper classes and methods for Run class (#660) * feature: Add helper classes and methods for Run class * Add Parent class to address comment * fix docstyle check * Add arg docstrings in _helper Co-authored-by: Dewen Qi <[email protected]> * feature: Add Experiment Run class (#651) Co-authored-by: Dewen Qi <[email protected]> * change: Add integ tests for Run (#673) Co-authored-by: Dewen Qi <[email protected]> * Update run log metric to use MetricsManager (#678) * Update run.log_metric to use _MetricsManager * fix several metrics issues * Add doc strings to metrics.py Co-authored-by: Dana Benson <[email protected]> Co-authored-by: Dana Benson <[email protected]> Co-authored-by: Dewen Qi <[email protected]> Co-authored-by: Dewen Qi <[email protected]> Co-authored-by: Dana Benson <[email protected]> Co-authored-by: Dana Benson <[email protected]> * change: Simplify exp plus integ test configuration (#694) Co-authored-by: Dewen Qi <[email protected]> * feature: add RunName to expeirment_config (#696) * change: Update Run init and add Run load and _RunContext (#707) * change: Update Run init and add Run load Add exp name and run group name to load and address comments * Address nit comments Co-authored-by: Dewen Qi <[email protected]> * fix: Fix run name uniqueness issue (#730) Co-authored-by: Dewen Qi <[email protected]> * change: Update integ tests for Exp Plus M1 changes (#741) Co-authored-by: Dewen Qi <[email protected]> * add metrics client to session object (#745) Co-authored-by: Dewen Qi <[email protected]> Co-authored-by: Dana Benson <[email protected]> Co-authored-by: Dana Benson <[email protected]> Co-authored-by: qidewenwhen <[email protected]> * change: Add integ test for using Run in Transform Job (#749) Co-authored-by: Dewen Qi <[email protected]> * Add async metrics sink (#739) Co-authored-by: Dewen Qi <[email protected]> Co-authored-by: Dana Benson <[email protected]> Co-authored-by: Dana Benson <[email protected]> Co-authored-by: qidewenwhen <[email protected]> * use metrics client provided by session (#754) * fix flaky metrics test (#753) * change: Change Run.init and Run.load to constructor and module method respectively (#752) Co-authored-by: Dewen Qi <[email protected]> * feature: Add latest metric service model (#757) Co-authored-by: Dewen Qi <[email protected]> Co-authored-by: qidewenwhen <[email protected]> * fix: lowercase run name (#767) * Change: Minimize use of lower case tc name (#769) * change: Clean up test resources to remove model files (#756) * change: Clean up test resources to remove model files * fix: Change experiment enums to upper case * change: Upgrade boto3 and update test to validate mixed case name * fix: Update as per latest botocore release and backend change Co-authored-by: Dewen Qi <[email protected]> * lowercase trial component name (#776) * change: Expose sagemaker experiment doc strings * fix: Fix exp name mixed case in issue Co-authored-by: Dewen Qi <[email protected]> Co-authored-by: Dana Benson <[email protected]> Co-authored-by: Dana Benson <[email protected]> Co-authored-by: Yifei Zhu <[email protected]>

* feature: Add experiment plus Run class (aws#691) * feature: Add Experiment helper classes (aws#646) * feature: Add Experiment helper classes feature: Add helper class _RunEnvironment * change: Change sleep retry to backoff retry for get TC * minor fixes in backoff retry Co-authored-by: Dewen Qi <[email protected]> * feature: Add helper classes and methods for Run class (aws#660) * feature: Add helper classes and methods for Run class * Add Parent class to address comment * fix docstyle check * Add arg docstrings in _helper Co-authored-by: Dewen Qi <[email protected]> * feature: Add Experiment Run class (aws#651) Co-authored-by: Dewen Qi <[email protected]> * change: Add integ tests for Run (aws#673) Co-authored-by: Dewen Qi <[email protected]> * Update run log metric to use MetricsManager (aws#678) * Update run.log_metric to use _MetricsManager * fix several metrics issues * Add doc strings to metrics.py Co-authored-by: Dana Benson <[email protected]> Co-authored-by: Dana Benson <[email protected]> Co-authored-by: Dewen Qi <[email protected]> Co-authored-by: Dewen Qi <[email protected]> Co-authored-by: Dana Benson <[email protected]> Co-authored-by: Dana Benson <[email protected]> * change: Simplify exp plus integ test configuration (aws#694) Co-authored-by: Dewen Qi <[email protected]> * feature: add RunName to expeirment_config (aws#696) * change: Update Run init and add Run load and _RunContext (aws#707) * change: Update Run init and add Run load Add exp name and run group name to load and address comments * Address nit comments Co-authored-by: Dewen Qi <[email protected]> * fix: Fix run name uniqueness issue (aws#730) Co-authored-by: Dewen Qi <[email protected]> * change: Update integ tests for Exp Plus M1 changes (aws#741) Co-authored-by: Dewen Qi <[email protected]> * add metrics client to session object (aws#745) Co-authored-by: Dewen Qi <[email protected]> Co-authored-by: Dana Benson <[email protected]> Co-authored-by: Dana Benson <[email protected]> Co-authored-by: qidewenwhen <[email protected]> * change: Add integ test for using Run in Transform Job (aws#749) Co-authored-by: Dewen Qi <[email protected]> * Add async metrics sink (aws#739) Co-authored-by: Dewen Qi <[email protected]> Co-authored-by: Dana Benson <[email protected]> Co-authored-by: Dana Benson <[email protected]> Co-authored-by: qidewenwhen <[email protected]> * use metrics client provided by session (aws#754) * fix flaky metrics test (aws#753) * change: Change Run.init and Run.load to constructor and module method respectively (aws#752) Co-authored-by: Dewen Qi <[email protected]> * feature: Add latest metric service model (aws#757) Co-authored-by: Dewen Qi <[email protected]> Co-authored-by: qidewenwhen <[email protected]> * fix: lowercase run name (aws#767) * Change: Minimize use of lower case tc name (aws#769) * change: Clean up test resources to remove model files (aws#756) * change: Clean up test resources to remove model files * fix: Change experiment enums to upper case * change: Upgrade boto3 and update test to validate mixed case name * fix: Update as per latest botocore release and backend change Co-authored-by: Dewen Qi <[email protected]> * lowercase trial component name (aws#776) * change: Expose sagemaker experiment doc strings * fix: Fix exp name mixed case in issue Co-authored-by: Dewen Qi <[email protected]> Co-authored-by: Dana Benson <[email protected]> Co-authored-by: Dana Benson <[email protected]> Co-authored-by: Yifei Zhu <[email protected]>

* feature: Add experiment plus Run class (#691) * feature: Add Experiment helper classes (#646) * feature: Add Experiment helper classes feature: Add helper class _RunEnvironment * change: Change sleep retry to backoff retry for get TC * minor fixes in backoff retry Co-authored-by: Dewen Qi <[email protected]> * feature: Add helper classes and methods for Run class (#660) * feature: Add helper classes and methods for Run class * Add Parent class to address comment * fix docstyle check * Add arg docstrings in _helper Co-authored-by: Dewen Qi <[email protected]> * feature: Add Experiment Run class (#651) Co-authored-by: Dewen Qi <[email protected]> * change: Add integ tests for Run (#673) Co-authored-by: Dewen Qi <[email protected]> * Update run log metric to use MetricsManager (#678) * Update run.log_metric to use _MetricsManager * fix several metrics issues * Add doc strings to metrics.py Co-authored-by: Dana Benson <[email protected]> Co-authored-by: Dana Benson <[email protected]> Co-authored-by: Dewen Qi <[email protected]> Co-authored-by: Dewen Qi <[email protected]> Co-authored-by: Dana Benson <[email protected]> Co-authored-by: Dana Benson <[email protected]> * change: Simplify exp plus integ test configuration (#694) Co-authored-by: Dewen Qi <[email protected]> * feature: add RunName to expeirment_config (#696) * change: Update Run init and add Run load and _RunContext (#707) * change: Update Run init and add Run load Add exp name and run group name to load and address comments * Address nit comments Co-authored-by: Dewen Qi <[email protected]> * fix: Fix run name uniqueness issue (#730) Co-authored-by: Dewen Qi <[email protected]> * change: Update integ tests for Exp Plus M1 changes (#741) Co-authored-by: Dewen Qi <[email protected]> * add metrics client to session object (#745) Co-authored-by: Dewen Qi <[email protected]> Co-authored-by: Dana Benson <[email protected]> Co-authored-by: Dana Benson <[email protected]> Co-authored-by: qidewenwhen <[email protected]> * change: Add integ test for using Run in Transform Job (#749) Co-authored-by: Dewen Qi <[email protected]> * Add async metrics sink (#739) Co-authored-by: Dewen Qi <[email protected]> Co-authored-by: Dana Benson <[email protected]> Co-authored-by: Dana Benson <[email protected]> Co-authored-by: qidewenwhen <[email protected]> * use metrics client provided by session (#754) * fix flaky metrics test (#753) * change: Change Run.init and Run.load to constructor and module method respectively (#752) Co-authored-by: Dewen Qi <[email protected]> * feature: Add latest metric service model (#757) Co-authored-by: Dewen Qi <[email protected]> Co-authored-by: qidewenwhen <[email protected]> * fix: lowercase run name (#767) * Change: Minimize use of lower case tc name (#769) * change: Clean up test resources to remove model files (#756) * change: Clean up test resources to remove model files * fix: Change experiment enums to upper case * change: Upgrade boto3 and update test to validate mixed case name * fix: Update as per latest botocore release and backend change Co-authored-by: Dewen Qi <[email protected]> * lowercase trial component name (#776) * change: Expose sagemaker experiment doc strings * fix: Fix exp name mixed case in issue Co-authored-by: Dewen Qi <[email protected]> Co-authored-by: Dana Benson <[email protected]> Co-authored-by: Dana Benson <[email protected]> Co-authored-by: Yifei Zhu <[email protected]>

* feature: Add experiment plus Run class (aws#691) * feature: Add Experiment helper classes (aws#646) * feature: Add Experiment helper classes feature: Add helper class _RunEnvironment * change: Change sleep retry to backoff retry for get TC * minor fixes in backoff retry Co-authored-by: Dewen Qi <[email protected]> * feature: Add helper classes and methods for Run class (aws#660) * feature: Add helper classes and methods for Run class * Add Parent class to address comment * fix docstyle check * Add arg docstrings in _helper Co-authored-by: Dewen Qi <[email protected]> * feature: Add Experiment Run class (aws#651) Co-authored-by: Dewen Qi <[email protected]> * change: Add integ tests for Run (aws#673) Co-authored-by: Dewen Qi <[email protected]> * Update run log metric to use MetricsManager (aws#678) * Update run.log_metric to use _MetricsManager * fix several metrics issues * Add doc strings to metrics.py Co-authored-by: Dana Benson <[email protected]> Co-authored-by: Dana Benson <[email protected]> Co-authored-by: Dewen Qi <[email protected]> Co-authored-by: Dewen Qi <[email protected]> Co-authored-by: Dana Benson <[email protected]> Co-authored-by: Dana Benson <[email protected]> * change: Simplify exp plus integ test configuration (aws#694) Co-authored-by: Dewen Qi <[email protected]> * feature: add RunName to expeirment_config (aws#696) * change: Update Run init and add Run load and _RunContext (aws#707) * change: Update Run init and add Run load Add exp name and run group name to load and address comments * Address nit comments Co-authored-by: Dewen Qi <[email protected]> * fix: Fix run name uniqueness issue (aws#730) Co-authored-by: Dewen Qi <[email protected]> * change: Update integ tests for Exp Plus M1 changes (aws#741) Co-authored-by: Dewen Qi <[email protected]> * add metrics client to session object (aws#745) Co-authored-by: Dewen Qi <[email protected]> Co-authored-by: Dana Benson <[email protected]> Co-authored-by: Dana Benson <[email protected]> Co-authored-by: qidewenwhen <[email protected]> * change: Add integ test for using Run in Transform Job (aws#749) Co-authored-by: Dewen Qi <[email protected]> * Add async metrics sink (aws#739) Co-authored-by: Dewen Qi <[email protected]> Co-authored-by: Dana Benson <[email protected]> Co-authored-by: Dana Benson <[email protected]> Co-authored-by: qidewenwhen <[email protected]> * use metrics client provided by session (aws#754) * fix flaky metrics test (aws#753) * change: Change Run.init and Run.load to constructor and module method respectively (aws#752) Co-authored-by: Dewen Qi <[email protected]> * feature: Add latest metric service model (aws#757) Co-authored-by: Dewen Qi <[email protected]> Co-authored-by: qidewenwhen <[email protected]> * fix: lowercase run name (aws#767) * Change: Minimize use of lower case tc name (aws#769) * change: Clean up test resources to remove model files (aws#756) * change: Clean up test resources to remove model files * fix: Change experiment enums to upper case * change: Upgrade boto3 and update test to validate mixed case name * fix: Update as per latest botocore release and backend change Co-authored-by: Dewen Qi <[email protected]> * lowercase trial component name (aws#776) * change: Expose sagemaker experiment doc strings * fix: Fix exp name mixed case in issue Co-authored-by: Dewen Qi <[email protected]> Co-authored-by: Dana Benson <[email protected]> Co-authored-by: Dana Benson <[email protected]> Co-authored-by: Yifei Zhu <[email protected]>

ChoiByungWook added the feature request label Feb 16, 2019

mvsusp mentioned this issue Mar 11, 2019

Pass kms id as parameter for uploading code with Server side encryption #693

Merged

4 tasks

mvsusp closed this as completed Mar 14, 2019

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

TF Framework support for encrypting code upload #646

TF Framework support for encrypting code upload #646

gnbk-aws commented Feb 15, 2019

ChoiByungWook commented Feb 16, 2019

gnbk-aws commented Feb 16, 2019 •

edited

Loading

mvsusp commented Feb 26, 2019

gnbk-aws commented Mar 4, 2019

mvsusp commented Mar 14, 2019

gnbk-aws commented Mar 14, 2019

mvsusp commented Mar 14, 2019

TF Framework support for encrypting code upload #646

TF Framework support for encrypting code upload #646

Comments

gnbk-aws commented Feb 15, 2019

System Information

Describe the problem

Minimal repro / logs

ChoiByungWook commented Feb 16, 2019

gnbk-aws commented Feb 16, 2019 • edited Loading

mvsusp commented Feb 26, 2019

gnbk-aws commented Mar 4, 2019

mvsusp commented Mar 14, 2019

gnbk-aws commented Mar 14, 2019

mvsusp commented Mar 14, 2019

gnbk-aws commented Feb 16, 2019 •

edited

Loading