-
Notifications
You must be signed in to change notification settings - Fork 1.2k
Reinvent 2024 early #4946
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Merged
Merged
Reinvent 2024 early #4946
Conversation
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
* Base model trainer * flake8 * add testing notebook * add param validation & set defaults * Implement simple train method
* feature: support script mode with local train.sh * Stop tracking train.sh and add it to .gitignore * update message * make dir if not exist * fix docs * fix: docstyle * Address comments * fix hyperparams * Revert pydantic custom error * pylint
* Image Spec refactoring and updates * Unit tests and update function for Image Spec * Fix hugging face test * Fix Tests
* Add unit tests for ModelTrainer * Flake8 * format
* Add testing notebook * format * use smaller data * remove large dataset * update * pylint * flake8 * ignore docstyle in directories with test * format * format
* Add enviornment variables scripts * format * fix comment * add docstrings * fix comment
* local snapshot * Update pip list command * Remove function calls * Address comments * Address comments
* Support intelligent parameters * fix codestyle
* General image builder * General image builder * Fix codestyle * Fix codestyle * Move location * Add warnings * Add integ tests * Fix integ test * Fix integ test * Fix region error * Add region
* Latest Container Image * Test Fixes * Parameterized tests and some logic updates * Test fixes * Move to Image URI * Fixes for unit test * Fixes for unit test * Fix codestyle error checks
…1560) * add pre-processing and post-processing logic to inference_spec * fix format * make accept_type and content_type optional * remove accept_type and content_type from pre/post processing * correct typo
… and deployment configs (#1572)
* add in-process mode for DJL server * fix format * add inference_spec as a member of DJL * add the validations for model server * fix typo * fix test assertion * add unit-testing * have a common server for inprocess mode * fix failing tests * add support to torchserve * fix tests to include torchserve servers * use custom inference_spec code instead of HF pipelines * fix tests for app.py * fix unit test failure * fix format * use schema_builder for serialization and deserialization * remove task field * remove unused import
* Base model trainer (#1521) * Base model trainer * flake8 * add testing notebook * add param validation & set defaults * Implement simple train method * feature: support script mode with local train.sh (#1523) * feature: support script mode with local train.sh * Stop tracking train.sh and add it to .gitignore * update message * make dir if not exist * fix docs * fix: docstyle * Address comments * fix hyperparams * Revert pydantic custom error * pylint * Image Spec refactoring and updates (#1525) * Image Spec refactoring and updates * Unit tests and update function for Image Spec * Fix hugging face test * Fix Tests * Add unit tests for ModelTrainer (#1527) * Add unit tests for ModelTrainer * Flake8 * format * Add example notebook (#1528) * Add testing notebook * format * use smaller data * remove large dataset * update * pylint * flake8 * ignore docstyle in directories with test * format * format * Add enviornment variable bootstrapping script (#1530) * Add enviornment variables scripts * format * fix comment * add docstrings * fix comment * feature: add utility function to capture local snapshot (#1524) * local snapshot * Update pip list command * Remove function calls * Address comments * Address comments * Change to make Model Trainer return a Model Object * Fix * Cleanup * Support intelligent parameters (#1540) * Support intelligent parameters * fix codestyle * Revert Image Spec (#1541) * Cleanup ModelTrainer (#1542) * General image builder (#1546) * General image builder * General image builder * Fix codestyle * Fix codestyle * Move location * Add warnings * Add integ tests * Fix integ test * Fix integ test * Fix region error * Add region * Latest Container Image (#1545) * Latest Container Image * Test Fixes * Parameterized tests and some logic updates * Test fixes * Move to Image URI * Fixes for unit test * Fixes for unit test * Fix codestyle error checks * Cleanup ModelTrainer code (#1552) * Updates * feat: add pre-processing and post-processing logic to inference_spec (#1560) * add pre-processing and post-processing logic to inference_spec * fix format * make accept_type and content_type optional * remove accept_type and content_type from pre/post processing * correct typo * Add Distributed Training Support Model Trainer (#1536) * Add path to set Additional Settings in ModelTrainer (#1555) * Updates * Mask Sensitive Env Logs in Container (#1568) * Cleanup PR * Codestyle fixes * Update logic to use model parameter instead of model_path * Fixes * Fixes * Tests * Codestyle Fixes * Codestyle Fixes * Codestyle Fixes * Codestyle Fixes --------- Co-authored-by: Erick Benitez-Ramos <[email protected]> Co-authored-by: pintaoz-aws <[email protected]> Co-authored-by: Pravali Uppugunduri <[email protected]>
Co-authored-by: Gokul Anantha Narayanan <[email protected]>
* Base model trainer (#1521) * Base model trainer * flake8 * add testing notebook * add param validation & set defaults * Implement simple train method * feature: support script mode with local train.sh (#1523) * feature: support script mode with local train.sh * Stop tracking train.sh and add it to .gitignore * update message * make dir if not exist * fix docs * fix: docstyle * Address comments * fix hyperparams * Revert pydantic custom error * pylint * Image Spec refactoring and updates (#1525) * Image Spec refactoring and updates * Unit tests and update function for Image Spec * Fix hugging face test * Fix Tests * Add unit tests for ModelTrainer (#1527) * Add unit tests for ModelTrainer * Flake8 * format * Add example notebook (#1528) * Add testing notebook * format * use smaller data * remove large dataset * update * pylint * flake8 * ignore docstyle in directories with test * format * format * Add enviornment variable bootstrapping script (#1530) * Add enviornment variables scripts * format * fix comment * add docstrings * fix comment * feature: add utility function to capture local snapshot (#1524) * local snapshot * Update pip list command * Remove function calls * Address comments * Address comments * Support intelligent parameters (#1540) * Support intelligent parameters * fix codestyle * Revert Image Spec (#1541) * Cleanup ModelTrainer (#1542) * General image builder (#1546) * General image builder * General image builder * Fix codestyle * Fix codestyle * Move location * Add warnings * Add integ tests * Fix integ test * Fix integ test * Fix region error * Add region * Latest Container Image (#1545) * Latest Container Image * Test Fixes * Parameterized tests and some logic updates * Test fixes * Move to Image URI * Fixes for unit test * Fixes for unit test * Fix codestyle error checks * Cleanup ModelTrainer code (#1552) * feat: add pre-processing and post-processing logic to inference_spec (#1560) * add pre-processing and post-processing logic to inference_spec * fix format * make accept_type and content_type optional * remove accept_type and content_type from pre/post processing * correct typo * Add Distributed Training Support Model Trainer (#1536) * Add path to set Additional Settings in ModelTrainer (#1555) * Support building image from Dockerfile * Fix test * Fix test * Rename functions --------- Co-authored-by: Erick Benitez-Ramos <[email protected]> Co-authored-by: Gokul Anantha Narayanan <[email protected]> Co-authored-by: Pravali Uppugunduri <[email protected]>
* Base model trainer (#1521) * Base model trainer * flake8 * add testing notebook * add param validation & set defaults * Implement simple train method * feature: support script mode with local train.sh (#1523) * feature: support script mode with local train.sh * Stop tracking train.sh and add it to .gitignore * update message * make dir if not exist * fix docs * fix: docstyle * Address comments * fix hyperparams * Revert pydantic custom error * pylint * Image Spec refactoring and updates (#1525) * Image Spec refactoring and updates * Unit tests and update function for Image Spec * Fix hugging face test * Fix Tests * Add unit tests for ModelTrainer (#1527) * Add unit tests for ModelTrainer * Flake8 * format * Add example notebook (#1528) * Add testing notebook * format * use smaller data * remove large dataset * update * pylint * flake8 * ignore docstyle in directories with test * format * format * Add enviornment variable bootstrapping script (#1530) * Add enviornment variables scripts * format * fix comment * add docstrings * fix comment * feature: add utility function to capture local snapshot (#1524) * local snapshot * Update pip list command * Remove function calls * Address comments * Address comments * Support intelligent parameters (#1540) * Support intelligent parameters * fix codestyle * Revert Image Spec (#1541) * Cleanup ModelTrainer (#1542) * Initial Prototype * General image builder (#1546) * General image builder * General image builder * Fix codestyle * Fix codestyle * Move location * Add warnings * Add integ tests * Fix integ test * Fix integ test * Fix region error * Add region * Unified deploying in ModelBuilder * Latest Container Image (#1545) * Latest Container Image * Test Fixes * Parameterized tests and some logic updates * Test fixes * Move to Image URI * Fixes for unit test * Fixes for unit test * Fix codestyle error checks * Address PR comments * Address Codestyle errors * Cleanup ModelTrainer code (#1552) * Black format * Codestyle changes * Codestyle changes * from __future__ import absolute_import * DocString formatting * Black formatting * Address PR comments * Noteboook changes and fixes * feat: add pre-processing and post-processing logic to inference_spec (#1560) * add pre-processing and post-processing logic to inference_spec * fix format * make accept_type and content_type optional * remove accept_type and content_type from pre/post processing * correct typo * Add Distributed Training Support Model Trainer (#1536) * Add path to set Additional Settings in ModelTrainer (#1555) * Checkstyle Fixes * Address PR comments * Fixes * Merge Fixes * Codestyle Fixes * Codestyle Fixes * Codestyle Fixes * Codestyle Fixes * Codestyle Fixes * Update Docstring --------- Co-authored-by: Erick Benitez-Ramos <[email protected]> Co-authored-by: pintaoz-aws <[email protected]> Co-authored-by: Pravali Uppugunduri <[email protected]>
* Parameterized intelligent defaults tests * Parameterized intelligent defaults tests * Parameterized intelligent defaults tests * Tests for all Model Builder deployment modes * Fix * CodeStyle Fixes * CodeStyle Fixes * Add Deepdiff dependency * Add Deepdiff dependency * Add Codestyle fix
Co-authored-by: Edward Sun <[email protected]>
* change: fix the file uploading signature verification error **Description** The URL contains charater(+) which is not escaped properly. Fixed by removing the conditional logic to escape for the character. **Testing** 1. Changed UT passed 2. Test in sample notebook * **Description** Changed from x-mlapp-sm-app-server-arn to x-sagemaker-partner-app-server-arn Also make some small format adjusting for the signing context information. **Testing Done** UT passed --------- Co-authored-by: Edward Sun <[email protected]>
* v0 estimator for launching kandinksy training * code cleanup * option to over-ride git repos for kandinsky for testing purposes * update dependencies * update comment * formatting fixes * style fixes * code cleanup * Add warning messages for ingored arguments * cleanup, address comments * fix * clone launcher repo only if necessary * add a cleanup method to call after fit * fix docstring * fix warning * cleanup update * fix * code style fix * rename cleanup method for clarity * missed change * move cleanup to when object is destroyed * add unit tests * formatting fix * removing tests which don't work as recipe repos are private * removing tests which don't work as recipe repos are private * resolve comments * resolve comments
* fix to work with launcher recipes * fix suffix for temp file * fix path and error message * fix for recipes from launcher * resolve recipes correctly * fix imports * reformat message to avoid code-doc test issue * code style fix * code style fix * code style fix * code style fix * code style fix * code style fix * code style fix * code style fix * code style fix * doc formatting * check if resolver exists before registering
* basic checks and unit test for recipes * More testing for recipes. Move recipe overrides to top before accessing any recipe fields. * check that we use customer provided image uri if it is set * reformat * test fixes * update git urls for recipes * revert to ssh git urls for recipes
Co-authored-by: Tian <[email protected]>
* Feature: Support GPU training recipes with Sagemaker Python SDK (#1516) * v0 estimator for launching kandinksy training * code cleanup * option to over-ride git repos for kandinsky for testing purposes * update dependencies * update comment * formatting fixes * style fixes * code cleanup * Add warning messages for ingored arguments * cleanup, address comments * fix * clone launcher repo only if necessary * add a cleanup method to call after fit * fix docstring * fix warning * cleanup update * fix * code style fix * rename cleanup method for clarity * missed change * move cleanup to when object is destroyed * add unit tests * formatting fix * removing tests which don't work as recipe repos are private * removing tests which don't work as recipe repos are private * resolve comments * resolve comments * Feature: Support Neuron training recipes. (#1526) * Feature: Resolve recipes correctly before launching (#1529) * fix to work with launcher recipes * fix suffix for temp file * fix path and error message * fix for recipes from launcher * resolve recipes correctly * fix imports * reformat message to avoid code-doc test issue * code style fix * code style fix * code style fix * code style fix * code style fix * code style fix * code style fix * code style fix * code style fix * doc formatting * check if resolver exists before registering * Feature: Add unit tests for recipes and minor bug fixes. (#1532) * basic checks and unit test for recipes * More testing for recipes. Move recipe overrides to top before accessing any recipe fields. * check that we use customer provided image uri if it is set * reformat * test fixes * update git urls for recipes * revert to ssh git urls for recipes * Feature: Move image uris and git repos for training recipes to json (#1547) * Update MANIFEST.in so that wheel builds correctly (#1563) * Remove default values for fields in recipe_overrides and fix recipe path. (#1566) * add optional source dir for recipes, copy training code and requirements to source dir * diff names for recipe file and local script option * format and add unit test * make entry point script and recipe file temp files that can be gced * formatting and fix * test fix * test fixes * format fix * break function up because it is too long * fixes * fix * fix * remove references to launcher and adapter dir as we copy out everything needed into source dir * reformat * copy all directory contents for trainium as there is more than one source file * fix * fix * remove debugging message * Change default source directory to current, add option to specify source dir (#1593) * update to public uris for hyperpod recipe repos and smp image * fixes * remove debug copies * change caps for env vars * skip some tests for now * format * neuron json for retrieving images * update training_recipes.json * add unit test * reformat * fix long line * add source dir check when using training recipe * adding more regions * reformat * doc update * doc update * doc update * doc update * fix capitalization issues * fix capitalization issues * doc check issue
…d recipe code unavailable" (#1642)
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Issue #, if available:
Description of changes:
Testing done:
Merge Checklist
Put an
x
in the boxes that apply. You can also fill these out after creating the PR. If you're unsure about any of them, don't hesitate to ask. We're here to help! This is simply a reminder of what we are going to look for before merging your pull request.General
Tests
unique_name_from_base
to create resource names in integ tests (if appropriate)By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.