Skip to content

TF 2.2.0 image is not available in eu-west-1 #1527

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
yegortokmakov opened this issue May 26, 2020 · 8 comments
Closed

TF 2.2.0 image is not available in eu-west-1 #1527

yegortokmakov opened this issue May 26, 2020 · 8 comments
Assignees

Comments

@yegortokmakov
Copy link

Describe the bug
TF 2.2.0 image is not available.
Region: eu-west-1

CalledProcessError: Command 'docker pull 763104351884.dkr.ecr.eu-west-1.amazonaws.com/tensorflow-training:2.2.0-cpu-py3' returned non-zero exit status 1.

at the same time docker pull 763104351884.dkr.ecr.eu-west-1.amazonaws.com/tensorflow-training:2.2.0-cpu-py37 image exists.

I haven't tested with other regions

To reproduce

estimator = TensorFlow(base_job_name='local',
                       entry_point='script.py',
                       source_dir=source_dir,
                       role=role,
                       framework_version='2.2.0',
                       py_version='py3',
                       hyperparameters=local_hyperparameters,
                       train_instance_count=1, train_instance_type=instance_type)

System information
A description of your system. Please provide:

  • SageMaker Python SDK version: 1.60.0
  • Framework name (eg. PyTorch) or algorithm (eg. KMeans): tensorflow
  • Framework version: 2.2.0
  • Python version: 3.6.6
  • CPU or GPU: both
  • Custom Docker image (Y/N): N
@laurenyu
Copy link
Contributor

apologies for the confusion - you need to specify "py37" for py_version

@yegortokmakov
Copy link
Author

works with py37, thanks!
was trying py37 on the previous version of sdk, obviously it didn't work :)

@stoyan-dixa
Copy link

I have a similar problem when trying to use framework version 2.2.0 and I get an error when trying to specify py_version to 'py37':
ValueError: invalid py_version argument: py37

I created a new notebook instance few days ago and the sagemaker sdk version is 1.55.3
Would updating to a newer version solve this issue and what is the recommended way to update?

This is how my estimator definition looks like:

tf_estimator = TensorFlow(
    source_dir='src',
    entry_point='training.py',
    role=role,
    train_instance_count=1,
    train_instance_type='ml.p3.2xlarge',
    framework_version='2.2.0',
    py_version='py37',
    script_mode=True,
    hyperparameters=hyperparameters
)

@laurenyu
Copy link
Contributor

laurenyu commented Jul 6, 2020

@stoyan-dixa you can update your SDK version by running !pip install -U sagemaker in a cell and then restarting the notebook kernel

@stoyan-dixa
Copy link

Thank you for the response! I've updated sagemaker to version 1.67.1 and I can confirm that py_version now accepts py37 and works with framework_version=2.2.0
I was wondering if it is common that new notebook instances have outdated versions of the python packages and therefore it is a good practice to always update them when the notebook instance is created?
This article states that AWS is updating the notebook software once it is not in service. What do these software updates include?

@laurenyu
Copy link
Contributor

laurenyu commented Jul 7, 2020

@stoyan-dixa my understanding is that Notebook Instances are updated regularly to include updates from the SageMaker Python SDK (in addition to other libraries). Since we release the SDK every Monday-Thursday (if there are changes), this does mean that you may run into an outdated SDK even with a new notebook instance. You could also look into Notebook Instance Lifecycle Configs as a way to update libraries upon startup.

@kafka399
Copy link

In my case I had to restart my notebook, as kernel restart didn't fix the issue completely. Thanks!

@martinRenou
Copy link
Collaborator

martinRenou commented Sep 27, 2023

Closing as answered, thanks!

pintaoz-aws pushed a commit that referenced this issue Dec 4, 2024
* Add unit tests for ModelTrainer

* Flake8

* format
pintaoz-aws added a commit that referenced this issue Dec 4, 2024
* Base model trainer (#1521)

* Base model trainer

* flake8

* add testing notebook

* add param validation & set defaults

* Implement simple train method

* feature: support script mode with local train.sh (#1523)

* feature: support script mode with local train.sh

* Stop tracking train.sh and add it to .gitignore

* update message

* make dir if not exist

* fix docs

* fix: docstyle

* Address comments

* fix hyperparams

* Revert pydantic custom error

* pylint

* Image Spec refactoring and updates (#1525)

* Image Spec refactoring and updates

* Unit tests and update function for Image Spec

* Fix hugging face test

* Fix Tests

* Add unit tests for ModelTrainer (#1527)

* Add unit tests for ModelTrainer

* Flake8

* format

* Add example notebook (#1528)

* Add testing notebook

* format

* use smaller data

* remove large dataset

* update

* pylint

* flake8

* ignore docstyle in directories with test

* format

* format

* Add enviornment variable bootstrapping script (#1530)

* Add enviornment variables scripts

* format

* fix comment

* add docstrings

* fix comment

* feature: add utility function to capture local snapshot (#1524)

* local snapshot

* Update pip list command

* Remove function calls

* Address comments

* Address comments

* Change to make Model Trainer return a Model Object

* Fix

* Cleanup

* Support intelligent parameters (#1540)

* Support intelligent parameters

* fix codestyle

* Revert Image Spec (#1541)

* Cleanup ModelTrainer (#1542)

* General image builder (#1546)

* General image builder

* General image builder

* Fix codestyle

* Fix codestyle

* Move location

* Add warnings

* Add integ tests

* Fix integ test

* Fix integ test

* Fix region error

* Add region

* Latest Container Image (#1545)

* Latest Container Image

* Test Fixes

* Parameterized tests and some logic updates

* Test fixes

* Move to Image URI

* Fixes for unit test

* Fixes for unit test

* Fix codestyle error checks

* Cleanup ModelTrainer code (#1552)

* Updates

* feat: add pre-processing and post-processing logic to inference_spec (#1560)

* add pre-processing and post-processing logic to inference_spec

* fix format

* make  accept_type and content_type optional

* remove accept_type and content_type from pre/post processing

* correct typo

* Add Distributed Training Support Model Trainer (#1536)

* Add path to set Additional Settings in ModelTrainer (#1555)

* Updates

* Mask Sensitive Env Logs in Container (#1568)

* Cleanup PR

* Codestyle fixes

* Update logic to use model parameter instead of model_path

* Fixes

* Fixes

* Tests

* Codestyle Fixes

* Codestyle Fixes

* Codestyle Fixes

* Codestyle Fixes

---------

Co-authored-by: Erick Benitez-Ramos <[email protected]>
Co-authored-by: pintaoz-aws <[email protected]>
Co-authored-by: Pravali Uppugunduri <[email protected]>
pintaoz-aws added a commit that referenced this issue Dec 4, 2024
* Base model trainer (#1521)

* Base model trainer

* flake8

* add testing notebook

* add param validation & set defaults

* Implement simple train method

* feature: support script mode with local train.sh (#1523)

* feature: support script mode with local train.sh

* Stop tracking train.sh and add it to .gitignore

* update message

* make dir if not exist

* fix docs

* fix: docstyle

* Address comments

* fix hyperparams

* Revert pydantic custom error

* pylint

* Image Spec refactoring and updates (#1525)

* Image Spec refactoring and updates

* Unit tests and update function for Image Spec

* Fix hugging face test

* Fix Tests

* Add unit tests for ModelTrainer (#1527)

* Add unit tests for ModelTrainer

* Flake8

* format

* Add example notebook (#1528)

* Add testing notebook

* format

* use smaller data

* remove large dataset

* update

* pylint

* flake8

* ignore docstyle in directories with test

* format

* format

* Add enviornment variable bootstrapping script (#1530)

* Add enviornment variables scripts

* format

* fix comment

* add docstrings

* fix comment

* feature: add utility function to capture local snapshot (#1524)

* local snapshot

* Update pip list command

* Remove function calls

* Address comments

* Address comments

* Support intelligent parameters (#1540)

* Support intelligent parameters

* fix codestyle

* Revert Image Spec (#1541)

* Cleanup ModelTrainer (#1542)

* General image builder (#1546)

* General image builder

* General image builder

* Fix codestyle

* Fix codestyle

* Move location

* Add warnings

* Add integ tests

* Fix integ test

* Fix integ test

* Fix region error

* Add region

* Latest Container Image (#1545)

* Latest Container Image

* Test Fixes

* Parameterized tests and some logic updates

* Test fixes

* Move to Image URI

* Fixes for unit test

* Fixes for unit test

* Fix codestyle error checks

* Cleanup ModelTrainer code (#1552)

* feat: add pre-processing and post-processing logic to inference_spec (#1560)

* add pre-processing and post-processing logic to inference_spec

* fix format

* make  accept_type and content_type optional

* remove accept_type and content_type from pre/post processing

* correct typo

* Add Distributed Training Support Model Trainer (#1536)

* Add path to set Additional Settings in ModelTrainer (#1555)

* Support building image from Dockerfile

* Fix test

* Fix test

* Rename functions

---------

Co-authored-by: Erick Benitez-Ramos <[email protected]>
Co-authored-by: Gokul Anantha Narayanan <[email protected]>
Co-authored-by: Pravali Uppugunduri <[email protected]>
pintaoz-aws added a commit that referenced this issue Dec 4, 2024
* Base model trainer (#1521)

* Base model trainer

* flake8

* add testing notebook

* add param validation & set defaults

* Implement simple train method

* feature: support script mode with local train.sh (#1523)

* feature: support script mode with local train.sh

* Stop tracking train.sh and add it to .gitignore

* update message

* make dir if not exist

* fix docs

* fix: docstyle

* Address comments

* fix hyperparams

* Revert pydantic custom error

* pylint

* Image Spec refactoring and updates (#1525)

* Image Spec refactoring and updates

* Unit tests and update function for Image Spec

* Fix hugging face test

* Fix Tests

* Add unit tests for ModelTrainer (#1527)

* Add unit tests for ModelTrainer

* Flake8

* format

* Add example notebook (#1528)

* Add testing notebook

* format

* use smaller data

* remove large dataset

* update

* pylint

* flake8

* ignore docstyle in directories with test

* format

* format

* Add enviornment variable bootstrapping script (#1530)

* Add enviornment variables scripts

* format

* fix comment

* add docstrings

* fix comment

* feature: add utility function to capture local snapshot (#1524)

* local snapshot

* Update pip list command

* Remove function calls

* Address comments

* Address comments

* Support intelligent parameters (#1540)

* Support intelligent parameters

* fix codestyle

* Revert Image Spec (#1541)

* Cleanup ModelTrainer (#1542)

* Initial Prototype

* General image builder (#1546)

* General image builder

* General image builder

* Fix codestyle

* Fix codestyle

* Move location

* Add warnings

* Add integ tests

* Fix integ test

* Fix integ test

* Fix region error

* Add region

* Unified deploying in ModelBuilder

* Latest Container Image (#1545)

* Latest Container Image

* Test Fixes

* Parameterized tests and some logic updates

* Test fixes

* Move to Image URI

* Fixes for unit test

* Fixes for unit test

* Fix codestyle error checks

* Address PR comments

* Address Codestyle errors

* Cleanup ModelTrainer code (#1552)

* Black format

* Codestyle changes

* Codestyle changes

* from __future__ import absolute_import

* DocString formatting

* Black formatting

* Address PR comments

* Noteboook changes and fixes

* feat: add pre-processing and post-processing logic to inference_spec (#1560)

* add pre-processing and post-processing logic to inference_spec

* fix format

* make  accept_type and content_type optional

* remove accept_type and content_type from pre/post processing

* correct typo

* Add Distributed Training Support Model Trainer (#1536)

* Add path to set Additional Settings in ModelTrainer (#1555)

* Checkstyle Fixes

* Address PR comments

* Fixes

* Merge Fixes

* Codestyle Fixes

* Codestyle Fixes

* Codestyle Fixes

* Codestyle Fixes

* Codestyle Fixes

* Update Docstring

---------

Co-authored-by: Erick Benitez-Ramos <[email protected]>
Co-authored-by: pintaoz-aws <[email protected]>
Co-authored-by: Pravali Uppugunduri <[email protected]>
pintaoz-aws added a commit that referenced this issue Dec 4, 2024
* Base model trainer (#1521)

* Base model trainer

* flake8

* add testing notebook

* add param validation & set defaults

* Implement simple train method

* feature: support script mode with local train.sh (#1523)

* feature: support script mode with local train.sh

* Stop tracking train.sh and add it to .gitignore

* update message

* make dir if not exist

* fix docs

* fix: docstyle

* Address comments

* fix hyperparams

* Revert pydantic custom error

* pylint

* Image Spec refactoring and updates (#1525)

* Image Spec refactoring and updates

* Unit tests and update function for Image Spec

* Fix hugging face test

* Fix Tests

* Add unit tests for ModelTrainer (#1527)

* Add unit tests for ModelTrainer

* Flake8

* format

* Add example notebook (#1528)

* Add testing notebook

* format

* use smaller data

* remove large dataset

* update

* pylint

* flake8

* ignore docstyle in directories with test

* format

* format

* Add enviornment variable bootstrapping script (#1530)

* Add enviornment variables scripts

* format

* fix comment

* add docstrings

* fix comment

* feature: add utility function to capture local snapshot (#1524)

* local snapshot

* Update pip list command

* Remove function calls

* Address comments

* Address comments

* Support intelligent parameters (#1540)

* Support intelligent parameters

* fix codestyle

* Revert Image Spec (#1541)

* Cleanup ModelTrainer (#1542)

* General image builder (#1546)

* General image builder

* General image builder

* Fix codestyle

* Fix codestyle

* Move location

* Add warnings

* Add integ tests

* Fix integ test

* Fix integ test

* Fix region error

* Add region

* Latest Container Image (#1545)

* Latest Container Image

* Test Fixes

* Parameterized tests and some logic updates

* Test fixes

* Move to Image URI

* Fixes for unit test

* Fixes for unit test

* Fix codestyle error checks

* Cleanup ModelTrainer code (#1552)

* Single container local mode training

* Add wait argument

* Implement helper funtions

* Add helper functions

* Fix bugs

* Fix codestyle

* feat: add pre-processing and post-processing logic to inference_spec (#1560)

* add pre-processing and post-processing logic to inference_spec

* fix format

* make  accept_type and content_type optional

* remove accept_type and content_type from pre/post processing

* correct typo

* Fix test and codestyle

* Add Distributed Training Support Model Trainer (#1536)

* Add tests

* Add path to set Additional Settings in ModelTrainer (#1555)

* Added example notebook

* Fix codestyle

* Address comments

* resolve merge conflict

* Support multi container local training (#1576)

* Fix codestyle

* Mask Sensitive Env Logs in Container (#1568)

* Fix bug in script mode setup ModelTrainer (#1575)

* Support multi container local training

* Merge branch 'single_container_local_training' into multi_container_local_training

* Update unit tests

---------

Co-authored-by: Erick Benitez-Ramos <[email protected]>

* Remove LocalTrainingJob class

* Bypass pydantic check

* Add example

---------

Co-authored-by: Erick Benitez-Ramos <[email protected]>
Co-authored-by: Gokul Anantha Narayanan <[email protected]>
Co-authored-by: Pravali Uppugunduri <[email protected]>
pintaoz-aws added a commit that referenced this issue Dec 4, 2024
* Base model trainer (#1521)

* Base model trainer

* flake8

* add testing notebook

* add param validation & set defaults

* Implement simple train method

* feature: support script mode with local train.sh (#1523)

* feature: support script mode with local train.sh

* Stop tracking train.sh and add it to .gitignore

* update message

* make dir if not exist

* fix docs

* fix: docstyle

* Address comments

* fix hyperparams

* Revert pydantic custom error

* pylint

* Image Spec refactoring and updates (#1525)

* Image Spec refactoring and updates

* Unit tests and update function for Image Spec

* Fix hugging face test

* Fix Tests

* Add unit tests for ModelTrainer (#1527)

* Add unit tests for ModelTrainer

* Flake8

* format

* Add example notebook (#1528)

* Add testing notebook

* format

* use smaller data

* remove large dataset

* update

* pylint

* flake8

* ignore docstyle in directories with test

* format

* format

* Add enviornment variable bootstrapping script (#1530)

* Add enviornment variables scripts

* format

* fix comment

* add docstrings

* fix comment

* feature: add utility function to capture local snapshot (#1524)

* local snapshot

* Update pip list command

* Remove function calls

* Address comments

* Address comments

* Support intelligent parameters (#1540)

* Support intelligent parameters

* fix codestyle

* Revert Image Spec (#1541)

* Cleanup ModelTrainer (#1542)

* General image builder (#1546)

* General image builder

* General image builder

* Fix codestyle

* Fix codestyle

* Move location

* Add warnings

* Add integ tests

* Fix integ test

* Fix integ test

* Fix region error

* Add region

* Latest Container Image (#1545)

* Latest Container Image

* Test Fixes

* Parameterized tests and some logic updates

* Test fixes

* Move to Image URI

* Fixes for unit test

* Fixes for unit test

* Fix codestyle error checks

* Cleanup ModelTrainer code (#1552)

* feat: add pre-processing and post-processing logic to inference_spec (#1560)

* add pre-processing and post-processing logic to inference_spec

* fix format

* make  accept_type and content_type optional

* remove accept_type and content_type from pre/post processing

* correct typo

* Add Distributed Training Support Model Trainer (#1536)

* Add path to set Additional Settings in ModelTrainer (#1555)

* feature: support HuggingFace models with JumpStart configs

* Update bucket name for the model mapping

* Mask Sensitive Env Logs in Container (#1568)

* Fix unit test

* Fix bug in script mode setup ModelTrainer (#1575)

* Save mapping as attribute

* Fix style issues

* Fix style issues

* Fix: bypass jumpstart mapping when not in endpoint mode

* Skip JS model mapping with env vars or image URI provided

* Revert "Merge branch 'aws:master' into dev-morpheus"

This reverts commit 26a0b0bb37e0343b3287f5c5c484df22726fc858, reversing
changes made to d19d4e178442be4b6e1d07d55498dd76dfac50f0.

* Merge branch 'aws:master' into dev-morpheus

This reverts commit 076442bd83e5ca977bf5b6ce1b716474d2794feb.

* Rebase on master-morpheus

* Fix unit test description

* Fix TEI integ test

* Fix style issue

* Fix style issues

* Fix schema builder integ tests

* Fix TEI integ test

* Fix code style issue

---------

Co-authored-by: Erick Benitez-Ramos <[email protected]>
Co-authored-by: Gokul Anantha Narayanan <[email protected]>
Co-authored-by: pintaoz-aws <[email protected]>
Co-authored-by: Pravali Uppugunduri <[email protected]>
Co-authored-by: Xiong Zeng <[email protected]>
Co-authored-by: Gary Wang <[email protected]>
pintaoz-aws pushed a commit that referenced this issue Dec 4, 2024
* Add unit tests for ModelTrainer

* Flake8

* format
pintaoz-aws added a commit that referenced this issue Dec 4, 2024
* Base model trainer (#1521)

* Base model trainer

* flake8

* add testing notebook

* add param validation & set defaults

* Implement simple train method

* feature: support script mode with local train.sh (#1523)

* feature: support script mode with local train.sh

* Stop tracking train.sh and add it to .gitignore

* update message

* make dir if not exist

* fix docs

* fix: docstyle

* Address comments

* fix hyperparams

* Revert pydantic custom error

* pylint

* Image Spec refactoring and updates (#1525)

* Image Spec refactoring and updates

* Unit tests and update function for Image Spec

* Fix hugging face test

* Fix Tests

* Add unit tests for ModelTrainer (#1527)

* Add unit tests for ModelTrainer

* Flake8

* format

* Add example notebook (#1528)

* Add testing notebook

* format

* use smaller data

* remove large dataset

* update

* pylint

* flake8

* ignore docstyle in directories with test

* format

* format

* Add enviornment variable bootstrapping script (#1530)

* Add enviornment variables scripts

* format

* fix comment

* add docstrings

* fix comment

* feature: add utility function to capture local snapshot (#1524)

* local snapshot

* Update pip list command

* Remove function calls

* Address comments

* Address comments

* Change to make Model Trainer return a Model Object

* Fix

* Cleanup

* Support intelligent parameters (#1540)

* Support intelligent parameters

* fix codestyle

* Revert Image Spec (#1541)

* Cleanup ModelTrainer (#1542)

* General image builder (#1546)

* General image builder

* General image builder

* Fix codestyle

* Fix codestyle

* Move location

* Add warnings

* Add integ tests

* Fix integ test

* Fix integ test

* Fix region error

* Add region

* Latest Container Image (#1545)

* Latest Container Image

* Test Fixes

* Parameterized tests and some logic updates

* Test fixes

* Move to Image URI

* Fixes for unit test

* Fixes for unit test

* Fix codestyle error checks

* Cleanup ModelTrainer code (#1552)

* Updates

* feat: add pre-processing and post-processing logic to inference_spec (#1560)

* add pre-processing and post-processing logic to inference_spec

* fix format

* make  accept_type and content_type optional

* remove accept_type and content_type from pre/post processing

* correct typo

* Add Distributed Training Support Model Trainer (#1536)

* Add path to set Additional Settings in ModelTrainer (#1555)

* Updates

* Mask Sensitive Env Logs in Container (#1568)

* Cleanup PR

* Codestyle fixes

* Update logic to use model parameter instead of model_path

* Fixes

* Fixes

* Tests

* Codestyle Fixes

* Codestyle Fixes

* Codestyle Fixes

* Codestyle Fixes

---------

Co-authored-by: Erick Benitez-Ramos <[email protected]>
Co-authored-by: pintaoz-aws <[email protected]>
Co-authored-by: Pravali Uppugunduri <[email protected]>
pintaoz-aws added a commit that referenced this issue Dec 4, 2024
* Base model trainer (#1521)

* Base model trainer

* flake8

* add testing notebook

* add param validation & set defaults

* Implement simple train method

* feature: support script mode with local train.sh (#1523)

* feature: support script mode with local train.sh

* Stop tracking train.sh and add it to .gitignore

* update message

* make dir if not exist

* fix docs

* fix: docstyle

* Address comments

* fix hyperparams

* Revert pydantic custom error

* pylint

* Image Spec refactoring and updates (#1525)

* Image Spec refactoring and updates

* Unit tests and update function for Image Spec

* Fix hugging face test

* Fix Tests

* Add unit tests for ModelTrainer (#1527)

* Add unit tests for ModelTrainer

* Flake8

* format

* Add example notebook (#1528)

* Add testing notebook

* format

* use smaller data

* remove large dataset

* update

* pylint

* flake8

* ignore docstyle in directories with test

* format

* format

* Add enviornment variable bootstrapping script (#1530)

* Add enviornment variables scripts

* format

* fix comment

* add docstrings

* fix comment

* feature: add utility function to capture local snapshot (#1524)

* local snapshot

* Update pip list command

* Remove function calls

* Address comments

* Address comments

* Support intelligent parameters (#1540)

* Support intelligent parameters

* fix codestyle

* Revert Image Spec (#1541)

* Cleanup ModelTrainer (#1542)

* General image builder (#1546)

* General image builder

* General image builder

* Fix codestyle

* Fix codestyle

* Move location

* Add warnings

* Add integ tests

* Fix integ test

* Fix integ test

* Fix region error

* Add region

* Latest Container Image (#1545)

* Latest Container Image

* Test Fixes

* Parameterized tests and some logic updates

* Test fixes

* Move to Image URI

* Fixes for unit test

* Fixes for unit test

* Fix codestyle error checks

* Cleanup ModelTrainer code (#1552)

* feat: add pre-processing and post-processing logic to inference_spec (#1560)

* add pre-processing and post-processing logic to inference_spec

* fix format

* make  accept_type and content_type optional

* remove accept_type and content_type from pre/post processing

* correct typo

* Add Distributed Training Support Model Trainer (#1536)

* Add path to set Additional Settings in ModelTrainer (#1555)

* Support building image from Dockerfile

* Fix test

* Fix test

* Rename functions

---------

Co-authored-by: Erick Benitez-Ramos <[email protected]>
Co-authored-by: Gokul Anantha Narayanan <[email protected]>
Co-authored-by: Pravali Uppugunduri <[email protected]>
pintaoz-aws added a commit that referenced this issue Dec 4, 2024
* Base model trainer (#1521)

* Base model trainer

* flake8

* add testing notebook

* add param validation & set defaults

* Implement simple train method

* feature: support script mode with local train.sh (#1523)

* feature: support script mode with local train.sh

* Stop tracking train.sh and add it to .gitignore

* update message

* make dir if not exist

* fix docs

* fix: docstyle

* Address comments

* fix hyperparams

* Revert pydantic custom error

* pylint

* Image Spec refactoring and updates (#1525)

* Image Spec refactoring and updates

* Unit tests and update function for Image Spec

* Fix hugging face test

* Fix Tests

* Add unit tests for ModelTrainer (#1527)

* Add unit tests for ModelTrainer

* Flake8

* format

* Add example notebook (#1528)

* Add testing notebook

* format

* use smaller data

* remove large dataset

* update

* pylint

* flake8

* ignore docstyle in directories with test

* format

* format

* Add enviornment variable bootstrapping script (#1530)

* Add enviornment variables scripts

* format

* fix comment

* add docstrings

* fix comment

* feature: add utility function to capture local snapshot (#1524)

* local snapshot

* Update pip list command

* Remove function calls

* Address comments

* Address comments

* Support intelligent parameters (#1540)

* Support intelligent parameters

* fix codestyle

* Revert Image Spec (#1541)

* Cleanup ModelTrainer (#1542)

* Initial Prototype

* General image builder (#1546)

* General image builder

* General image builder

* Fix codestyle

* Fix codestyle

* Move location

* Add warnings

* Add integ tests

* Fix integ test

* Fix integ test

* Fix region error

* Add region

* Unified deploying in ModelBuilder

* Latest Container Image (#1545)

* Latest Container Image

* Test Fixes

* Parameterized tests and some logic updates

* Test fixes

* Move to Image URI

* Fixes for unit test

* Fixes for unit test

* Fix codestyle error checks

* Address PR comments

* Address Codestyle errors

* Cleanup ModelTrainer code (#1552)

* Black format

* Codestyle changes

* Codestyle changes

* from __future__ import absolute_import

* DocString formatting

* Black formatting

* Address PR comments

* Noteboook changes and fixes

* feat: add pre-processing and post-processing logic to inference_spec (#1560)

* add pre-processing and post-processing logic to inference_spec

* fix format

* make  accept_type and content_type optional

* remove accept_type and content_type from pre/post processing

* correct typo

* Add Distributed Training Support Model Trainer (#1536)

* Add path to set Additional Settings in ModelTrainer (#1555)

* Checkstyle Fixes

* Address PR comments

* Fixes

* Merge Fixes

* Codestyle Fixes

* Codestyle Fixes

* Codestyle Fixes

* Codestyle Fixes

* Codestyle Fixes

* Update Docstring

---------

Co-authored-by: Erick Benitez-Ramos <[email protected]>
Co-authored-by: pintaoz-aws <[email protected]>
Co-authored-by: Pravali Uppugunduri <[email protected]>
pintaoz-aws added a commit that referenced this issue Dec 4, 2024
* Base model trainer (#1521)

* Base model trainer

* flake8

* add testing notebook

* add param validation & set defaults

* Implement simple train method

* feature: support script mode with local train.sh (#1523)

* feature: support script mode with local train.sh

* Stop tracking train.sh and add it to .gitignore

* update message

* make dir if not exist

* fix docs

* fix: docstyle

* Address comments

* fix hyperparams

* Revert pydantic custom error

* pylint

* Image Spec refactoring and updates (#1525)

* Image Spec refactoring and updates

* Unit tests and update function for Image Spec

* Fix hugging face test

* Fix Tests

* Add unit tests for ModelTrainer (#1527)

* Add unit tests for ModelTrainer

* Flake8

* format

* Add example notebook (#1528)

* Add testing notebook

* format

* use smaller data

* remove large dataset

* update

* pylint

* flake8

* ignore docstyle in directories with test

* format

* format

* Add enviornment variable bootstrapping script (#1530)

* Add enviornment variables scripts

* format

* fix comment

* add docstrings

* fix comment

* feature: add utility function to capture local snapshot (#1524)

* local snapshot

* Update pip list command

* Remove function calls

* Address comments

* Address comments

* Support intelligent parameters (#1540)

* Support intelligent parameters

* fix codestyle

* Revert Image Spec (#1541)

* Cleanup ModelTrainer (#1542)

* General image builder (#1546)

* General image builder

* General image builder

* Fix codestyle

* Fix codestyle

* Move location

* Add warnings

* Add integ tests

* Fix integ test

* Fix integ test

* Fix region error

* Add region

* Latest Container Image (#1545)

* Latest Container Image

* Test Fixes

* Parameterized tests and some logic updates

* Test fixes

* Move to Image URI

* Fixes for unit test

* Fixes for unit test

* Fix codestyle error checks

* Cleanup ModelTrainer code (#1552)

* Single container local mode training

* Add wait argument

* Implement helper funtions

* Add helper functions

* Fix bugs

* Fix codestyle

* feat: add pre-processing and post-processing logic to inference_spec (#1560)

* add pre-processing and post-processing logic to inference_spec

* fix format

* make  accept_type and content_type optional

* remove accept_type and content_type from pre/post processing

* correct typo

* Fix test and codestyle

* Add Distributed Training Support Model Trainer (#1536)

* Add tests

* Add path to set Additional Settings in ModelTrainer (#1555)

* Added example notebook

* Fix codestyle

* Address comments

* resolve merge conflict

* Support multi container local training (#1576)

* Fix codestyle

* Mask Sensitive Env Logs in Container (#1568)

* Fix bug in script mode setup ModelTrainer (#1575)

* Support multi container local training

* Merge branch 'single_container_local_training' into multi_container_local_training

* Update unit tests

---------

Co-authored-by: Erick Benitez-Ramos <[email protected]>

* Remove LocalTrainingJob class

* Bypass pydantic check

* Add example

---------

Co-authored-by: Erick Benitez-Ramos <[email protected]>
Co-authored-by: Gokul Anantha Narayanan <[email protected]>
Co-authored-by: Pravali Uppugunduri <[email protected]>
pintaoz-aws added a commit that referenced this issue Dec 4, 2024
* Base model trainer (#1521)

* Base model trainer

* flake8

* add testing notebook

* add param validation & set defaults

* Implement simple train method

* feature: support script mode with local train.sh (#1523)

* feature: support script mode with local train.sh

* Stop tracking train.sh and add it to .gitignore

* update message

* make dir if not exist

* fix docs

* fix: docstyle

* Address comments

* fix hyperparams

* Revert pydantic custom error

* pylint

* Image Spec refactoring and updates (#1525)

* Image Spec refactoring and updates

* Unit tests and update function for Image Spec

* Fix hugging face test

* Fix Tests

* Add unit tests for ModelTrainer (#1527)

* Add unit tests for ModelTrainer

* Flake8

* format

* Add example notebook (#1528)

* Add testing notebook

* format

* use smaller data

* remove large dataset

* update

* pylint

* flake8

* ignore docstyle in directories with test

* format

* format

* Add enviornment variable bootstrapping script (#1530)

* Add enviornment variables scripts

* format

* fix comment

* add docstrings

* fix comment

* feature: add utility function to capture local snapshot (#1524)

* local snapshot

* Update pip list command

* Remove function calls

* Address comments

* Address comments

* Support intelligent parameters (#1540)

* Support intelligent parameters

* fix codestyle

* Revert Image Spec (#1541)

* Cleanup ModelTrainer (#1542)

* General image builder (#1546)

* General image builder

* General image builder

* Fix codestyle

* Fix codestyle

* Move location

* Add warnings

* Add integ tests

* Fix integ test

* Fix integ test

* Fix region error

* Add region

* Latest Container Image (#1545)

* Latest Container Image

* Test Fixes

* Parameterized tests and some logic updates

* Test fixes

* Move to Image URI

* Fixes for unit test

* Fixes for unit test

* Fix codestyle error checks

* Cleanup ModelTrainer code (#1552)

* feat: add pre-processing and post-processing logic to inference_spec (#1560)

* add pre-processing and post-processing logic to inference_spec

* fix format

* make  accept_type and content_type optional

* remove accept_type and content_type from pre/post processing

* correct typo

* Add Distributed Training Support Model Trainer (#1536)

* Add path to set Additional Settings in ModelTrainer (#1555)

* feature: support HuggingFace models with JumpStart configs

* Update bucket name for the model mapping

* Mask Sensitive Env Logs in Container (#1568)

* Fix unit test

* Fix bug in script mode setup ModelTrainer (#1575)

* Save mapping as attribute

* Fix style issues

* Fix style issues

* Fix: bypass jumpstart mapping when not in endpoint mode

* Skip JS model mapping with env vars or image URI provided

* Revert "Merge branch 'aws:master' into dev-morpheus"

This reverts commit 26a0b0bb37e0343b3287f5c5c484df22726fc858, reversing
changes made to d19d4e178442be4b6e1d07d55498dd76dfac50f0.

* Merge branch 'aws:master' into dev-morpheus

This reverts commit 076442bd83e5ca977bf5b6ce1b716474d2794feb.

* Rebase on master-morpheus

* Fix unit test description

* Fix TEI integ test

* Fix style issue

* Fix style issues

* Fix schema builder integ tests

* Fix TEI integ test

* Fix code style issue

---------

Co-authored-by: Erick Benitez-Ramos <[email protected]>
Co-authored-by: Gokul Anantha Narayanan <[email protected]>
Co-authored-by: pintaoz-aws <[email protected]>
Co-authored-by: Pravali Uppugunduri <[email protected]>
Co-authored-by: Xiong Zeng <[email protected]>
Co-authored-by: Gary Wang <[email protected]>
@nargokul nargokul self-assigned this Feb 4, 2025
@nargokul nargokul closed this as completed Feb 4, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

6 participants