Skip to content

Commit 3ad90d1

Browse files
fix: CI (#234)
1 parent 88ca48a commit 3ad90d1

15 files changed

+56
-124
lines changed

CONTRIBUTING.md

Lines changed: 3 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -11,7 +11,7 @@ information to effectively respond to your bug report or contribution.
1111

1212
We welcome you to use the GitHub issue tracker to report bugs or suggest features.
1313

14-
When filing an issue, please check [existing open](https://github.com/aws-samples/sagemaker-pytorch-containers/issues), or [recently closed](https://github.com/aws-samples/sagemaker-pytorch-containers/issues?utf8=%E2%9C%93&q=is%3Aissue%20is%3Aclosed%20), issues to make sure somebody else hasn't already
14+
When filing an issue, please check [existing open](https://github.com/aws/sagemaker-pytorch-training-toolkit/issues), or [recently closed](https://github.com/aws/sagemaker-pytorch-training-toolkit/issues?utf8=%E2%9C%93&q=is%3Aissue%20is%3Aclosed%20), issues to make sure somebody else hasn't already
1515
reported the issue. Please try to include as much information as you can. Details like these are incredibly useful:
1616

1717
* A reproducible test case or series of steps
@@ -41,7 +41,7 @@ GitHub provides additional document on [forking a repository](https://help.githu
4141

4242

4343
## Finding contributions to work on
44-
Looking at the existing issues is a great way to find something to contribute on. As our projects, by default, use the default GitHub issue labels ((enhancement/bug/duplicate/help wanted/invalid/question/wontfix), looking at any ['help wanted'](https://github.com/aws-samples/sagemaker-pytorch-containers/labels/help%20wanted) issues is a great place to start.
44+
Looking at the existing issues is a great way to find something to contribute on. As our projects, by default, use the default GitHub issue labels ((enhancement/bug/duplicate/help wanted/invalid/question/wontfix), looking at any ['help wanted'](https://github.com/aws/sagemaker-pytorch-training-toolkit/labels/help%20wanted) issues is a great place to start.
4545

4646

4747
## Code of Conduct
@@ -56,6 +56,6 @@ If you discover a potential security issue in this project we ask that you notif
5656

5757
## Licensing
5858

59-
See the [LICENSE](https://github.com/aws-samples/sagemaker-pytorch-containers/blob/master/LICENSE) file for our project's licensing. We will ask you to confirm the licensing of your contribution.
59+
See the [LICENSE](https://github.com/aws/sagemaker-pytorch-training-toolkit/blob/master/LICENSE) file for our project's licensing. We will ask you to confirm the licensing of your contribution.
6060

6161
We may ask you to sign a [Contributor License Agreement (CLA)](http://en.wikipedia.org/wiki/Contributor_License_Agreement) for larger changes.

buildspec-gputests.yml

Lines changed: 30 additions & 36 deletions
Original file line numberDiff line numberDiff line change
@@ -2,8 +2,8 @@ version: 0.2
22

33
env:
44
variables:
5-
FRAMEWORK_VERSION: '1.6.0'
6-
GPU_INSTANCE_TYPE: 'ml.p2.8xlarge'
5+
FRAMEWORK_VERSION: '1.11.0'
6+
GPU_INSTANCE_TYPE: 'ml.p3.16xlarge'
77
ECR_REPO: 'sagemaker-test'
88
GITHUB_REPO: 'sagemaker-pytorch-container'
99
DLC_ACCOUNT: '763104351884'
@@ -26,46 +26,40 @@ phases:
2626
- pip3 install -U -e .[test]
2727

2828
# define tags
29-
- GENERIC_TAG="$FRAMEWORK_VERSION-pytorch-$BUILD_ID"
3029
- DLC_GPU_TAG="$FRAMEWORK_VERSION-dlc-gpu-$BUILD_ID"
31-
32-
# launch remote GPU instance
33-
- prefix='ml.'
34-
- instance_type=${GPU_INSTANCE_TYPE#"$prefix"}
35-
- create-key-pair
36-
- launch-ec2-instance --instance-type $instance_type --ami-name dlami-ubuntu-latest
30+
- echo 'Skipping DLC creation as it is taken care in DLC pipelines'
31+
# # launch remote GPU instance
32+
# - prefix='ml.'
33+
# - instance_type=${GPU_INSTANCE_TYPE#"$prefix"}
34+
# - create-key-pair
35+
# - launch-ec2-instance --instance-type $instance_type --ami-name dlami-ubuntu-latest
3736

3837
# build DLC GPU image because the base DLC image is too big and takes too long to build as part of the test
39-
- python3 setup.py sdist
40-
- build_dir="test/container/$FRAMEWORK_VERSION"
41-
- $(aws ecr get-login --registry-ids $DLC_ACCOUNT --no-include-email --region $AWS_DEFAULT_REGION)
42-
- build_cmd="docker build -f "$build_dir/Dockerfile.dlc.gpu" -t $PREPROD_IMAGE:$DLC_GPU_TAG --build-arg region=$AWS_DEFAULT_REGION ."
43-
- execute-command-if-has-matching-changes "$build_cmd" "test/" "src/*.py" "setup.py" "setup.cfg" "buildspec.yml" "lib/*"
44-
# push DLC GPU image to ECR
45-
- $(aws ecr get-login --registry-ids $ACCOUNT --no-include-email --region $AWS_DEFAULT_REGION)
46-
- push_cmd="docker push $PREPROD_IMAGE:$DLC_GPU_TAG"
47-
- execute-command-if-has-matching-changes "$push_cmd" "test/" "src/*.py" "setup.py" "setup.cfg" "buildspec.yml" "lib/*"
38+
# - python3 setup.py sdist
39+
# - build_dir="test/container/$FRAMEWORK_VERSION"
40+
# - $(aws ecr get-login --registry-ids $DLC_ACCOUNT --no-include-email --region $AWS_DEFAULT_REGION)
41+
# - build_cmd="docker build -f "$build_dir/Dockerfile.dlc.gpu" -t $PREPROD_IMAGE:$DLC_GPU_TAG --build-arg region=$AWS_DEFAULT_REGION ."
42+
# - execute-command-if-has-matching-changes "$build_cmd" "test/" "src/*.py" "setup.py" "setup.cfg" "buildspec.yml" "lib/*"
43+
# # push DLC GPU image to ECR
44+
# - $(aws ecr get-login --registry-ids $ACCOUNT --no-include-email --region $AWS_DEFAULT_REGION)
45+
# - push_cmd="docker push $PREPROD_IMAGE:$DLC_GPU_TAG"
46+
# - execute-command-if-has-matching-changes "$push_cmd" "test/" "src/*.py" "setup.py" "setup.cfg" "buildspec.yml" "lib/*"
4847

49-
# run GPU local integration tests
50-
- printf "$SETUP_CMDS" > $SETUP_FILE
51-
- generic_cmd="pytest test/integration/local --build-image --push-image --region $AWS_DEFAULT_REGION --docker-base-name $ECR_REPO --dockerfile-type pytorch --aws-id $ACCOUNT --framework-version $FRAMEWORK_VERSION --processor gpu --tag $GENERIC_TAG"
52-
- test_cmd="remote-test --github-repo $GITHUB_REPO --test-cmd \"$generic_cmd\" --setup-file $SETUP_FILE --pr-number \"$PR_NUM\""
53-
- execute-command-if-has-matching-changes "$test_cmd" "test/" "src/*.py" "setup.py" "setup.cfg" "buildspec.yml" "lib/*"
54-
- dlc_cmd="pytest test/integration/local --region $AWS_DEFAULT_REGION --docker-base-name $ECR_REPO --dockerfile-type dlc.gpu --aws-id $ACCOUNT --framework-version $FRAMEWORK_VERSION --processor gpu --tag $DLC_GPU_TAG"
55-
- test_cmd="remote-test --github-repo $GITHUB_REPO --test-cmd \"$dlc_cmd\" --setup-file $SETUP_FILE --pr-number \"$PR_NUM\" --skip-setup"
56-
- execute-command-if-has-matching-changes "$test_cmd" "test/" "src/*.py" "setup.py" "setup.cfg" "buildspec.yml" "lib/*"
48+
# # run GPU local integration tests
49+
# - printf "$SETUP_CMDS" > $SETUP_FILE
50+
# - dlc_cmd="pytest test/integration/local --region $AWS_DEFAULT_REGION --docker-base-name $ECR_REPO --dockerfile-type dlc.gpu --aws-id $ACCOUNT --framework-version $FRAMEWORK_VERSION --processor gpu --tag $DLC_GPU_TAG"
51+
# - test_cmd="remote-test --github-repo $GITHUB_REPO --test-cmd \"$dlc_cmd\" --setup-file $SETUP_FILE --pr-number \"$PR_NUM\" --skip-setup"
52+
# - execute-command-if-has-matching-changes "$test_cmd" "test/" "src/*.py" "setup.py" "setup.cfg" "buildspec.yml" "lib/*"
5753

58-
# run GPU sagemaker integration tests
59-
- test_cmd="pytest -n 10 test/integration/sagemaker --region $AWS_DEFAULT_REGION --docker-base-name $ECR_REPO --dockerfile-type pytorch --aws-id $ACCOUNT --framework-version $FRAMEWORK_VERSION --processor gpu --instance-type $GPU_INSTANCE_TYPE --tag $GENERIC_TAG"
60-
- execute-command-if-has-matching-changes "$test_cmd" "test/" "src/*.py" "setup.py" "setup.cfg" "buildspec.yml" "lib/*"
61-
- test_cmd="pytest -n 10 test/integration/sagemaker --region $AWS_DEFAULT_REGION --docker-base-name $ECR_REPO --dockerfile-type dlc.gpu --aws-id $ACCOUNT --framework-version $FRAMEWORK_VERSION --processor gpu --instance-type $GPU_INSTANCE_TYPE --tag $DLC_GPU_TAG"
62-
- execute-command-if-has-matching-changes "$test_cmd" "test/" "src/*.py" "setup.py" "setup.cfg" "buildspec.yml" "lib/*"
54+
# # run GPU sagemaker integration tests
55+
# - test_cmd="pytest -n 10 test/integration/sagemaker --region $AWS_DEFAULT_REGION --docker-base-name $ECR_REPO --dockerfile-type dlc.gpu --aws-id $ACCOUNT --framework-version $FRAMEWORK_VERSION --processor gpu --instance-type $GPU_INSTANCE_TYPE --tag $DLC_GPU_TAG"
56+
# - execute-command-if-has-matching-changes "$test_cmd" "test/" "src/*.py" "setup.py" "setup.cfg" "buildspec.yml" "lib/*"
6357

6458
finally:
59+
- echo 'Done'
6560
# shut down remote GPU instance
66-
- cleanup-gpu-instances
67-
- cleanup-key-pairs
61+
# - cleanup-gpu-instances
62+
# - cleanup-key-pairs
6863

69-
# remove ECR image
70-
- aws ecr batch-delete-image --repository-name $ECR_REPO --region $AWS_DEFAULT_REGION --image-ids imageTag=$GENERIC_TAG
71-
- aws ecr batch-delete-image --repository-name $ECR_REPO --region $AWS_DEFAULT_REGION --image-ids imageTag=$DLC_GPU_TAG
64+
# # remove ECR image
65+
# - aws ecr batch-delete-image --repository-name $ECR_REPO --region $AWS_DEFAULT_REGION --image-ids imageTag=$DLC_GPU_TAG

buildspec-release.yml

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -12,7 +12,7 @@ phases:
1212
# run unit tests
1313
- AWS_ACCESS_KEY_ID= AWS_SECRET_ACCESS_KEY= AWS_SESSION_TOKEN=
1414
AWS_CONTAINER_CREDENTIALS_RELATIVE_URI= AWS_DEFAULT_REGION=
15-
tox -e py27,py36,py37 -- test/unit
15+
tox -e py38 -- test/unit
1616

1717
# run local integ tests
1818
#- $(aws ecr get-login --no-include-email --region us-west-2)

buildspec-unittests.yml

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -13,4 +13,4 @@ phases:
1313
- tox -e flake8,twine
1414

1515
# run unit tests
16-
- tox -e py27,py36,py37 test/unit
16+
- tox -e py38 test/unit

buildspec.yml

Lines changed: 12 additions & 16 deletions
Original file line numberDiff line numberDiff line change
@@ -2,7 +2,7 @@ version: 0.2
22

33
env:
44
variables:
5-
FRAMEWORK_VERSION: '1.6.0'
5+
FRAMEWORK_VERSION: '1.11.0'
66
CPU_INSTANCE_TYPE: 'ml.c4.xlarge'
77
ECR_REPO: 'sagemaker-test'
88

@@ -21,22 +21,18 @@ phases:
2121
- pip3 install -U -e .[test]
2222

2323
# define tags
24-
- GENERIC_TAG="$FRAMEWORK_VERSION-pytorch-$BUILD_ID"
2524
- DLC_CPU_TAG="$FRAMEWORK_VERSION-dlc-cpu-$BUILD_ID"
25+
- echo 'Skipping DLC creation as it is taken care in DLC pipelines'
26+
# # run local CPU integration tests (build and push the image to ECR repo)
27+
# - test_cmd="pytest test/integration/local --build-image --push-image --dockerfile-type dlc.cpu --region $AWS_DEFAULT_REGION --docker-base-name $ECR_REPO --aws-id $ACCOUNT --framework-version $FRAMEWORK_VERSION --processor cpu --tag $DLC_CPU_TAG"
28+
# # execute-command-if-has-matching-changes "$test_cmd" "test/" "src/*.py" "setup.py" "setup.cfg" "buildspec.yml" "lib/*"
29+
# - "$test_cmd" "test/" "src/*.py" "setup.py" "setup.cfg" "buildspec.yml" "lib/*"
2630

27-
# run local CPU integration tests (build and push the image to ECR repo)
28-
- test_cmd="pytest test/integration/local --build-image --push-image --dockerfile-type pytorch --region $AWS_DEFAULT_REGION --docker-base-name $ECR_REPO --aws-id $ACCOUNT --framework-version $FRAMEWORK_VERSION --processor cpu --tag $GENERIC_TAG"
29-
- execute-command-if-has-matching-changes "$test_cmd" "test/" "src/*.py" "setup.py" "setup.cfg" "buildspec.yml" "lib/*"
30-
- test_cmd="pytest test/integration/local --build-image --push-image --dockerfile-type dlc.cpu --region $AWS_DEFAULT_REGION --docker-base-name $ECR_REPO --aws-id $ACCOUNT --framework-version $FRAMEWORK_VERSION --processor cpu --tag $DLC_CPU_TAG"
31-
- execute-command-if-has-matching-changes "$test_cmd" "test/" "src/*.py" "setup.py" "setup.cfg" "buildspec.yml" "lib/*"
32-
33-
# run CPU sagemaker integration tests
34-
- test_cmd="pytest -n 10 test/integration/sagemaker --region $AWS_DEFAULT_REGION --docker-base-name $ECR_REPO --dockerfile-type pytorch --aws-id $ACCOUNT --framework-version $FRAMEWORK_VERSION --processor cpu --instance-type $CPU_INSTANCE_TYPE --tag $GENERIC_TAG"
35-
- execute-command-if-has-matching-changes "$test_cmd" "test/" "src/*.py" "setup.py" "setup.cfg" "buildspec.yml" "lib/*"
36-
- test_cmd="pytest -n 10 test/integration/sagemaker --region $AWS_DEFAULT_REGION --docker-base-name $ECR_REPO --dockerfile-type dlc.cpu --aws-id $ACCOUNT --framework-version $FRAMEWORK_VERSION --processor cpu --instance-type $CPU_INSTANCE_TYPE --tag $DLC_CPU_TAG"
37-
- execute-command-if-has-matching-changes "$test_cmd" "test/" "src/*.py" "setup.py" "setup.cfg" "buildspec.yml" "lib/*"
31+
# # run CPU sagemaker integration tests
32+
# - test_cmd="pytest -n 10 test/integration/sagemaker --region $AWS_DEFAULT_REGION --docker-base-name $ECR_REPO --dockerfile-type dlc.cpu --aws-id $ACCOUNT --framework-version $FRAMEWORK_VERSION --processor cpu --instance-type $CPU_INSTANCE_TYPE --tag $DLC_CPU_TAG"
33+
# - execute-command-if-has-matching-changes "$test_cmd" "test/" "src/*.py" "setup.py" "setup.cfg" "buildspec.yml" "lib/*"
3834

3935
finally:
40-
# remove ECR image
41-
- aws ecr batch-delete-image --repository-name $ECR_REPO --region $AWS_DEFAULT_REGION --image-ids imageTag=$GENERIC_TAG
42-
- aws ecr batch-delete-image --repository-name $ECR_REPO --region $AWS_DEFAULT_REGION --image-ids imageTag=$DLC_CPU_TAG
36+
- echo 'Done'
37+
# # remove ECR image
38+
# - aws ecr batch-delete-image --repository-name $ECR_REPO --region $AWS_DEFAULT_REGION --image-ids imageTag=$DLC_CPU_TAG

setup.py

Lines changed: 3 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -48,12 +48,12 @@ def read(fname):
4848
"Natural Language :: English",
4949
"License :: OSI Approved :: Apache Software License",
5050
"Programming Language :: Python",
51-
'Programming Language :: Python :: 2.7',
52-
'Programming Language :: Python :: 3.6',
5351
'Programming Language :: Python :: 3.7',
52+
'Programming Language :: Python :: 3.8',
53+
'Programming Language :: Python :: 3.9',
5454
],
5555

56-
install_requires=['retrying', 'sagemaker-training>=3.7.0', 'six>=1.12.0'],
56+
install_requires=['retrying', 'sagemaker-training>=4.2.0', 'six>=1.12.0'],
5757
extras_require={
5858
'test': test_dependencies
5959
},

test/conftest.py

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -46,13 +46,13 @@ def pytest_addoption(parser):
4646
parser.addoption('--build-image', '-B', action='store_true')
4747
parser.addoption('--push-image', '-P', action='store_true')
4848
parser.addoption('--dockerfile-type', '-T', choices=['dlc.cpu', 'dlc.gpu', 'pytorch'],
49-
default=None)
49+
default='pytorch')
5050
parser.addoption('--dockerfile', '-D', default=None)
5151
parser.addoption('--aws-id', default=None)
5252
parser.addoption('--instance-type')
5353
parser.addoption('--docker-base-name', default='sagemaker-pytorch-training')
5454
parser.addoption('--region', default='us-west-2')
55-
parser.addoption('--framework-version', default="1.4.0")
55+
parser.addoption('--framework-version', default="1.10.0")
5656
parser.addoption('--py-version', choices=['2', '3'], default=str(sys.version_info.major))
5757
parser.addoption('--processor', choices=['gpu', 'cpu'], default='cpu')
5858
# If not specified, will default to {framework-version}-{processor}-py{py-version}

test/container/1.6.0/Dockerfile.dlc.gpu renamed to test/container/1.11.0/Dockerfile.dlc.gpu

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1,5 +1,5 @@
11
ARG region
2-
from 763104351884.dkr.ecr.$region.amazonaws.com/pytorch-training:1.6.0-gpu-py36-cu110-ubuntu18.04
2+
FROM 763104351884.dkr.ecr.$region.amazonaws.com/pytorch-training:1.11.0-gpu-py38-cu113-ubuntu20.04-sagemaker
33

44
COPY dist/sagemaker_pytorch_training-*.tar.gz /sagemaker_pytorch_training.tar.gz
55
RUN pip install --upgrade --no-cache-dir /sagemaker_pytorch_training.tar.gz && \

test/container/1.6.0/Dockerfile.pytorch renamed to test/container/1.11.0/Dockerfile.pytorch

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1,4 +1,4 @@
1-
from pytorch/pytorch:1.6.0-cuda10.1-cudnn7-runtime
1+
FROM pytorch/pytorch:1.11.0-cuda11.3-cudnn8-runtime
22

33
RUN apt-get update && apt-get install -y --no-install-recommends \
44
jq \

test/container/1.4.0/Dockerfile.dlc.cpu

Lines changed: 0 additions & 10 deletions
This file was deleted.

test/container/1.4.0/Dockerfile.dlc.gpu

Lines changed: 0 additions & 28 deletions
This file was deleted.

test/container/1.4.0/Dockerfile.pytorch

Lines changed: 0 additions & 20 deletions
This file was deleted.

test/integration/sagemaker/test_horovod.py

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -27,7 +27,7 @@
2727
@pytest.mark.skip_generic
2828
@pytest.mark.parametrize(
2929
"instances, processes, train_instance_type",
30-
[(1, 8, "ml.p2.8xlarge"), (2, 4, "ml.p3.8xlarge")],
30+
[(2, 4, "ml.p3.8xlarge")],
3131
)
3232
def test_horovod_simple(
3333
instances,

tox.ini

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -4,7 +4,7 @@
44
# and then run "tox" from this directory.
55

66
[tox]
7-
envlist = flake8,twine,py27,py36,py37
7+
envlist = flake8,twine,py38
88
skip_missing_interpreters = False
99

1010
[flake8]

0 commit comments

Comments
 (0)