Skip to content

Commit 5d4537a

Browse files
SergTogulSergey Togulevtejaschumbalkar
authored
[pytorch][build]Upgrade python version in PT 1.6 containers (#860)
* Upgrade python version in PT 1.6 containers * Fix path for inference 1.6 cu 101 image * Installing python from conda-forge * Updated pyyaml version. Removed 39611 from ignored issues * Installing ruamel-yaml to fix pip check's conda 4.9.2 requires ruamel-yaml, which is not installed. * Reformated test_safety_check.py * Enabling all tests * Skipping smprofiler test for pt 1.6 * Using DLG test 0.4.x since 0.5 may work only for pt 1.7+ * Using DLG test 0.4.x in eks since 0.5 may work only for pt 1.7+ * Add conditions for dgl test based on Pytorch version * Fixed copy-paste error * Fixed couple typos * A couple updates after review * Disabling safety check in PRs * Rolling back buildspec.yml changes * Rolling back config changes Co-authored-by: Sergey Togulev <[email protected]> Co-authored-by: Tejas Chumbalkar <[email protected]>
1 parent 6449495 commit 5d4537a

File tree

11 files changed

+74
-43
lines changed

11 files changed

+74
-43
lines changed

pytorch/inference/docker/1.6.0/py3/Dockerfile.cpu

Lines changed: 7 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -5,7 +5,7 @@ LABEL dlc_major_version="1"
55
LABEL com.amazonaws.sagemaker.capabilities.accept-bind-to-port=true
66
LABEL com.amazonaws.sagemaker.capabilities.multi-models=true
77

8-
ARG PYTHON_VERSION=3.6.10
8+
ARG PYTHON_VERSION=3.6.13
99
ARG OPEN_MPI_VERSION=4.0.1
1010
ARG TS_VERSION="0.2.1=py36_0"
1111
ARG PT_INFERENCE_URL=https://aws-pytorch-binaries.s3-us-west-2.amazonaws.com/r1.6.0_inference/20200727-223446/b0251e7e070e57f34ee08ac59ab4710081b41918/cpu/torch-1.6.0-cp36-cp36m-manylinux1_x86_64.whl
@@ -52,8 +52,9 @@ RUN curl -L -o ~/miniconda.sh https://repo.continuum.io/miniconda/Miniconda3-lat
5252
&& ~/miniconda.sh -b -p /opt/conda \
5353
&& rm ~/miniconda.sh \
5454
&& /opt/conda/bin/conda update conda \
55-
&& /opt/conda/bin/conda install -y \
55+
&& /opt/conda/bin/conda install -c conda-forge \
5656
python=$PYTHON_VERSION \
57+
&& /opt/conda/bin/conda install -y \
5758
cython==0.29.12 \
5859
ipython==7.7.0 \
5960
mkl-include==2019.4 \
@@ -123,8 +124,10 @@ RUN pip install --no-cache-dir "sagemaker-pytorch-inference>=2"
123124

124125
RUN curl https://aws-dlc-licenses.s3.amazonaws.com/pytorch-1.6.0/license.txt -o /license.txt
125126

126-
RUN conda install -y -c conda-forge pyyaml==5.3.1
127-
RUN pip install pillow==7.2.0 "awscli<2"
127+
RUN conda install -y -c conda-forge "pyyaml>5.4,<5.5"
128+
RUN pip install pillow==7.2.0 \
129+
"awscli<2" \
130+
ruamel-yaml
128131

129132
EXPOSE 8080 8081
130133
ENTRYPOINT ["python", "/usr/local/bin/dockerd-entrypoint.py"]

pytorch/inference/docker/1.6.0/py3/cu101/Dockerfile.gpu

Lines changed: 5 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -6,7 +6,7 @@ LABEL dlc_major_version="1"
66
LABEL com.amazonaws.sagemaker.capabilities.accept-bind-to-port=true
77

88
# Add arguments to achieve the version, python and url
9-
ARG PYTHON_VERSION=3.6.10
9+
ARG PYTHON_VERSION=3.6.13
1010
ARG OPEN_MPI_VERSION=4.0.1
1111
ARG TS_VERSION="0.2.1=py36_0"
1212
ARG PT_INFERENCE_URL=https://aws-pytorch-binaries.s3-us-west-2.amazonaws.com/r1.6.0_inference/20200727-223446/b0251e7e070e57f34ee08ac59ab4710081b41918/gpu/torch-1.6.0-cp36-cp36m-manylinux1_x86_64.whl
@@ -79,8 +79,9 @@ RUN curl -L -o ~/miniconda.sh https://repo.continuum.io/miniconda/Miniconda3-lat
7979
&& ~/miniconda.sh -b -p /opt/conda \
8080
&& rm ~/miniconda.sh \
8181
&& /opt/conda/bin/conda update conda \
82-
&& /opt/conda/bin/conda install -y \
82+
&& /opt/conda/bin/conda install -c conda-forge \
8383
python=$PYTHON_VERSION \
84+
&& /opt/conda/bin/conda install -y \
8485
cython==0.29.12 \
8586
ipython==7.7.0 \
8687
mkl-include==2019.4 \
@@ -106,6 +107,7 @@ RUN conda install -c \
106107
&& ln -s /opt/conda/bin/pip /usr/local/bin/pip3 \
107108
&& pip install packaging==20.4 \
108109
enum-compat==0.0.3 \
110+
ruamel-yaml \
109111
&& conda install -y -c pytorch torchserve=$TS_VERSION \
110112
&& conda install -y -c pytorch torch-model-archiver=$TS_VERSION
111113

@@ -133,7 +135,7 @@ RUN pip install --no-cache-dir "sagemaker-pytorch-inference>=2"
133135

134136
RUN curl https://aws-dlc-licenses.s3.amazonaws.com/pytorch-1.6.0/license.txt -o /license.txt
135137

136-
RUN conda install -y -c conda-forge pyyaml==5.3.1
138+
RUN conda install -y -c conda-forge "pyyaml>5.4,<5.5"
137139
RUN pip install pillow==7.2.0 "awscli<2"
138140

139141
EXPOSE 8080 8081

pytorch/training/docker/1.6.0/py3/Dockerfile.cpu

Lines changed: 4 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -4,7 +4,7 @@ LABEL maintainer="Amazon AI"
44
LABEL dlc_major_version="1"
55

66
# Add arguments to achieve the version, python and url
7-
ARG PYTHON_VERSION=3.6.10
7+
ARG PYTHON_VERSION=3.6.13
88
ARG OPEN_MPI_VERSION=4.0.1
99

1010
# The smdebug pipeline relies for following format to perform string replace and trigger DLC pipeline for validating
@@ -73,8 +73,9 @@ RUN curl -L -o ~/miniconda.sh https://repo.continuum.io/miniconda/Miniconda3-lat
7373
&& chmod +x ~/miniconda.sh \
7474
&& ~/miniconda.sh -b -p /opt/conda \
7575
&& rm ~/miniconda.sh \
76-
&& /opt/conda/bin/conda install -y -c anaconda \
76+
&& /opt/conda/bin/conda install -c conda-forge \
7777
python=$PYTHON_VERSION \
78+
&& /opt/conda/bin/conda install -y -c anaconda \
7879
numpy \
7980
ipython \
8081
mkl \
@@ -102,6 +103,7 @@ RUN pip install --upgrade pip --trusted-host pypi.org --trusted-host files.pytho
102103
fastai==1.0.61 \
103104
scipy==1.2.2 \
104105
click \
106+
ruamel-yaml \
105107
"cryptography>3.2" \
106108
smdebug==${SMDEBUG_VERSION} \
107109
smclarify \

pytorch/training/docker/1.6.0/py3/cu101/Dockerfile.gpu

Lines changed: 4 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -4,7 +4,7 @@ FROM nvidia/cuda:10.1-cudnn7-devel-ubuntu16.04
44
LABEL maintainer="Amazon AI"
55
LABEL dlc_major_version="1"
66

7-
ARG PYTHON_VERSION=3.6.10
7+
ARG PYTHON_VERSION=3.6.13
88
ARG OPEN_MPI_VERSION=4.0.1
99
ARG CUBLAS_VERSION=10.2.1.243-1_amd64
1010
ARG OPEN_MPI_PATH=/home/.openmpi
@@ -102,8 +102,9 @@ RUN ompi_info --parsable --all | grep mpi_built_with_cuda_support:value \
102102
&& chmod +x ~/miniconda.sh \
103103
&& ~/miniconda.sh -b -p /opt/conda \
104104
&& rm ~/miniconda.sh \
105-
&& /opt/conda/bin/conda install -y -c anaconda \
105+
&& /opt/conda/bin/conda install -c conda-forge \
106106
python=$PYTHON_VERSION \
107+
&& /opt/conda/bin/conda install -y -c anaconda \
107108
numpy \
108109
ipython \
109110
mkl \
@@ -149,6 +150,7 @@ RUN pip install \
149150
Pillow \
150151
scipy \
151152
click \
153+
ruamel-yaml \
152154
mpi4py==3.0.3 \
153155
cmake==3.18.2.post1 \
154156
&& pip install --no-cache-dir -U ${PT_TRAINING_URL} \

pytorch/training/docker/1.6.0/py3/cu110/Dockerfile.gpu

Lines changed: 4 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -4,7 +4,7 @@ FROM nvidia/cuda:11.0-cudnn8-devel-ubuntu18.04
44
LABEL maintainer="Amazon AI"
55
LABEL dlc_major_version="3"
66

7-
ARG PYTHON_VERSION=3.6.10
7+
ARG PYTHON_VERSION=3.6.13
88
ARG OPEN_MPI_VERSION=4.0.1
99
ARG CUBLAS_VERSION=11.2.0.252-1_amd64
1010
ARG OPEN_MPI_PATH=/home/.openmpi
@@ -103,8 +103,9 @@ RUN ompi_info --parsable --all | grep mpi_built_with_cuda_support:value \
103103
&& chmod +x ~/miniconda.sh \
104104
&& ~/miniconda.sh -b -p ${CONDA_PREFIX} \
105105
&& rm ~/miniconda.sh \
106-
&& ${CONDA_PREFIX}/bin/conda install -y -c anaconda \
106+
&& ${CONDA_PREFIX}/bin/conda install -c conda-forge \
107107
python=$PYTHON_VERSION \
108+
&& ${CONDA_PREFIX}/bin/conda install -y -c anaconda \
108109
numpy \
109110
ipython \
110111
mkl \
@@ -161,6 +162,7 @@ RUN pip install --no-cache-dir -U \
161162
scipy \
162163
pybind11 \
163164
click \
165+
ruamel-yaml \
164166
mpi4py==3.0.3 \
165167
cmake==3.18.2.post1 \
166168
torchnet \

test/dlc_tests/conftest.py

Lines changed: 12 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -17,10 +17,10 @@
1717

1818
from test import test_utils
1919
from test.test_utils import (
20-
is_benchmark_dev_context, get_framework_and_version_from_tag, get_job_type_from_image, is_tf_version,
21-
is_below_tf_version, is_below_mxnet_version,
20+
is_benchmark_dev_context, get_framework_and_version_from_tag, get_job_type_from_image, is_tf_version,
21+
is_below_tf_version, is_below_mxnet_version, is_below_pytorch_version,
2222
DEFAULT_REGION, P3DN_REGION, UBUNTU_18_BASE_DLAMI_US_EAST_1, UBUNTU_18_BASE_DLAMI_US_WEST_2,
23-
PT_GPU_PY3_BENCHMARK_IMAGENET_AMI_US_EAST_1, KEYS_TO_DESTROY_FILE
23+
PT_GPU_PY3_BENCHMARK_IMAGENET_AMI_US_EAST_1, KEYS_TO_DESTROY_FILE
2424
)
2525
from test.test_utils.test_reporting import TestReportGenerator
2626

@@ -327,6 +327,11 @@ def mx18_and_above_only():
327327
pass
328328

329329

330+
@pytest.fixture(scope="session")
331+
def pt17_and_above_only():
332+
pass
333+
334+
330335
def framework_version_within_limit(metafunc_obj, image):
331336
"""
332337
Test all pytest fixtures for TensorFlow version limits, and return True if all requirements are satisfied
@@ -347,6 +352,10 @@ def framework_version_within_limit(metafunc_obj, image):
347352
mx18_requirement_failed = "mx18_and_above_only" in metafunc_obj.fixturenames and is_below_mxnet_version("1.8", image)
348353
if mx18_requirement_failed :
349354
return False
355+
if image_framework_name == "pytorch" :
356+
pt17_requirement_failed = "pt17_and_above_only" in metafunc_obj.fixturenames and is_below_pytorch_version("1.7", image)
357+
if pt17_requirement_failed :
358+
return False
350359
return True
351360

352361

test/dlc_tests/container_tests/bin/dgl_tests/testPyTorchDGL

Lines changed: 8 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -4,11 +4,16 @@ HOME_DIR=/test
44
BIN_DIR=${HOME_DIR}/bin
55
LOG_DIR=${HOME_DIR}/logs
66

7+
OLD_DGL_VERSION=0.4.x
8+
NEW_DGL_VERSION=0.5.x
9+
710
DGL_RELEASE_TAG=$(python -c "import dgl; dgl_versions = dgl.__version__.split('.'); print(f'{dgl_versions[0]}.{dgl_versions[1]}.x')")
811

9-
# hard coded test files from 0.5.x branch. Change it back once 0.6.x is released
10-
# git clone -b ${DGL_RELEASE_TAG} https://github.com/dmlc/dgl.git ${HOME_DIR}/artifacts/dgl
11-
git clone -b 0.5.x https://github.com/dmlc/dgl.git ${HOME_DIR}/artifacts/dgl
12+
if [[ ${DGL_RELEASE_TAG} == ${OLD_DGL_VERSION} ]]; then
13+
git clone -b ${OLD_DGL_VERSION} https://github.com/dmlc/dgl.git ${HOME_DIR}/artifacts/dgl
14+
else
15+
git clone -b ${NEW_DGL_VERSION} https://github.com/dmlc/dgl.git ${HOME_DIR}/artifacts/dgl
16+
fi
1217
${BIN_DIR}/dgl_tests/testDGLHelper python pytorch || exit 1
1318

1419
exit 0

test/dlc_tests/ec2/test_smdebug.py

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -41,7 +41,7 @@ def test_smdebug_gpu(training, ec2_connection, region, ec2_instance_type, gpu_on
4141
@pytest.mark.model("mnist")
4242
@pytest.mark.parametrize("ec2_instance_type", SMDEBUG_EC2_GPU_INSTANCE_TYPE, indirect=True)
4343
@pytest.mark.flaky(reruns=0)
44-
def test_smprofiler_gpu(training, ec2_connection, region, ec2_instance_type, gpu_only, py3_only, tf23_and_above_only):
44+
def test_smprofiler_gpu(training, ec2_connection, region, ec2_instance_type, gpu_only, py3_only, tf23_and_above_only, pt17_and_above_only):
4545
# Running the profiler tests for pytorch and tensorflow2 frameworks only.
4646
# This code needs to be modified past reInvent 2020
4747
framework = get_framework_from_image_uri(training)

test/dlc_tests/eks/pytorch/training/test_eks_pytorch_training.py

Lines changed: 5 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -10,7 +10,7 @@
1010
from retrying import retry
1111

1212
import test.test_utils.eks as eks_utils
13-
from test.test_utils import is_pr_context, SKIP_PR_REASON
13+
from test.test_utils import is_pr_context, SKIP_PR_REASON, is_below_pytorch_version
1414
from test.test_utils import get_framework_and_version_from_tag, get_cuda_version_from_tag
1515
from packaging.version import Version
1616

@@ -89,7 +89,10 @@ def test_eks_pytorch_dgl_single_node_training(pytorch_training, py3_only):
8989
yaml_path = os.path.join(os.sep, "tmp", f"pytorch_single_node_training_dgl_{rand_int}.yaml")
9090
pod_name = f"pytorch-single-node-training-dgl-{rand_int}"
9191

92-
dgl_branch = "0.5.x"
92+
if is_below_pytorch_version("1.7", pytorch_training):
93+
dgl_branch = "0.4.x"
94+
else:
95+
dgl_branch = "0.5.x"
9396

9497
args = (
9598
f"git clone -b {dgl_branch} https://github.com/dmlc/dgl.git && "

test/dlc_tests/sanity/test_safety_check.py

Lines changed: 11 additions & 21 deletions
Original file line numberDiff line numberDiff line change
@@ -28,25 +28,20 @@
2828
# 38449, 38450, 38451, 38452: for shipping pillow<=6.2.2 - the last available version for py2
2929
# 35015: for shipping pycrypto<=2.6.1 - the last available version for py2
3030
"py2": ['38449', '38450', '38451', '38452', '35015'],
31-
# for shipping pyyaml v5.3.1 - blocked on upgrading to v5.4.1 due to dependency on awscli
32-
"py3": ['39611']
31+
"py3": []
3332
},
3433
"inference": {
3534
# for shipping pillow<=6.2.2 - the last available version for py2
3635
"py2": ['38449', '38450', '38451', '38452'],
37-
# for shipping pyyaml v5.3.1 - blocked on upgrading to v5.4.1 due to dependency on awscli
38-
"py3": ['39611']
36+
"py3": []
3937
},
4038
"inference-eia": {
4139
# for shipping pillow<=6.2.2 - the last available version for py2
4240
"py2": ['38449', '38450', '38451', '38452'],
43-
# for shipping pyyaml v5.3.1 - blocked on upgrading to v5.4.1 due to dependency on awscli
44-
"py3": ['39611']
41+
"py3": []
4542
},
4643
"inference-neuron": {
4744
"py3": [
48-
# for shipping pyyaml v5.3.1 - blocked on upgrading to v5.4.1 due to dependency on awscli
49-
'39611',
5045
# 39409, 39408, 39407, 39406: TF 1.15.5 is on par with TF 2.0.4, 2.1.3, 2.2.2, 2.3.2 in security patches
5146
'39409', '39408', '39407', '39406',
5247
],
@@ -58,24 +53,20 @@
5853
"py2": ['36810',
5954
# for shipping pillow<=6.2.2 - the last available version for py2
6055
'38449', '38450', '38451', '38452'],
61-
# for shipping pyyaml v5.3.1 - blocked on upgrading to v5.4.1 due to dependency on awscli
62-
"py3": ['39611']
56+
"py3": []
6357
},
6458
"inference": {
6559
# for shipping pillow<=6.2.2 - the last available version for py2
6660
"py2": ['38449', '38450', '38451', '38452'],
67-
# for shipping pyyaml v5.3.1 - blocked on upgrading to v5.4.1 due to dependency on awscli
68-
"py3": ['39611']
61+
"py3": []
6962
},
7063
"training": {
7164
# for shipping pillow<=6.2.2 - the last available version for py2
7265
"py2": ['38449', '38450', '38451', '38452'],
73-
# for shipping pyyaml v5.3.1 - blocked on upgrading to v5.4.1 due to dependency on awscli
74-
"py3": ['39611']
66+
"py3": []
7567
},
76-
"inference-neuron":{
77-
# for shipping pyyaml v5.3.1 - blocked on upgrading to v5.4.1 due to dependency on awscli
78-
"py3": ['39611']
68+
"inference-neuron": {
69+
"py3": []
7970
}
8071
},
8172
"pytorch": {
@@ -92,9 +83,8 @@
9283
"inference-eia": {
9384
"py3": []
9485
},
95-
"inference-neuron":{
96-
# for shipping pyyaml v5.3.1 - blocked on upgrading to v5.4.1 due to dependency on awscli
97-
"py3": ['39611']
86+
"inference-neuron": {
87+
"py3": []
9888
}
9989
}
10090
}
@@ -135,7 +125,7 @@ def _get_latest_package_version(package):
135125
@pytest.mark.model("N/A")
136126
@pytest.mark.skipif(not is_dlc_cicd_context(), reason="Skipping test because it is not running in dlc cicd infra")
137127
@pytest.mark.skipif(not is_mainline_context(),
138-
reason="Skipping the test to decrease the number of calls to the Safety Check DB. "
128+
reason="Skipping the test to decrease the number of calls to the Safety Check DB. "
139129
"Test will be executed in the 'mainline' pipeline only")
140130
def test_safety(image):
141131
"""

test/test_utils/__init__.py

Lines changed: 13 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -112,6 +112,19 @@ def is_below_mxnet_version(version_upper_bound, image_uri):
112112
return image_framework_name == "mxnet" and image_framework_version in required_version_specifier_set
113113

114114

115+
def is_below_pytorch_version(version_upper_bound, image_uri):
116+
"""
117+
Validate that image_uri has framework version strictly less than version_upper_bound
118+
119+
:param version_upper_bound: str Framework version that image_uri is required to be below
120+
:param image_uri: str ECR Image URI for the image to be validated
121+
:return: bool True if image_uri has framework version less than version_upper_bound, else False
122+
"""
123+
image_framework_name, image_framework_version = get_framework_and_version_from_tag(image_uri)
124+
required_version_specifier_set = SpecifierSet(f"<{version_upper_bound}")
125+
return image_framework_name == "pytorch" and image_framework_version in required_version_specifier_set
126+
127+
115128
def get_repository_local_path():
116129
git_repo_path = os.getcwd().split("/test/")[0]
117130
return git_repo_path

0 commit comments

Comments
 (0)