From 419e23ece675d3b11bfa7c2b27b39dbb84b15b97 Mon Sep 17 00:00:00 2001 From: Manuel Kaufmann Date: Thu, 15 Oct 2020 18:54:34 +0200 Subject: [PATCH 1/6] Design document for new Docker images structure - Explains a little what's the current situations - Mention problems we already had - Propose a new and different naming for images - Ideas to support custom Docker images --- docs/development/design/build-images.rst | 216 +++++++++++++++++++++++ 1 file changed, 216 insertions(+) create mode 100644 docs/development/design/build-images.rst diff --git a/docs/development/design/build-images.rst b/docs/development/design/build-images.rst new file mode 100644 index 00000000000..98bec67b888 --- /dev/null +++ b/docs/development/design/build-images.rst @@ -0,0 +1,216 @@ +Build Images +============ + +This document describes how Read the Docs uses the Docker Build Images and how they are named. +Besides, it proposes a new way to create and name them to allow +sharing as many image layers as possible to support more customization while keeping the stability. + + +Introduction +------------ + +We use Docker images to build user's documentation. +Each time a build is triggered, one of our VMs picks the task +and go through different steps: + +#. run some application code to spin up a Docker image into a container +#. execute git inside the container to clone the repository +#. analyze and parse files from the repository *outside* the container +#. create the environment and install docs' dependencies inside the container +#. execute build commnands inside the container +#. push the output generated by builds commands to the storage + + +*All* those steps depends on specific commands versions: ``git``, ``python``, ``virtualenv``, ``conda``, etc. +Currently, we are pinning only a few of them in our Docker images and that have caused issues +when re-deploying these images with bugfixes: **the images are not reproducible in time**. + +.. note:: + + The repoducibility of the images will be fixed once + https://github.com/readthedocs/readthedocs-docker-images/pull/145 and + https://github.com/readthedocs/readthedocs-docker-images/pull/146 + get merged. + +To allow users to pin the image we ended up exposing three images: ``stable``, ``latest`` and ``testing``. +With that naming, we were able to bugfix issues and add more features +on each image without asking the users to change the image selected in their config file. + +Then, when a completely different image appeared and after testing ``testing`` image enough, +we discarded ``stable``, old ``latest`` became the new ``stable`` and old ``testing`` became the new ``latest``. +This produced issues to people pinning their images to any of these names because after this change, +*we changed all the images for all the users* and many build issues arrised! + + +Goals +----- + +* release a completely new Docker image without forcing users to change their pinned image +* allow users to stick with an image "forever" (~years) +* use a ``base`` image with the dependencies that don't change frequently (OS and base requirements) +* reduce size on builder VM disks by sharing Docker image layers +* deprecate ``stable``, ``latest`` and ``testing`` +* allow use custom images for particular users/customers by sharing most layers +* create a small ``nopdf`` image version without LaTeX dependencies for local development + + +New build image structure +------------------------- + +.. Taken from https://github.com/readthedocs/readthedocs-docker-images/blob/master/Dockerfile + +* ``ubuntu20-base`` + * labels + * environment variables + * system dependencies + * install requirements + * user requirements + * plantuml, imagemagick, rsgv-convert, swig + * sphinx-js dependencies + * rust + * UID and GID + +* ``ubuntu20-pdf`` (from ``ubuntu20-base``) + * PDF/LaTeX dependencies + +* ``ubuntu20`` (from ``ubuntu20-pdf``) + * all Python versions (2, 3.6, 3.7, 3.8, 3.9) + * conda + * future extra user requirements + * labels + +We will also build a ``nopdf`` version to allow quick testing in local development: + +* ``ubuntu20-nopdf`` (from ``ubuntu20-base``) + * same as ``ubuntu20`` but based on ``ubuntu20-base`` instead + +.. note:: + + I don't think it's useful to have ``ubuntu20-py37`` exposed to users, + since the Python version is selected by using the config file's ``python.version`` keyword, + we only update patch versions and we don't remove them (unless together with OS changes). + +.. Build all these images with Docker + docker build -t readthedocs/build:ubuntu20-base -f Dockerfile.base . + docker build -t readthedocs/build:ubuntu20-nopdf -f Dockerfile.nopdf . + docker build -t readthedocs/build:ubuntu20-pdf -f Dockerfile.pdf . + docker build -t readthedocs/build:ubuntu20 -f Dockerfile . + + Check the shared space between images + docker system df --verbose | grep -E 'SHARED SIZE|readthedocs' + + +Custom images +------------- + +There are some dependencies that are not easy to update and keep compatibility with all the users at the same time. +Upgrading ``nodejs`` may make lot of old projects expecting the older version to start failing all their builds. +On the other hand, sticking with an old version avoid users requiring a newer version to build their documentation. +To handle this case and others, we have been thinking on supporting custom Docker images. + +It's not clear to me how it would be the implementation of this, but I see different paths to discuss and explore: + +#. Allow a ``build.dockerfile`` config pointing to a ``Dockerfile`` + * ``FROM readthedocs/build:ubuntu20`` is required to be a valid image (to share layers) + * the image is build each time a build is triggered consuming build time +#. Create a branch per custom image in ``readthedocs-docker-images`` repository + * use ``ubuntu20`` as base image and add the custom extra requirements + * build the image using our current process (Docker Hub) + * add the custom image to our ``-ops`` repository + * re-build builders to pull down the new custom image + * set the project to use this custom image, eg. ``readthedocs/build:`` + + +Updating versions over time +--------------------------- + +How do we add/upgrade a Python version? +~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ + +Python patch versions can be upgraded and backported to all the images without problems. +There is only needed to rebuild ``ubuntu20`` and most of the layers will remain shared with ``-base`` and ``-pdf``. + +In case we need to *add* a new Python version, the situation is similar. +We can add the new version by using ``pyenv`` and rebuilding the ``ubuntu20`` image. + + +How do we upgrade system versions? +~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ + +We usually don't upgrade these dependencies unless we upgrade the Ubuntu version. +So, they will be only upgraded when we go from Ubuntu 18.04 LTS to Ubuntu 20.04 LTS for example. + +Examples of these versions are: + +* doxygen +* git +* subversion +* pandoc +* nodejs / npm +* swig +* rust + + +How do we add an extra requirement? +~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ + +If a user asks for a new requirement (eg. azure CLI, ``az`` command) it should go into the +"user requirements" section in the ``ubuntu20-base`` image. +However, that will force us to rebuild all the images. + +We could use the section named as "future user extra requirements" for this, +and it will force us to only rebuild the ``ubuntu20`` image. + +Both approaches will require to rebuild all the custom docker images from our users/customers +that are based on the ``ubuntu20`` image. + + +How do we remove an old Python version? +~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ + +At some point an old version of Python will be deprecated (eg. 3.4) and will be removed from our Docker images. +These versions should only be removed when the OS in the ``base`` is upgraded (eg. from ``ubuntu20`` to ``ubuntu22``). + + +Deprecation plan +---------------- + +It seems we have ~50Gb free on builders disks. +Considering that the new images will be sized approximately (built locally as test): + +* ``base``: ~2.5Gb +* ``nopdf``: ~5.5Gb +* ``pdf``: ~1.5Gb + +which is about ~10Gb in total, we will still have space to support multiple custom images. + +We could keep ``stable``, ``latest`` and ``testing`` for some time without worry too much. +New projects shouldn't be able to select these images and they will be forced to use ``ubuntu20`` +or any other custom image. + +We may want to keep the three latest Ubuntu LTS releases available in production. +At the moment of writing this they are: + +* Ubuntu 16.04 LTS (we are not using it anymore) +* Ubuntu 18.04 LTS (our ``stable``, ``latest`` and ``testing`` images) +* Ubuntu 20.04 LTS (our new ``ubuntu20``) + +Once Ubuntu 22.04 LTS is released, we should deprecate Ubuntu 16.04 LTS, +and give users 6 months to migrate to a newer image. +User with custom images based on Ubuntu 16.04 LTS will be forced to migrate as well. + + +Conclusion +---------- + +I don't think we need to differentiate the images by its state (stable, latest, testing) +but by its main base difference: OS. The version of the OS will change many library versions, +LaTeX dependencies, basic required commands like git and more, +that doesn't seem to be useful to have the same OS version with different states. + +Also, splitting images by Python version sounds complicated to maintain. +Each time we need to make a small change into one of the base layers, we will end up rebuilding many images. +Besides, the key ``python.version`` won't make sense anymore and bring confusions. + +Custom images is something that needs more exploration still, +but both proposals seem doable in weeks as an initial proof of concept. From a4b37bced75fc5e9a8912fe054c8ff74b9537926 Mon Sep 17 00:00:00 2001 From: Manuel Kaufmann Date: Tue, 16 Mar 2021 15:31:53 +0100 Subject: [PATCH 2/6] Updates after some discussion --- docs/development/design/build-images.rst | 206 ++++++++++++++--------- 1 file changed, 125 insertions(+), 81 deletions(-) diff --git a/docs/development/design/build-images.rst b/docs/development/design/build-images.rst index 98bec67b888..392437ab341 100644 --- a/docs/development/design/build-images.rst +++ b/docs/development/design/build-images.rst @@ -1,10 +1,12 @@ Build Images ============ -This document describes how Read the Docs uses the Docker Build Images and how they are named. +This document describes how Read the Docs uses the `Docker Images`_ and how they are named. Besides, it proposes a new way to create and name them to allow sharing as many image layers as possible to support more customization while keeping the stability. +.. _Docker Build Images: https://github.com/readthedocs/readthedocs-docker-images + Introduction ------------ @@ -14,12 +16,11 @@ Each time a build is triggered, one of our VMs picks the task and go through different steps: #. run some application code to spin up a Docker image into a container -#. execute git inside the container to clone the repository -#. analyze and parse files from the repository *outside* the container +#. execute ``git`` inside the container to clone the repository +#. analyze and parse files (``.readthedocs.yaml``) from the repository *outside* the container #. create the environment and install docs' dependencies inside the container -#. execute build commnands inside the container -#. push the output generated by builds commands to the storage - +#. execute build commands inside the container +#. push the output generated by build commands to the storage *All* those steps depends on specific commands versions: ``git``, ``python``, ``virtualenv``, ``conda``, etc. Currently, we are pinning only a few of them in our Docker images and that have caused issues @@ -27,10 +28,10 @@ when re-deploying these images with bugfixes: **the images are not reproducible .. note:: - The repoducibility of the images will be fixed once + The reproducibility of the images will be better once https://github.com/readthedocs/readthedocs-docker-images/pull/145 and https://github.com/readthedocs/readthedocs-docker-images/pull/146 - get merged. + get merged but OS packages still won't be 100% the exact same versions. To allow users to pin the image we ended up exposing three images: ``stable``, ``latest`` and ``testing``. With that naming, we were able to bugfix issues and add more features @@ -45,13 +46,16 @@ This produced issues to people pinning their images to any of these names becaus Goals ----- -* release a completely new Docker image without forcing users to change their pinned image +* release completely new Docker images without forcing users to change their pinned image * allow users to stick with an image "forever" (~years) * use a ``base`` image with the dependencies that don't change frequently (OS and base requirements) +* ``base`` image naming is tied to the OS version (e.g. Ubuntu LTS) +* allow us to update a Python version without affecting the ``base`` image * reduce size on builder VM disks by sharing Docker image layers +* allow users to specify extra dependencies (apt packages, node, rust, etc) +* allow use "limited" custom images for users by sharing most layers +* automatically build & push *all* images on commit * deprecate ``stable``, ``latest`` and ``testing`` -* allow use custom images for particular users/customers by sharing most layers -* create a small ``nopdf`` image version without LaTeX dependencies for local development New build image structure @@ -64,61 +68,98 @@ New build image structure * environment variables * system dependencies * install requirements - * user requirements - * plantuml, imagemagick, rsgv-convert, swig - * sphinx-js dependencies - * rust + * LaTeX dependencies (for PDF generation) + * other languages version managers (``pyenv``, ``nodenv``, etc) * UID and GID -* ``ubuntu20-pdf`` (from ``ubuntu20-base``) - * PDF/LaTeX dependencies - -* ``ubuntu20`` (from ``ubuntu20-pdf``) - * all Python versions (2, 3.6, 3.7, 3.8, 3.9) - * conda - * future extra user requirements - * labels +The following images all are based on ``ubuntu20-base``: -We will also build a ``nopdf`` version to allow quick testing in local development: - -* ``ubuntu20-nopdf`` (from ``ubuntu20-base``) - * same as ``ubuntu20`` but based on ``ubuntu20-base`` instead - -.. note:: +* ``ubuntu20-py27`` +* ``ubuntu20-py36`` +* ``ubuntu20-py37`` +* ``ubuntu20-py38`` +* ``ubuntu20-py39`` +* ``ubuntu20-conda47`` (contains ``mamba`` executable as well) - I don't think it's useful to have ``ubuntu20-py37`` exposed to users, - since the Python version is selected by using the config file's ``python.version`` keyword, - we only update patch versions and we don't remove them (unless together with OS changes). +Note that all these images only need to run ``pyenv install ${PYTHON_VERSION}`` +to install a specific Python/Conda version. .. Build all these images with Docker + docker build -t readthedocs/build:ubuntu20-base -f Dockerfile.base . - docker build -t readthedocs/build:ubuntu20-nopdf -f Dockerfile.nopdf . - docker build -t readthedocs/build:ubuntu20-pdf -f Dockerfile.pdf . - docker build -t readthedocs/build:ubuntu20 -f Dockerfile . + docker build -t readthedocs/build:ubuntu20-py39 -f Dockerfile.py39 . + docker build -t readthedocs/build:ubuntu20-conda47 -f Dockerfile.conda47 . Check the shared space between images docker system df --verbose | grep -E 'SHARED SIZE|readthedocs' -Custom images -------------- +Specifying extra users' dependencies +------------------------------------ + +Different users may have different requirements. We were already requested to install +``swig``, ``imagemagick``, ``libmysqlclient-dev``, ``lmod``, ``rust``, ``poppler-utils``, etc. + +People with specific dependencies will be able to install them as APT packages or as extras +using ``.readthedocs.yaml`` config file. Example: + +.. code:: yaml + + build: + image: ubuntu20-py39 + apt: + - swig + - imagemagick + extras: + - node==14.16 + - rust==1.46.0 + -There are some dependencies that are not easy to update and keep compatibility with all the users at the same time. -Upgrading ``nodejs`` may make lot of old projects expecting the older version to start failing all their builds. -On the other hand, sticking with an old version avoid users requiring a newer version to build their documentation. -To handle this case and others, we have been thinking on supporting custom Docker images. +.. note:: Idea for implementation -It's not clear to me how it would be the implementation of this, but I see different paths to discuss and explore: + Once this config file is parsed, the builder builds a Docker image on-demand with a command similar to: -#. Allow a ``build.dockerfile`` config pointing to a ``Dockerfile`` - * ``FROM readthedocs/build:ubuntu20`` is required to be a valid image (to share layers) - * the image is build each time a build is triggered consuming build time -#. Create a branch per custom image in ``readthedocs-docker-images`` repository - * use ``ubuntu20`` as base image and add the custom extra requirements - * build the image using our current process (Docker Hub) - * add the custom image to our ``-ops`` repository - * re-build builders to pull down the new custom image - * set the project to use this custom image, eg. ``readthedocs/build:`` + .. console:: + + docker build \ + --tag ${BUILD_ID} \ + --file Dockerfile.custom \ + --build-arg RTD_IMAGE=ubuntu20-py39 + --build-arg RTD_NODE_VERSION=14.16 \ + --build-arg RTD_RUST_VERSION=1.46.0 \ + --build-arg RTD_APT_PACKAGES="swig imagemagick" + + using ``Dockerfile.custom`` that has the following content: + + .. code:: Dockerfile + + ARG RTD_IMAGE + FROM readthedocs:${RTD_IMAGE} + + ARG RTD_NODE_VERSION + ARG RTD_RUST_VERSION + ARG RTD_APT_PACKAGES + + USER root + WORKDIR / + + # Install extras + RUN apt-get update + RUN apt-get install -y ${RTD_APT_PACKAGES} + + USER docs + WORKDIR /home/docs + + # Install ``nodejs`` + RUN nodenv install ${RTD_NODE_VERSION} + RUN nodenv global ${RTD_NODE_VERSION} + + # Install ``rust`` + RUN curl https://sh.rustup.rs -sSf | sh -s -- -y --default-toolchain ${RTD_RUST_VERSION} + ENV PATH="/home/docs/.cargo/bin:$PATH" + + Building this image should be pretty fast since all the requirements to install these extra packages + are already installed and all of them are pre-compiles binaries. It will take the time it takes to download them. Updating versions over time @@ -127,11 +168,12 @@ Updating versions over time How do we add/upgrade a Python version? ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ -Python patch versions can be upgraded and backported to all the images without problems. -There is only needed to rebuild ``ubuntu20`` and most of the layers will remain shared with ``-base`` and ``-pdf``. +Python patch versions can be upgraded on the affected image. +As the ``base`` image won't change for this case, it will only modify the layers after it. +All the OS package versions will remain the same. -In case we need to *add* a new Python version, the situation is similar. -We can add the new version by using ``pyenv`` and rebuilding the ``ubuntu20`` image. +In case we need to *add* a new Python version, we just need to build a new image based on ``base``: +``ubuntu20-py310`` that will contain Python 3.10 and none of the other images are affected. How do we upgrade system versions? @@ -146,30 +188,29 @@ Examples of these versions are: * git * subversion * pandoc -* nodejs / npm * swig -* rust +* latex +This case will introduce a new ``base`` image. Example, ``ubuntu22-base`` in 2022. +Note that these images will be completely isolated from the rest and don't require them to rebuild. +This also allow us to test new Ubuntu versions without breaking people's builds. How do we add an extra requirement? ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ -If a user asks for a new requirement (eg. azure CLI, ``az`` command) it should go into the -"user requirements" section in the ``ubuntu20-base`` image. -However, that will force us to rebuild all the images. - -We could use the section named as "future user extra requirements" for this, -and it will force us to only rebuild the ``ubuntu20`` image. - -Both approaches will require to rebuild all the custom docker images from our users/customers -that are based on the ``ubuntu20`` image. +In case we need to add an extra requirement to the ``base`` image, +we will need to rebuild all of them. +The new image may have different package versions since there may be updates on the Ubuntu repositories. +This conveys some small risk here, but in general we shouldn't require to add packages to the base images. +Users with specific requirements could use ``build.apt`` and/or ``build.extras`` in the config file. How do we remove an old Python version? ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ -At some point an old version of Python will be deprecated (eg. 3.4) and will be removed from our Docker images. -These versions should only be removed when the OS in the ``base`` is upgraded (eg. from ``ubuntu20`` to ``ubuntu22``). +At some point an old version of Python will be deprecated (eg. 3.4) and will be removed. +To achieve this, we can just remove the Docker image affected: ``ubuntu20-py34``, +once there are no users depending on it anymore. Deprecation plan @@ -178,9 +219,10 @@ Deprecation plan It seems we have ~50Gb free on builders disks. Considering that the new images will be sized approximately (built locally as test): -* ``base``: ~2.5Gb -* ``nopdf``: ~5.5Gb -* ``pdf``: ~1.5Gb +* ``ubuntu20-base``: ~ +* ``ubuntu20-py27``: ~ +* ``ubuntu20-py39``: ~ +* ``ubuntu20-conda47``: ~ which is about ~10Gb in total, we will still have space to support multiple custom images. @@ -188,29 +230,31 @@ We could keep ``stable``, ``latest`` and ``testing`` for some time without worry New projects shouldn't be able to select these images and they will be forced to use ``ubuntu20`` or any other custom image. -We may want to keep the three latest Ubuntu LTS releases available in production. +We may want to keep the two latest Ubuntu LTS releases available in production. At the moment of writing this they are: -* Ubuntu 16.04 LTS (we are not using it anymore) * Ubuntu 18.04 LTS (our ``stable``, ``latest`` and ``testing`` images) * Ubuntu 20.04 LTS (our new ``ubuntu20``) -Once Ubuntu 22.04 LTS is released, we should deprecate Ubuntu 16.04 LTS, +Once Ubuntu 22.04 LTS is released, we should deprecate Ubuntu 18.04 LTS, and give users 6 months to migrate to a newer image. -User with custom images based on Ubuntu 16.04 LTS will be forced to migrate as well. Conclusion ---------- I don't think we need to differentiate the images by its state (stable, latest, testing) -but by its main base difference: OS. The version of the OS will change many library versions, +but by its main base differences: OS and Python version. +The version of the OS will change many library versions, LaTeX dependencies, basic required commands like git and more, that doesn't seem to be useful to have the same OS version with different states. -Also, splitting images by Python version sounds complicated to maintain. -Each time we need to make a small change into one of the base layers, we will end up rebuilding many images. -Besides, the key ``python.version`` won't make sense anymore and bring confusions. +"Limited" custom Docker images is something that will cover most of the support requests we have had in the past +and allow users to use our platform in a controlled way for us. +Exposing users how we want them to use our platform will allow us to be able to maintain it longer, +than given them totally freedom on the Docker image. -Custom images is something that needs more exploration still, -but both proposals seem doable in weeks as an initial proof of concept. +"Non limited" custom Docker images is out of the scope of this document, +but could be done in a similar way as the "limited" on-demand Docker images. +However, there are other aspects like persistence of the image between builds +that needs to be considered as well. From 2738818e00b7316872197e217cadfb0df7c6bed6 Mon Sep 17 00:00:00 2001 From: Manuel Kaufmann Date: Tue, 16 Mar 2021 16:11:08 +0100 Subject: [PATCH 3/6] Latest changes --- docs/development/design/build-images.rst | 49 +++++++++++++++--------- 1 file changed, 31 insertions(+), 18 deletions(-) diff --git a/docs/development/design/build-images.rst b/docs/development/design/build-images.rst index 392437ab341..64306759a81 100644 --- a/docs/development/design/build-images.rst +++ b/docs/development/design/build-images.rst @@ -5,7 +5,7 @@ This document describes how Read the Docs uses the `Docker Images`_ and how they Besides, it proposes a new way to create and name them to allow sharing as many image layers as possible to support more customization while keeping the stability. -.. _Docker Build Images: https://github.com/readthedocs/readthedocs-docker-images +.. _Docker Images: https://github.com/readthedocs/readthedocs-docker-images Introduction @@ -28,10 +28,11 @@ when re-deploying these images with bugfixes: **the images are not reproducible .. note:: - The reproducibility of the images will be better once - https://github.com/readthedocs/readthedocs-docker-images/pull/145 and - https://github.com/readthedocs/readthedocs-docker-images/pull/146 - get merged but OS packages still won't be 100% the exact same versions. + The reproducibility of the images will be better once these PRs are merged, + but OS packages still won't be 100% the exact same versions. + + * https://github.com/readthedocs/readthedocs-docker-images/pull/145 + * https://github.com/readthedocs/readthedocs-docker-images/pull/146 To allow users to pin the image we ended up exposing three images: ``stable``, ``latest`` and ``testing``. With that naming, we were able to bugfix issues and add more features @@ -64,6 +65,7 @@ New build image structure .. Taken from https://github.com/readthedocs/readthedocs-docker-images/blob/master/Dockerfile * ``ubuntu20-base`` + * labels * environment variables * system dependencies @@ -74,12 +76,19 @@ New build image structure The following images all are based on ``ubuntu20-base``: -* ``ubuntu20-py27`` -* ``ubuntu20-py36`` -* ``ubuntu20-py37`` -* ``ubuntu20-py38`` -* ``ubuntu20-py39`` -* ``ubuntu20-conda47`` (contains ``mamba`` executable as well) +* ``ubuntu20-py*`` + + * Python version installed via ``pyenv`` + * default Python packages (pinned versions) + * pip + * setuptools + * virtualenv + * labels + +* ``ubuntu20-conda*`` + + * same as ``-py*`` versions + * ``mamba`` executable Note that all these images only need to run ``pyenv install ${PYTHON_VERSION}`` to install a specific Python/Conda version. @@ -93,6 +102,9 @@ to install a specific Python/Conda version. Check the shared space between images docker system df --verbose | grep -E 'SHARED SIZE|readthedocs' + Initial Dockerfile.* as example for this are pushed in this PR + https://github.com/readthedocs/readthedocs-docker-images/pull/166 + Specifying extra users' dependencies ------------------------------------ @@ -106,7 +118,7 @@ using ``.readthedocs.yaml`` config file. Example: .. code:: yaml build: - image: ubuntu20-py39 + image: ubuntu20-py39 apt: - swig - imagemagick @@ -119,7 +131,7 @@ using ``.readthedocs.yaml`` config file. Example: Once this config file is parsed, the builder builds a Docker image on-demand with a command similar to: - .. console:: + .. code:: bash docker build \ --tag ${BUILD_ID} \ @@ -133,6 +145,7 @@ using ``.readthedocs.yaml`` config file. Example: .. code:: Dockerfile + # Dockerfile.custom ARG RTD_IMAGE FROM readthedocs:${RTD_IMAGE} @@ -219,12 +232,12 @@ Deprecation plan It seems we have ~50Gb free on builders disks. Considering that the new images will be sized approximately (built locally as test): -* ``ubuntu20-base``: ~ -* ``ubuntu20-py27``: ~ -* ``ubuntu20-py39``: ~ -* ``ubuntu20-conda47``: ~ +* ``ubuntu20-base``: ~5Gb +* ``ubuntu20-py27``: ~150Mb +* ``ubuntu20-py39``: ~20Mb +* ``ubuntu20-conda47``: ~713Mb -which is about ~10Gb in total, we will still have space to support multiple custom images. +which is about ~6Gb in total, we will still have space to support multiple custom images. We could keep ``stable``, ``latest`` and ``testing`` for some time without worry too much. New projects shouldn't be able to select these images and they will be forced to use ``ubuntu20`` From 836e32b16273c5b2a4b8609364aa1b6307ef5296 Mon Sep 17 00:00:00 2001 From: Manuel Kaufmann Date: Wed, 17 Mar 2021 10:09:34 +0100 Subject: [PATCH 4/6] Minor updates --- docs/development/design/build-images.rst | 7 ++++--- 1 file changed, 4 insertions(+), 3 deletions(-) diff --git a/docs/development/design/build-images.rst b/docs/development/design/build-images.rst index 64306759a81..6179f3b6f3d 100644 --- a/docs/development/design/build-images.rst +++ b/docs/development/design/build-images.rst @@ -137,7 +137,7 @@ using ``.readthedocs.yaml`` config file. Example: --tag ${BUILD_ID} \ --file Dockerfile.custom \ --build-arg RTD_IMAGE=ubuntu20-py39 - --build-arg RTD_NODE_VERSION=14.16 \ + --build-arg RTD_NODE_VERSION=14.16.0 \ --build-arg RTD_RUST_VERSION=1.46.0 \ --build-arg RTD_APT_PACKAGES="swig imagemagick" @@ -147,7 +147,7 @@ using ``.readthedocs.yaml`` config file. Example: # Dockerfile.custom ARG RTD_IMAGE - FROM readthedocs:${RTD_IMAGE} + FROM readthedocs/build:${RTD_IMAGE} ARG RTD_NODE_VERSION ARG RTD_RUST_VERSION @@ -171,7 +171,7 @@ using ``.readthedocs.yaml`` config file. Example: RUN curl https://sh.rustup.rs -sSf | sh -s -- -y --default-toolchain ${RTD_RUST_VERSION} ENV PATH="/home/docs/.cargo/bin:$PATH" - Building this image should be pretty fast since all the requirements to install these extra packages + Building this image should be pretty fast (~2 minutes locally) since all the requirements to install these extra packages are already installed and all of them are pre-compiles binaries. It will take the time it takes to download them. @@ -234,6 +234,7 @@ Considering that the new images will be sized approximately (built locally as te * ``ubuntu20-base``: ~5Gb * ``ubuntu20-py27``: ~150Mb +* ``ubuntu20-py36``: ~210Mb * ``ubuntu20-py39``: ~20Mb * ``ubuntu20-conda47``: ~713Mb From 1ce35d37315019404662298749e9d4eb3798ab9c Mon Sep 17 00:00:00 2001 From: Manuel Kaufmann Date: Thu, 18 Mar 2021 12:44:48 +0100 Subject: [PATCH 5/6] Updates after our meeting --- docs/development/design/build-images.rst | 140 ++++++++++++----------- 1 file changed, 75 insertions(+), 65 deletions(-) diff --git a/docs/development/design/build-images.rst b/docs/development/design/build-images.rst index 6179f3b6f3d..201bc279222 100644 --- a/docs/development/design/build-images.rst +++ b/docs/development/design/build-images.rst @@ -2,8 +2,8 @@ Build Images ============ This document describes how Read the Docs uses the `Docker Images`_ and how they are named. -Besides, it proposes a new way to create and name them to allow -sharing as many image layers as possible to support more customization while keeping the stability. +Besides, it proposes a path forward about a new way to create and name our Docker build images to allow sharing as many image layers as possible +and support installation of OS level packages as well as extra requirements. .. _Docker Images: https://github.com/readthedocs/readthedocs-docker-images @@ -18,6 +18,7 @@ and go through different steps: #. run some application code to spin up a Docker image into a container #. execute ``git`` inside the container to clone the repository #. analyze and parse files (``.readthedocs.yaml``) from the repository *outside* the container +#. spin up a new Docker container based on the config file #. create the environment and install docs' dependencies inside the container #. execute build commands inside the container #. push the output generated by build commands to the storage @@ -51,12 +52,19 @@ Goals * allow users to stick with an image "forever" (~years) * use a ``base`` image with the dependencies that don't change frequently (OS and base requirements) * ``base`` image naming is tied to the OS version (e.g. Ubuntu LTS) -* allow us to update a Python version without affecting the ``base`` image +* allow us to add/update a Python version without affecting the ``base`` image * reduce size on builder VM disks by sharing Docker image layers * allow users to specify extra dependencies (apt packages, node, rust, etc) -* allow use "limited" custom images for users by sharing most layers * automatically build & push *all* images on commit * deprecate ``stable``, ``latest`` and ``testing`` +* new images won't contain old/deprecated OS (eg. Ubuntu 18) and Python versions (eg. 3.5, miniconda2) + + +Non goals +--------- + +* allow creation/usage of custom Docker images +* allow to execute arbitraty commands via hooks (eg. ``pre_build``) New build image structure @@ -88,7 +96,8 @@ The following images all are based on ``ubuntu20-base``: * ``ubuntu20-conda*`` * same as ``-py*`` versions - * ``mamba`` executable + * Conda version installed via ``pyenv`` + * ``mamba`` executable (installed via ``conda``) Note that all these images only need to run ``pyenv install ${PYTHON_VERSION}`` to install a specific Python/Conda version. @@ -106,7 +115,7 @@ to install a specific Python/Conda version. https://github.com/readthedocs/readthedocs-docker-images/pull/166 -Specifying extra users' dependencies +Specifying extra user's dependencies ------------------------------------ Different users may have different requirements. We were already requested to install @@ -118,61 +127,36 @@ using ``.readthedocs.yaml`` config file. Example: .. code:: yaml build: - image: ubuntu20-py39 - apt: + image: ubuntu20 + python: 3.9 + system_packages: - swig - imagemagick extras: - - node==14.16 - - rust==1.46.0 - - -.. note:: Idea for implementation - - Once this config file is parsed, the builder builds a Docker image on-demand with a command similar to: - - .. code:: bash - - docker build \ - --tag ${BUILD_ID} \ - --file Dockerfile.custom \ - --build-arg RTD_IMAGE=ubuntu20-py39 - --build-arg RTD_NODE_VERSION=14.16.0 \ - --build-arg RTD_RUST_VERSION=1.46.0 \ - --build-arg RTD_APT_PACKAGES="swig imagemagick" - - using ``Dockerfile.custom`` that has the following content: - - .. code:: Dockerfile - - # Dockerfile.custom - ARG RTD_IMAGE - FROM readthedocs/build:${RTD_IMAGE} - - ARG RTD_NODE_VERSION - ARG RTD_RUST_VERSION - ARG RTD_APT_PACKAGES + - node==14 + - rust==1.46 - USER root - WORKDIR / +Important highlights: - # Install extras - RUN apt-get update - RUN apt-get install -y ${RTD_APT_PACKAGES} +* users won't be able to use custom Ubuntu PPAs to install packages +* all APT packages installed will be from official Ubuntu repositories +* not specifying ``build.image`` will pick the latest OS image available +* not specifying ``build.python`` will pick the latest Python version available +* Ubuntu 18 will still be available via ``stable`` and ``latest`` images +* all ``node`` (major) pre-compiled versions on ``nodenv`` are available to select +* all ``rust`` (minor) pre-compiled versions on ``rustup`` are available to select +* knowing exactly what packages users are installing, + could allow us to prebuild extra images: ``ubuntu20-py37+node14`` - USER docs - WORKDIR /home/docs +.. admonition:: Implementation - # Install ``nodejs`` - RUN nodenv install ${RTD_NODE_VERSION} - RUN nodenv global ${RTD_NODE_VERSION} + We talked about using a ``Dockerfile.custom`` and build it on every build. + However, at this point it requires extra work to change our build pipeline. + We decided to install OS packages from the application itself for now using + Docker API to call ``docker exec`` as ``root`` user. - # Install ``rust`` - RUN curl https://sh.rustup.rs -sSf | sh -s -- -y --default-toolchain ${RTD_RUST_VERSION} - ENV PATH="/home/docs/.cargo/bin:$PATH" - - Building this image should be pretty fast (~2 minutes locally) since all the requirements to install these extra packages - are already installed and all of them are pre-compiles binaries. It will take the time it takes to download them. + This reduces the amount of work required but also allows us to add this feature + to our current existing images (they require a rebuild to add ``nodenv`` and ``rustup``) Updating versions over time @@ -187,6 +171,7 @@ All the OS package versions will remain the same. In case we need to *add* a new Python version, we just need to build a new image based on ``base``: ``ubuntu20-py310`` that will contain Python 3.10 and none of the other images are affected. +This also allow us to test new Python (eg. 3.11rc1) versions without breaking people's builds. How do we upgrade system versions? @@ -213,10 +198,10 @@ How do we add an extra requirement? In case we need to add an extra requirement to the ``base`` image, we will need to rebuild all of them. -The new image may have different package versions since there may be updates on the Ubuntu repositories. +The new image *may have different package versions* since there may be updates on the Ubuntu repositories. This conveys some small risk here, but in general we shouldn't require to add packages to the base images. -Users with specific requirements could use ``build.apt`` and/or ``build.extras`` in the config file. +Users with specific requirements could use ``build.system_packages`` and/or ``build.extras`` in the config file. How do we remove an old Python version? ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ @@ -225,6 +210,8 @@ At some point an old version of Python will be deprecated (eg. 3.4) and will be To achieve this, we can just remove the Docker image affected: ``ubuntu20-py34``, once there are no users depending on it anymore. +We will know which projects are using these images because they are pinning it in the config file. +We could show a message in the build output page and also send them an email with the EOL date for this image. Deprecation plan ---------------- @@ -238,11 +225,11 @@ Considering that the new images will be sized approximately (built locally as te * ``ubuntu20-py39``: ~20Mb * ``ubuntu20-conda47``: ~713Mb -which is about ~6Gb in total, we will still have space to support multiple custom images. +which is about ~6Gb in total, we still have plenty of space. We could keep ``stable``, ``latest`` and ``testing`` for some time without worry too much. New projects shouldn't be able to select these images and they will be forced to use ``ubuntu20`` -or any other custom image. +if they don't specify one. We may want to keep the two latest Ubuntu LTS releases available in production. At the moment of writing this they are: @@ -254,6 +241,34 @@ Once Ubuntu 22.04 LTS is released, we should deprecate Ubuntu 18.04 LTS, and give users 6 months to migrate to a newer image. +Work required +------------- + +There are a lot of work to do here. +However, we want to prioritize it based on users' impact. + +#. allow users to install packages with APT + + * update config file to support ``build.system_packages`` config + * modify builder code to run ``apt-get install`` as ``root`` user + +#. allow users to install extras via config file + + * update config file to support ``build.extras`` config + * modify builder code to run ``nodenv install`` / ``rustup install`` + * re-build our current images with pre-installed nodenv and rustup + * make sure that all the versions are the same we have in production + * deploy builders with newer images + +#. pre-build commands (not covered in this document) + +#. new structure + + * update config file to support new image names for ``build.image`` + * automate Docker image building + * deploy builders with newer images + + Conclusion ---------- @@ -263,12 +278,7 @@ The version of the OS will change many library versions, LaTeX dependencies, basic required commands like git and more, that doesn't seem to be useful to have the same OS version with different states. -"Limited" custom Docker images is something that will cover most of the support requests we have had in the past -and allow users to use our platform in a controlled way for us. +Allowing users to install system dependencies and extras will cover most of the support requests we have had in the past +It also will allow us to know more about how our users are using the platform to make future decisions based on this data. Exposing users how we want them to use our platform will allow us to be able to maintain it longer, -than given them totally freedom on the Docker image. - -"Non limited" custom Docker images is out of the scope of this document, -but could be done in a similar way as the "limited" on-demand Docker images. -However, there are other aspects like persistence of the image between builds -that needs to be considered as well. +than giving them totally freedom on the Docker image. From af6bb57df806980d0d695da26abe751f6740aef7 Mon Sep 17 00:00:00 2001 From: Manuel Kaufmann Date: Thu, 1 Apr 2021 11:11:57 +0200 Subject: [PATCH 6/6] Minor details --- docs/development/design/build-images.rst | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/docs/development/design/build-images.rst b/docs/development/design/build-images.rst index 201bc279222..ed95d9efed4 100644 --- a/docs/development/design/build-images.rst +++ b/docs/development/design/build-images.rst @@ -272,13 +272,13 @@ However, we want to prioritize it based on users' impact. Conclusion ---------- -I don't think we need to differentiate the images by its state (stable, latest, testing) +There is no need to differentiate the images by its state (stable, latest, testing) but by its main base differences: OS and Python version. The version of the OS will change many library versions, LaTeX dependencies, basic required commands like git and more, that doesn't seem to be useful to have the same OS version with different states. -Allowing users to install system dependencies and extras will cover most of the support requests we have had in the past +Allowing users to install system dependencies and extras will cover most of the support requests we have had in the past. It also will allow us to know more about how our users are using the platform to make future decisions based on this data. Exposing users how we want them to use our platform will allow us to be able to maintain it longer, than giving them totally freedom on the Docker image.