Skip to content

Docker build images: update design doc #8447

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 8 commits into from
Sep 9, 2021
Merged

Conversation

humitos
Copy link
Member

@humitos humitos commented Aug 26, 2021

Update the design doc with the latest conversation we had about this topic.
Hopefully, we can agree on it and start making some tests in production.

Structure of the new Dockerfile(s) at readthedocs/readthedocs-docker-images#166

Rendered version: https://docs--8447.org.readthedocs.build/en/8447/development/design/build-images.html

Proof of Concept: #8453

@humitos humitos force-pushed the humitos/new-docker-images branch 3 times, most recently from 2636528 to f567040 Compare August 26, 2021 12:25
@humitos humitos force-pushed the humitos/new-docker-images branch 2 times, most recently from e9b0590 to cc6e739 Compare August 26, 2021 14:11
Copy link
Contributor

@agjohnson agjohnson left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just a couple note, I'll continuing thinking about this plan though. I thought a bit more on why I'm more in favor of managing asdf or pyenv solely at build time as I was digesting this:

While docker is a great fit for our underlying, system dependencies, and is a reasonable fit for isolating processes, our plan so far for managing images is feeling like we're working hard to make Docker fit a hole that is not Docker sized.

Overall, the benefits I see to managing pre-built binaries via asdf and friends solely at build time are:

  • We don't have to limit the number of binaries we're willing to cache. We have a hard limit with Docker because of the exponential nature of adding new supported versions.
  • We have one system to manage versions: asdf at build time
  • We can adapt quicker to changes in versions we want to cache because there isn't the same amount of friction as releasing a docker image
  • The scope of our docker images would also be greatly reduced and there would be less friction to iterating on these images
  • We probably want local caching of prebulit binaries eventually, so that we aren't overloading stress on external mirrors for these prebuilt binaries.
  • There is a good amount of effort that would go into continually managing >50 Docker images -- building, rebuilding on base layer changes, managing installation at salt, and supporting this breadth of images

If caching is possible -- if a proof of concept looks promising -- caching is still just an optimization as well. We can chunk this work up into some digestible pieces, starting with a minimal docker image and management of build time installation in the application.

@humitos humitos force-pushed the humitos/new-docker-images branch from cc6e739 to feab4b3 Compare August 26, 2021 16:40
Update the design doc with the latest conversation we had about this topic.
Hopefully, we can agree on it and start making some tests in production.
@humitos
Copy link
Member Author

humitos commented Aug 26, 2021

I liked this feedback. Eric also mentioned to me the idea of caching these somehow, but I didn't explore it too much just yet.

From what you said here, I don't think there is anything blocking us from moving forward with this plan. The main change here is semantic on how to specify the requirements, basically. Instead of saying "I want to use this Docker image" (via build.image) the user will have to say "I want to have these binaries available" (via build.os and build.languages) and we can manage that behind the scenes without them knowing how it's implemented.

We probably want local caching of prebulit binaries eventually, so that we aren't overloading stress on external mirrors for these prebuilt binaries.

This seems doable by creating a mirror in our side (e.g. S3) and then defining some environment variables per language supported:

Even with a mirror for Python, we would still have to compile it. So, for Python, we may need a different approach to caching it as a binary.

With this optimization, this hybrid approach wouldn't seem necessary. Instead of managing asdf at docker image build time and again at project build time, asdf could just be managing at project build time.

If we find a way to compile Python once for each version and cache those binaries, only the *-base would be required. All the languages would be downloaded/copied from the cache at build time in seconds.

I did a small test by building Python 3.9.6 on build-large and copy the output to build-default and it worked. We could use the same process through S3 to store pre-built Python binaries there, and just download and untar the correct version at build time:

### build-large

# Install asdf
git clone https://github.com/asdf-vm/asdf.git ~/.asdf --branch v0.8.1
. $HOME/.asdf/asdf.sh
. $HOME/.asdf/completions/asdf.bash
asdf plugin-add python

# Install Python 3.9.6
asdf install python 3.9.6

# Compress the output
tar cfvz python396.tar.gz $HOME/.asdf/installs/python/3.9.6  # 64Mb

# Push the .tar.gz to S3

NOTE I use util01 as intermediate here, but S3 should be used

### build-default

# Download the .tar.gz from S3 to here 
tar xvfz /tmp/python396.tar.gz
mv home/docs/.asdf/installs/python/3.9.6 /home/docs/.asdf/installs/python/

and then running asdf list shows the version:

### build-default
$ asdf list                                                                 
python
  3.9.6

We can set the global version and use it:

### build-default
$ asdf global python 3.9.6
$ python
Python 3.6.9 (default, Jan 26 2021, 15:33:00) 
[GCC 8.4.0] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> 

@agjohnson
Copy link
Contributor

That's great, seems caching should be mostly doable.

the user will have to say "I want to have these binaries available"

I forgot to comment that this is a fantastic part of this plan overall, it's far clearer UX. I was confused by circleci's image layout when i needed both a specific python and node version and it wasn't clear how I was supposed to select an image that fit both constraints. This is directly asking the user "what do you need?" instead of "how does this fit?" and that is such a better experience. 💯

I don't think there is anything blocking us from moving forward with this plan

Yeah, exactly, it seems we can break this up into stages really easily. As long as we know it's possible for now, we can implement optimizations with caching later on in the plan -- probably some time before wide spread or default adoption of this approach, but we can gauge that as well.

* mention cache for official mirrors
* add support for pre-compiled Python versions from local cache
@humitos humitos marked this pull request as ready for review August 30, 2021 10:39
Copy link
Member

@ericholscher ericholscher left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a great improvement, and I think a great direction. Just the config changes alone feel like a solid win, and this feels like the "correct" solution to the docker image name conversation we've been having forever (solve a different problem and don't expose the name :D)

I had a few comments, the most important are on the project-level requirements. I think we should move away from feature flagging a lot of things that can break on users like this, since it's totally hidden in the product and very confusing. We can find better ways to opt users in, without making a weird UX.

@humitos
Copy link
Member Author

humitos commented Aug 31, 2021

Another approach for the cache could be to share a volume between the Docker container and the host. In this case, we could pre-compile Python versions and install other languages versions when the AMI is created with packer. Then, we just need to mount the folder /home/docs/.asdf/installs/ into the container and only call asdf global python 3.9.6.

This could reduce the complexity of having a mirror on S3 for all the languages and cached pre-compiled Python versions. However, this may require that the Ubuntu running in the AMI has to be the same as the one used for build the docs.

humitos added a commit that referenced this pull request Aug 31, 2021
Minimal implementation for POC of
#8447

It uses a Feature flag for now as a way to select the new
`readthedocs/build:ubuntu20` image and install Python versions via `asdf`.

MinIO requires a new bucket called `languages` with a pre-compiled Python 3.9.6
version to work (*) (this version is hardcoded for now). However, if a different
version is selected it will be downloaded from official mirrors, installed and
used.

Build times on `latest` version for `test-build`:

* using the new image + cached Python version: 112s
* using the new image + non cached Python version: 288s
* using old image (current production): 87s

> Note that all the parsing of the Config File to support `build.os` and
> `build.languages` is not included in this PR on purpose. That work can be
> split as a separate work and done in parallel with the rest of work required
> here.

(*) to pre-compile a Python version:

```bash
docker run -it readthedocs/build:ubuntu20 /bin/bash

asdf install python 3.9.6
asdf global python 3.9.6
python -m pip install -U pip setuptools virtualenv
cd /home/docs/.asdf/installs/python
tar -cfvz ubuntu20-python-3.9.6.tar.gz 3.9.6

docker cp <container id>:/home/docs/.asdf/installs/python/ubuntu20-python-3.9.6.tar.gz .
```

and upload the .tar.gz file to MinIO `languages` bucket using the web interface
humitos added a commit that referenced this pull request Sep 1, 2021
Minimal implementation for POC of
#8447

It uses a Feature flag for now as a way to select the new
`readthedocs/build:ubuntu20` image and install Python versions via `asdf`
(readthedocs/readthedocs-docker-images#166)

MinIO requires a new bucket called `languages` with a pre-compiled Python 3.9.6
version to work (*) (this version is hardcoded for now). However, if a different
version is selected it will be downloaded from official mirrors, installed and
used.

Build times on `latest` version for `test-build`:

* using the new image + cached Python version: 112s
* using the new image + non cached Python version: 288s
* using old image (current production): 87s

> Note that all the parsing of the Config File to support `build.os` and
> `build.languages` is not included in this PR on purpose. That work can be
> split as a separate work and done in parallel with the rest of work required
> here.

(*) to pre-compile a Python version:

```bash
docker run -it readthedocs/build:ubuntu20 /bin/bash

asdf install python 3.9.6
asdf global python 3.9.6
python -m pip install -U pip setuptools virtualenv
cd /home/docs/.asdf/installs/python
tar -cfvz ubuntu20-python-3.9.6.tar.gz 3.9.6

docker cp <container id>:/home/docs/.asdf/installs/python/ubuntu20-python-3.9.6.tar.gz .
```

and upload the .tar.gz file to MinIO `languages` bucket using the web interface
@humitos
Copy link
Member Author

humitos commented Sep 1, 2021

@agjohnson @ericholscher I updated the document with all the feedback received. I think it's ready!

@humitos humitos requested review from a team, ericholscher and agjohnson September 1, 2021 12:50
Copy link
Member

@stsewd stsewd left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is definitely better than the current structure we have. About supporting more languages and binaries, I think we should also look at allowing uploading html files directly (and processing those files to add our custom stuff).
We have several problems that would be solved by allowing to upload html files instead of building them in our platform:

  • Build time
  • Additional dependencies, custom builds, etc.
  • Repo permissions (permissions are inverted, now users will have access to our platform instead of us access to their code).

There are already great CI services that allow great customization and whatnot, feels like we are replicating the same and with fewer features/flexibility.

Right now we are discussing re-structuring docker images to support more things, custom builders, github permissions, and allowing executing arbitrary commands in the build process. Allowing uploading html files will solve all these problems.
This is just a suggestion to consider, we would need to invest maybe the same amount of time to support this new feature.

I know we have discussed this before, but I'm bring it up since the problems we want to solve seen related.


At start only a small subset of Python version will be pre-compiled:

* 2.7.x
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe we can start dropping support for 2.7 in the new images?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think we can drop support for 2.7. Unfortunately, the world still depends on it.

I'm fine dropping other old 3.x versions, because people can use a newer one, but I don't think users depending on 2.7 are able to migrate to 3.x easily.

We could grab some data from our users and check how many of them are using 2.7, but still --I'm not sure it's a good idea to drop it.

As long as we can compile it on our Ubuntu images via asdf, it should be fine to support it 😄

@humitos
Copy link
Member Author

humitos commented Sep 2, 2021

This is definitely better than the current structure we have.

Thanks for the feedback!

About supporting more languages and binaries, I think we should also look at allowing uploading html files directly (and processing those files to add our custom stuff).

I definitely want to support HTML upload. I agree with you that it will solve a good amout of known problems. In fact, I have planned its support on the "Future Builders" (see the project https://github.com/orgs/readthedocs/projects/108). However, the implementation of HTML upload depends on the new builders and that work is outside the scope of this "New Docker images for builds" project.

@humitos
Copy link
Member Author

humitos commented Sep 2, 2021

@readthedocs/backend One thing that we haven't defined yet is how we are going to store the cached pre-built versions in the languages S3 bucket.

Originally I thought something like {os}/{language}/{version}.tar.gz. Example: ubuntu20/python/3.9.6.tar.gz. This is quite simple but as a downside, we don't have a good way to QA a new build for the same version: pre-compile 3.9.6 again with a new compilation flag, an updated pip package, or something else.

On the other hand, I was thinking something like this: {os}/{language}/{date}/{version}.tar.gz. Example: ubuntu20/python/2021-09-02/3.9.6.tar.gz. Adding a {date} folder allows us to have a pre-compiled version multiple times and make some tests on production without breaking the builds before releasing it publicly. However, it makes the builder logic to select the correct version more complex since we will have some versions under 2021-09-02 and some others under a different date.

We could probably start with the simplest solution and scale from there if it's needed.

@stsewd
Copy link
Member

stsewd commented Sep 2, 2021

However, the implementation of HTML upload depends on the new builders and that work is outside the scope of this "New Docker images for builds" project.

We are adding support for new languages along the way, not just updating to new docker images, that's the main reason I brought up the idea of solving that with allowing uploading of html files instead of maintaining that code and dependencies.

@agjohnson
Copy link
Contributor

One thing that we haven't defined yet is how we are going to store the cached pre-built versions in the languages S3 bucket.

We could always do something like build a 3.9.6-dev1 and select that in the config file for QA testing. I'd say avoiding dates is best, those will be completely unpredictable and would require tracking somewhere.

Copy link
Contributor

@agjohnson agjohnson left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I made a couple notes, but it seems we're all in agreement about how to proceed. Some of the finer points of caching storage and mechanism can be ironed out as we start testing. I think we had a few really good ideas about how to accomplish a cache, but S3 seems pretty easy to start at least.

@humitos
Copy link
Member Author

humitos commented Sep 9, 2021

I'm going to merge these updates. There are some things that we will explore/update/refine while doing the implementation of the config file changes and the cache system; and probably in the QA phase as well. However, the general idea of the documentation is accurate and we are happy to move forward. We can come back to this document and update it again if needed.

@humitos humitos merged commit b68a921 into master Sep 9, 2021
@humitos humitos deleted the humitos/new-docker-images branch September 9, 2021 09:25
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants