Implement "repo-mode" to use the repo as source of truth for devcontainer/Dockerfile #218

mafredri · 2024-06-01T15:42:20Z

As part of #128, we want to implement "repo"-mode for envbuilder as an alternative to the current "filesystem"-mode.

Filesystem-mode work like this:

Check if repository is cloned
If no, clone the repository to the filesystem
If yes, skip clone
Read devcontainer/Dockerfile from filesystem

In that last step, the devcontainer or Dockerfile may be old or locally modified. It depends on what the user has done.

Repo-mode differs from filesystem-mode in that it will always clone the repo (to a temporary location) and use the devcontainer/Dockerfile from there. This way the resulting container is always in-sync with the repo.

For now, we can keep this simple and always clone the repo. This feature should be implemented as a package that can be used by both envbuilder and a future envbuilder Terraform provider.

In the future, we can improve performance by only reading the relevant files from the repo.

mtojek · 2024-06-06T09:45:00Z

@mafredri We're having trouble researching the idea, so I will put this on hold until we have a good understanding.

Envbuilder supports the git clone repo with go-git today, also tags/branches are supported. Could you please elaborate more what is the ideal solution you had in mind?

Also, what kind of problem will this issue address? monorepo & sparse checkout?

bpmct · 2024-06-06T14:15:56Z

Just read the RFC again. Will the filesystem mode have the same level of caching, just via a different architecture or is repo mode more performant in some scenarios?

johnstcn · 2024-06-12T09:56:15Z

Will the filesystem mode have the same level of caching

As I understand, it entirely depends on whether the layers are present in the build cache.

is repo mode more performant in some scenarios?

The way it's described here, repo mode will always clone from the remote. Therefore it may take slightly longer depending on the size of the repo to be cloned. The future performance improvement (only reading the relevant files directly from the repo).

bpmct · 2024-06-13T13:41:09Z

Understood 👍🏼

I feel like I'd personally use local mode so I can iterate on a devcontainer without having to commit and set a branch parameter

mafredri · 2024-06-17T09:25:51Z

Envbuilder supports the git clone repo with go-git today, also tags/branches are supported. Could you please elaborate more what is the ideal solution you had in mind?

@mtojek what you describe here is "filesystem"-mode. The repo is cloned onto a persistent volume, and on next start the local files are used to run envbuidler. The repo isn't cloned or updated. This is the current behavior.

What we want with "repo"-mode is (in simple terms, disregarding optimizations):

Always clone a fresh copy of the repo as defined by env variables
Do not modify anything in the users persistent files (if their checked out repo is old, it should remain old, if it doesn't exist, we can copy over the repo from 1., etc.)
(Break repo-mode logic out into public func/library so that it can be imported by future envbuilder Terraform provider)

Also, what kind of problem will this issue address? monorepo & sparse checkout?

The issue we're addressing is startup performance (cache utilization) first and foremost. Essentially on use-case is, the repo owner can pre-generate an envbuilder image from the latest commit (of some git repo). By using "repo"-mode, the user can take advantage of the existing image even when their own local files are old and outdated.

The second issue we're addressing is building the functionality for use in a Terraform provider so that startup performance can be further improved without having to run envbuilder --get-cached-image first (this logic will be performed by the Terraform provider).

For this issue, we simply want full clone of repo. In future, we can improve performance further by only checking out necessary files (like @johnstcn mentioned).

mtojek · 2024-06-17T10:02:11Z

The repo is cloned onto a persistent volume, and on next start the local files are used to run envbuidler.

In case of "repo-mode" this would be /tmp or something similar?

Essentially on use-case is, the repo owner can pre-generate an envbuilder image from the latest commit (of some git repo).

Does it mean we want to precache the repo snapshot in the image? I'm trying to see the benefits because in both cases we have to checkout the Git repo, and for large repos it will take some time unless we "snapshot" it in the image.

mafredri · 2024-06-17T10:38:37Z

In case of "repo-mode" this would be /tmp or something similar?

Probably configurability is best, but /.envbuilder is a solid alternative for /tmp. OTOH /tmp is probably fine too.

Does it mean we want to precache the repo snapshot in the image? I'm trying to see the benefits because in both cases we have to checkout the Git repo, and for large repos it will take some time unless we "snapshot" it in the image.

No, the git repo is only for building, and for get cached image. We don't want it as part of the image.

mtojek · 2024-06-17T11:17:05Z

Ok, I think I got what you mean:

You want to pull the Git repo based on branch/commit/sha, get.devcontainer out of it, build the image, and push it to the container repository (CR). I understand that CR hash will refer to the repo commit sha? Should we tag the image built by kaniko specially?

No, the git repo is only for building, and for get cached image. We don't want it as part of the image.

Right 👍 we don't want to keep redundant data in too many places.

mafredri · 2024-06-17T13:56:16Z

Ok, I think I got what you mean:

You want to pull the Git repo based on branch/commit/sha, get.devcontainer out of it, build the image, and push it to the container repository (CR). I understand that CR hash will refer to the repo commit sha? Should we tag the image built by kaniko specially?

We can just rely on the image sha produced by Kaniko. The get cached image can deduce the correct sha by cloning the repo. If we tag the git commit sha there needs to be a new build/tag for every commit. Whereas with clone + get cached image it works without tagging. And if the files haven't changed, no additional images need to be pushed.

Fixes #218

coder-labeler bot added the enhancement label Jun 1, 2024

mafredri mentioned this issue Jun 1, 2024

Write RFC around performance/caching improvements #128

Closed

mtojek mentioned this issue Jun 3, 2024

Envbuilder v1.0 release #132

Closed

36 tasks

mafredri mentioned this issue Jul 15, 2024

Implement Terraform provider that uses/imports "repo mode" and "get cached image" functionality coder/internal#12

Closed

5 tasks

mtojek assigned mafredri Jul 19, 2024

mafredri added a commit that referenced this issue Jul 30, 2024

feat: implement repo-mode

022d9d4

Fixes #218

mafredri mentioned this issue Jul 30, 2024

feat: implement repo-mode #290

Merged

mafredri added a commit that referenced this issue Jul 30, 2024

feat: implement repo-mode

8f3606b

Fixes #218

mafredri added a commit that referenced this issue Jul 30, 2024

feat: implement repo-mode

0acb5f6

Fixes #218

mafredri added a commit that referenced this issue Jul 30, 2024

feat: implement repo-mode

5d4e5ca

Fixes #218

mafredri added a commit that referenced this issue Aug 1, 2024

feat: implement repo-mode

6a0c240

Fixes #218

mafredri closed this as completed in #290 Aug 1, 2024

matifali removed the enhancement label Oct 15, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Implement "repo-mode" to use the repo as source of truth for devcontainer/Dockerfile #218

Implement "repo-mode" to use the repo as source of truth for devcontainer/Dockerfile #218

mafredri commented Jun 1, 2024

mtojek commented Jun 6, 2024

bpmct commented Jun 6, 2024

johnstcn commented Jun 12, 2024

bpmct commented Jun 13, 2024

mafredri commented Jun 17, 2024

mtojek commented Jun 17, 2024

mafredri commented Jun 17, 2024

mtojek commented Jun 17, 2024

mafredri commented Jun 17, 2024

Implement "repo-mode" to use the repo as source of truth for devcontainer/Dockerfile #218

Implement "repo-mode" to use the repo as source of truth for devcontainer/Dockerfile #218

Comments

mafredri commented Jun 1, 2024

mtojek commented Jun 6, 2024

bpmct commented Jun 6, 2024

johnstcn commented Jun 12, 2024

bpmct commented Jun 13, 2024

mafredri commented Jun 17, 2024

mtojek commented Jun 17, 2024

mafredri commented Jun 17, 2024

mtojek commented Jun 17, 2024

mafredri commented Jun 17, 2024