Skip to content

Implement "repo-mode" to use the repo as source of truth for devcontainer/Dockerfile #218

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
Tracked by #132
mafredri opened this issue Jun 1, 2024 · 9 comments · Fixed by #290
Closed
Tracked by #132
Assignees

Comments

@mafredri
Copy link
Member

mafredri commented Jun 1, 2024

As part of #128, we want to implement "repo"-mode for envbuilder as an alternative to the current "filesystem"-mode.

Filesystem-mode work like this:

  • Check if repository is cloned
  • If no, clone the repository to the filesystem
  • If yes, skip clone
  • Read devcontainer/Dockerfile from filesystem

In that last step, the devcontainer or Dockerfile may be old or locally modified. It depends on what the user has done.

Repo-mode differs from filesystem-mode in that it will always clone the repo (to a temporary location) and use the devcontainer/Dockerfile from there. This way the resulting container is always in-sync with the repo.

For now, we can keep this simple and always clone the repo. This feature should be implemented as a package that can be used by both envbuilder and a future envbuilder Terraform provider.

In the future, we can improve performance by only reading the relevant files from the repo.

@mtojek
Copy link
Member

mtojek commented Jun 6, 2024

@mafredri We're having trouble researching the idea, so I will put this on hold until we have a good understanding.

Envbuilder supports the git clone repo with go-git today, also tags/branches are supported. Could you please elaborate more what is the ideal solution you had in mind?

Also, what kind of problem will this issue address? monorepo & sparse checkout?

@bpmct
Copy link
Member

bpmct commented Jun 6, 2024

Just read the RFC again. Will the filesystem mode have the same level of caching, just via a different architecture or is repo mode more performant in some scenarios?

@johnstcn
Copy link
Member

Will the filesystem mode have the same level of caching

As I understand, it entirely depends on whether the layers are present in the build cache.

is repo mode more performant in some scenarios?

The way it's described here, repo mode will always clone from the remote. Therefore it may take slightly longer depending on the size of the repo to be cloned. The future performance improvement (only reading the relevant files directly from the repo).

@bpmct
Copy link
Member

bpmct commented Jun 13, 2024

Understood 👍🏼

I feel like I'd personally use local mode so I can iterate on a devcontainer without having to commit and set a branch parameter

@mafredri
Copy link
Member Author

Envbuilder supports the git clone repo with go-git today, also tags/branches are supported. Could you please elaborate more what is the ideal solution you had in mind?

@mtojek what you describe here is "filesystem"-mode. The repo is cloned onto a persistent volume, and on next start the local files are used to run envbuidler. The repo isn't cloned or updated. This is the current behavior.

What we want with "repo"-mode is (in simple terms, disregarding optimizations):

  1. Always clone a fresh copy of the repo as defined by env variables
  2. Do not modify anything in the users persistent files (if their checked out repo is old, it should remain old, if it doesn't exist, we can copy over the repo from 1., etc.)
  3. (Break repo-mode logic out into public func/library so that it can be imported by future envbuilder Terraform provider)

Also, what kind of problem will this issue address? monorepo & sparse checkout?

The issue we're addressing is startup performance (cache utilization) first and foremost. Essentially on use-case is, the repo owner can pre-generate an envbuilder image from the latest commit (of some git repo). By using "repo"-mode, the user can take advantage of the existing image even when their own local files are old and outdated.

The second issue we're addressing is building the functionality for use in a Terraform provider so that startup performance can be further improved without having to run envbuilder --get-cached-image first (this logic will be performed by the Terraform provider).

For this issue, we simply want full clone of repo. In future, we can improve performance further by only checking out necessary files (like @johnstcn mentioned).

@mtojek
Copy link
Member

mtojek commented Jun 17, 2024

The repo is cloned onto a persistent volume, and on next start the local files are used to run envbuidler.

In case of "repo-mode" this would be /tmp or something similar?

Essentially on use-case is, the repo owner can pre-generate an envbuilder image from the latest commit (of some git repo).

Does it mean we want to precache the repo snapshot in the image? I'm trying to see the benefits because in both cases we have to checkout the Git repo, and for large repos it will take some time unless we "snapshot" it in the image.

@mafredri
Copy link
Member Author

In case of "repo-mode" this would be /tmp or something similar?

Probably configurability is best, but /.envbuilder is a solid alternative for /tmp. OTOH /tmp is probably fine too.

Does it mean we want to precache the repo snapshot in the image? I'm trying to see the benefits because in both cases we have to checkout the Git repo, and for large repos it will take some time unless we "snapshot" it in the image.

No, the git repo is only for building, and for get cached image. We don't want it as part of the image.

@mtojek
Copy link
Member

mtojek commented Jun 17, 2024

Ok, I think I got what you mean:

You want to pull the Git repo based on branch/commit/sha, get.devcontainer out of it, build the image, and push it to the container repository (CR). I understand that CR hash will refer to the repo commit sha? Should we tag the image built by kaniko specially?

No, the git repo is only for building, and for get cached image. We don't want it as part of the image.

Right 👍 we don't want to keep redundant data in too many places.

@mafredri
Copy link
Member Author

Ok, I think I got what you mean:

You want to pull the Git repo based on branch/commit/sha, get.devcontainer out of it, build the image, and push it to the container repository (CR). I understand that CR hash will refer to the repo commit sha? Should we tag the image built by kaniko specially?

We can just rely on the image sha produced by Kaniko. The get cached image can deduce the correct sha by cloning the repo. If we tag the git commit sha there needs to be a new build/tag for every commit. Whereas with clone + get cached image it works without tagging. And if the files haven't changed, no additional images need to be pushed.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

5 participants