Skip to content

RUN command is not supported in cache probe mode #68

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
MartinLoeper opened this issue Nov 27, 2024 · 10 comments
Open

RUN command is not supported in cache probe mode #68

MartinLoeper opened this issue Nov 27, 2024 · 10 comments
Labels
help wanted Extra attention is needed question Further information is requested

Comments

@MartinLoeper
Copy link

I am using the envbuilder terraform provider to cache my workspace image in coder.
Could somebody help me understand why I always see the following error which results in a cache miss?

Failed to find cached image in repository "myrepo.org/coder-ssd-local-storage-nvme-test/coder-494b6612-e6be-42e1-b653-969164b30b53". It will be rebuilt in the next apply. Error: get cached image: error probing build cache: uncached RUN command is not supported in cache probe mode

Are there devcontainer elements that cannot be cached such as devcontainer features?
Or what does "uncached RUN command is not supported in cache probe mode" mean?

@coder-labeler coder-labeler bot added help wanted Extra attention is needed question Further information is requested labels Nov 27, 2024
@johnstcn
Copy link
Member

johnstcn commented Nov 28, 2024

Hi @MartinLoeper, can you provide the versions of both the envbuilder image and the terraform provider you're using?

@MartinLoeper
Copy link
Author

Hi @johnstcn, I used envbuilder ghcr.io/coder/envbuilder-preview:1.0.5-dev-e64f857 and terraform provider coder/envbuilder v1.0.0. If I can provide you anything which helps you looking into this issue, please tell me. Thanks =)

@johnstcn johnstcn self-assigned this Dec 9, 2024
@johnstcn
Copy link
Member

johnstcn commented Dec 9, 2024

Hey @MartinLoeper, would you be able to provide a minimal Dockerfile or devcontainer.json that reproduces the issue for you? Our integration tests pass with that specific combination of envbuilder + provider, which suggests to me that there's an underlying issue that we haven't managed to capture in our tests yet.

@MartinLoeper
Copy link
Author

Hey @johnstcn, Thanks for looking into it. Here is my sample devcontainer repo which I open in Coder via envbuilder: https://github.com/MartinLoeper/devcontainer-test/blob/main/.devcontainer/devcontainer.json

@johnstcn
Copy link
Member

Hey Martin, just wanted to give you an update. I've narrowed it down to a couple of the devcontainer features busting the cache, but I need to spend some more time to figure out exactly which feature and/or which parts of the feature.

My notes so far: https://gist.github.com/johnstcn/385700a755b10028844609af28014a6e

@MartinLoeper
Copy link
Author

Wow, that is a very detailed analysis! Thanks @johnstcn

I see that docker-outside-of-docker is included in my test repo. I think this feature is not really required when running in coder since I am using a remote builder.

It is definitely required when developing locally outside of coder since the developers are using their local docker socket in that case.
Is docker-outside-of-docker currently incompatible with the envbuilder cache? In that case I would remove it from my test repo. The repo would be "coder-only" in the meantime until that issue is fixed.

Appreciate your effort @johnstcn since I know how difficult it is to analyse things like that and I would definitely not have the skills to do that so well.

@johnstcn
Copy link
Member

I don't think it's necessarily related to that particular feature @MartinLoeper -- I was also able to replicate it with the awscli feature.

We have a theory that it may be related to how the Terraform provider runs the cache probe. Normally we have to "lie" about the paths of some files in order to ensure that the cache probe works, and we may not be lying correctly about the paths of feature install scripts :-)

The next step would be to validate this theory with an integration test in the provider that builds an image with a "no-op" feature.

@MartinLoeper
Copy link
Author

Ah I understand. If there is something I can help you with, e.g. testing something on my cluster, feel free to reach out.

@johnstcn
Copy link
Member

TL;DR features are not cacheable at the moment.

When envbuilder builds devcontainer features, they are extracted to a temporary folder and executed.

Example:

USER root
WORKDIR /.envbuilder/features/test-feature-a019c204
FOO=bar PATH=/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin WORKDIR=/ RUN MESSAGE="hello world" _CONTAINER_USER="root" _REMOTE_USER="root" ./install.sh 

When performing a cache probe (ENVBUILDER_GET_CACHED_IMAGE=1, or using the provider), we do not have access to the same path, so we end up extracting the feature under /tmp

Example (not using provider, but using a test that replicates the behaviour):

integration_test.go:3011: [cache-probe] info: #2: WORKDIR /tmp/TestPushImage2464303247/001/.envbuilder/features/test-feature-a019c204
integration_test.go:3011: [cache-probe] info: #2: RUN MESSAGE="hello world" _CONTAINER_USER="root" _REMOTE_USER="root" ./install.sh

Possible fix:

Use a chroot-like filesystem abstraction to run the cache probe, where all reads and writes are redirected to a temporary directory. For example, reading /.envbuilder/bin/envbuilder would translate to /tmp/TestPushImage2464303247/001/.envbuilder/bin/envbuilder.

We currently do a sort of 'hacky' workaround for this right now as part of the cache probe: before running the cache probe, we extract the envbuilder binary from the builder image and place it in the build context in an expected location (as we embed the envbuilder binary inside the built image).

@andrewreid
Copy link

andrewreid commented Mar 24, 2025

Checking in to see what the status of this issue is, as I've run into it myself today, with one (but not all) of my features. My devcontainer.json is building (and caching) some, but not all the features – it's particularly getting stuck with the NVIDIA-CUDA feature.

You can see in the logs, it'll build and cache some features (or in this case, re-use the already-built layers from the cache), but with the NVIDIA-CUDA feature, it finishes the installation phase for the feature, announces "#3: Taking snapshot of files..." and then, it would seem, never gets as far as pushing the layer to the container registry like it has for the other features.

It then downloads the git repo again, pulls the other feature layers from the cache and then gets to the NVIDIA-CUDA feature, which isn't in the cache and sets about building it again... only to get to "#3: Taking snapshot of files...", not push the layer and doing the whole thing again. Over and over!

I'm struggling to find anything that indicates an error has occurred. Is this related to this existing issue, or do you think it's a separate problem that I'm inducing?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
help wanted Extra attention is needed question Further information is requested
Projects
None yet
Development

No branches or pull requests

3 participants