Skip to content

Proof of concept: Cache Conda env and add ccache #45698

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
wants to merge 1 commit into from

Conversation

jonashaag
Copy link
Contributor

Proof of concept to speed up GH actions builds by ~10 minutes.

  • Cache Conda envs across builds (cache is invalided when deps change, and 1x per day)
  • Add ccache (cache is invalidated daily)

I wonder what everyone thinks about this and whether you think this is worth spending more effort on (eg. porting it also to Azure).

  • closes #xxxx
  • tests added / passed
  • Ensure all linting tests pass, see here for how to run them
  • whatsnew entry

@jbrockmendel
Copy link
Member

I wonder what everyone thinks about this and whether you think this is worth spending more effort on (eg. porting it also to Azure).

If this works and is robust it'll be great.

@jonashaag
Copy link
Contributor Author

It should be pretty robust. For the Conda caching part the key is to come up with a proper cache key, ie. something that changes whenever the deps are changed. (Strictly speaking it should be invalidated every time Anaconda/Conda-forge is updated but that would invalidate the cache every few minutes.) For ccache I don't think we'll have to take care of anything as ccache is an incredibly robust and widely-used tool.

@phofl
Copy link
Member

phofl commented Jan 30, 2022

How are we gaining 10 minutes? The env setup takes around 4

@jonashaag
Copy link
Contributor Author

By caching the C compilation using ccache, which takes around 6-8 minutes

@phofl
Copy link
Member

phofl commented Jan 30, 2022

Does the compilation recognize, that it is already there when you try recompiling it?

@jonashaag
Copy link
Contributor Author

Can you please clarify the question?

@fangchenli
Copy link
Member

For the Conda caching part the key is to come up with a proper cache key, ie. something that changes whenever the deps are changed.

Maybe something like conda env export > key.yml, and then hash this key.yml instead of the environment file.

@jonashaag
Copy link
Contributor Author

Maybe something like conda env export > key.yml, and then hash this key.yml instead of the environment file.

Can you elaborate please? How would you export an env before it's been installed?

@fangchenli
Copy link
Member

Maybe something like conda env export > key.yml, and then hash this key.yml instead of the environment file.

Can you elaborate please? How would you export an env before it's been installed?

Sorry I was wrong... It should be something like this conda env create -f environment.yml --dry-run > key.txt. Then we can hash this ket.txt file.

@jonashaag
Copy link
Contributor Author

Sorry I was wrong... It should be something like this conda env create -f environment.yml --dry-run > key.txt. Then we can hash this ket.txt file.

I see, that makes more sense. But it would also invalidate the cache whenever there is a minor version update of any of the (transitive) dependencies, so the cache would likely not survive more than a couple of hours.

@mroeschke
Copy link
Member

Could you clarify which part of the setup this is caching? Package download and/or c compilation?

Does this skip the dependency solve step as well?

(Off-topic: I'd like to eventually see the CI use conda-lock files as I've been trying to align/share dependency files. Not sure how this intersects with this change. https://github.com/conda-incubator/conda-lock)

@jonashaag
Copy link
Contributor Author

Could you clarify which part of the setup this is caching? Package download and/or c compilation?

This implements env caching and C compilation caching. Env caching = caching the entire envs/ Conda directory. C caching = Using ccache. I plan to make separate PRs for these things.

@jonashaag
Copy link
Contributor Author

Re conda-lock: that definitely makes sense and also makes env setup faster.

@fangchenli
Copy link
Member

Does this skip the dependency solve step as well?

(Off-topic: I'd like to eventually see the CI use conda-lock files as I've been trying to align/share dependency files. Not sure how this intersects with this change. https://github.com/conda-incubator/conda-lock)

We could set up a daily CI job to generate a lock file for each build.

@jreback
Copy link
Contributor

jreback commented Feb 2, 2022

@jonashaag @fangchenli

see https://github.com/ibis-project/ibis/blob/master/poetry.lock

@cpcloud has spent a bit of time setting up using these lock file with github actions on ibis. maybe can offer a good roadmap here.

@cpcloud
Copy link
Member

cpcloud commented Feb 2, 2022

The way this works for ibis is that whenever someone submits a PR that modifies files related to dependencies, we trigger a PR comment /condalock which kicks off another action to actually generate the lock files and push a new commit to the corresponding PR.

poetry.lock (the output of poetry's solver) doesn't come into play when generating conda lock files, because conda-lock uses pyproject.toml to generate dependency constraints and then lets conda solve the constraint problem. If conda-lock were to use poetry.lock, most of the solves would fail because many packages that are available through poetry (which generally pulls from PyPI) aren't available on conda-forge.

This setup was adapted from https://github.com/pangeo-data/pangeo-docker-images/blob/master/.github/workflows/ChatOpsDispatcher.yml.

@jonashaag
Copy link
Contributor Author

I've done some work on ccache (GHA only so far) here #45701. Also here #45902 I've changed Conda for Mamba.

@jonashaag
Copy link
Contributor Author

Caching Conda environments should gain another ~4 minute cut (and more on Windows).

@jonashaag
Copy link
Contributor Author

Closing this (splitting work into multiple PRs)

@jonashaag jonashaag closed this Feb 11, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

7 participants