-
-
Notifications
You must be signed in to change notification settings - Fork 3.6k
Build: cloning multibranch repository takes too long #9736
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
@humitos I'm guessing we're doing the latest branches because we're reusing this code to sync versions. Does this change impact that functionality at all? |
@ericholscher it should not affect the sync versions code since we use readthedocs.org/readthedocs/projects/tasks/mixins.py Lines 64 to 68 in e600fd1
It's currently under a feature flag, but it's enabled for all the projects 1 (and it's targeted to be deleted 😄 and make it the default behavior ) Footnotes
|
Yea, that seems like step 1 then, then I'm 👍 on this approach. Probably deploy it behind a feature flag as well tho, just in case it breaks stuff for a small subset of projects somehow. |
I enabled the feature flag 7 days ago in .com and things keep looking great. So, I think we are good to move forward with this implementation when we have the time. |
When working on this issue, consider the conversation we have in #9424 (comment) regarding how to get the |
Here's a quick sketch of what I'm gonna try to do...
|
This is slightly more complicated, once we start looking at PR branches and tags. PR branchesThese branches are named by GitHub for instance So we skip these when cloning. There's a funny PR branches have a different history from the repository's default HEAD, so that's why AFAICT we always run (we're not talking about the source branch of a PR, which can exist in an entirely different repo) TagsIf we want to reference a tag from the Git history, then we need a Git history that contains that tag. From my experiments, Current implementation seems to rely on tag references available after There are 3 good options here:
Here's an example of what needs to be compensated for if we remove
With
Simplifying thingsWe can use a more generic Circle CI has a different approach:
This seems like a simpler way of doing things, so we don't have different Proposal
|
@humitos had this comment that I think is important to align on:
We can have almost the same clone command for branches, tags and PRs. Branches and tags can rely on the |
@benjaoming I think the approach that CircleCI uses seems promising. I don't fully understand it, but I think it makes sense to me to do something like:
It seems that will keep the builds the same for PR branches & merged branches, but I think tags being special seems like the main issue here, and I'm OK with special-casing that logic I think. |
Note that After reading the comments here, I think we should:
The previous approach in Git commands would be:
A real example using
We could use |
@humitos Awesome! That's very similar to what I've been wanting, too. Re: the The So if we go in this direction, we have to make sure that we are doing
Having a simple approach where branches, tags and PRs are all cloned, fetched and checked out with the same commands will make it a lot easier to tweak more things 💯 Changes proposed here are still very contained: AFAICT, everything runs through |
cpython-previews before: 85 seconds ccxt before: 506 seconds |
I just noticed that there are some projects where it takes too long to clone the repository, ~500 seconds, but the build time is just a few of seconds. I found this scenario in
ccxt
project: https://readthedocs.org/projects/ccxt/builds/18648371/ 1We've done some improvements in the past by adding
--depth 50
so we don't bring the whole history. However, we are using--no-single-branch
, which will bring the 50 last commits of all the branches in the repository. That's what I understand from the manpage:Since we are building always from one and only one branch, we could probably remove the
--no-single-branch
and only fetch the latest 50 commits for the branch we are actually going to build.Example
Cloning with depth 1 and single branch, it takes 5 seconds
Cloning with depth 1 but using
--no-single-branch
, it takes 1 minutes 37 seconds. Which is almost 20x slower.Proposal
--no-single-branch
--depth 50
--branch <branch>
git checkout --force <branch>
This way, we get the best outcome by downloading only the exact data needed to perform the build using one branch. Since this is the most general use case 2, we will be saving lot of bandwidth here and also improving the UX for our users:
After executing this command, the repository will be in the
parallel-build
branch already. It will be ready to build the documentation.Footnotes
noticed this on Metabase because this project showed a high number of build time compared to others ↩
users wanting to do something different can always make usage of
build.jobs
orbuild.commands
in case they need more branches in their repository ↩The text was updated successfully, but these errors were encountered: