Skip to content

Git backend: make default_branch to point to VCS' default branch #9424

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 18 commits into from
Jan 25, 2023

Conversation

humitos
Copy link
Member

@humitos humitos commented Jul 11, 2022

Currently, importing a project manually without setting the default_branch,
will make Git VCS to use master as the default branch. However, with the last
swap about master and main branches, repositories defaulting to main
branch fail when being imported on Read the Docs. Leaving the user in a blocked
state they can get out.

This commit allows Version.identifier to be NULL meaning that "it won't try to
guess what's the default branch of the VCS" and just leave the repository in the
immediate state after being cloned (which is already the default branch). To do
this, I'm removing the command git checkout --force <branch> from the steps
executed for this case.

Closes #9367

Currently, importing a project manually without setting the `default_branch`,
will make Git VCS to use `master` as the default branch. However, with the last
swap about `master` and `main` branches, repositories defaulting to `main`
branch fail when being imported on Read the Docs. Leaving the user in a blocked
state they can get out.

This commit allows `Version.identifier` to be NULL meaning that "it won't try to
guess what's the default branch of the VCS" and just leave the repository in the
immediate state after being cloned (which is already the default branch). To do
this, I'm removing the command `git checkout --force <branch>` from the steps
executed for this case.

Closes #9367
@humitos humitos requested a review from a team as a code owner July 11, 2022 09:17
@humitos humitos requested a review from stsewd July 11, 2022 09:17
@humitos
Copy link
Member Author

humitos commented Jul 11, 2022

This works fine when importing and building. However, when a webhook is pushed, we need to do some work to add the mapping between the branch/tag updated and the Version on Read the Docs. Since we are not saving Version.identifier anymore.

@humitos
Copy link
Member Author

humitos commented Jul 11, 2022

GH webhook sends repository.default_branch: "main" and repository.master_branch: "main". We could use that to populate our Version.identifier the first time a webhook is pushed.

Or even, checking if ref == repository.default_branch, in that case, trigger a build for LATEST. I think that makes sense.

@benjaoming
Copy link
Contributor

Or even, checking if ref == repository.default_branch, in that case, trigger a build for LATEST. I think that makes sense.

As long as that's not going to become an infinite loop of sorts?

Copy link
Contributor

@benjaoming benjaoming left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This sounds like the way to go! Do we have test cases where no git identifiers are specified or Version.identifier isn't set?

Copy link
Member

@ericholscher ericholscher left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This seems like a reasonable approach. I think just doing the clone and then using the branc that we pull without trying to checkout a specific branch is a reasonable approach for version syncing as well.

@benjaoming benjaoming changed the title Git backend: make defaul_branch to point to VCS' default branch Git backend: make default_branch to point to VCS' default branch Jul 19, 2022
@benjaoming benjaoming marked this pull request as draft July 27, 2022 11:01
@benjaoming
Copy link
Contributor

Still remaining:

However, when a webhook is pushed, we need to do some work to add the mapping between the branch/tag updated and the Version on Read the Docs. Since we are not saving Version.identifier anymore.

Or even, checking if ref == repository.default_branch, in that case, trigger a build for LATEST. I think that makes sense.

Use `default_branch` coming from the Webhook itself to decide if a build for
Latest Version with `identifier=None` has to be triggered or not.

Note that this feature only works for GitHub and GitLab since Bitbucket does not
send the `default_branch` of the repository.
@agjohnson
Copy link
Contributor

Doing a little house cleaning for sprints: what is the status of this PR? Blocked on anything in particular?

@humitos
Copy link
Member Author

humitos commented Oct 25, 2022

@agjohnson we need to keep thinking how to relate the incoming webhook for the default version with the "latest" version itself

@benjaoming
Copy link
Contributor

benjaoming commented Nov 23, 2022

Noted for later - tried to test-run this by importing a new project without setting the default branch.

Migration issue needs to be fixed:

CommandError: Conflicting migrations detected; multiple leaf nodes in the migration graph: (0090_default_branch_helptext, 0093_migrate_null_fields in projects; 0045_alter_build_status, 0045_identifier_null in builds).
To fix them run 'python manage.py makemigrations --merge'

and (not sure if related to the missing migration), I encounter this on project import:

web_1       | │ /usr/src/app/checkouts/readthedocs.org/readthedocs/core/utils/__init__.py:62 │
web_1       | │ in prepare_build                                                             │
web_1       | │                                                                              │
web_1       | │    59 │                                                                      │
web_1       | │    60 │   if not version:                                                    │
web_1       | │    61 │   │   default_version = project.get_default_version()                │
web_1       | │ ❱  62 │   │   version = project.versions.get(slug=default_version)           │
web_1       | │    63 │                                                                      │
web_1       | │    64 │   build = Build.objects.create(                                      │
web_1       | │    65 │   │   project=project,                                               

(cut)

DoesNotExist: Version matching query does not exist.

The VCS's `default_branch` comes in the incoming webhooks that GitHub and GitLab
sends to us when the user realizes an action in the repository.

Note that in this commit BitBucket is not supported and will keep having the
issues this PR solves. To solve this problem, we would need to update the
`Version.identifier` from the builder immediately after clonning the default
branch (e.g. `git clone <URL>`). I put some commented code/ideas in place to
come back to this once we feel it's the right time.
@humitos
Copy link
Member Author

humitos commented Jan 19, 2023

I updated this branch with the latest changes in main.

After that, I worked on solving the problem for GitHub and GitLab since they provide the default_branch in the incoming webhooks. BitBucket does not, so the issue is not fixed for it in this commit.

I tried to be clear by writing comments in the code explaining the changes in meaning and logic behind this. It's tricky to follow and it's easy to get lost, even after you think you understand the things. So, I'm happy to jump into a call and do some pair review for this if you think it's gonna be easier.

@humitos humitos requested a review from ericholscher January 19, 2023 18:06
@humitos humitos removed the PR: work in progress Pull request is not ready for full review label Jan 19, 2023
@humitos
Copy link
Member Author

humitos commented Jan 19, 2023

I'm removing the command git checkout --force <branch> from the steps
executed for this case.

This is related to #9736 in some degree. We can continue following this path and remove this command completely by adding an extra argument to the git command in a new PR, if we consider.

@humitos humitos marked this pull request as ready for review January 19, 2023 18:13
Copy link
Member

@ericholscher ericholscher left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This seems like a great approach, and pretty straightforward! It makes a lot of sense to me.

Comment on lines +206 to +209
# The idea is to hit the APIv2 here to update the `latest` version with
# the `default_branch` we just got from the repository itself,
# after clonning it.
# However, we don't know the PK for the version we want to update.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We could just add a get_default_branch method to the class here, and hit the API from the caller where we have the version?

self.vcs_repository.update()
identifier = self.data.build_commit or self.data.version.identifier
log.info("Checking out.", identifier=identifier)
self.vcs_repository.checkout(identifier)

Copy link
Member Author

@humitos humitos Jan 23, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hrm... We could, yes. I'm not sure if it's better than just initializing this class with a read Version object instead of the version_slug only.

  • the clone() method's caller is the same class; from outside, the method called is update() --but I understand it's the same for the purpose of your comment
  • I do see hitting the API from the clone() method a little more organic, since the caller doesn't need to think about updating the default_branch each time after calling clone() --However, I don't see it bad for now.

There are other potential solutions as well:

  1. initialize the class with version_pk too
  2. initialize the class with a complete Version object
  3. use APIv3 (instead of APIv2) from the builder that supports getting a Version object without the pk: https://docs.readthedocs.io/en/stable/api/v3.html#version-detail

It probably makes sense to use APIv3 since it's part of the organic transition to remove APIv2 internally.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hrm, thinking a little more about this... These ides will interfere with this other improvement #9736 where we will be cloning the exact branch directly, instead of the default_branch and then switching to the one wanted.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This could also be part of another issue to investigate a little more, if you think it's not mandatory for this PR for now. It only affects Bitbucket currently.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it's fine to not include, but I just wanted to note that we have options here to get the data we want by thinking about getting it another way. 🤷

humitos and others added 2 commits January 23, 2023 11:12
Meh, I made a mistake. This commit should have gone to a different branch.

This reverts commit fc90d36.
@humitos
Copy link
Member Author

humitos commented Jan 23, 2023

I replied to all the feedback that I receive. There are some good ideas that may require some extra time investigating and decided how to proceed. I suggested to open new issues for those instead of blocking this PR again. Other than that, this is ready for a re-review.

Copy link
Member

@ericholscher ericholscher left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this is probably good to go ahead with. I do think we should probably not try to deploy it this week since we already have other big build changes going out. I'd be 👍 shipping it next deploy tho.

People is able to pass `default_branch=main` so Read the Docs can decide whether
or not it has to trigger a build for `LATEST` in case its `identifier=None` (the
default branch of the repository)
All these tests are expecting `master` to be the default branch.
We have to force this now since it was automatically "guess" previously,
but now we are not guessing anymore.
I don't understand why we were checking for this in this test,
but I think it's not necessary since it's an old feature.
@humitos humitos requested a review from a team as a code owner January 24, 2023 09:15
@humitos
Copy link
Member Author

humitos commented Jan 24, 2023

I made some latest changes to make the tests accurate to the changes made on this PR. I also added support for default_branch on generic webhooks. It's not a breaking change, since if it's present, it's used. Otherwise, nothing changes.

Note that 90% of the QA was successful and it's working fine. However, there is something that's not perfect still.

When the latest version is not pointing to any identifier (project just imported without defining the default_branch -this is manually importing a repository with default_branch as None), if we hit the webhook we receive a response saying there was not build triggered, but a build for latest version was triggered:

$ curl -X POST devthedocs.org/api/v2/webhook/test-builds/1/ -H "Content-Type: application/json" --data @webhook.json
{"build_triggered":false,"project":"test-builds","versions":[]}

Being body.json the following:

{
    "created": false,
    "deleted": false,
    "forced": false,
    "default_branch": "main",
    "ref": "refs/heads/main"
}

I don't think this case is terrible and should not cause troubles, but it's not 100% correct, tho. I suppose there may be other edge cases like this one that we could continue researching and working on to fix them, but I don't think they are a huge priority now, considering that "Importing a repository manually and do not specify a default_branch" is a discouraged action. Also, this only happens the first time the webhook is triggered. The second time, it works as expected:

curl -X POST devthedocs.org/api/v2/webhook/test-builds/1/ -H "Content-Type: application/json" --data @webhook.json
{"build_triggered":true,"project":"test-builds","versions":["latest"]}

@ericholscher please, review these changes again and let me know if you think we are moving in the right direction. These changes are getting more complex that we all thought for sure 😩

Copy link
Member

@ericholscher ericholscher left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not sure we care about the generic build API, tbh. I think it still looks fine.

I removed because executing tests with `--nomigrations` made it fail.
@humitos humitos merged commit e3cf808 into main Jan 25, 2023
@humitos humitos deleted the humitos/vcs-default-branch branch January 25, 2023 08:51
@benjaoming
Copy link
Contributor

Great work @humitos! I haven't tried it out yet, but I'll probably be doing some related stuff with my own development environment, so will send back feedback if anything comes up.

considering that "Importing a repository manually and do not specify a default_branch" is a discouraged action.

I agree and it was also a proposed solution: ask the user to fill in the default branch, then verify that it exists as part of validating form input.

We could open up an issue on adding this validation? I don't think there is any contradiction to the current implementation.. we can still allow the field to be left blank and try to resolve the default branch automatically.

@humitos
Copy link
Member Author

humitos commented Jan 25, 2023

@benjaoming

We could open up an issue on adding this validation? I don't think there is any contradiction to the current implementation..

I'm not sure how to do this. We cannot verify the branch exists while the user is filling the form. We would need to clone the repository in the background before being able to validate the form. It seems complicated to me, at the moment.

we can still allow the field to be left blank and try to resolve the default branch automatically.

In theory, this should work with the implementation done in this PR. It should use whatever is the branch cloned after executing git clone <URL> --that would be the default_branch of the project.

@benjaoming
Copy link
Contributor

I'm not sure how to do this. We cannot verify the branch exists while the user is filling the form. We would need to clone the repository in the background before being able to validate the form. It seems complicated to me, at the moment.

I think there are others, such as sending HEAD requests to known HTTP endpoints or using git ls-remote http://user@server/GIT/${REPO}.git (https://stackoverflow.com/questions/43225291/check-if-branch-exists-in-git-repo-without-making-a-clone)

@humitos
Copy link
Member Author

humitos commented Jan 25, 2023

Oh, yeah. In fact, we are already using that and I've implemented it in our code 😅 , ha ha. Manually importing repositories is a discouraged action as we were saying and we may remove it in the future.

However, feel free to open a issue with the proposed feature/form validation for this and we can come back to it when needed 👍🏼

@benjaoming
Copy link
Contributor

Sounds good! I'll open an issue about validation of the form field for the default branch 👍

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Manual project import: Fails to detect Github "main" branch
4 participants