Skip to content

Sync versions: filter by type on update #9019

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 4 commits into from
Apr 6, 2022
Merged

Conversation

stsewd
Copy link
Member

@stsewd stsewd commented Mar 15, 2022

If two versions share the same verbose name
(a branch named 2.0 and a tag named 2.0),
then we will be updating both,
instead of just one and changing its type from branch to tag.
Later when running this same code,
it will see the branch as new, since isn't included in our queryset that
filters by type=branch.

This wasn't happening before because this check was true

if version_id == old_versions[version_name]:

Since a branch would never change its identifier
(origin/name), but once lsremote was enabled,
the identifier didn't included the 'origin/' part.

So, when doing a full build the branches would be changed
to (origin/name) and when doing a syn they would be changed
to (name), and so on.

Ref #8992 (comment)

If two versions share the same verbose name
(a branch named 2.0 and a tag named 2.0),
then we will be updating both,
instead of just one and changing its type from branch to tag.
Later when running this same code,
it will see the branch as new, since isn't included in our queryset that
filters by type=branch.

This wasn't happening before because this check was true
https://github.com/readthedocs/readthedocs.org/blob/2e3f208c4906de649a9b5b9e33afee6cc20b86bc/readthedocs/api/v2/utils.py#L74-L74

Since a branch would never change its identifier
(origin/name), but once lsremote was enabled,
the identifier didn't included the 'origin/' part.

So, when doing a full build the branches would be changed
to (origin/name) and when doing a syn they would be changed
to (name), and so on.

Ref #8992 (comment)
@stsewd
Copy link
Member Author

stsewd commented Mar 15, 2022

We still need to fix the problem of lsremote and gitpython returning different identifiers (origin/name vs name)

Comment on lines 79 to 85
Version.objects.filter(
project=project,
verbose_name=version_name,
type=type,
).update(
identifier=version_id,
type=type,
machine=False,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It would be good to explain with a comment in code the situation described in the description of the PR.

@humitos
Copy link
Member

humitos commented Mar 16, 2022

If two versions share the same verbose name (a branch named 2.0 and a tag named 2.0),

Is this really possible? I do see that we have a unique_together = [project, slug] which does not includes type, so I don't think this is possible currently. See

unique_together = [('project', 'slug')]

@humitos
Copy link
Member

humitos commented Mar 16, 2022

Since a branch would never change its identifier (origin/name), but once lsremote was enabled, the identifier didn't included the 'origin/' part.

So, when doing a full build the branches would be changed to (origin/name) and when doing a syn they would be changed to (name), and so on.

It would be good to make gitpython and lsremote to return the exact same response for the same repository. It seems that the real fix, isn't it?

@stsewd
Copy link
Member Author

stsewd commented Mar 16, 2022

I do see that we have a unique_together = [project, slug] which does not includes type, so I don't think this is possible currently.

that's slug, here the duplicates are in the verbose_name

@stsewd
Copy link
Member Author

stsewd commented Mar 16, 2022

It seems that the real fix, isn't it?

That's together with this change, otherwise will happen the same if a tag is updated

@stsewd
Copy link
Member Author

stsewd commented Mar 16, 2022

So, about lsremote and gitpython returning the same values for branches, we could match the output of lsremote to the one from gitpython or the other way around.

I'm not really sure why we include the remote part origin, since we will always have only one remote. I only found this commit #4984 and 58b3412#diff-2fc9f70fc71269000cbbc336a139bcb62fe63f5121d299eb5e207d8aa593bad1 related to that.

What I see: git branch -a returns both, local and remote branches, so it was changed to git branch -r to list remote branches only (could also just be changed to git branch to return local branches only?). Git python's repo.branches returns local branches only too. From what I see, local and remote branches are the same for recently cloned repo.

And other thing, whatever we choose, it will hit this code for a lot of versions

Version.objects.filter(
project=project,
verbose_name=version_name,
# Always filter by type, a tag and a branch
# can share the same verbose_name.
type=type,
).update(
identifier=version_id,
machine=False,
)

Since the content will be changed from origin/foo to foo or the other way around. The lsremote flag was enabled like 2 weeks ago (I think) for all projects (but I think it was set to default_true=True before that). BUT hitting that code should be safe now with the change from this PR, and it won't have any side effects.

@stsewd
Copy link
Member Author

stsewd commented Mar 16, 2022

Okay, looks like repo.branches doesn't include all branches when the repo was shallow cloned. So, next question, which name should we keep? prefix origin/ to all branches from lsremote or remove origin/ from all branches from gitpython? Also, is kind of redundant to have the same information for branches, could be more useful to have the commit there, but that's another thing...

@humitos
Copy link
Member

humitos commented Mar 17, 2022

prefix origin/ to all branches from lsremote or remove origin/ from all branches from gitpython?

Are we sure the origin/ is not needed? If that's the case, I would say we can strip it. Is it possible that the default "remote" defined in the git repository is not called origin/? In that case, we will need it, and we will ask gitpyhon for it so we can append the correct one.

Also, I found this repository that has different removes on some of their branches:

▶ git ls-remote https://github.com/cilium/cilium.git | grep remotes
ac8c49bb0401c99e761dab2530023c550dea8052	refs/remotes/bruno/hf/v1.10/v1.10.3-bpf-snat-and-masq-fixes
3143aaeb85f9c1bcabfc6d24eb4ca1fb72532866	refs/remotes/joe/submit/quarantine-etcd
dc3bd17ef91331e4313bd5360eec5283801615c2	refs/remotes/origin/1.2-backports-18-09-12
e3539bfe15c944bb6d542a62b2467110b35429f4	refs/remotes/origin/ipvlan3
24329966c770827c7d09403bfa0e5f86c5b988cd	refs/remotes/origin/pr/add-reserved-health
fc80e6e931a6962a40132bb5da7ef1f36c9bd067	refs/remotes/origin/pr/brb/nodeport-lb
e8abe3293aea7ae8deee411b89d0fb2ec572e086	refs/remotes/origin/pr/ianvernon/5859
173e47aa252b3d9a1a2464506162f62b13b2bdc3	refs/remotes/origin/pr/ianvernon/dynamic-ep-cfg
746ad7b59805d9c5a954402112eb09a9d8b23514	refs/remotes/origin/pr/tgraf/kube-dns-fixed-identity

@stsewd
Copy link
Member Author

stsewd commented Mar 17, 2022

Is it possible that the default "remote" defined in the git repository is not called origin/?

from the docs

This default configuration is achieved by creating references to the remote branch heads under refs/remotes/origin and by initializing remote.origin.url and remote.origin.fetch configuration variables.

@stsewd
Copy link
Member Author

stsewd commented Mar 17, 2022

Also, I found this repository that has different removes on some of their branches:

Tested with gitpython, it doesn't list any other remotes, only origin, and those aren't included there.

@stsewd
Copy link
Member Author

stsewd commented Mar 29, 2022

Looks like on checkout we always add the origin part anyway.

valid, submodules = self.validate_submodules(config)

# Check if ref is a branch of the origin remote
if self.ref_exists('remotes/origin/' + ref):
return 'origin/' + ref

It's redundant as we always have only one remote (origin),
and we need to match the same behavior of lsremote.
@stsewd stsewd marked this pull request as ready for review March 29, 2022 22:37
@stsewd stsewd requested a review from a team as a code owner March 29, 2022 22:37
@stsewd stsewd self-assigned this Mar 31, 2022
Copy link
Member

@humitos humitos left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good! Let's re-enable lsremote feature after merging/deploying this.

@stsewd stsewd merged commit 9a56069 into main Apr 6, 2022
@stsewd stsewd deleted the filter-by-type-on-update branch April 6, 2022 17:20
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
Archived in project
Development

Successfully merging this pull request may close these issues.

2 participants