-
-
Notifications
You must be signed in to change notification settings - Fork 3.6k
Re-connect a Project
to a RemoteRepository
#8229
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
We also need to define how the UI for this feature would look like. I took a look at the Admin tabs and I didn't find a great place to put a button for this. So, this may be a good point for the new templates as well. |
Note that for corporate we will have more URLs options here because they could use |
This issue is related to #8229 as well. |
Don't know if this is related to the issue, but the remote repo gets out of sync when the repo URL is changed. Happened on https://readthedocs.org/dashboard/dev/
That repo was first, but then we changed it to https://github.com/readthedocs/readthedocs.org/. But even weirder, the build status was being sent correctly a couple of weeks ago. |
@ericholscher I think I wanted to link to #8715 , but I had a different issue in my clipboard 😓 |
I'd like to prioritize this if possible and find a proper solution for this case. I've bitten multiple times by this and I'm jumping into the Django shell and connecting the |
@humitos added this to Q3, so it doesn't get lost 👍 I agree this is important, and we can probably add it to an upcoming sprint. |
OK, I'm back to this issue and some things have changed/fixed already. I will write down a small proposal to be able to move forward with this with the minor UI / UX changes. We could create a better UI with the new templates. My proposal is as follows:
import requests
def get_final_url(project):
final_url = None
url = project.repo
if '@' in url:
url = convert_to_https(url)
if url.startswith('http'):
try:
reponse = requests.head(project.repo, allow_redirects=True)
final_url = response.url
except requests.exceptions.TooManyRedirects:
pass
return final_url
def reconnect_remote_repository(project, user):
final_url = get_final_url(project)
if final_url:
remote_repository = RemoteRepository.objects.filter(
http_url=final_url,
remote_repository_relations__user=user,
remote_repository_relations__admin=True,
).first()
if remote_repository:
project.remote_repository = remote_repository
project.save() Footnotes |
This makes sense to me. It seems like there should be a better way to store and query this information, so we don't need translate the format in such weird ways, but that's a future-looking improvement. |
I thought a little more about this (while working on the PR: #9381) and I realize that we may be able to perform this action automatically instead of asking the users to do it. I mean, if we are going to allow only one possibility to connect the project and it will always match the repository URL: we can do it by ourselves without the user interaction, right? What are the cases where doing this automatically does not make sense?
It seems we could iterate over the |
We could update the
|
Not a terrible idea, but I do worry about overwriting the users data. Perhaps having another field like
I like this idea. The more we can do to stop things getting out of sync automatically the better in my mind. |
I created a script to test the automated way and I think it could work fine: https://gist.github.com/humitos/414c8ff7a4c16776d23e108aef184ce8 These are some of the projects that it will connect (output from the script):
All of them look accurate to me. Since our This solves the problem of re-connecting all the Project to RemoteRepository automatically. However, this does not solve the problem of a "Manually imported project that a user wants to connect later". This brings me the question: do we want to expose this to users? Why? Why they should be aware of this connection? What's the UX we want to build around this? Note that most of our projects do not have their # Projects with remote repository connected
In [17]: (
...: Project.objects.annotate(spam_score=Sum("spam_rules__value"))
...: .filter(
...: spam_score__lt=settings.RTD_SPAM_THRESHOLD_DENY_ON_ROBOTS,
...: remote_repository__isnull=False,
...: )
...: .count()
...: )
Out[17]: 4834
# Project without remote repository
In [18]: (
...: Project.objects.annotate(spam_score=Sum("spam_rules__value"))
...: .filter(
...: spam_score__lt=settings.RTD_SPAM_THRESHOLD_DENY_ON_ROBOTS,
...: remote_repository__isnull=True,
...: )
...: .count()
...: )
Out[18]: 117026 |
I think we likely want to resync all users & re-connect the RemoteRepo. If we can keep this data in a mostly working order, it gives us a lot of interesting paths forward, and also is a prerequisite for doing widescale SSO on .org
In general we should not expose this to the user. It's an implementation detail for sure. If we need to expose something like this to the user, it should almost certainly be "You need to connect your GitHub repository so that we can perform this operation" with the RemoteRepo creation happening automatically if it's already connected. |
I understand you wanted to say "You need to connect your GitHub account so that we can perform this operation". If that's the case, I agree. |
Cool! I will start by re-syncing all the users' accounts first. My idea is to spin up a new
Once we have all this data re-synced, I will come back to my script that shows what are going to be the connections, review it again and run it after some QA. |
Sounds good. Excited to be moving forward on this! 🚀 |
Now that we have all the user accounts synced, I updated the script and did another run as a test to show what are the "Project <-> RemoteRepository" will be linked automatically. Here are the results for 500 projects: https://onenr.io/0LwGL0yBvR6. Note that from 500 repositories checked, we found matches only for 37. I may run a higher sample of projects, but so far it seems accurate and safe to do for me. |
This script won't solve the problem of auto-connect them on these situations:
Probably, these are not all the situations and there are more, but at least these are good to start talking and thinking about the approach we want to follow for auto-connect. I'm not sure we will be able to cover all the possible cases (unless we completely disable "Manual Import") but I'd be happy reducing the "unlinking" as much as we can. |
Agreed. At least getting 80% of the projects in the easy cases is a huge win. The edge cases will always be weird/broken in some way. |
Trigger a daily task to compare user's last login isoweekday with today's isoweekday for all active users. If they matches, we resync the `RemoteRepository` for this user. This logic is the same as "resync `RemoteRepository` once a week per each user". We consider active users those that have logged in at least once in the last 90 days. Related: #8229 Related: #9409
Trigger a daily task to compare user's last login isoweekday with today's isoweekday for all active users. If they matches, we resync the `RemoteRepository` for this user. This logic is the same as "resync `RemoteRepository` once a week per each user". We consider active users those that have logged in at least once in the last 90 days. Related: #8229 Related: #9409
* OAuth: resync `RemoteRepository` weekly for active users Trigger a daily task to compare user's last login isoweekday with today's isoweekday for all active users. If they matches, we resync the `RemoteRepository` for this user. This logic is the same as "resync `RemoteRepository` once a week per each user". We consider active users those that have logged in at least once in the last 90 days. Related: #8229 Related: #9409 * Crontab schedule Co-authored-by: Eric Holscher <[email protected]> * OAuth: re-sync `RemoteRepository` in 1 Celery process only Weekly resync will use only one Celery process to avoid backing up our queue. Co-authored-by: Eric Holscher <[email protected]>
The script has finished on community and we ended up with On commercial, I triggered a dryrun of the script and we have around ~250 projects that we auto-reconnect. So, I think it's worth doing it as well (see the progress in https://onenr.io/0oR809B9XRG) I think the only missing piece from this task is handling the situations mentioned in #8229 (comment), in an attempt to auto-connect them on manual import or account reconnect when finding an exact match. |
Marking this issue as done. I created #9437 to track the missing parts of it. Let me know if you have something else in mind that I should check/do, tho. |
With the work done in Read the Docs for Business about VCS Auth (see https://docs.readthedocs.io/en/stable/commercial/single-sign-on.html#sso-with-vcs-provider-github-bitbucket-or-gitlab) where all the permissions are based on
Project
s being linked/related toRemoteRepository
and with the work that we are doing to migrate this feature + organization to Read the Docs Community, it will be almost mandatory to support a way for users to "(Re) Connect a Project to a RemoteRepository" to be able to use the VCS Auth feature.There are some things to consider here:
Project.repo
(e.g.https://github.com/rtfd/sphinx-notfound-page.git
) as a sourceRemoteRepository
has fields:clone_url
,ssh_url
andhtml_url
we can compare againstrtfd/sphinx-notfound-page
and then moved toreadthedocs/sphinx-notfound-page
/
.git
We will need to manipulate a little the URL to handle these cases:
/
.git
Then we can query our database to find out the correct
RemoteRepository
:Also, note that we currently have an(seeOneToOneField
relation betweenProject - RemoteRepository
readthedocs.org/readthedocs/oauth/models.py
Lines 127 to 133 in 9ba4fb1
This has to be changed to aForeignKey
on theProject
model so we can allow the sameRemoteRepository
connected to multipleProject
s. Otherwise, a non-owner of a GH repository could manually import the official GH repository under a different slug making the official one be disconnected from theRemoteRepository
.The text was updated successfully, but these errors were encountered: