Skip to content

GitHub OAuth: skip conda-forge organization when syncing #8979

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
humitos opened this issue Mar 1, 2022 · 4 comments
Open

GitHub OAuth: skip conda-forge organization when syncing #8979

humitos opened this issue Mar 1, 2022 · 4 comments
Assignees
Labels
Accepted Accepted issue on our roadmap Improvement Minor improvement to code

Comments

@humitos
Copy link
Member

humitos commented Mar 1, 2022

conda-forge on GitHub has a lot of repositories and that produces different issues on our side. The most recent one was that our Celery task was finishing due to timeout and the sync wasn't performed.

Since nobody (?) imports projects from conda-forge into Read the Docs, we thought it could be a good idea to skip it from the synchronization. This will reduce the time required considerably and also the database's size.

Related #8974

@humitos humitos added Improvement Minor improvement to code Accepted Accepted issue on our roadmap labels Mar 1, 2022
@astrojuanlu
Copy link
Contributor

astrojuanlu commented Mar 1, 2022

Not even GitHub can render them:

IMG_20220301_122428

However, instead of making a special case for conda-forge, perhaps it would make more sense to disable orgs with, say, > 1000 repositories?

@humitos
Copy link
Member Author

humitos commented Mar 1, 2022

That makes sense to me 👍. I'll see if we have that data on the first request and make the decision there before start consuming the resulting pages

@humitos humitos self-assigned this Mar 3, 2022
@humitos
Copy link
Member Author

humitos commented Mar 16, 2022

I took a quick look at this today and I don't think it's possible to implement it without a considerable refactor.

  1. we are using this GitHub endpoint now https://docs.github.com/en/rest/reference/repos#list-repositories-for-the-authenticated-user which lists all the repositories for the logged-in user. I tried asking only for ?affiliation=owner to avoid fetching all the repositories for all the organizations the user belongs to, but it fetches repositories the user owns under the organizations as well --so I'm not sure that will solve the issue
  2. we would need to check the links HTTP response header after fetching the first page to calculate how many pages are available to calculate how many repositories the organization has
  3. we would need to raise an exception to stop the pagination at
    return response.links.get('next', {}).get('url')
    and skip the organization

Even making those changes, I'm not 100% sure that this will work 😄 . Ideally, we would like to improve step 1) as much as possible but I didn't find a way to do that yet.

humitos added a commit that referenced this issue Mar 16, 2022
Currently, we are using the default pagination that GitHub offers, which is 30.
This commit increases the page size to 100 when fetching organization for a user
and repositories for each organizations.

Note we are already using 100 page size when fetching repositories for a user.

Related to #8979
@humitos
Copy link
Member Author

humitos commented Mar 16, 2022

I opened #9020 to increase the page size to 100 when dealing with organizations and their repositories. This is not the best fix, but at least it will reduce the number of requests required.

humitos added a commit that referenced this issue Mar 16, 2022
Currently, we are using the default pagination that GitHub offers, which is 30.
This commit increases the page size to 100 when fetching organization for a user
and repositories for each organizations.

Note we are already using 100 page size when fetching repositories for a user.

Related to #8979
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Accepted Accepted issue on our roadmap Improvement Minor improvement to code
Projects
None yet
Development

No branches or pull requests

2 participants