Skip to content

OAuth: resync RemoteRepository weekly for active users #9410

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 4 commits into from
Jul 11, 2022

Conversation

humitos
Copy link
Member

@humitos humitos commented Jul 7, 2022

Trigger a daily task to compare user's last login isoweekday with today's
isoweekday for all active users. If they matches, we resync the
RemoteRepository for this user.

This logic is the same as "resync RemoteRepository once a week per each user".

We consider active users those that have logged in at least once in the last 90
days.

Related: #8229
Related: #9409

@humitos humitos requested a review from a team as a code owner July 7, 2022 10:29
@humitos humitos requested a review from ericholscher July 7, 2022 10:29
Trigger a daily task to compare user's last login isoweekday with today's
isoweekday for all active users. If they matches, we resync the
`RemoteRepository` for this user.

This logic is the same as "resync `RemoteRepository` once a week per each user".

We consider active users those that have logged in at least once in the last 90
days.

Related: #8229
Related: #9409
@humitos humitos force-pushed the humitos/resync-remoterepository-weekly branch from 1454e27 to d76f145 Compare July 7, 2022 10:44
@humitos
Copy link
Member Author

humitos commented Jul 7, 2022

Note that this is about 2k users to resync per day, which I don't think it's terrible, but it's definitely a lot more than what we are doing currently:

In [75]: Counter(
    ...:     User.objects.annotate(weekday=ExtractIsoWeekDay("last_login"))
    ...:     .filter(last_login__gt=three_months_ago, socialaccount__isnull=False)
    ...:     .order_by("weekday")
    ...:     .values_list("weekday", flat=True)
    ...: )
Out[75]: Counter({1: 1749, 2: 1873, 3: 1927, 4: 1749, 5: 1603, 6: 1025, 7: 942})

This may stress our web-celery ASG a little. We could split those 2ks into 10 and trigger ~200 tasks every hour instead of 2k at the same time. Does it worth trying this as-is and splitting it later if we found that's too many tasks at once?

There are some improvements to do here too:

  • ignore GH organization with plenty of repositories (see GitHub OAuth: skip conda-forge organization when syncing #8979)
  • mark users with revoked permissions for our app to not resync until they reconnect
  • migrate to GH Application that allows us to receive webhooks instead (this migration is not trivial, as we found out)

Copy link
Member

@ericholscher ericholscher left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks like a good approach, but I'd much rather tie up 1 celery process and take a long time vs. totally block celery.

humitos and others added 3 commits July 11, 2022 12:37
Co-authored-by: Eric Holscher <[email protected]>
Weekly resync will use only one Celery process to avoid backing up our queue.
@humitos humitos force-pushed the humitos/resync-remoterepository-weekly branch from 00dbbbd to d26eb93 Compare July 11, 2022 10:49
@humitos humitos enabled auto-merge (squash) July 11, 2022 10:50
@humitos
Copy link
Member Author

humitos commented Jul 11, 2022

There is an unrelated problem with the docs,

/home/circleci/project/docs/user/guides/jupyter.rst:306: WARNING: unknown document: poliastro:gallery

@humitos humitos disabled auto-merge July 11, 2022 13:40
@humitos humitos merged commit 807e29c into main Jul 11, 2022
@humitos humitos deleted the humitos/resync-remoterepository-weekly branch July 11, 2022 13:40
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
Archived in project
Development

Successfully merging this pull request may close these issues.

2 participants