Skip to content

Security logs: delete old user security logs #8620

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 9 commits into from
Aug 31, 2022
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
30 changes: 30 additions & 0 deletions readthedocs/audit/tasks.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,30 @@
"""Celery tasks related to audit logs."""

import structlog
from django.conf import settings
from django.utils import timezone

from readthedocs.audit.models import AuditLog
from readthedocs.worker import app

log = structlog.get_logger(__name__)


@app.task(queue="web")
def delete_old_personal_audit_logs(days=None):
"""
Delete personal security logs older than `days`.

If `days` isn't given, default to ``RTD_AUDITLOGS_DEFAULT_RETENTION_DAYS``.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Shouldn't we keep audit logs longer for users belonging to organizations with bigger plans?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The logs we are deleting are related to actions of the user outside the organization, especially auth and auth failures to the dashboard. This type of information isn't visible to the organization owners currently.

We could show this type of activity to the org owners (just of members of the org), but those are outside the organization, so not sure if that makes sense to show to org owners since it isn't related to activity done on the organization.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it's important to keep this data in the database. It could be useful for organization's owners, in particular, if they think they are being attacked and trying to log in with different accounts from their employees to gain access to unauthorized documentation.

Also, if the organization wants to audit when they users logging for the last time or similar, it's data they should have access to.

So, I'd personally tie these retention days to the same day of the plan the organization is paying for.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yea, I'd be 👍 on using the organization plan here, so users get the benefit of the organization's longer subscription time. I think this makes the most sense.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have overridden this task on .com, since there is code there that I'm using, but I'll move that code to .org soon. This PR still needs to be merged, it does the deletion for logs from .org where we don't have organizations, the other PR on .com overrides this.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Cool! I think we should have all the code in just one project (here) and check for RTD_ALLOW_ORGANIZATION to decide what's the logic to run on each case.


We delete logs that aren't related to an organization,
there are tasks in .com to delete those according to their plan.
"""
days = days or settings.RTD_AUDITLOGS_DEFAULT_RETENTION_DAYS
days_ago = timezone.now() - timezone.timedelta(days=days)
audit_logs = AuditLog.objects.filter(
log_organization_id__isnull=True,
created__lt=days_ago,
)
log.info("Deleting old audit logs.", days=days, count=audit_logs.count())
audit_logs.delete()
115 changes: 115 additions & 0 deletions readthedocs/audit/tests/test_tasks.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,115 @@
from unittest import mock

from django.contrib.auth.models import User
from django.test import TestCase
from django.utils import timezone
from django_dynamic_fixture import get

from readthedocs.audit.models import AuditLog
from readthedocs.audit.tasks import delete_old_personal_audit_logs
from readthedocs.organizations.models import Organization
from readthedocs.projects.models import Project


class AuditTasksTest(TestCase):
def setUp(self):
self.user = get(User)
self.project = get(
Project,
slug="project",
)
self.organization = get(
Organization,
owners=[self.user],
name="testorg",
)
self.organization.projects.add(self.project)

self.another_user = get(User)
self.another_project = get(
Project,
slug="another-project",
users=[self.user],
)

@mock.patch("django.utils.timezone.now")
def test_delete_old_personal_audit_logs(self, now_mock):
now_mock.return_value = timezone.datetime(
year=2021,
month=5,
day=5,
)
newer_date = timezone.datetime(
year=2021,
month=4,
day=30,
)
middle_date = timezone.datetime(
year=2021,
month=4,
day=5,
)
old_date = timezone.datetime(
year=2021,
month=3,
day=20,
)
for date in [newer_date, middle_date, old_date]:
for user in [self.user, self.another_user]:
# Log attached to a project and organization.
get(
AuditLog,
user=user,
project=self.project,
created=date,
action=AuditLog.PAGEVIEW,
)
# Log attached to a project only.
get(
AuditLog,
user=user,
project=self.another_project,
created=date,
action=AuditLog.PAGEVIEW,
)

# Log attached to the user only.
get(
AuditLog,
user=user,
created=date,
action=AuditLog.AUTHN,
)

# Log without a user.
get(
AuditLog,
created=date,
action=AuditLog.AUTHN_FAILURE,
)

# Log with a organization, and without a user.
get(
AuditLog,
project=self.project,
created=date,
action=AuditLog.AUTHN_FAILURE,
)

self.assertEqual(AuditLog.objects.all().count(), 24)

# We don't have logs older than 90 days.
delete_old_personal_audit_logs(days=90)
self.assertEqual(AuditLog.objects.all().count(), 24)

# Only 5 logs can be deteled.
delete_old_personal_audit_logs(days=30)
self.assertEqual(AuditLog.objects.all().count(), 19)

# Only 5 logs can be deteled.
delete_old_personal_audit_logs(days=10)
self.assertEqual(AuditLog.objects.all().count(), 14)

# Only 5 logs can be deteled.
delete_old_personal_audit_logs(days=1)
self.assertEqual(AuditLog.objects.all().count(), 9)
5 changes: 5 additions & 0 deletions readthedocs/settings/base.py
Original file line number Diff line number Diff line change
Expand Up @@ -435,6 +435,11 @@ def TEMPLATES(self):
'schedule': crontab(minute=0, hour=2),
'options': {'queue': 'web'},
},
'weekly-delete-old-personal-audit-logs': {
'task': 'readthedocs.audit.tasks.delete_old_personal_audit_logs',
'schedule': crontab(day_of_week='wednesday', minute=0, hour=7),
'options': {'queue': 'web'},
},
'every-day-resync-sso-organization-users': {
'task': 'readthedocs.oauth.tasks.sync_remote_repositories_organizations',
'schedule': crontab(minute=0, hour=4),
Expand Down