Skip to content

Commit cd4535e

Browse files
authored
DB: do not fetch data and others when deleting rows (#10446)
* DB: do not fetch `data` and others when deleting rows This task was cancelled again. The query shows a SELECT first that fetchs the whole rows. I think we can reduce this time/memory by only fetching the ids. * DB: only fetch "id" when deleting rows * DB: clean up old data using raw SQL from Django We are facing an issue with this query because it takes too long to execute (more than 30s) making our DB to kill the query. This is because Django performs a `SELECT` first to be able to trigger pre_ and post_ delete signals on each object delete. We don't really need this here, so we are using raw SQL to bypass this and make the query to execute faster. This is not ideal, but we didn't find a better approach. * DB: there is no results to fetch The query is executed without requiring `.fetchall()`
1 parent 6d2f858 commit cd4535e

File tree

2 files changed

+27
-9
lines changed

2 files changed

+27
-9
lines changed

readthedocs/analytics/tasks.py

+14-5
Original file line numberDiff line numberDiff line change
@@ -1,12 +1,12 @@
11
"""Tasks for Read the Docs' analytics."""
22

33
from django.conf import settings
4+
from django.db import connection
45
from django.utils import timezone
56

67
import readthedocs
78
from readthedocs.worker import app
89

9-
from .models import PageView
1010
from .utils import send_to_analytics
1111

1212
DEFAULT_PARAMETERS = {
@@ -80,7 +80,16 @@ def delete_old_page_counts():
8080
"""
8181
retention_days = settings.RTD_ANALYTICS_DEFAULT_RETENTION_DAYS
8282
days_ago = timezone.now().date() - timezone.timedelta(days=retention_days)
83-
return PageView.objects.filter(
84-
date__lt=days_ago,
85-
date__gt=days_ago - timezone.timedelta(days=90),
86-
).delete()
83+
84+
# NOTE: we are using raw SQL here to avoid Django doing a SELECT first to
85+
# send `pre_` and `post_` delete signals
86+
# See https://docs.djangoproject.com/en/4.2/ref/models/querysets/#delete
87+
with connection.cursor() as cursor:
88+
cursor.execute(
89+
# "SELECT COUNT(*) FROM analytics_pageview WHERE date BETWEEN %s AND %s",
90+
"DELETE FROM analytics_pageview WHERE date BETWEEN %s AND %s",
91+
[
92+
days_ago - timezone.timedelta(days=90),
93+
days_ago,
94+
],
95+
)

readthedocs/telemetry/tasks.py

+13-4
Original file line numberDiff line numberDiff line change
@@ -1,6 +1,7 @@
11
"""Tasks related to telemetry."""
22

33
from django.conf import settings
4+
from django.db import connections
45
from django.utils import timezone
56

67
from readthedocs.builds.models import Build
@@ -33,7 +34,15 @@ def delete_old_build_data():
3334
"""
3435
retention_days = settings.RTD_TELEMETRY_DATA_RETENTION_DAYS
3536
days_ago = timezone.now().date() - timezone.timedelta(days=retention_days)
36-
return BuildData.objects.filter(
37-
created__lt=days_ago,
38-
created__gt=days_ago - timezone.timedelta(days=90),
39-
).delete()
37+
# NOTE: we are using raw SQL here to avoid Django doing a SELECT first to
38+
# send `pre_` and `post_` delete signals
39+
# See https://docs.djangoproject.com/en/4.2/ref/models/querysets/#delete
40+
with connections["telemetry"].cursor() as cursor:
41+
cursor.execute(
42+
# "SELECT COUNT(*) FROM telemetry_builddata WHERE created BETWEEN %s AND %s",
43+
"DELETE FROM telemetry_builddata WHERE created BETWEEN %s AND %s",
44+
[
45+
days_ago - timezone.timedelta(days=90),
46+
days_ago,
47+
],
48+
)

0 commit comments

Comments
 (0)