Proxito: Don't hit storage for 404s #10617

stsewd · 2023-08-09T00:15:40Z

We have all the information we need in the DB already.

There is one call to storage left, done by the robots.txt view, but I'm thinking of leaving that one out while we test this new way of handling files? I can also include it here, isn't a big change.

I also did a quick test on production checking connection.queries and .explain(), queries seem to be fast (0.001)

Closes #10512

We have all the information we need in the DB already. Closes #10512

stsewd · 2023-08-10T18:51:48Z

There is one call to storage left, done by the robots.txt view, but I'm thinking of leaving that one out while we test this new way of handling files? I can also include it here, isn't a big change.

Oh, we can't use the DB for that yet, we aren't tracking .txt files, only .html files.

humitos

Looks good to me. However, I wasn't sure if this PR was ready to review or not since it seems there are some work to do here still.

Oh, we can't use the DB for that yet, we aren't tracking .txt files, only .html files.

If you are not going to implement this on this PR, please create an issue referencing to this one to track this work.

humitos · 2023-08-21T14:32:10Z

readthedocs/proxito/views/serve.py

+        available_404_files = list(
+            HTMLFile.objects.filter(
+                version__in=versions_404, path__in=tryfiles
+            ).values_list("version__slug", "path")
+        )


Do we know how performant is this query currently? I understand this table is pretty huge right now and we are thinking about shrinking it. After that, I suppose it will improve this time a lot.

Also, do we have an issue to track this work? If not, we should create one so we reduce the data we save in this table.

In [22]: p = Project.objects.get(slug='docs') In [23]: stable = p.versions.get(slug='stable') In [24]: latest = p.versions.get(slug='latest') In [25]: tryfiles = ['404.html', '404/index.html', 'intro/getting-started-with-mkdocs.html', 'intro/getting-started-foo/index.html'] In [26]: HTMLFile.objects.filter(version__in=[stable, latest], path__in=tryfiles).values_list('version__slug', 'path') {'sql': 'SELECT "builds_version"."slug", "projects_importedfile"."path" FROM "projects_importedfile" INNER JOIN "builds _version" ON ("projects_importedfile"."version_id" = "builds_version"."id") WHERE ("projects_importedfile"."name"::text LIKE \'%.html\' AND "projects_importedfile"."path" IN (\'404.html\', \'404/index.html\', \'intro/getting-started-with- mkdocs.html\', \'intro/getting-started-foo/index.html\') AND "projects_importedfile"."version_id" IN (2604018, 2603893) ) LIMIT 21', 'time': '0.005'}

This is a query with 4 files and two versions, it takes between 0.001 and 0.005. My only worry is that we will have an increase in queries to the DB, but probably isn't that bad, since these requests will be cached at the CDN level.

Also, do we have an issue to track this work? If not, we should create one so we reduce the data we save in this table.

#10623

If you are not going to implement this on this PR, please create an issue referencing to this one to track this work.

#10659

Great, thanks!

This query can be faster based on a DB cache if it was run multiple times, so we'll have to check how fast it averages our in production, but it'll definitely be faster than hitting storage!

humitos · 2023-08-23T11:47:55Z

I'm happy to move forward with this 👍🏼

Proxito: Don't hit storage for 404s

9a1bd6c

We have all the information we need in the DB already. Closes #10512

stsewd force-pushed the use-db-instead-of-storage branch from c4e0d06 to 9a1bd6c Compare August 9, 2023 18:37

stsewd added 2 commits August 10, 2023 10:30

Fix tests

7866282

Remove todo

bffa002

stsewd marked this pull request as ready for review August 10, 2023 16:58

stsewd requested a review from a team as a code owner August 10, 2023 16:58

stsewd requested a review from humitos August 10, 2023 16:58

auto-assign bot assigned stsewd Aug 10, 2023

humitos approved these changes Aug 21, 2023

View reviewed changes

stsewd and others added 3 commits August 22, 2023 17:45

Merge branch 'main' into use-db-instead-of-storage

957083e

Merge branch 'main' into use-db-instead-of-storage

e97d6fa

Merge branch 'main' into use-db-instead-of-storage

b2f7803

stsewd merged commit dcfc862 into main Aug 23, 2023

stsewd deleted the use-db-instead-of-storage branch August 23, 2023 15:22

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Proxito: Don't hit storage for 404s #10617

Proxito: Don't hit storage for 404s #10617

stsewd commented Aug 9, 2023 •

edited

Loading

stsewd commented Aug 10, 2023

humitos left a comment

humitos Aug 21, 2023

humitos Aug 21, 2023

stsewd Aug 22, 2023

stsewd Aug 22, 2023

humitos Aug 23, 2023

ericholscher Aug 23, 2023

humitos commented Aug 23, 2023

Proxito: Don't hit storage for 404s #10617

Proxito: Don't hit storage for 404s #10617

Conversation

stsewd commented Aug 9, 2023 • edited Loading

stsewd commented Aug 10, 2023

humitos left a comment

Choose a reason for hiding this comment

humitos Aug 21, 2023

Choose a reason for hiding this comment

humitos Aug 21, 2023

Choose a reason for hiding this comment

stsewd Aug 22, 2023

Choose a reason for hiding this comment

stsewd Aug 22, 2023

Choose a reason for hiding this comment

humitos Aug 23, 2023

Choose a reason for hiding this comment

ericholscher Aug 23, 2023

Choose a reason for hiding this comment

humitos commented Aug 23, 2023

stsewd commented Aug 9, 2023 •

edited

Loading