-
-
Notifications
You must be signed in to change notification settings - Fork 3.6k
Proxito: Don't hit storage for 404s #10617
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
We have all the information we need in the DB already. Closes #10512
c4e0d06
to
9a1bd6c
Compare
Oh, we can't use the DB for that yet, we aren't tracking .txt files, only .html files. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good to me. However, I wasn't sure if this PR was ready to review or not since it seems there are some work to do here still.
Oh, we can't use the DB for that yet, we aren't tracking .txt files, only .html files.
If you are not going to implement this on this PR, please create an issue referencing to this one to track this work.
available_404_files = list( | ||
HTMLFile.objects.filter( | ||
version__in=versions_404, path__in=tryfiles | ||
).values_list("version__slug", "path") | ||
) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we know how performant is this query currently? I understand this table is pretty huge right now and we are thinking about shrinking it. After that, I suppose it will improve this time a lot.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Also, do we have an issue to track this work? If not, we should create one so we reduce the data we save in this table.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In [22]: p = Project.objects.get(slug='docs')
In [23]: stable = p.versions.get(slug='stable')
In [24]: latest = p.versions.get(slug='latest')
In [25]: tryfiles = ['404.html', '404/index.html', 'intro/getting-started-with-mkdocs.html', 'intro/getting-started-foo/index.html']
In [26]: HTMLFile.objects.filter(version__in=[stable, latest], path__in=tryfiles).values_list('version__slug', 'path')
{'sql': 'SELECT "builds_version"."slug", "projects_importedfile"."path" FROM "projects_importedfile" INNER JOIN "builds
_version" ON ("projects_importedfile"."version_id" = "builds_version"."id") WHERE ("projects_importedfile"."name"::text
LIKE \'%.html\' AND "projects_importedfile"."path" IN (\'404.html\', \'404/index.html\', \'intro/getting-started-with-
mkdocs.html\', \'intro/getting-started-foo/index.html\') AND "projects_importedfile"."version_id" IN (2604018, 2603893)
) LIMIT 21',
'time': '0.005'}
This is a query with 4 files and two versions, it takes between 0.001 and 0.005. My only worry is that we will have an increase in queries to the DB, but probably isn't that bad, since these requests will be cached at the CDN level.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Great, thanks!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This query can be faster based on a DB cache if it was run multiple times, so we'll have to check how fast it averages our in production, but it'll definitely be faster than hitting storage!
I'm happy to move forward with this 👍🏼 |
We have all the information we need in the DB already.
There is one call to storage left, done by the robots.txt view, but I'm thinking of leaving that one out while we test this new way of handling files? I can also include it here, isn't a big change.
I also did a quick test on production checking connection.queries and .explain(), queries seem to be fast (0.001)
Closes #10512