-
-
Notifications
You must be signed in to change notification settings - Fork 3.6k
Search: stop relying on the DB when indexing #10623
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
This is good to me 👍🏼 I'd like to see some data about:
|
@humitos times would be similar to the |
That is |
- Removed the "wipe" actions from the admin instead of porting them, since I'm not sure that we need an action in the admin just to delete the search index of a project. Re-index seems useful. - `fileify` was replaced by `index_build`, and it only requires the build id to be passed, any other information can be retrieved from the build/version object. - `fileify` isn't removed in this PR to avoid downtimes during deploy, it's safe to keep it around till next deploy. - New code is avoiding any deep connection to the django-elasticsearch-dsl package, since it doesn't make sense anymore to have it, and I'm planning on removing it. - We are no longer tracking all files in the DB, only the ones of interest. - Re-indexing a version will also re-evaluate the files from the DB, useful for old projects that are out of sync. - The reindex command now generates taks per-version rather than per-collection of files, since we no longer track all files in the DB. - Closes #10623 - Closes #10690 We don't need to do anything special during deploy, zero downtime out of the box. We can trigger a re-index for all versions if we want to delete the HTML files that we don't need from the DB, but that operation will also re-index their contents in ES, so probably better do that after we are all settled with any changes to ES.
What's the problem this feature will solve?
Currently, we are keeping track of all HTML files (this is one of our largest table). We do this mainly for re-indexing (we will be relying on these models for handling 404s, but we don't need to keep track of all files for that).
Describe the solution you'd like
Instead of relying on the DB, walk the storage. And we can get the search ignore/ranking patterns from the config of the build object attached to the version.
readthedocs.org/readthedocs/builds/models.py
Lines 310 to 327 in 84f889a
We would still need to create HTMLFile/ImportedFile models, but only the ones that are needed for our 404 handler (
**/index.html
,404.html
, androbots.txt
)Alternative solutions
None
Additional context
ref #10512
The text was updated successfully, but these errors were encountered: