Skip to content

Make sure search indexing is ready for production #4333

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
ericholscher opened this issue Jul 6, 2018 · 1 comment
Closed

Make sure search indexing is ready for production #4333

ericholscher opened this issue Jul 6, 2018 · 1 comment
Assignees
Labels
Accepted Accepted issue on our roadmap

Comments

@ericholscher
Copy link
Member

Currently our search re-indexing code is single threaded. In the upgrade-es branch I did a while back, I made it multi-threaded so that we could index things much faster.

You can see that code here: https://github.com/rtfd/readthedocs.org/blob/upgrade-es/readthedocs/core/management/commands/reindex_elasticsearch.py#L56

We need to make sure the approach that we're using to reindex ES will hold up when we run it in production. In particular, we should:

  • Make sure we're using iterator() on the queryset, otherwise we will likely run out of memory
  • Find a way to "chunk" the ImportedFile indexing, perhaps by project? So we can send "reindex Project x" tasks to each worker, and that will be how we multi-thread it.

We should do some testing with ~10,000 documents or so, and see how it will scale into our production workload of around 1.5 million.

@safwanrahman safwanrahman self-assigned this Jul 6, 2018
@safwanrahman safwanrahman added this to the Search improvements milestone Jul 6, 2018
safwanrahman added a commit to safwanrahman/readthedocs.org that referenced this issue Jul 13, 2018
@safwanrahman safwanrahman added the Accepted Accepted issue on our roadmap label Jul 13, 2018
safwanrahman added a commit to safwanrahman/readthedocs.org that referenced this issue Jul 16, 2018
safwanrahman added a commit to safwanrahman/readthedocs.org that referenced this issue Jul 16, 2018
ericholscher added a commit that referenced this issue Jul 31, 2018
[Fix  #4333] Implement asynchronous search reindex functionality using celery
@safwanrahman
Copy link
Member

Fixed by #4368

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Accepted Accepted issue on our roadmap
Projects
None yet
Development

No branches or pull requests

2 participants