Make sure search indexing is ready for production #4333

ericholscher · 2018-07-06T13:27:32Z

Currently our search re-indexing code is single threaded. In the upgrade-es branch I did a while back, I made it multi-threaded so that we could index things much faster.

You can see that code here: https://github.com/rtfd/readthedocs.org/blob/upgrade-es/readthedocs/core/management/commands/reindex_elasticsearch.py#L56

We need to make sure the approach that we're using to reindex ES will hold up when we run it in production. In particular, we should:

Make sure we're using iterator() on the queryset, otherwise we will likely run out of memory
Find a way to "chunk" the ImportedFile indexing, perhaps by project? So we can send "reindex Project x" tasks to each worker, and that will be how we multi-thread it.

We should do some testing with ~10,000 documents or so, and see how it will scale into our production workload of around 1.5 million.

The text was updated successfully, but these errors were encountered:

…ality using celery

… celery

[Fix #4333] Implement asynchronous search reindex functionality using celery

safwanrahman · 2018-08-11T22:53:48Z

Fixed by #4368

safwanrahman self-assigned this Jul 6, 2018

safwanrahman added this to the Search improvements milestone Jul 6, 2018

safwanrahman added a commit to safwanrahman/readthedocs.org that referenced this issue Jul 13, 2018

[Fix readthedocs#4333] Implement asynchronous search reindex function…

9a07177

…ality using celery

safwanrahman mentioned this issue Jul 13, 2018

[Fix #4333] Implement asynchronous search reindex functionality using celery #4368

Merged

safwanrahman added the Accepted Accepted issue on our roadmap label Jul 13, 2018

safwanrahman added a commit to safwanrahman/readthedocs.org that referenced this issue Jul 16, 2018

[Fix readthedocs#4333] Implement asynchronous search reindex function…

d61d0bf

…ality using celery

safwanrahman added a commit to safwanrahman/readthedocs.org that referenced this issue Jul 16, 2018

[Fix readthedocs#4333] Implement asynchronous search reindex function…

3cb3afb

…ality using celery

safwanrahman added a commit that referenced this issue Jul 16, 2018

[Fix #4333] Implement asynchronous search reindex functionality using…

8fc3b65

… celery

ericholscher added a commit that referenced this issue Jul 31, 2018

Merge pull request #4368 from safwanrahman/comman

463f9e2

[Fix #4333] Implement asynchronous search reindex functionality using celery

safwanrahman closed this as completed Aug 11, 2018

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

Make sure search indexing is ready for production #4333

Make sure search indexing is ready for production #4333

ericholscher commented Jul 6, 2018

safwanrahman commented Aug 11, 2018

Uh oh!

Uh oh!

Make sure search indexing is ready for production #4333

Make sure search indexing is ready for production #4333

Comments

ericholscher commented Jul 6, 2018

safwanrahman commented Aug 11, 2018

Uh oh!