You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Currently our search re-indexing code is single threaded. In the upgrade-es branch I did a while back, I made it multi-threaded so that we could index things much faster.
We need to make sure the approach that we're using to reindex ES will hold up when we run it in production. In particular, we should:
Make sure we're using iterator() on the queryset, otherwise we will likely run out of memory
Find a way to "chunk" the ImportedFile indexing, perhaps by project? So we can send "reindex Project x" tasks to each worker, and that will be how we multi-thread it.
We should do some testing with ~10,000 documents or so, and see how it will scale into our production workload of around 1.5 million.
The text was updated successfully, but these errors were encountered:
Currently our search re-indexing code is single threaded. In the
upgrade-es
branch I did a while back, I made it multi-threaded so that we could index things much faster.You can see that code here: https://github.com/rtfd/readthedocs.org/blob/upgrade-es/readthedocs/core/management/commands/reindex_elasticsearch.py#L56
We need to make sure the approach that we're using to reindex ES will hold up when we run it in production. In particular, we should:
iterator()
on the queryset, otherwise we will likely run out of memoryWe should do some testing with ~10,000 documents or so, and see how it will scale into our production workload of around 1.5 million.
The text was updated successfully, but these errors were encountered: