Indexing speedup #5939

dojutsu-user · 2019-07-16T17:55:53Z

No description provided.

stsewd · 2019-07-16T18:00:32Z

readthedocs/search/management/commands/reindex_elasticsearch.py

+
+            try:
+                for _ in range(chunk_size):
+                    objects_id.append(qs_iterator.__next__().pk)


You can just call next(qs_iterator).pk

davidfischer · 2019-07-16T18:00:49Z

readthedocs/search/management/commands/reindex_elasticsearch.py

+
+            try:
+                for _ in range(chunk_size):
+                    objects_id.append(qs_iterator.__next__().pk)


This section could probably be simplified by using next() with a default value.

What should be the default?

I think default is not required, current logic required to catch the exception.

stsewd · 2019-07-16T18:06:32Z

readthedocs/search/management/commands/reindex_elasticsearch.py

+            'index_name': index_name,
+        }
+
+        while not is_iterator_empty:


Actually you can just use islice from itertools

from itertools import islice objects_id = list(islice(qs_iterator, chunk_size))

And you can use https://docs.djangoproject.com/en/2.2/ref/models/querysets/#values-list to only get an iterator with only the ids

With this approach we will have a list of all the ids at once.
I think it is better to iterate over on generator object?

It returns a QuerySet of values, it's lazy https://stackoverflow.com/questions/37140426/does-django-queryset-values-list-return-a-list-object

Ohhh...
But we also want to logging based on the pk.
Sorry... But I'm not able to think on how to use this method and to log also.
Do you have something on your mind?

you can get the last pk from the list last_pk = objects_id[-1] if objects_id else 0

Not so important anyway, current logic works too

I prefer the current implementation. It feels more explicit to me.

ericholscher · 2019-07-16T18:16:27Z

readthedocs/search/management/commands/reindex_elasticsearch.py

+
+            try:
+                for _ in range(chunk_size):
+                    objects_id.append(next(qs_iterator).pk)


I'd also like some logging here to be able to track progress. Something like if pk % 5000 = 0: log.info('Total: pk')

ericholscher

👍 I will get this wrapped up and deployed to web03 today to start the reindex.

dojutsu-user added 2 commits July 16, 2019 23:18

fix indexing speedup

b38423d

remove if

cbbbb42

dojutsu-user requested review from ericholscher and davidfischer July 16, 2019 17:56

stsewd reviewed Jul 16, 2019

View reviewed changes

davidfischer reviewed Jul 16, 2019

View reviewed changes

use next()

a6b8a1a

stsewd reviewed Jul 16, 2019

View reviewed changes

ericholscher reviewed Jul 16, 2019

View reviewed changes

add logging

985488d

ericholscher approved these changes Jul 16, 2019

View reviewed changes

ericholscher merged commit a457fc0 into readthedocs:gsoc-19-indoc-search Jul 16, 2019

dojutsu-user deleted the indexing-speedup branch July 16, 2019 19:05

Uh oh!

Indexing speedup #5939

Indexing speedup #5939

Uh oh!

Conversation

dojutsu-user commented Jul 16, 2019

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

ericholscher left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!