Skip to content

Commit 693c0d2

Browse files
committed
Search: use with_positions_offsets term vector for some fields
Big documents will take some time (and memory) to find and highlight results. Using a term vector will reduce this computation, but it will increase the size of the index. We have a lot of free space, so we are fine. This was working before for big documents because there wasn't a limit by default https://www.elastic.co/guide/en/elasticsearch/reference/current/breaking-changes-7.0.html#_limiting_the_length_of_an_analyzed_text_during_highlighting
1 parent a05c1c3 commit 693c0d2

File tree

1 file changed

+6
-3
lines changed

1 file changed

+6
-3
lines changed

readthedocs/search/documents.py

Lines changed: 6 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -59,8 +59,11 @@ class PageDocument(RTDDocTypeMixin, Document):
5959
Simple analyzer will break the text in non-letter characters,
6060
so a text like ``python.submodule`` will be broken like [python, submodule]
6161
instead of [python.submodule].
62+
See more at https://www.elastic.co/guide/en/elasticsearch/reference/current/analysis-analyzers.html # noqa
6263
63-
https://www.elastic.co/guide/en/elasticsearch/reference/current/analysis-analyzers.html
64+
Some text fields use the ``with_positions_offsets`` term vector,
65+
this is to have faster highlighting on big documents.
66+
See more at https://www.elastic.co/guide/en/elasticsearch/reference/7.9/term-vector.html
6467
"""
6568

6669
# Metadata
@@ -77,7 +80,7 @@ class PageDocument(RTDDocTypeMixin, Document):
7780
properties={
7881
'id': fields.KeywordField(),
7982
'title': fields.TextField(),
80-
'content': fields.TextField(),
83+
'content': fields.TextField(term_vector='with_positions_offsets'),
8184
}
8285
)
8386
domains = fields.NestedField(
@@ -89,7 +92,7 @@ class PageDocument(RTDDocTypeMixin, Document):
8992

9093
# For showing in the search result
9194
'type_display': fields.TextField(),
92-
'docstrings': fields.TextField(),
95+
'docstrings': fields.TextField(term_vector='with_positions_offsets'),
9396

9497
# Simple analyzer breaks on `.`,
9598
# otherwise search results are too strict for this use case

0 commit comments

Comments
 (0)