Skip to content

Commit 36bb8cd

Browse files
ericholschersafwanrahman
authored andcommitted
Merge pull request #4467 from safwanrahman/search_docs
[Fix #4268] Adding Documentation for upgraded Search
2 parents a508020 + 2a726db commit 36bb8cd

File tree

5 files changed

+112
-142
lines changed

5 files changed

+112
-142
lines changed

docs/custom_installs/elasticsearch.rst

Lines changed: 0 additions & 108 deletions
This file was deleted.

docs/development/search.rst

Lines changed: 110 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,110 @@
1+
Search
2+
============
3+
4+
Read The Docs uses Elasticsearch_ instead of the built in Sphinx search for providing better search
5+
results. Documents are indexed in the Elasticsearch index and the search is made through the API.
6+
All the Search Code is open source and lives in the `Github Repository`_.
7+
Currently we are using the `Elasticsearch 6.3`_ version.
8+
9+
Local Development Configuration
10+
-------------------------------
11+
12+
Installing and running Elasticsearch
13+
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
14+
You need to install and run Elasticsearch_ version 6.3 on your local development machine.
15+
You can get the installation instructions
16+
`here <https://www.elastic.co/guide/en/elasticsearch/reference/6.3/install-elasticsearch.html>`_.
17+
Otherwise, you can also start an Elasticsearch Docker container by running the following command::
18+
19+
docker run -p 9200:9200 -p 9300:9300 \
20+
-e "discovery.type=single-node" \
21+
docker.elastic.co/elasticsearch/elasticsearch:6.3.2
22+
23+
Indexing into Elasticsearch
24+
^^^^^^^^^^^^^^^^^^^^^^^^^^^
25+
For using search, you need to index data to the Elasticsearch Index. Run `reindex_elasticsearch`
26+
management command::
27+
28+
./manage.py reindex_elasticsearch
29+
30+
For performance optimization, we implemented our own version of management command rather than
31+
the built in management command provided by the `django-elasticsearch-dsl`_ package.
32+
33+
Auto Indexing
34+
^^^^^^^^^^^^^
35+
By default, Auto Indexing is turned off in development mode. To turn it on, change the
36+
`ELASTICSEARCH_DSL_AUTOSYNC` settings to `True` in the `readthedocs/settings/dev.py` file.
37+
After that, whenever a documentation successfully builds, or project gets added,
38+
the search index will update automatically.
39+
40+
41+
Architecture
42+
------------
43+
The search architecture is devided into 2 parts.
44+
One part is responsible for **indexing** the documents and projects and
45+
the other part is responsible for querying the Index to show the proper results to users.
46+
We use the `django-elasticsearch-dsl`_ package mostly to the keep the search working.
47+
`django-elasticsearch-dsl`_ is a wrapper around `elasticsearch-dsl`_ for easy configuration
48+
with Django.
49+
50+
Indexing
51+
^^^^^^^^
52+
All the Sphinx documents are indexed into Elasticsearch after the build is successful.
53+
Currently, we do not index MkDocs documents to elasticsearch, but
54+
`any kind of help is welcome <https://github.com/rtfd/readthedocs.org/issues/1088>`_.
55+
56+
How we index documentations
57+
~~~~~~~~~~~~~~~~~~~~~~~~~~~
58+
59+
After any build is successfully finished, `HTMLFile` objects are created for each of the
60+
`HTML` files and the old version's `HTMLFile` object is deleted. By default,
61+
`django-elasticsearch-dsl`_ package listens to the `post_create`/`post_delete` signals
62+
to index/delete documents, but it has performance drawbacks as it send HTTP request whenever
63+
any `HTMLFile` objects is created or deleted. To optimize the performance, `bulk_post_create`
64+
and `bulk_post_delete` Signals_ are dispatched with list of `HTMLFIle` objects so its possible
65+
to bulk index documents in elasticsearch ( `bulk_post_create` signal is dispatched for created
66+
and `bulk_post_delete` is dispatched for deleted objects). Both of the signals are dispatched
67+
with the list of the instances of `HTMLFile` in `instance_list` parameter.
68+
69+
We listen to the `bulk_post_create` and `bulk_post_delete` signals in our `Search` application
70+
and index/delete the documentation content from the `HTMLFile` instances.
71+
72+
73+
How we index projects
74+
~~~~~~~~~~~~~~~~~~~~~
75+
We also index project information in our search index so that the user can search for projects
76+
from the main site. `django-elasticsearch-dsl`_ listen `post_create` and `post_delete` signals of
77+
`Project` model and index/delete into Elasticsearch accordingly.
78+
79+
80+
Elasticsearch Document
81+
~~~~~~~~~~~~~~~~~~~~~~
82+
83+
`elasticsearch-dsl`_ provides model-like wrapper for the `Elasticsearch document`_.
84+
As per requirements of `django-elasticsearch-dsl`_, it is stored in the
85+
`readthedocs/search/documents.py` file.
86+
87+
**ProjectDocument:** It is used for indexing projects. Signal listener of
88+
`django-elasticsearch-dsl`_ listens to the `post_save` signal of `Project` model and
89+
then index/delete into Elasticsearch.
90+
91+
**PageDocument**: It is used for indexing documentation of projects. By default, the auto
92+
indexing is turned off by `ignore_signals = settings.ES_PAGE_IGNORE_SIGNALS`.
93+
`settings.ES_PAGE_IGNORE_SIGNALS` is `False` both in development and production.
94+
As mentioned above, our `Search` app listens to the `bulk_post_create` and `bulk_post_delete`
95+
signals and indexes/deleted documentation into Elasticsearch. The signal listeners are in
96+
the `readthedocs/search/signals.py` file. Both of the signals are dispatched
97+
after a successful documentation build.
98+
99+
The fields and ES Datatypes are specified in the `PageDocument`. The indexable data is taken
100+
from `processed_json` property of `HTMLFile`. This property provides python dictionary with
101+
document data like `title`, `headers`, `content` etc.
102+
103+
104+
.. _Elasticsearch: https://www.elastic.co/products/elasticsearch
105+
.. _Elasticsearch 6.3: https://www.elastic.co/guide/en/elasticsearch/reference/6.3/index.html
106+
.. _Github Repository: https://github.com/rtfd/readthedocs.org/tree/master/readthedocs/search
107+
.. _Elasticsearch document: https://www.elastic.co/guide/en/elasticsearch/guide/current/document.html
108+
.. _django-elasticsearch-dsl: https://github.com/sabricot/django-elasticsearch-dsl
109+
.. _elasticsearch-dsl: http://elasticsearch-dsl.readthedocs.io/en/latest/
110+
.. _Signals: https://docs.djangoproject.com/en/2.1/topics/signals/

docs/index.rst

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -95,6 +95,7 @@ Information about development is also available:
9595

9696
changelog
9797
install
98+
development/search
9899
architecture
99100
tests
100101
docs

docs/install.rst

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -57,7 +57,7 @@ need to install Python 2.7 with virtualenv in your system as well.
5757
If you want full support for searching inside your Read the Docs
5858
site you will need to install Elasticsearch_.
5959

60-
Ubuntu users could install this package by following :doc:`/custom_installs/elasticsearch`.
60+
Follow :doc:`/development/search` documentation for more instruction.
6161

6262
.. note::
6363

readthedocs/core/management/commands/provision_elasticsearch.py

Lines changed: 0 additions & 33 deletions
This file was deleted.

0 commit comments

Comments
 (0)