Skip to content

Commit 97e3ad8

Browse files
committed
Revert "Revert "Merge pull request #4636 from rtfd/search_upgrade" (#4716)"
This reverts commit 183b176.
1 parent 183b176 commit 97e3ad8

Some content is hidden

Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.

59 files changed

+1431
-1436
lines changed

.travis.yml

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -4,7 +4,7 @@ python:
44
- 3.6
55
sudo: false
66
env:
7-
- ES_VERSION=1.3.9 ES_DOWNLOAD_URL=https://download.elastic.co/elasticsearch/elasticsearch/elasticsearch-${ES_VERSION}.tar.gz
7+
- ES_VERSION=6.2.4 ES_DOWNLOAD_URL=https://artifacts.elastic.co/downloads/elasticsearch/elasticsearch-${ES_VERSION}.tar.gz
88
matrix:
99
include:
1010
- python: 3.6

conftest.py

Lines changed: 6 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -1,5 +1,7 @@
11
# -*- coding: utf-8 -*-
22
import pytest
3+
from django.conf import settings
4+
from rest_framework.test import APIClient
35

46
try:
57
# TODO: this file is read/executed even when called from ``readthedocsinc``,
@@ -44,3 +46,7 @@ def pytest_configure(config):
4446
@pytest.fixture(autouse=True)
4547
def settings_modification(settings):
4648
settings.CELERY_ALWAYS_EAGER = True
49+
50+
@pytest.fixture
51+
def api_client():
52+
return APIClient()

docs/custom_installs/elasticsearch.rst

Lines changed: 0 additions & 108 deletions
This file was deleted.

docs/development/search.rst

Lines changed: 110 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,110 @@
1+
Search
2+
======
3+
4+
Read The Docs uses Elasticsearch_ instead of the built in Sphinx search for providing better search
5+
results. Documents are indexed in the Elasticsearch index and the search is made through the API.
6+
All the Search Code is open source and lives in the `GitHub Repository`_.
7+
Currently we are using the `Elasticsearch 6.3`_ version.
8+
9+
Local Development Configuration
10+
-------------------------------
11+
12+
Installing and running Elasticsearch
13+
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
14+
You need to install and run Elasticsearch_ version 6.3 on your local development machine.
15+
You can get the installation instructions
16+
`here <https://www.elastic.co/guide/en/elasticsearch/reference/6.3/install-elasticsearch.html>`_.
17+
Otherwise, you can also start an Elasticsearch Docker container by running the following command::
18+
19+
docker run -p 9200:9200 -p 9300:9300 \
20+
-e "discovery.type=single-node" \
21+
docker.elastic.co/elasticsearch/elasticsearch:6.3.2
22+
23+
Indexing into Elasticsearch
24+
^^^^^^^^^^^^^^^^^^^^^^^^^^^
25+
For using search, you need to index data to the Elasticsearch Index. Run ``reindex_elasticsearch``
26+
management command::
27+
28+
./manage.py reindex_elasticsearch
29+
30+
For performance optimization, we implemented our own version of management command rather than
31+
the built in management command provided by the `django-elasticsearch-dsl`_ package.
32+
33+
Auto Indexing
34+
^^^^^^^^^^^^^
35+
By default, Auto Indexing is turned off in development mode. To turn it on, change the
36+
``ELASTICSEARCH_DSL_AUTOSYNC`` settings to `True` in the `readthedocs/settings/dev.py` file.
37+
After that, whenever a documentation successfully builds, or project gets added,
38+
the search index will update automatically.
39+
40+
41+
Architecture
42+
------------
43+
The search architecture is devided into 2 parts.
44+
One part is responsible for **indexing** the documents and projects and
45+
the other part is responsible for querying the Index to show the proper results to users.
46+
We use the `django-elasticsearch-dsl`_ package mostly to the keep the search working.
47+
`django-elasticsearch-dsl`_ is a wrapper around `elasticsearch-dsl`_ for easy configuration
48+
with Django.
49+
50+
Indexing
51+
^^^^^^^^
52+
All the Sphinx documents are indexed into Elasticsearch after the build is successful.
53+
Currently, we do not index MkDocs documents to elasticsearch, but
54+
`any kind of help is welcome <https://github.com/rtfd/readthedocs.org/issues/1088>`_.
55+
56+
How we index documentations
57+
~~~~~~~~~~~~~~~~~~~~~~~~~~~
58+
59+
After any build is successfully finished, `HTMLFile` objects are created for each of the
60+
``HTML`` files and the old version's `HTMLFile` object is deleted. By default,
61+
`django-elasticsearch-dsl`_ package listens to the `post_create`/`post_delete` signals
62+
to index/delete documents, but it has performance drawbacks as it send HTTP request whenever
63+
any `HTMLFile` objects is created or deleted. To optimize the performance, `bulk_post_create`
64+
and `bulk_post_delete` Signals_ are dispatched with list of `HTMLFIle` objects so its possible
65+
to bulk index documents in elasticsearch ( `bulk_post_create` signal is dispatched for created
66+
and `bulk_post_delete` is dispatched for deleted objects). Both of the signals are dispatched
67+
with the list of the instances of `HTMLFile` in `instance_list` parameter.
68+
69+
We listen to the `bulk_post_create` and `bulk_post_delete` signals in our `Search` application
70+
and index/delete the documentation content from the `HTMLFile` instances.
71+
72+
73+
How we index projects
74+
~~~~~~~~~~~~~~~~~~~~~
75+
We also index project information in our search index so that the user can search for projects
76+
from the main site. `django-elasticsearch-dsl`_ listen `post_create` and `post_delete` signals of
77+
`Project` model and index/delete into Elasticsearch accordingly.
78+
79+
80+
Elasticsearch Document
81+
~~~~~~~~~~~~~~~~~~~~~~
82+
83+
`elasticsearch-dsl`_ provides model-like wrapper for the `Elasticsearch document`_.
84+
As per requirements of `django-elasticsearch-dsl`_, it is stored in the
85+
`readthedocs/search/documents.py` file.
86+
87+
**ProjectDocument:** It is used for indexing projects. Signal listener of
88+
`django-elasticsearch-dsl`_ listens to the `post_save` signal of `Project` model and
89+
then index/delete into Elasticsearch.
90+
91+
**PageDocument**: It is used for indexing documentation of projects. By default, the auto
92+
indexing is turned off by `ignore_signals = settings.ES_PAGE_IGNORE_SIGNALS`.
93+
`settings.ES_PAGE_IGNORE_SIGNALS` is `False` both in development and production.
94+
As mentioned above, our `Search` app listens to the `bulk_post_create` and `bulk_post_delete`
95+
signals and indexes/deleted documentation into Elasticsearch. The signal listeners are in
96+
the `readthedocs/search/signals.py` file. Both of the signals are dispatched
97+
after a successful documentation build.
98+
99+
The fields and ES Datatypes are specified in the `PageDocument`. The indexable data is taken
100+
from `processed_json` property of `HTMLFile`. This property provides python dictionary with
101+
document data like `title`, `headers`, `content` etc.
102+
103+
104+
.. _Elasticsearch: https://www.elastic.co/products/elasticsearch
105+
.. _Elasticsearch 6.3: https://www.elastic.co/guide/en/elasticsearch/reference/6.3/index.html
106+
.. _GitHub Repository: https://github.com/rtfd/readthedocs.org/tree/master/readthedocs/search
107+
.. _Elasticsearch document: https://www.elastic.co/guide/en/elasticsearch/guide/current/document.html
108+
.. _django-elasticsearch-dsl: https://github.com/sabricot/django-elasticsearch-dsl
109+
.. _elasticsearch-dsl: http://elasticsearch-dsl.readthedocs.io/en/latest/
110+
.. _Signals: https://docs.djangoproject.com/en/2.1/topics/signals/

docs/index.rst

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -96,6 +96,7 @@ Information about development is also available:
9696

9797
changelog
9898
install
99+
development/search
99100
architecture
100101
tests
101102
docs

docs/install.rst

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -57,7 +57,7 @@ need to install Python 2.7 with virtualenv in your system as well.
5757
If you want full support for searching inside your Read the Docs
5858
site you will need to install Elasticsearch_.
5959

60-
Ubuntu users could install this package by following :doc:`/custom_installs/elasticsearch`.
60+
Follow :doc:`/development/search` documentation for more instruction.
6161

6262
.. note::
6363

docs/settings.rst

Lines changed: 78 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -100,3 +100,81 @@ ALLOW_ADMIN
100100
Default: :djangosetting:`ALLOW_ADMIN`
101101

102102
Whether to include `django.contrib.admin` in the URL's.
103+
104+
105+
ELASTICSEARCH_DSL
106+
-----------------
107+
108+
Default:
109+
110+
.. code-block:: python
111+
112+
{
113+
'default': {
114+
'hosts': '127.0.0.1:9200'
115+
},
116+
}
117+
118+
Settings for elasticsearch connection.
119+
This settings then pass to `elasticsearch-dsl-py.connections.configure`_
120+
121+
122+
ES_INDEXES
123+
----------
124+
125+
Default:
126+
127+
.. code-block:: python
128+
129+
{
130+
'project': {
131+
'name': 'project_index',
132+
'settings': {'number_of_shards': 5,
133+
'number_of_replicas': 0
134+
}
135+
},
136+
'page': {
137+
'name': 'page_index',
138+
'settings': {
139+
'number_of_shards': 5,
140+
'number_of_replicas': 0,
141+
}
142+
},
143+
}
144+
145+
Define the elasticsearch name and settings of all the index separately.
146+
The key is the type of index, like ``project`` or ``page`` and the value is another
147+
dictionary containing ``name`` and ``settings``. Here the ``name`` is the index name
148+
and the ``settings`` is used for configuring the particular index.
149+
150+
151+
ES_TASK_CHUNK_SIZE
152+
------------------
153+
154+
Default: :djangosetting:`ES_TASK_CHUNK_SIZE`
155+
156+
The maximum number of data send to each elasticsearch indexing celery task.
157+
This has been used while running ``elasticsearch_reindex`` management command.
158+
159+
160+
ES_PAGE_IGNORE_SIGNALS
161+
----------------------
162+
163+
Default: ``False``
164+
165+
This settings is used to determine whether to index each page separately into elasticsearch.
166+
If the setting is ``True``, each ``HTML`` page will not be indexed separately but will be
167+
indexed by bulk indexing.
168+
169+
170+
ELASTICSEARCH_DSL_AUTOSYNC
171+
--------------------------
172+
173+
Default: ``True``
174+
175+
This setting is used for automatically indexing objects to elasticsearch.
176+
``False`` by default in development so it is possible to create
177+
project and build documentations without having elasticsearch.
178+
179+
180+
.. _elasticsearch-dsl-py.connections.configure: https://elasticsearch-dsl.readthedocs.io/en/stable/configuration.html#multiple-clusters

readthedocs/core/management/commands/provision_elasticsearch.py

Lines changed: 0 additions & 33 deletions
This file was deleted.

0 commit comments

Comments
 (0)