Skip to content

Commit 9857e01

Browse files
committed
Squash commits
1 parent 5d4da21 commit 9857e01

Some content is hidden

Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.

60 files changed

+1516
-1423
lines changed

.travis.yml

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -3,7 +3,7 @@ python:
33
- 2.7
44
- 3.6
55
env:
6-
- ES_VERSION=1.3.9 ES_DOWNLOAD_URL=https://download.elastic.co/elasticsearch/elasticsearch/elasticsearch-${ES_VERSION}.tar.gz
6+
- ES_VERSION=6.2.4 ES_DOWNLOAD_URL=https://artifacts.elastic.co/downloads/elasticsearch/elasticsearch-${ES_VERSION}.tar.gz
77
matrix:
88
include:
99
- python: 3.6

conftest.py

Lines changed: 6 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -1,5 +1,7 @@
11
# -*- coding: utf-8 -*-
22
import pytest
3+
from django.conf import settings
4+
from rest_framework.test import APIClient
35

46
try:
57
# TODO: this file is read/executed even when called from ``readthedocsinc``,
@@ -44,3 +46,7 @@ def pytest_configure(config):
4446
@pytest.fixture(autouse=True)
4547
def settings_modification(settings):
4648
settings.CELERY_ALWAYS_EAGER = True
49+
50+
@pytest.fixture
51+
def api_client():
52+
return APIClient()

docs/custom_installs/elasticsearch.rst

Lines changed: 0 additions & 108 deletions
This file was deleted.

docs/development/search.rst

Lines changed: 110 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,110 @@
1+
Search
2+
======
3+
4+
Read The Docs uses Elasticsearch_ instead of the built in Sphinx search for providing better search
5+
results. Documents are indexed in the Elasticsearch index and the search is made through the API.
6+
All the Search Code is open source and lives in the `GitHub Repository`_.
7+
Currently we are using the `Elasticsearch 6.3`_ version.
8+
9+
Local Development Configuration
10+
-------------------------------
11+
12+
Installing and running Elasticsearch
13+
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
14+
You need to install and run Elasticsearch_ version 6.3 on your local development machine.
15+
You can get the installation instructions
16+
`here <https://www.elastic.co/guide/en/elasticsearch/reference/6.3/install-elasticsearch.html>`_.
17+
Otherwise, you can also start an Elasticsearch Docker container by running the following command::
18+
19+
docker run -p 9200:9200 -p 9300:9300 \
20+
-e "discovery.type=single-node" \
21+
docker.elastic.co/elasticsearch/elasticsearch:6.3.2
22+
23+
Indexing into Elasticsearch
24+
^^^^^^^^^^^^^^^^^^^^^^^^^^^
25+
For using search, you need to index data to the Elasticsearch Index. Run ``reindex_elasticsearch``
26+
management command::
27+
28+
./manage.py reindex_elasticsearch
29+
30+
For performance optimization, we implemented our own version of management command rather than
31+
the built in management command provided by the `django-elasticsearch-dsl`_ package.
32+
33+
Auto Indexing
34+
^^^^^^^^^^^^^
35+
By default, Auto Indexing is turned off in development mode. To turn it on, change the
36+
``ELASTICSEARCH_DSL_AUTOSYNC`` settings to `True` in the `readthedocs/settings/dev.py` file.
37+
After that, whenever a documentation successfully builds, or project gets added,
38+
the search index will update automatically.
39+
40+
41+
Architecture
42+
------------
43+
The search architecture is devided into 2 parts.
44+
One part is responsible for **indexing** the documents and projects and
45+
the other part is responsible for querying the Index to show the proper results to users.
46+
We use the `django-elasticsearch-dsl`_ package mostly to the keep the search working.
47+
`django-elasticsearch-dsl`_ is a wrapper around `elasticsearch-dsl`_ for easy configuration
48+
with Django.
49+
50+
Indexing
51+
^^^^^^^^
52+
All the Sphinx documents are indexed into Elasticsearch after the build is successful.
53+
Currently, we do not index MkDocs documents to elasticsearch, but
54+
`any kind of help is welcome <https://github.com/rtfd/readthedocs.org/issues/1088>`_.
55+
56+
How we index documentations
57+
~~~~~~~~~~~~~~~~~~~~~~~~~~~
58+
59+
After any build is successfully finished, `HTMLFile` objects are created for each of the
60+
``HTML`` files and the old version's `HTMLFile` object is deleted. By default,
61+
`django-elasticsearch-dsl`_ package listens to the `post_create`/`post_delete` signals
62+
to index/delete documents, but it has performance drawbacks as it send HTTP request whenever
63+
any `HTMLFile` objects is created or deleted. To optimize the performance, `bulk_post_create`
64+
and `bulk_post_delete` Signals_ are dispatched with list of `HTMLFIle` objects so its possible
65+
to bulk index documents in elasticsearch ( `bulk_post_create` signal is dispatched for created
66+
and `bulk_post_delete` is dispatched for deleted objects). Both of the signals are dispatched
67+
with the list of the instances of `HTMLFile` in `instance_list` parameter.
68+
69+
We listen to the `bulk_post_create` and `bulk_post_delete` signals in our `Search` application
70+
and index/delete the documentation content from the `HTMLFile` instances.
71+
72+
73+
How we index projects
74+
~~~~~~~~~~~~~~~~~~~~~
75+
We also index project information in our search index so that the user can search for projects
76+
from the main site. `django-elasticsearch-dsl`_ listen `post_create` and `post_delete` signals of
77+
`Project` model and index/delete into Elasticsearch accordingly.
78+
79+
80+
Elasticsearch Document
81+
~~~~~~~~~~~~~~~~~~~~~~
82+
83+
`elasticsearch-dsl`_ provides model-like wrapper for the `Elasticsearch document`_.
84+
As per requirements of `django-elasticsearch-dsl`_, it is stored in the
85+
`readthedocs/search/documents.py` file.
86+
87+
**ProjectDocument:** It is used for indexing projects. Signal listener of
88+
`django-elasticsearch-dsl`_ listens to the `post_save` signal of `Project` model and
89+
then index/delete into Elasticsearch.
90+
91+
**PageDocument**: It is used for indexing documentation of projects. By default, the auto
92+
indexing is turned off by `ignore_signals = settings.ES_PAGE_IGNORE_SIGNALS`.
93+
`settings.ES_PAGE_IGNORE_SIGNALS` is `False` both in development and production.
94+
As mentioned above, our `Search` app listens to the `bulk_post_create` and `bulk_post_delete`
95+
signals and indexes/deleted documentation into Elasticsearch. The signal listeners are in
96+
the `readthedocs/search/signals.py` file. Both of the signals are dispatched
97+
after a successful documentation build.
98+
99+
The fields and ES Datatypes are specified in the `PageDocument`. The indexable data is taken
100+
from `processed_json` property of `HTMLFile`. This property provides python dictionary with
101+
document data like `title`, `headers`, `content` etc.
102+
103+
104+
.. _Elasticsearch: https://www.elastic.co/products/elasticsearch
105+
.. _Elasticsearch 6.3: https://www.elastic.co/guide/en/elasticsearch/reference/6.3/index.html
106+
.. _GitHub Repository: https://github.com/rtfd/readthedocs.org/tree/master/readthedocs/search
107+
.. _Elasticsearch document: https://www.elastic.co/guide/en/elasticsearch/guide/current/document.html
108+
.. _django-elasticsearch-dsl: https://github.com/sabricot/django-elasticsearch-dsl
109+
.. _elasticsearch-dsl: http://elasticsearch-dsl.readthedocs.io/en/latest/
110+
.. _Signals: https://docs.djangoproject.com/en/2.1/topics/signals/

docs/index.rst

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -114,6 +114,7 @@ to help you create fantastic documentation for your project.
114114

115115
changelog
116116
install
117+
development/search
117118
architecture
118119
tests
119120
docs

docs/install.rst

Lines changed: 2 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -19,8 +19,8 @@ Additionally Read the Docs depends on:
1919
* `Redis`_
2020
* `Elasticsearch`_ (only if you want full support for searching inside the site)
2121

22-
* Ubuntu users could install this package by following :doc:`/custom_installs/elasticsearch`.
23-
22+
* Follow :doc:`/development/search` documentation for more instruction.
23+
`
2424
.. note::
2525

2626
If you plan to import Python 2 projects to your RTD,
@@ -56,8 +56,6 @@ you need these libraries.
5656

5757
.. tab:: CentOS/RHEL 7
5858

59-
Install::
60-
6159
sudo yum install python-devel python-pip libxml2-devel libxslt-devel
6260

6361
.. tab:: Other OS

docs/settings.rst

Lines changed: 78 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -100,3 +100,81 @@ ALLOW_ADMIN
100100
Default: :djangosetting:`ALLOW_ADMIN`
101101

102102
Whether to include `django.contrib.admin` in the URL's.
103+
104+
105+
ELASTICSEARCH_DSL
106+
-----------------
107+
108+
Default:
109+
110+
.. code-block:: python
111+
112+
{
113+
'default': {
114+
'hosts': '127.0.0.1:9200'
115+
},
116+
}
117+
118+
Settings for elasticsearch connection.
119+
This settings then pass to `elasticsearch-dsl-py.connections.configure`_
120+
121+
122+
ES_INDEXES
123+
----------
124+
125+
Default:
126+
127+
.. code-block:: python
128+
129+
{
130+
'project': {
131+
'name': 'project_index',
132+
'settings': {'number_of_shards': 5,
133+
'number_of_replicas': 0
134+
}
135+
},
136+
'page': {
137+
'name': 'page_index',
138+
'settings': {
139+
'number_of_shards': 5,
140+
'number_of_replicas': 0,
141+
}
142+
},
143+
}
144+
145+
Define the elasticsearch name and settings of all the index separately.
146+
The key is the type of index, like ``project`` or ``page`` and the value is another
147+
dictionary containing ``name`` and ``settings``. Here the ``name`` is the index name
148+
and the ``settings`` is used for configuring the particular index.
149+
150+
151+
ES_TASK_CHUNK_SIZE
152+
------------------
153+
154+
Default: :djangosetting:`ES_TASK_CHUNK_SIZE`
155+
156+
The maximum number of data send to each elasticsearch indexing celery task.
157+
This has been used while running ``elasticsearch_reindex`` management command.
158+
159+
160+
ES_PAGE_IGNORE_SIGNALS
161+
----------------------
162+
163+
Default: ``False``
164+
165+
This settings is used to determine whether to index each page separately into elasticsearch.
166+
If the setting is ``True``, each ``HTML`` page will not be indexed separately but will be
167+
indexed by bulk indexing.
168+
169+
170+
ELASTICSEARCH_DSL_AUTOSYNC
171+
--------------------------
172+
173+
Default: ``True``
174+
175+
This setting is used for automatically indexing objects to elasticsearch.
176+
``False`` by default in development so it is possible to create
177+
project and build documentations without having elasticsearch.
178+
179+
180+
.. _elasticsearch-dsl-py.connections.configure: https://elasticsearch-dsl.readthedocs.io/en/stable/configuration.html#multiple-clusters

0 commit comments

Comments
 (0)