-
-
Notifications
You must be signed in to change notification settings - Fork 3.6k
[GSoC 2018] All Search Improvements #4636
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Changes from 77 commits
3c41b42
6410495
272b50a
b8f1a06
6c430e5
035c312
746b378
de47978
ab6fffb
3523fab
9a5b0ed
e9b1c03
37f6936
f730556
05f5e05
0965a94
d4f6708
c13798b
26e21de
a464ddc
10b7f36
54f0106
3cdac0c
12d7f9b
dd18370
e8ac769
e9bfeee
ccf2382
a82f006
044565b
e2b8d8c
8dcc149
1a5b30e
f8d5e7f
d30bac3
1b47227
2586e15
39ada00
8fc3b65
fb16187
39d8031
bbbdca5
faca6de
fd54d69
b9dbb5d
ce4abaf
db51a90
7993f80
665cc08
612cfb8
baf8421
143ce7f
abaeade
9d6f201
652f869
463f9e2
879b59c
5a3d9c8
bbf0973
e51d580
c752e44
568d8c6
e923884
2a726db
6b27161
9a78698
cb06923
d06e57e
9dbc572
9fcdfc4
ad2d174
a508020
36bb8cd
aa2fe7b
21fed3a
bf6ccbe
1417f86
295f91a
87691fa
bcbdd13
d5d7f7d
83db570
4b7c88d
669fe22
d1bba06
bf71fb4
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
This file was deleted.
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,110 @@ | ||
Search | ||
============ | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. nitpick: |
||
|
||
Read The Docs uses Elasticsearch_ instead of the built in Sphinx search for providing better search | ||
results. Documents are indexed in the Elasticsearch index and the search is made through the API. | ||
All the Search Code is open source and lives in the `Github Repository`_. | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. np: |
||
Currently we are using the `Elasticsearch 6.3`_ version. | ||
|
||
Local Development Configuration | ||
------------------------------- | ||
|
||
Installing and running Elasticsearch | ||
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ | ||
You need to install and run Elasticsearch_ version 6.3 on your local development machine. | ||
You can get the installation instructions | ||
`here <https://www.elastic.co/guide/en/elasticsearch/reference/6.3/install-elasticsearch.html>`_. | ||
Otherwise, you can also start an Elasticsearch Docker container by running the following command:: | ||
|
||
docker run -p 9200:9200 -p 9300:9300 \ | ||
-e "discovery.type=single-node" \ | ||
docker.elastic.co/elasticsearch/elasticsearch:6.3.2 | ||
|
||
Indexing into Elasticsearch | ||
^^^^^^^^^^^^^^^^^^^^^^^^^^^ | ||
For using search, you need to index data to the Elasticsearch Index. Run `reindex_elasticsearch` | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. reindex_elasticsearch should be enclosed in double ` |
||
management command:: | ||
|
||
./manage.py reindex_elasticsearch | ||
|
||
For performance optimization, we implemented our own version of management command rather than | ||
the built in management command provided by the `django-elasticsearch-dsl`_ package. | ||
|
||
Auto Indexing | ||
^^^^^^^^^^^^^ | ||
By default, Auto Indexing is turned off in development mode. To turn it on, change the | ||
`ELASTICSEARCH_DSL_AUTOSYNC` settings to `True` in the `readthedocs/settings/dev.py` file. | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Same here, double ` |
||
After that, whenever a documentation successfully builds, or project gets added, | ||
the search index will update automatically. | ||
|
||
|
||
Architecture | ||
------------ | ||
The search architecture is devided into 2 parts. | ||
One part is responsible for **indexing** the documents and projects and | ||
the other part is responsible for querying the Index to show the proper results to users. | ||
We use the `django-elasticsearch-dsl`_ package mostly to the keep the search working. | ||
`django-elasticsearch-dsl`_ is a wrapper around `elasticsearch-dsl`_ for easy configuration | ||
with Django. | ||
|
||
Indexing | ||
^^^^^^^^ | ||
All the Sphinx documents are indexed into Elasticsearch after the build is successful. | ||
Currently, we do not index MkDocs documents to elasticsearch, but | ||
`any kind of help is welcome <https://github.com/rtfd/readthedocs.org/issues/1088>`_. | ||
|
||
How we index documentations | ||
~~~~~~~~~~~~~~~~~~~~~~~~~~~ | ||
|
||
After any build is successfully finished, `HTMLFile` objects are created for each of the | ||
`HTML` files and the old version's `HTMLFile` object is deleted. By default, | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Same here, double ` |
||
`django-elasticsearch-dsl`_ package listens to the `post_create`/`post_delete` signals | ||
to index/delete documents, but it has performance drawbacks as it send HTTP request whenever | ||
any `HTMLFile` objects is created or deleted. To optimize the performance, `bulk_post_create` | ||
and `bulk_post_delete` Signals_ are dispatched with list of `HTMLFIle` objects so its possible | ||
to bulk index documents in elasticsearch ( `bulk_post_create` signal is dispatched for created | ||
and `bulk_post_delete` is dispatched for deleted objects). Both of the signals are dispatched | ||
with the list of the instances of `HTMLFile` in `instance_list` parameter. | ||
|
||
We listen to the `bulk_post_create` and `bulk_post_delete` signals in our `Search` application | ||
and index/delete the documentation content from the `HTMLFile` instances. | ||
|
||
|
||
How we index projects | ||
~~~~~~~~~~~~~~~~~~~~~ | ||
We also index project information in our search index so that the user can search for projects | ||
from the main site. `django-elasticsearch-dsl`_ listen `post_create` and `post_delete` signals of | ||
`Project` model and index/delete into Elasticsearch accordingly. | ||
|
||
|
||
Elasticsearch Document | ||
~~~~~~~~~~~~~~~~~~~~~~ | ||
|
||
`elasticsearch-dsl`_ provides model-like wrapper for the `Elasticsearch document`_. | ||
As per requirements of `django-elasticsearch-dsl`_, it is stored in the | ||
`readthedocs/search/documents.py` file. | ||
|
||
**ProjectDocument:** It is used for indexing projects. Signal listener of | ||
`django-elasticsearch-dsl`_ listens to the `post_save` signal of `Project` model and | ||
then index/delete into Elasticsearch. | ||
|
||
**PageDocument**: It is used for indexing documentation of projects. By default, the auto | ||
indexing is turned off by `ignore_signals = settings.ES_PAGE_IGNORE_SIGNALS`. | ||
`settings.ES_PAGE_IGNORE_SIGNALS` is `False` both in development and production. | ||
As mentioned above, our `Search` app listens to the `bulk_post_create` and `bulk_post_delete` | ||
signals and indexes/deleted documentation into Elasticsearch. The signal listeners are in | ||
the `readthedocs/search/signals.py` file. Both of the signals are dispatched | ||
after a successful documentation build. | ||
|
||
The fields and ES Datatypes are specified in the `PageDocument`. The indexable data is taken | ||
from `processed_json` property of `HTMLFile`. This property provides python dictionary with | ||
document data like `title`, `headers`, `content` etc. | ||
|
||
|
||
.. _Elasticsearch: https://www.elastic.co/products/elasticsearch | ||
.. _Elasticsearch 6.3: https://www.elastic.co/guide/en/elasticsearch/reference/6.3/index.html | ||
.. _Github Repository: https://github.com/rtfd/readthedocs.org/tree/master/readthedocs/search | ||
.. _Elasticsearch document: https://www.elastic.co/guide/en/elasticsearch/guide/current/document.html | ||
.. _django-elasticsearch-dsl: https://github.com/sabricot/django-elasticsearch-dsl | ||
.. _elasticsearch-dsl: http://elasticsearch-dsl.readthedocs.io/en/latest/ | ||
.. _Signals: https://docs.djangoproject.com/en/2.1/topics/signals/ |
This file was deleted.
This file was deleted.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can remove this.