|
| 1 | +Search |
| 2 | +====== |
| 3 | + |
| 4 | +Read The Docs uses Elasticsearch_ instead of the built in Sphinx search for providing better search |
| 5 | +results. Documents are indexed in the Elasticsearch index and the search is made through the API. |
| 6 | +All the Search Code is open source and lives in the `GitHub Repository`_. |
| 7 | +Currently we are using the `Elasticsearch 6.3`_ version. |
| 8 | + |
| 9 | +Local Development Configuration |
| 10 | +------------------------------- |
| 11 | + |
| 12 | +Installing and running Elasticsearch |
| 13 | +^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ |
| 14 | +You need to install and run Elasticsearch_ version 6.3 on your local development machine. |
| 15 | +You can get the installation instructions |
| 16 | +`here <https://www.elastic.co/guide/en/elasticsearch/reference/6.3/install-elasticsearch.html>`_. |
| 17 | +Otherwise, you can also start an Elasticsearch Docker container by running the following command:: |
| 18 | + |
| 19 | + docker run -p 9200:9200 -p 9300:9300 \ |
| 20 | + -e "discovery.type=single-node" \ |
| 21 | + docker.elastic.co/elasticsearch/elasticsearch:6.3.2 |
| 22 | + |
| 23 | +Indexing into Elasticsearch |
| 24 | +^^^^^^^^^^^^^^^^^^^^^^^^^^^ |
| 25 | +For using search, you need to index data to the Elasticsearch Index. Run ``reindex_elasticsearch`` |
| 26 | +management command:: |
| 27 | + |
| 28 | + ./manage.py reindex_elasticsearch |
| 29 | + |
| 30 | +For performance optimization, we implemented our own version of management command rather than |
| 31 | +the built in management command provided by the `django-elasticsearch-dsl`_ package. |
| 32 | + |
| 33 | +Auto Indexing |
| 34 | +^^^^^^^^^^^^^ |
| 35 | +By default, Auto Indexing is turned off in development mode. To turn it on, change the |
| 36 | +``ELASTICSEARCH_DSL_AUTOSYNC`` settings to `True` in the `readthedocs/settings/dev.py` file. |
| 37 | +After that, whenever a documentation successfully builds, or project gets added, |
| 38 | +the search index will update automatically. |
| 39 | + |
| 40 | + |
| 41 | +Architecture |
| 42 | +------------ |
| 43 | +The search architecture is devided into 2 parts. |
| 44 | +One part is responsible for **indexing** the documents and projects and |
| 45 | +the other part is responsible for querying the Index to show the proper results to users. |
| 46 | +We use the `django-elasticsearch-dsl`_ package mostly to the keep the search working. |
| 47 | +`django-elasticsearch-dsl`_ is a wrapper around `elasticsearch-dsl`_ for easy configuration |
| 48 | +with Django. |
| 49 | + |
| 50 | +Indexing |
| 51 | +^^^^^^^^ |
| 52 | +All the Sphinx documents are indexed into Elasticsearch after the build is successful. |
| 53 | +Currently, we do not index MkDocs documents to elasticsearch, but |
| 54 | +`any kind of help is welcome <https://github.com/rtfd/readthedocs.org/issues/1088>`_. |
| 55 | + |
| 56 | +How we index documentations |
| 57 | +~~~~~~~~~~~~~~~~~~~~~~~~~~~ |
| 58 | + |
| 59 | +After any build is successfully finished, `HTMLFile` objects are created for each of the |
| 60 | +``HTML`` files and the old version's `HTMLFile` object is deleted. By default, |
| 61 | +`django-elasticsearch-dsl`_ package listens to the `post_create`/`post_delete` signals |
| 62 | +to index/delete documents, but it has performance drawbacks as it send HTTP request whenever |
| 63 | +any `HTMLFile` objects is created or deleted. To optimize the performance, `bulk_post_create` |
| 64 | +and `bulk_post_delete` Signals_ are dispatched with list of `HTMLFIle` objects so its possible |
| 65 | +to bulk index documents in elasticsearch ( `bulk_post_create` signal is dispatched for created |
| 66 | +and `bulk_post_delete` is dispatched for deleted objects). Both of the signals are dispatched |
| 67 | +with the list of the instances of `HTMLFile` in `instance_list` parameter. |
| 68 | + |
| 69 | +We listen to the `bulk_post_create` and `bulk_post_delete` signals in our `Search` application |
| 70 | +and index/delete the documentation content from the `HTMLFile` instances. |
| 71 | + |
| 72 | + |
| 73 | +How we index projects |
| 74 | +~~~~~~~~~~~~~~~~~~~~~ |
| 75 | +We also index project information in our search index so that the user can search for projects |
| 76 | +from the main site. `django-elasticsearch-dsl`_ listen `post_create` and `post_delete` signals of |
| 77 | +`Project` model and index/delete into Elasticsearch accordingly. |
| 78 | + |
| 79 | + |
| 80 | +Elasticsearch Document |
| 81 | +~~~~~~~~~~~~~~~~~~~~~~ |
| 82 | + |
| 83 | +`elasticsearch-dsl`_ provides model-like wrapper for the `Elasticsearch document`_. |
| 84 | +As per requirements of `django-elasticsearch-dsl`_, it is stored in the |
| 85 | +`readthedocs/search/documents.py` file. |
| 86 | + |
| 87 | + **ProjectDocument:** It is used for indexing projects. Signal listener of |
| 88 | + `django-elasticsearch-dsl`_ listens to the `post_save` signal of `Project` model and |
| 89 | + then index/delete into Elasticsearch. |
| 90 | + |
| 91 | + **PageDocument**: It is used for indexing documentation of projects. By default, the auto |
| 92 | + indexing is turned off by `ignore_signals = settings.ES_PAGE_IGNORE_SIGNALS`. |
| 93 | + `settings.ES_PAGE_IGNORE_SIGNALS` is `False` both in development and production. |
| 94 | + As mentioned above, our `Search` app listens to the `bulk_post_create` and `bulk_post_delete` |
| 95 | + signals and indexes/deleted documentation into Elasticsearch. The signal listeners are in |
| 96 | + the `readthedocs/search/signals.py` file. Both of the signals are dispatched |
| 97 | + after a successful documentation build. |
| 98 | + |
| 99 | + The fields and ES Datatypes are specified in the `PageDocument`. The indexable data is taken |
| 100 | + from `processed_json` property of `HTMLFile`. This property provides python dictionary with |
| 101 | + document data like `title`, `headers`, `content` etc. |
| 102 | + |
| 103 | + |
| 104 | +.. _Elasticsearch: https://www.elastic.co/products/elasticsearch |
| 105 | +.. _Elasticsearch 6.3: https://www.elastic.co/guide/en/elasticsearch/reference/6.3/index.html |
| 106 | +.. _GitHub Repository: https://github.com/rtfd/readthedocs.org/tree/master/readthedocs/search |
| 107 | +.. _Elasticsearch document: https://www.elastic.co/guide/en/elasticsearch/guide/current/document.html |
| 108 | +.. _django-elasticsearch-dsl: https://github.com/sabricot/django-elasticsearch-dsl |
| 109 | +.. _elasticsearch-dsl: http://elasticsearch-dsl.readthedocs.io/en/latest/ |
| 110 | +.. _Signals: https://docs.djangoproject.com/en/2.1/topics/signals/ |
0 commit comments