diff --git a/docs/config-file/v2.rst b/docs/config-file/v2.rst index 88f5d3a73b4..3baf8615dff 100644 --- a/docs/config-file/v2.rst +++ b/docs/config-file/v2.rst @@ -25,7 +25,7 @@ Below is an example YAML file which shows the most common configuration options: # configuration: mkdocs.yml # Optionally build your docs in additional formats such as PDF - formats: + formats: - pdf # Optionally set the version of Python and requirements required to build your docs @@ -74,11 +74,15 @@ Example: .. code-block:: yaml + version: 2 + # Default formats: [] .. code-block:: yaml + version: 2 + # Build PDF & ePub formats: - epub @@ -90,6 +94,8 @@ Example: .. code-block:: yaml + version: 2 + # Build all formats formats: all @@ -104,6 +110,8 @@ Configuration of the Python environment to be used. .. code-block:: yaml + version: 2 + python: version: 3.7 install: @@ -153,6 +161,8 @@ Example: .. code-block:: yaml + version: 2 + python: version: 3.7 install: @@ -199,6 +209,8 @@ Example: .. code-block:: yaml + version: 2 + python: version: 3.7 install: @@ -242,6 +254,8 @@ Configuration for Conda support. .. code-block:: yaml + version: 2 + conda: environment: environment.yml @@ -260,6 +274,8 @@ Configuration for the documentation build process. .. code-block:: yaml + version: 2 + build: image: latest @@ -290,6 +306,8 @@ Configuration for Sphinx documentation .. code-block:: yaml + version: 2 + sphinx: builder: html configuration: conf.py @@ -335,6 +353,8 @@ Configuration for Mkdocs documentation. .. code-block:: yaml + version: 2 + mkdocs: configuration: mkdocs.yml fail_on_warning: false @@ -374,6 +394,8 @@ VCS submodules configuration. .. code-block:: yaml + version: 2 + submodules: include: - one @@ -394,6 +416,8 @@ List of submodules to be included. .. code-block:: yaml + version: 2 + submodules: include: all @@ -412,6 +436,8 @@ List of submodules to be excluded. .. code-block:: yaml + version: 2 + submodules: exclude: all @@ -427,6 +453,65 @@ Do a recursive clone of the submodules. This is ignored if there aren't submodules to clone. +search +~~~~~~ + +Settings for more control over :doc:`/server-side-search`. + +.. code-block:: yaml + + version: 2 + + search: + ranking: + api/v1/*: -1 + api/v2/*: 4 + +search.ranking +`````````````` + +Set a custom search rank over pages matching a pattern. + +:Type: ``map`` of patterns to ranks +:Default: ``{}`` + +Patterns are matched against the final html pages produced by the build +(you should try to match `index.html`, not `docs/index.rst`). +Patterns can include some special characters: + +- ``*`` matches everything +- ``?`` matches any single character +- ``[seq]`` matches any character in ``seq`` + +The rank can be an integer number between -10 and 10 (inclusive). +Pages with a rank closer to -10 will appear further down the list of results, +and pages with a rank closer to 10 will appear higher in the list of results. +Note that 0 means *normal rank*, not *no rank*. + +.. code-block:: yaml + + version: 2 + + search: + ranking: + # Match a single file + tutorial.hml: 2 + + # Match all files under the api/v1 directory + api/v1/*: -5 + + # Match all files that end with tutorial.html + */tutorial.html: 3 + +.. note:: + + The final rank will be the last pattern to match the page. + +.. tip:: + + Is better to decrease the rank of pages you want to deprecate, + rather than increasing the rank of the other pages. + Schema ------ @@ -468,6 +553,7 @@ New settings - :ref:`config-file/v2:mkdocs` - :ref:`config-file/v2:submodules` - :ref:`config-file/v2:python.install` +- :ref:`config-file/v2:search` Migrating from the web interface -------------------------------- diff --git a/docs/guides/searching-with-readthedocs.rst b/docs/guides/searching-with-readthedocs.rst index e7de8283114..f2b1fd8e091 100644 --- a/docs/guides/searching-with-readthedocs.rst +++ b/docs/guides/searching-with-readthedocs.rst @@ -2,7 +2,7 @@ Searching with Read the Docs ============================ Read the Docs uses :doc:`/server-side-search` to power our search. -This guide explains how to add a "search as you type" feature to your documentation, +This guide explains how to add a "search as you type" feature to your documentation, and how to use advanced query syntax to get more accurate results. You can find information on the search architecture and how we index documents in our @@ -20,7 +20,7 @@ Enable "search as you type" in your documentation documentation more closely with the search implementation of Read the Docs. It adds a clean and minimal full-page search UI that supports a **search as you type** feature. -To try this feature, +To try this feature, you can press :guilabel:`/` (forward slash) and start typing or just visit these URLs: - https://docs.readthedocs.io/?rtd_search=contributing diff --git a/docs/server-side-search.rst b/docs/server-side-search.rst index 08155d00cdb..fabd2f79814 100644 --- a/docs/server-side-search.rst +++ b/docs/server-side-search.rst @@ -25,7 +25,12 @@ Search results land on the exact content you were looking for We index every heading in the document, allowing you to get search results exactly to the content that you are searching for. Try this out by searching for `"full-text search"`_. - + +Full control over which results should be listed first + Set a custom rank per page, + allowing you to deprecate content, and always show relevant content to your users first. + See :ref:`config-file/v2:search.ranking`. + Search across projects you have access to (|com_brand|) This allows you to search across all the projects you access to in your Dashboard. **Don't remember where you found that document the other day? @@ -35,10 +40,10 @@ Special query syntax for more specific results. We support a full range of search queries. You can see some examples in our :ref:`guides/searching-with-readthedocs:search query syntax` guide. -.. +.. Code object searching With the user of :doc:`Sphinx Domains ` we are able to automatically provide direct search results to your Code objects. - You can try this out with our docs here by searching for + You can try this out with our docs here by searching for TODO: Find good examples in our docs, API maybe? .. _"full-text search": https://docs.readthedocs.io/en/latest/search.html?q=%22full-text+search%22 diff --git a/readthedocs/config/config.py b/readthedocs/config/config.py index 84dec15838f..244135277eb 100644 --- a/readthedocs/config/config.py +++ b/readthedocs/config/config.py @@ -10,7 +10,6 @@ from django.conf import settings from readthedocs.config.utils import list_to_dict, to_dict -from readthedocs.projects.constants import DOCUMENTATION_CHOICES from .find import find_one from .models import ( @@ -20,6 +19,7 @@ Python, PythonInstall, PythonInstallRequirements, + Search, Sphinx, Submodules, ) @@ -32,6 +32,7 @@ validate_dict, validate_list, validate_path, + validate_path_pattern, validate_string, ) @@ -155,6 +156,7 @@ class BuildConfigBase: 'sphinx', 'mkdocs', 'submodules', + 'search', ] default_build_image = settings.DOCKER_DEFAULT_VERSION @@ -652,6 +654,10 @@ def submodules(self): recursive=True, ) + @property + def search(self): + return Search(ranking={}) + class BuildConfigV2(BuildConfigBase): @@ -666,7 +672,6 @@ class BuildConfigV2(BuildConfigBase): 'dirhtml': 'sphinx_htmldir', 'singlehtml': 'sphinx_singlehtml', } - builders_display = dict(DOCUMENTATION_CHOICES) def validate(self): """ @@ -686,6 +691,7 @@ def validate(self): self._config['mkdocs'] = self.validate_mkdocs() self._config['sphinx'] = self.validate_sphinx() self._config['submodules'] = self.validate_submodules() + self._config['search'] = self.validate_search() self.validate_keys() def validate_formats(self): @@ -1013,6 +1019,35 @@ def validate_submodules(self): return submodules + def validate_search(self): + """ + Validates the search key. + + - Ranking is a map of path patterns to a rank. + - The path pattern supports basic globs (*, ?, [seq]). + - The rank can be a integer number between -10 and 10. + """ + raw_search = self._raw_config.get('search', {}) + with self.catch_validation_error('search'): + validate_dict(raw_search) + + search = {} + with self.catch_validation_error('search.ranking'): + ranking = self.pop_config('search.ranking', {}) + validate_dict(ranking) + + valid_rank_range = list(range(-10, 10 + 1)) + + final_ranking = {} + for pattern, rank in ranking.items(): + pattern = validate_path_pattern(pattern) + validate_choice(rank, valid_rank_range) + final_ranking[pattern] = rank + + search['ranking'] = final_ranking + + return search + def validate_keys(self): """ Checks that we don't have extra keys (invalid ones). @@ -1107,6 +1142,10 @@ def doctype(self): def submodules(self): return Submodules(**self._config['submodules']) + @property + def search(self): + return Search(**self._config['search']) + def load(path, env_config): """ diff --git a/readthedocs/config/models.py b/readthedocs/config/models.py index ba11bce29d4..55932e991d4 100644 --- a/readthedocs/config/models.py +++ b/readthedocs/config/models.py @@ -68,3 +68,8 @@ class Mkdocs(Base): class Submodules(Base): __slots__ = ('include', 'exclude', 'recursive') + + +class Search(Base): + + __slots__ = ('ranking',) diff --git a/readthedocs/config/tests/test_config.py b/readthedocs/config/tests/test_config.py index 3e6e5d665f6..3ee20d1653c 100644 --- a/readthedocs/config/tests/test_config.py +++ b/readthedocs/config/tests/test_config.py @@ -2,9 +2,9 @@ import re import textwrap from collections import OrderedDict +from unittest.mock import DEFAULT, patch import pytest -from unittest.mock import DEFAULT, patch from pytest import raises from readthedocs.config import ( @@ -44,7 +44,6 @@ from .utils import apply_fs - yaml_config_dir = { 'readthedocs.yml': textwrap.dedent( ''' @@ -766,6 +765,9 @@ def test_as_dict(tmpdir): 'exclude': [], 'recursive': True, }, + 'search': { + 'ranking': {}, + }, } assert build.as_dict() == expected_dict @@ -1834,6 +1836,78 @@ def test_submodules_recursive_explicit_default(self): assert build.submodules.exclude == [] assert build.submodules.recursive is False + @pytest.mark.parametrize('value', ['invalid', True, 0, []]) + def test_search_invalid_type(self, value): + build = self.get_build_config({ + 'search': value, + }) + with raises(InvalidConfig) as excinfo: + build.validate() + assert excinfo.value.key == 'search' + + @pytest.mark.parametrize( + 'value', + [ + 'invalid', + True, + 0, + [], + {'foo/bar': 11}, + {'foo/bar': -11}, + {'foo/bar': 2.5}, + {'foo/bar': 'bar'}, + {'/': 1}, + {'/foo/..': 1}, + {'..': 1}, + {'/foo/bar/../../../': 1}, + {10: 'bar'}, + {10: 0}, + ], + ) + def test_search_ranking_invalid_type(self, value): + build = self.get_build_config({ + 'search': {'ranking': value}, + }) + with raises(InvalidConfig) as excinfo: + build.validate() + assert excinfo.value.key == 'search.ranking' + + @pytest.mark.parametrize('value', list(range(-10, 10 + 1))) + def test_search_valid_ranking(self, value): + build = self.get_build_config({ + 'search': { + 'ranking': { + 'foo/bar': value, + 'bar/foo': value, + }, + }, + }) + build.validate() + assert build.search.ranking == {'foo/bar': value, 'bar/foo': value} + + @pytest.mark.parametrize('path, expected', [ + ('/foo/bar', 'foo/bar'), + ('///foo//bar', 'foo/bar'), + ('///foo//bar/', 'foo/bar'), + ('/foo/bar/../', 'foo'), + ('/foo*', 'foo*'), + ('/foo/bar/*', 'foo/bar/*'), + ('/foo/bar?/*', 'foo/bar?/*'), + ('foo/[bc]ar/*/', 'foo/[bc]ar/*'), + ('*', '*'), + ('index.html', 'index.html'), + ]) + def test_search_ranking_normilize_path(self, path, expected): + build = self.get_build_config({ + 'search': { + 'ranking': { + path: 1, + }, + }, + }) + build.validate() + assert build.search.ranking == {expected: 1} + @pytest.mark.parametrize('value,key', [ ({'typo': 'something'}, 'typo'), ( @@ -1972,5 +2046,8 @@ def test_as_dict(self, tmpdir): 'exclude': ALL, 'recursive': False, }, + 'search': { + 'ranking': {}, + }, } assert build.as_dict() == expected_dict diff --git a/readthedocs/config/validation.py b/readthedocs/config/validation.py index 095bf444630..21a10fd0bf4 100644 --- a/readthedocs/config/validation.py +++ b/readthedocs/config/validation.py @@ -2,12 +2,12 @@ import os - INVALID_BOOL = 'invalid-bool' INVALID_CHOICE = 'invalid-choice' INVALID_LIST = 'invalid-list' INVALID_DICT = 'invalid-dictionary' INVALID_PATH = 'invalid-path' +INVALID_PATH_PATTERN = 'invalid-path-pattern' INVALID_STRING = 'invalid-string' VALUE_NOT_FOUND = 'value-not-found' @@ -21,6 +21,7 @@ class ValidationError(Exception): INVALID_CHOICE: 'expected one of ({choices}), got {value}', INVALID_DICT: '{value} is not a dictionary', INVALID_PATH: 'path {value} does not exist', + INVALID_PATH_PATTERN: '{value} isn\'t a valid path pattern', INVALID_STRING: 'expected string', INVALID_LIST: 'expected list', VALUE_NOT_FOUND: '{value} not found', @@ -84,6 +85,27 @@ def validate_path(value, base_path): return rel_path +def validate_path_pattern(value): + """ + Normalize and validates a path pattern. + + - Normalizes the path stripping multiple ``/``. + - Expands relative paths. + - Checks the final path is relative to the root of the site ``/``. + """ + path = validate_string(value) + # Start the path with ``/`` to interprete the path as absolute to ``/``. + path = '/' + path.lstrip('/') + path = os.path.normpath(path) + if not os.path.isabs(path): + raise ValidationError(value, INVALID_PATH_PATTERN) + # Remove ``/`` from the path once is validated. + path = path.lstrip('/') + if not path: + raise ValidationError(value, INVALID_PATH_PATTERN) + return path + + def validate_string(value): """Check that ``value`` is a string type.""" if not isinstance(value, str): diff --git a/readthedocs/projects/migrations/0057_add_page_rank.py b/readthedocs/projects/migrations/0057_add_page_rank.py new file mode 100644 index 00000000000..a7017cc5c4f --- /dev/null +++ b/readthedocs/projects/migrations/0057_add_page_rank.py @@ -0,0 +1,19 @@ +# Generated by Django 2.2.12 on 2020-06-25 18:44 + +import django.core.validators +from django.db import migrations, models + + +class Migration(migrations.Migration): + + dependencies = [ + ('projects', '0056_add_disable_analytics'), + ] + + operations = [ + migrations.AddField( + model_name='importedfile', + name='rank', + field=models.IntegerField(default=0, null=True, validators=[django.core.validators.MinValueValidator(-10), django.core.validators.MaxValueValidator(10)], verbose_name='Page search rank'), + ), + ] diff --git a/readthedocs/projects/models.py b/readthedocs/projects/models.py index d47898805a6..b1c564413dd 100644 --- a/readthedocs/projects/models.py +++ b/readthedocs/projects/models.py @@ -12,6 +12,7 @@ from django.conf.urls import include from django.contrib.auth.models import User from django.core.files.storage import get_storage_class +from django.core.validators import MaxValueValidator, MinValueValidator from django.db import models from django.db.models import Prefetch from django.urls import re_path, reverse @@ -1352,6 +1353,13 @@ class ImportedFile(models.Model): commit = models.CharField(_('Commit'), max_length=255) build = models.IntegerField(_('Build id'), null=True) modified_date = models.DateTimeField(_('Modified date'), auto_now=True) + rank = models.IntegerField( + _('Page search rank'), + default=0, + # TODO: remove after migration + null=True, + validators=[MinValueValidator(-10), MaxValueValidator(10)], + ) def get_absolute_url(self): return resolve( diff --git a/readthedocs/projects/tasks.py b/readthedocs/projects/tasks.py index f2c27927872..f11abafcc42 100644 --- a/readthedocs/projects/tasks.py +++ b/readthedocs/projects/tasks.py @@ -16,6 +16,7 @@ import tarfile import tempfile from collections import Counter, defaultdict +from fnmatch import fnmatch import requests from celery.exceptions import SoftTimeLimitExceeded @@ -87,7 +88,6 @@ files_changed, ) - log = logging.getLogger(__name__) @@ -1135,6 +1135,7 @@ def update_app_instances( version_pk=self.version.pk, commit=self.build['commit'], build=self.build['id'], + search_ranking=self.config.search.ranking, ) def setup_python_environment(self): @@ -1277,7 +1278,7 @@ def is_type_sphinx(self): # Web tasks @app.task(queue='reindex') -def fileify(version_pk, commit, build): +def fileify(version_pk, commit, build, search_ranking): """ Create ImportedFile objects for all of a version's files. @@ -1311,7 +1312,12 @@ def fileify(version_pk, commit, build): }, ) try: - changed_files = _create_imported_files(version, commit, build) + changed_files = _create_imported_files( + version=version, + commit=commit, + build=build, + search_ranking=search_ranking, + ) except Exception: changed_files = set() log.exception('Failed during ImportedFile creation') @@ -1488,7 +1494,7 @@ def clean_build(version_pk): return True -def _create_imported_files(version, commit, build): +def _create_imported_files(*, version, commit, build, search_ranking): """ Create imported files for version. @@ -1547,6 +1553,17 @@ def _create_imported_files(version, commit, build): version_slug=version.slug, ), ) + + page_rank = 0 + # Last pattern to match takes precedence + # XXX: see if we can implement another type of precedence, + # like the longest pattern. + reverse_rankings = reversed(list(search_ranking.items())) + for pattern, rank in reverse_rankings: + if fnmatch(relpath, pattern): + page_rank = rank + break + # Create imported files from new build model_class.objects.create( project=version.project, @@ -1554,6 +1571,7 @@ def _create_imported_files(version, commit, build): path=relpath, name=filename, md5=md5, + rank=page_rank, commit=commit, build=build, ) diff --git a/readthedocs/rtd_tests/fixtures/spec/v2/schema.yml b/readthedocs/rtd_tests/fixtures/spec/v2/schema.yml index abe8685ff7f..2e34d08c7f0 100644 --- a/readthedocs/rtd_tests/fixtures/spec/v2/schema.yml +++ b/readthedocs/rtd_tests/fixtures/spec/v2/schema.yml @@ -30,6 +30,8 @@ mkdocs: include('mkdocs', required=False) # Submodules configuration submodules: include('submodules', required=False) +search: include('search', required=False) + # Redirects for the current version to be built # Key/value list, represent redirects of type `type` # from url -> to url @@ -127,3 +129,8 @@ submodules: # Do a recursive clone? # Default: false recursive: bool(required=False) + +search: + # Map of patterns to ranks + # Default: {} + ranking: map(str(), int(min=-10, max=10), required=False) diff --git a/readthedocs/rtd_tests/tests/test_celery.py b/readthedocs/rtd_tests/tests/test_celery.py index 21ea8b91a39..5ff0eaf1cd9 100644 --- a/readthedocs/rtd_tests/tests/test_celery.py +++ b/readthedocs/rtd_tests/tests/test_celery.py @@ -2,13 +2,13 @@ import shutil from os.path import exists from tempfile import mkdtemp +from unittest.mock import MagicMock, patch from allauth.socialaccount.models import SocialAccount -from django.test import TestCase from django.contrib.auth.models import User +from django.test import TestCase from django_dynamic_fixture import get from messages_extends.models import Message -from unittest.mock import MagicMock, patch from readthedocs.builds.constants import ( BUILD_STATE_TRIGGERED, @@ -327,7 +327,7 @@ def public_task_exception(): @patch('readthedocs.builds.managers.log') def test_fileify_logging_when_wrong_version_pk(self, mock_logger): self.assertFalse(Version.objects.filter(pk=345343).exists()) - tasks.fileify(version_pk=345343, commit=None, build=1) + tasks.fileify(version_pk=345343, commit=None, build=1, search_ranking={}) mock_logger.warning.assert_called_with("Version not found for given kwargs. {'pk': 345343}") @patch('readthedocs.oauth.services.github.GitHubService.send_build_status') diff --git a/readthedocs/rtd_tests/tests/test_imported_file.py b/readthedocs/rtd_tests/tests/test_imported_file.py index 26323b81842..1452eeaeaaf 100644 --- a/readthedocs/rtd_tests/tests/test_imported_file.py +++ b/readthedocs/rtd_tests/tests/test_imported_file.py @@ -1,5 +1,3 @@ -# -*- coding: utf-8 -*- - import os from unittest import mock @@ -7,15 +5,14 @@ from django.core.files.storage import get_storage_class from django.test import TestCase -from readthedocs.projects.models import ImportedFile, Project, HTMLFile +from readthedocs.projects.models import HTMLFile, ImportedFile, Project from readthedocs.projects.tasks import ( _create_imported_files, - _sync_imported_files, _create_intersphinx_data, + _sync_imported_files, ) from readthedocs.sphinx_domains.models import SphinxDomain - base_dir = os.path.dirname(os.path.dirname(__file__)) @@ -30,10 +27,16 @@ def setUp(self): self.test_dir = os.path.join(base_dir, 'files') self._copy_storage_dir() - - def _manage_imported_files(self, version, commit, build): + + def _manage_imported_files(self, version, commit, build, search_ranking=None): """Helper function for the tests to create and sync ImportedFiles.""" - _create_imported_files(version, commit, build) + search_ranking = search_ranking or {} + _create_imported_files( + version=version, + commit=commit, + build=build, + search_ranking=search_ranking, + ) _sync_imported_files(version, build, set()) def _copy_storage_dir(self): @@ -69,6 +72,65 @@ def test_update_commit(self): self._manage_imported_files(self.version, 'commit02', 2) self.assertEqual(ImportedFile.objects.first().commit, 'commit02') + def test_page_default_rank(self): + search_ranking = {} + self.assertEqual(HTMLFile.objects.count(), 0) + self._manage_imported_files(self.version, 'commit01', 1, search_ranking) + + self.assertEqual(HTMLFile.objects.count(), 2) + self.assertEqual(HTMLFile.objects.filter(rank=0).count(), 2) + + def test_page_custom_rank_glob(self): + search_ranking = { + '*index.html': 5, + } + self._manage_imported_files(self.version, 'commit01', 1, search_ranking) + + self.assertEqual(HTMLFile.objects.count(), 2) + file_api = HTMLFile.objects.get(path='api/index.html') + file_test = HTMLFile.objects.get(path='test.html') + self.assertEqual(file_api.rank, 5) + self.assertEqual(file_test.rank, 0) + + def test_page_custom_rank_several(self): + search_ranking = { + 'test.html': 5, + 'api/index.html': 2, + } + self._manage_imported_files(self.version, 'commit01', 1, search_ranking) + + self.assertEqual(HTMLFile.objects.count(), 2) + file_api = HTMLFile.objects.get(path='api/index.html') + file_test = HTMLFile.objects.get(path='test.html') + self.assertEqual(file_api.rank, 2) + self.assertEqual(file_test.rank, 5) + + def test_page_custom_rank_precedence(self): + search_ranking = { + '*.html': 5, + 'api/index.html': 2, + } + self._manage_imported_files(self.version, 'commit01', 1, search_ranking) + + self.assertEqual(HTMLFile.objects.count(), 2) + file_api = HTMLFile.objects.get(path='api/index.html') + file_test = HTMLFile.objects.get(path='test.html') + self.assertEqual(file_api.rank, 2) + self.assertEqual(file_test.rank, 5) + + def test_page_custom_rank_precedence_inverted(self): + search_ranking = { + 'api/index.html': 2, + '*.html': 5, + } + self._manage_imported_files(self.version, 'commit01', 1, search_ranking) + + self.assertEqual(HTMLFile.objects.count(), 2) + file_api = HTMLFile.objects.get(path='api/index.html') + file_test = HTMLFile.objects.get(path='test.html') + self.assertEqual(file_api.rank, 5) + self.assertEqual(file_test.rank, 5) + def test_update_content(self): test_dir = os.path.join(base_dir, 'files') self.assertEqual(ImportedFile.objects.count(), 0) @@ -128,7 +190,12 @@ def test_create_intersphinx_data(self, mock_exists): return_value=test_objects_inv ) as mock_fetch_inventory: - _create_imported_files(self.version, 'commit01', 1) + _create_imported_files( + version=self.version, + commit='commit01', + build=1, + search_ranking={}, + ) _create_intersphinx_data(self.version, 'commit01', 1) # there will be two html files, diff --git a/readthedocs/search/documents.py b/readthedocs/search/documents.py index 55efe0b4ca5..776e16338d5 100644 --- a/readthedocs/search/documents.py +++ b/readthedocs/search/documents.py @@ -57,6 +57,7 @@ class PageDocument(RTDDocTypeMixin, DocType): version = fields.KeywordField(attr='version.slug') path = fields.KeywordField(attr='processed_json.path') full_path = fields.KeywordField(attr='path') + rank = fields.IntegerField() # Searchable content title = fields.TextField(attr='processed_json.title') @@ -92,6 +93,12 @@ class Meta: fields = ('commit', 'build') ignore_signals = True + def prepare_rank(self, html_file): + # TODO: remove when the migration is done + if html_file.rank is None or not (-10 <= html_file.rank <= 10): + return 0 + return html_file.rank + def prepare_domains(self, html_file): """Prepares and returns the values for domains field.""" if not html_file.version.is_sphinx_type: diff --git a/readthedocs/search/faceted_search.py b/readthedocs/search/faceted_search.py index cf218b3344e..62a21ab417c 100644 --- a/readthedocs/search/faceted_search.py +++ b/readthedocs/search/faceted_search.py @@ -4,7 +4,13 @@ from elasticsearch import Elasticsearch from elasticsearch_dsl import FacetedSearch, TermsFacet from elasticsearch_dsl.faceted_search import NestedFacet -from elasticsearch_dsl.query import Bool, MultiMatch, Nested, SimpleQueryString +from elasticsearch_dsl.query import ( + Bool, + FunctionScore, + MultiMatch, + Nested, + SimpleQueryString, +) from readthedocs.core.utils.extend import SettingsOverrideObject from readthedocs.search.documents import PageDocument, ProjectDocument @@ -159,10 +165,12 @@ class PageSearchBase(RTDFacetedSearch): doc_types = [PageDocument] index = PageDocument._doc_type.index - _outer_fields = ['title^2'] - _section_fields = ['sections.title^3', 'sections.content'] + # boosting for these fields need to be close enough + # to be re-boosted by the page rank. + _outer_fields = ['title^1.5'] + _section_fields = ['sections.title^2', 'sections.content'] _domain_fields = [ - 'domains.name^2', + 'domains.name^1.5', 'domains.docstrings', ] fields = _outer_fields @@ -182,7 +190,7 @@ def total_count(self): return s.hits.total def query(self, search, query): - """Manipulates query to support nested query.""" + """Manipulates the query to support nested queries and a custom rank for pages.""" search = search.highlight_options(**self._highlight_options) all_queries = [] @@ -229,11 +237,74 @@ def query(self, search, query): ) all_queries.extend([sections_nested_query, domains_nested_query]) - final_query = Bool(should=all_queries) - search = search.query(final_query) + final_query = FunctionScore( + query=Bool(should=all_queries), + script_score=self._get_script_score(), + ) + search = search.query(final_query) return search + def _get_script_score(self): + """ + Gets an ES script to map the page rank to a valid score weight. + + ES expects the rank to be a number greater than 0, + but users can set this between [-10, +10]. + We map that range to [0.01, 2] (21 possible values). + + The first lower rank (0.8) needs to bring the score from the highest boost (sections.title^2) + close to the lowest boost (title^1.5), that way exact results take priority: + + - 2.0 * 0.8 = 1.6 (score close to 1.5, but not lower than it) + - 1.5 * 0.8 = 1.2 (score lower than 1.5) + + The first higher rank (1.2) needs to bring the score from the lowest boost (title^1.5) + close to the highest boost (sections.title^2), that way exact results take priority: + + - 2.0 * 1.3 = 2.6 (score higher thank 2.0) + - 1.5 * 1.3 = 1.95 (score close to 2.0, but not higher than it) + + The next lower and higher ranks need to decrease/increase both scores. + + See https://www.elastic.co/guide/en/elasticsearch/reference/current/query-dsl-script-score-query.html#field-value-factor # noqa + """ + ranking = [ + 0.01, + 0.05, + 0.1, + 0.2, + 0.3, + 0.4, + 0.5, + 0.6, + 0.7, + 0.8, + 1, + 1.3, + 1.4, + 1.5, + 1.6, + 1.7, + 1.8, + 1.9, + 1.93, + 1.96, + 2, + ] + # Each rank maps to a element in the ranking list. + # -10 will map to the first element (-10 + 10 = 0) and so on. + source = """ + int rank = doc['rank'].size() == 0 ? 0 : (int) doc['rank'].value; + return params.ranking[rank + 10] * _score; + """ + return { + "script": { + "source": source, + "params": {"ranking": ranking}, + }, + } + def generate_nested_query(self, query, path, fields, inner_hits): """Generate a nested query with passed parameters.""" queries = [] diff --git a/readthedocs/search/tests/test_api.py b/readthedocs/search/tests/test_api.py index 1331a5e0aa0..c7bf9728fcd 100644 --- a/readthedocs/search/tests/test_api.py +++ b/readthedocs/search/tests/test_api.py @@ -508,6 +508,111 @@ def test_search_advanced_query_detection(self, api_client): assert len(results) > 0 assert 'Index' in results[0]['title'] + def test_search_custom_ranking(self, api_client): + project = Project.objects.get(slug='docs') + version = project.versions.all().first() + + page_index = HTMLFile.objects.get(path='index.html') + page_guides = HTMLFile.objects.get(path='guides/index.html') + + # Query with the default ranking + assert page_index.rank == 0 + assert page_guides.rank == 0 + + search_params = { + 'project': project.slug, + 'version': version.slug, + 'q': '"content from"', + } + resp = self.get_search(api_client, search_params) + assert resp.status_code == 200 + + results = resp.data['results'] + assert len(results) == 2 + assert results[0]['full_path'] == 'index.html' + assert results[1]['full_path'] == 'guides/index.html' + + # Query with a higher rank over guides/index.html + page_guides.rank = 5 + page_guides.save() + PageDocument().update(page_guides) + + search_params = { + 'project': project.slug, + 'version': version.slug, + 'q': '"content from"', + } + resp = self.get_search(api_client, search_params) + assert resp.status_code == 200 + + results = resp.data['results'] + assert len(results) == 2 + assert results[0]['full_path'] == 'guides/index.html' + assert results[1]['full_path'] == 'index.html' + + # Query with a lower rank over index.html + page_index.rank = -2 + page_index.save() + page_guides.rank = 4 + page_guides.save() + PageDocument().update(page_index) + PageDocument().update(page_guides) + + search_params = { + 'project': project.slug, + 'version': version.slug, + 'q': '"content from"', + } + resp = self.get_search(api_client, search_params) + assert resp.status_code == 200 + + results = resp.data['results'] + assert len(results) == 2 + assert results[0]['full_path'] == 'guides/index.html' + assert results[1]['full_path'] == 'index.html' + + # Query with a lower rank over index.html + page_index.rank = 3 + page_index.save() + page_guides.rank = 6 + page_guides.save() + PageDocument().update(page_index) + PageDocument().update(page_guides) + + search_params = { + 'project': project.slug, + 'version': version.slug, + 'q': '"content from"', + } + resp = self.get_search(api_client, search_params) + assert resp.status_code == 200 + + results = resp.data['results'] + assert len(results) == 2 + assert results[0]['full_path'] == 'guides/index.html' + assert results[1]['full_path'] == 'index.html' + + # Query with a same rank over guides/index.html and index.html + page_index.rank = -10 + page_index.save() + page_guides.rank = -10 + page_guides.save() + PageDocument().update(page_index) + PageDocument().update(page_guides) + + search_params = { + 'project': project.slug, + 'version': version.slug, + 'q': '"content from"', + } + resp = self.get_search(api_client, search_params) + assert resp.status_code == 200 + + results = resp.data['results'] + assert len(results) == 2 + assert results[0]['full_path'] == 'index.html' + assert results[1]['full_path'] == 'guides/index.html' + class TestDocumentSearch(BaseTestDocumentSearch):