diff --git a/docs/user/analytics.rst b/docs/user/analytics.rst index 0762dddb206..bc20b83962e 100644 --- a/docs/user/analytics.rst +++ b/docs/user/analytics.rst @@ -16,7 +16,7 @@ and then click on :guilabel:`Traffic Analytics`. Traffic analytics demo -You can also access to analytics data from :ref:`search results `. +You can also access to analytics data from :ref:`search results `. .. note:: diff --git a/docs/user/api/v3.rst b/docs/user/api/v3.rst index 9772927f387..39f82a07e00 100644 --- a/docs/user/api/v3.rst +++ b/docs/user/api/v3.rst @@ -1882,4 +1882,4 @@ Embed Additional APIs --------------- -- :ref:`Server side search API `. +- :doc:`Server side search API `. diff --git a/docs/user/build-customization.rst b/docs/user/build-customization.rst index 18d56fa8a72..6f0655c2a16 100644 --- a/docs/user/build-customization.rst +++ b/docs/user/build-customization.rst @@ -409,7 +409,7 @@ Read the Docs will automatically index the content of all your HTML files, respecting the :ref:`search ` options from your config file. You can access the search results from the :guilabel:`Search` tab of your project, -or by using the :ref:`search API `. +or by using the :doc:`/server-side-search/api`. .. note:: diff --git a/docs/user/config-file/v2.rst b/docs/user/config-file/v2.rst index c29509efc85..9a83b1ed580 100644 --- a/docs/user/config-file/v2.rst +++ b/docs/user/config-file/v2.rst @@ -684,7 +684,7 @@ Do a recursive clone of the submodules. search ~~~~~~ -Settings for more control over :doc:`/server-side-search`. +Settings for more control over :doc:`/server-side-search/index`. .. code-block:: yaml diff --git a/docs/user/features.rst b/docs/user/features.rst index 51f03aaed6a..8af36093d5e 100644 --- a/docs/user/features.rst +++ b/docs/user/features.rst @@ -70,10 +70,10 @@ We offer a number of search features: * Search across :doc:`subprojects ` * Search results land on the exact content you were looking for -* Search across projects you have access to (available on |com_brand|) -* A full range of :doc:`search operators ` including exact matching and excluding phrases. +* Search across projects you have access to +* A full range of :doc:`search operators ` including exact matching and excluding phrases. -Learn more about :doc:`/server-side-search`. +Learn more about :doc:`/server-side-search/index`. Open Source and Customer Focused -------------------------------- diff --git a/docs/user/flyout-menu.rst b/docs/user/flyout-menu.rst index 2107c591d8f..c88bae32f54 100644 --- a/docs/user/flyout-menu.rst +++ b/docs/user/flyout-menu.rst @@ -15,7 +15,7 @@ The flyout menu provides access to the following bits of Read the Docs functiona * :doc:`Downloadable formats ` for the current version, including HTML & PDF downloads that are enabled by the project. * Links to the Read the Docs dashboard for the project. * Links to your :doc:`VCS provider ` that allow the user to quickly find the exact file that the documentation was rendered from. -* A search bar that gives users access to our :doc:`/server-side-search` of the current version. +* A search bar that gives users access to our :doc:`/server-side-search/index` of the current version. Closed ~~~~~~ diff --git a/docs/user/guides/administrators.rst b/docs/user/guides/administrators.rst index 6cdad8d03d2..eca54090f06 100644 --- a/docs/user/guides/administrators.rst +++ b/docs/user/guides/administrators.rst @@ -14,7 +14,6 @@ have a look at our :doc:`/tutorial/index`. technical-docs-seo-guide manage-translations-sphinx - advanced-search hiding-a-version deprecating-content pdf-non-ascii-languages diff --git a/docs/user/guides/advanced-search.rst b/docs/user/guides/advanced-search.rst deleted file mode 100644 index f00a847eada..00000000000 --- a/docs/user/guides/advanced-search.rst +++ /dev/null @@ -1,97 +0,0 @@ -Using advanced search features -============================== - -Read the Docs uses :doc:`/server-side-search` to power our search. -This guide explains how to add a "search as you type" feature to your documentation, -and how to use advanced query syntax to get more accurate results. - -.. contents:: Table of contents - :local: - :backlinks: none - :depth: 3 - -Enable "search as you type" in your documentation -------------------------------------------------- - -`readthedocs-sphinx-search`_ is a Sphinx extension that integrates your -documentation more closely with the search implementation of Read the Docs. -It adds a clean and minimal full-page search UI that supports a **search as you type** feature. - -To try this feature, -you can press :guilabel:`/` (forward slash) and start typing or just visit these URLs: - -- https://docs.readthedocs.io/?rtd_search=contributing -- https://docs.readthedocs.io/?rtd_search=api/v3/projects/ - -Search query syntax -------------------- - -Read the Docs uses the `Simple Query String`_ feature from `Elasticsearch`_. -This means that as the search query becomes more complex, -the results yielded become more specific. - -Exact phrase search -~~~~~~~~~~~~~~~~~~~ - -If a query is wrapped in ``"`` (double quotes), -then only those results where the phrase is exactly matched will be returned. - -Example queries: - -- https://docs.readthedocs.io/?rtd_search=%22custom%20css%22 -- https://docs.readthedocs.io/?rtd_search=%22adding%20a%20subproject%22 -- https://docs.readthedocs.io/?rtd_search=%22when%20a%20404%20is%20returned%22 - -Exact phrase search with slop value -~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ - -``~N`` (tilde N) after a phrase signifies slop amount. -It can be used to match words that are near one another. - -Example queries: - -- https://docs.readthedocs.io/?rtd_search=%22dashboard%20admin%22~2 -- https://docs.readthedocs.io/?rtd_search=%22single%20documentation%22~1 -- https://docs.readthedocs.io/?rtd_search=%22read%20the%20docs%20story%22~5 - -Prefix query -~~~~~~~~~~~~ - -``*`` (asterisk) at the end of any term signifies a prefix query. -It returns the results containing the words with specific prefix. - -Example queries: - -- https://docs.readthedocs.io/?rtd_search=API%20v* -- https://docs.readthedocs.io/?rtd_search=single%20v*%20doc* -- https://docs.readthedocs.io/?rtd_search=build*%20and%20c*%20to%20doc* - -Fuzzy query -~~~~~~~~~~~ - -``~N`` after a word signifies edit distance (fuzziness). -This type of query is helpful when the exact spelling of the keyword is unknown. -It returns results that contain terms similar to the search term as measured by a `Levenshtein edit distance`_. - -Example queries: - -- https://docs.readthedocs.io/?rtd_search=reedthedcs~2 -- https://docs.readthedocs.io/?rtd_search=authentation~3 -- https://docs.readthedocs.io/?rtd_search=configurtion~1 - - -Build complex queries -~~~~~~~~~~~~~~~~~~~~~ - -The search query syntaxes described in the previous sections can be used with one another to build complex queries. - -For example: - -- https://docs.readthedocs.io/?rtd_search=auto*%20redirect* -- https://docs.readthedocs.io/?rtd_search=abandon*%20proj* -- https://docs.readthedocs.io/?rtd_search=localisation~3%20of%20doc* - -.. _Elasticsearch: https://www.elastic.co/products/elasticsearch -.. _readthedocs-sphinx-search: https://readthedocs-sphinx-search.readthedocs.io/ -.. _Simple Query String: https://www.elastic.co/guide/en/elasticsearch/reference/current/query-dsl-simple-query-string-query.html# -.. _Levenshtein edit distance: https://en.wikipedia.org/wiki/Levenshtein_distance diff --git a/docs/user/index.rst b/docs/user/index.rst index c47a6bf427e..b7abf868b75 100644 --- a/docs/user/index.rst +++ b/docs/user/index.rst @@ -78,7 +78,7 @@ and some of the core features of Read the Docs. :doc:`/versions` | :doc:`/downloadable-documentation` | :doc:`/hosting` | - :doc:`/server-side-search` | + :doc:`/server-side-search/index` | :doc:`/analytics` | :doc:`/pull-requests` | :doc:`/build-notifications` | @@ -111,7 +111,7 @@ and some of the core features of Read the Docs. /versions /downloadable-documentation /hosting - /server-side-search + /server-side-search/index /analytics /pull-requests /build-notifications @@ -145,7 +145,6 @@ and how to write successful documentation. * **For project administrators**: :doc:`/guides/technical-docs-seo-guide` | :doc:`/guides/manage-translations-sphinx` | - :doc:`/guides/advanced-search` | :doc:`/guides/private-submodules` | Setup Build Notifications | :doc:`More guides for administrators ` diff --git a/docs/user/server-side-search.rst b/docs/user/server-side-search.rst deleted file mode 100644 index 87bbc285568..00000000000 --- a/docs/user/server-side-search.rst +++ /dev/null @@ -1,202 +0,0 @@ -Server Side Search -================== - -Read the Docs provides full-text search across all of the pages of all projects, -this is powered by Elasticsearch_. -You can search all projects at https://readthedocs.org/search/, -or search only on your project from the :guilabel:`Search` tab of your project. - -We override the default search engine of your project with our search engine -to provide you better results within your project. -We fallback to the built-in search engine from your project -if our search engine doesn't return any results, -just in case we missed something |:smile:| - -Search features ---------------- - -We offer a number of benefits compared to other documentation hosts: - -Search across :doc:`subprojects ` - Subprojects allow you to host multiple discrete projects on a single domain. - Every subproject hosted on that same domain is included in the search results of the main project. - -Search results land on the exact content you were looking for - We index every heading in the document, - allowing you to get search results exactly to the content that you are searching for. - Try this out by searching for `"full-text search"`_. - -Full control over which results should be listed first - Set a custom rank per page, - allowing you to deprecate content, and always show relevant content to your users first. - See :ref:`config-file/v2:search.ranking`. - -Search across projects you have access to (|com_brand|) - This allows you to search across all the projects you access to in your Dashboard. - **Don't remember where you found that document the other day? - No problem, you can search across them all.** - -Special query syntax for more specific results. - We support a full range of search queries. - You can see some examples in our :ref:`guides/advanced-search:search query syntax` guide. - -Configurable. - Tweak search results according to your needs using a - :ref:`configuration file `. - -.. - Code object searching - With the user of :doc:`Sphinx Domains ` we are able to automatically provide direct search results to your Code objects. - You can try this out with our docs here by searching for - TODO: Find good examples in our docs, API maybe? - -.. _"full-text search": https://docs.readthedocs.io/en/latest/search.html?q=%22full-text+search%22 - -Search Analytics ----------------- - -Know what your users are looking for in your docs. -To see a list of the top queries and an overview from the last month, -go to the :guilabel:`Admin` tab of your project, -and then click on :guilabel:`Search Analytics`. - -.. figure:: /_static/images/search-analytics-demo.png - :width: 50% - :align: center - :alt: Search analytics demo - - Search analytics demo - -.. _Elasticsearch: https://www.elastic.co/products/elasticsearch - -API ---- - -If you are using :doc:`/commercial/index` you will need to replace -https://readthedocs.org/ with https://readthedocs.com/ in all the URLs used in the following examples. -Check :ref:`server-side-search:authentication and authorization` if you are using private versions. - -.. warning:: - - This API isn't stable yet, some small things may change in the future. - -.. http:get:: /api/v2/search/ - - Return a list of search results for a project, - including results from its :doc:`/subprojects`. - Results are divided into sections with highlights of the matching term. - - .. Request - - :query q: Search query - :query project: Project slug - :query version: Version slug - :query page: Jump to a specific page - :query page_size: Limits the results per page, default is 50 - - .. Response - - :>json string type: The type of the result, currently page is the only type. - :>json string project: The project slug - :>json string project_alias: Alias of the project if it's a subproject. - :>json string version: The version slug - :>json string title: The title of the page - :>json string domain: Canonical domain of the resulting page - :>json string path: Path to the resulting page - :>json object highlights: An object containing a list of substrings with matching terms. - Note that the text is HTML escaped with the matching terms inside a tag. - :>json object blocks: - - A list of block objects containing search results from the page. - Currently, there are two types of blocks: - - - section: A page section with a linkable anchor (``id`` attribute). - - domain: A Sphinx :doc:`domain ` - with a linkable anchor (``id`` attribute). - - - **Example request**: - - .. tabs:: - - .. code-tab:: bash - - $ curl "https://readthedocs.org/api/v2/search/?project=docs&version=latest&q=server%20side%20search" - - .. code-tab:: python - - import requests - URL = 'https://readthedocs.org/api/v2/search/' - params = { - 'q': 'server side search', - 'project': 'docs', - 'version': 'latest', - } - response = requests.get(URL, params=params) - print(response.json()) - - **Example response**: - - .. sourcecode:: json - - { - "count": 41, - "next": "https://readthedocs.org/api/v2/search/?page=2&project=read-the-docs&q=server+side+search&version=latest", - "previous": null, - "results": [ - { - "type": "page", - "project": "docs", - "project_alias": null, - "version": "latest", - "title": "Server Side Search", - "domain": "https://docs.readthedocs.io", - "path": "/en/latest/server-side-search.html", - "highlights": { - "title": [ - "Server Side Search" - ] - }, - "blocks": [ - { - "type": "section", - "id": "server-side-search", - "title": "Server Side Search", - "content": "Read the Docs provides full-text search across all of the pages of all projects, this is powered by Elasticsearch.", - "highlights": { - "title": [ - "Server Side Search" - ], - "content": [ - "You can search all projects at https://readthedocs.org/search/" - ] - } - }, - { - "type": "domain", - "role": "http:get", - "name": "/_/api/v2/search/", - "id": "get--_-api-v2-search-", - "content": "Retrieve search results for docs", - "highlights": { - "name": [""], - "content": ["Retrieve search results for docs"] - } - } - ] - }, - ] - } - -Authentication and authorization -~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ - -If you are using :ref:`private versions `, -users will only be allowed to search projects they have permissions over. -Authentication and authorization is done using the current session, -or any of the valid :doc:`sharing methods `. - -To be able to use the user's current session you need to use the API from the domain where your docs are being served -(``/_/api/v2/search/``). -This is ``https://docs.readthedocs-hosted.com/_/api/v2/search/`` -for the ``https://docs.readthedocs-hosted.com/`` project, for example. diff --git a/docs/user/server-side-search/api.rst b/docs/user/server-side-search/api.rst new file mode 100644 index 00000000000..f954c0c781b --- /dev/null +++ b/docs/user/server-side-search/api.rst @@ -0,0 +1,288 @@ +Server Side Search API +====================== + +You can integrate our :doc:`server side search ` in your documentation by using our API. + +If you are using :doc:`/commercial/index` you will need to replace +https://readthedocs.org/ with https://readthedocs.com/ in all the URLs used in the following examples. +Check :ref:`server-side-search/api:authentication and authorization` if you are using private versions. + +.. contents:: Table of contents + :local: + :backlinks: none + :depth: 3 + +API V3 +------ + +.. http:get:: /api/v3/search/ + + Return a list of search results for a project or subset of projects. + Results are divided into sections with highlights of the matching term. + + .. Request + + :query q: Search query (see :doc:`/server-side-search/syntax`) + :query page: Jump to a specific page + :query page_size: Limits the results per page, default is 50 + + .. Response + + :>json string type: The type of the result, currently page is the only type. + :>json string project: The project object + :>json string version: The version object + :>json string title: The title of the page + :>json string domain: Canonical domain of the resulting page + :>json string path: Path to the resulting page + :>json object highlights: An object containing a list of substrings with matching terms. + Note that the text is HTML escaped with the matching terms inside a tag. + :>json object blocks: + + A list of block objects containing search results from the page. + Currently, there are two types of blocks: + + - section: A page section with a linkable anchor (``id`` attribute). + - domain: A Sphinx :doc:`domain ` + with a linkable anchor (``id`` attribute). + + + **Example request**: + + .. tabs:: + + .. code-tab:: bash + + $ curl "https://readthedocs.org/api/v3/search/?q=project:docs%20server%20side%20search" + + .. code-tab:: python + + import requests + URL = 'https://readthedocs.org/api/v3/search/' + params = { + 'q': 'project:docs server side search', + } + response = requests.get(URL, params=params) + print(response.json()) + + **Example response**: + + .. sourcecode:: json + + { + "count": 41, + "next": "https://readthedocs.org/api/v3/search/?page=2&q=project:docs%20server+side+search", + "previous": null, + "projects": [ + { + "slug": "docs", + "versions": [ + {"slug": "latest"} + ] + } + ], + "query": "server side search", + "results": [ + { + "type": "page", + "project": { + "slug": "docs", + "alias": null + }, + "version": { + "slug": "latest" + }, + "title": "Server Side Search", + "domain": "https://docs.readthedocs.io", + "path": "/en/latest/server-side-search.html", + "highlights": { + "title": [ + "Server Side Search" + ] + }, + "blocks": [ + { + "type": "section", + "id": "server-side-search", + "title": "Server Side Search", + "content": "Read the Docs provides full-text search across all of the pages of all projects, this is powered by Elasticsearch.", + "highlights": { + "title": [ + "Server Side Search" + ], + "content": [ + "You can search all projects at https://readthedocs.org/search/" + ] + } + }, + { + "type": "domain", + "role": "http:get", + "name": "/_/api/v2/search/", + "id": "get--_-api-v2-search-", + "content": "Retrieve search results for docs", + "highlights": { + "name": [""], + "content": ["Retrieve search results for docs"] + } + } + ] + }, + ] + } + + +Migrating from API V2 +~~~~~~~~~~~~~~~~~~~~~ + +Instead of using query arguments to specify the project +and version to search, you need to do it from the query itself, +this is if you had the following parameters: + +- project: docs +- version: latest +- q: test + +Now you need to use: + +- q: project:docs/latest test + +The response of the API is very similar to V2, +with the following changes: + +- ``project`` is an object, not a string. +- ``version`` is an object, not a string. +- ``project_alias`` isn't present, + it is contained in the ``project`` object. + +When searching on a parent project, +results from their subprojects won't be included automatically, +to include results from subprojects use the ``subprojects`` paramater. + +Authentication and authorization +-------------------------------- + +If you are using :ref:`private versions `, +users will only be allowed to search projects they have permissions over. +Authentication and authorization is done using the current session, +or any of the valid :doc:`sharing methods `. + +To be able to use the user's current session you need to use the API from the domain where your docs are being served +(``/_/api/v3/search/``). +This is ``https://docs.readthedocs-hosted.com/_/api/v3/search/`` +for the ``https://docs.readthedocs-hosted.com/`` project, for example. + +API V2 (deprecated) +------------------- + +.. note:: + + Please use our :ref:`server-side-search/api:api v3` instead, + see :ref:`server-side-search/api:migrating from api v2`. + +.. http:get:: /api/v2/search/ + + Return a list of search results for a project, + including results from its :doc:`/subprojects`. + Results are divided into sections with highlights of the matching term. + + .. Request + + :query q: Search query + :query project: Project slug + :query version: Version slug + :query page: Jump to a specific page + :query page_size: Limits the results per page, default is 50 + + .. Response + + :>json string type: The type of the result, currently page is the only type. + :>json string project: The project slug + :>json string project_alias: Alias of the project if it's a subproject. + :>json string version: The version slug + :>json string title: The title of the page + :>json string domain: Canonical domain of the resulting page + :>json string path: Path to the resulting page + :>json object highlights: An object containing a list of substrings with matching terms. + Note that the text is HTML escaped with the matching terms inside a tag. + :>json object blocks: + + A list of block objects containing search results from the page. + Currently, there are two types of blocks: + + - section: A page section with a linkable anchor (``id`` attribute). + - domain: A Sphinx :doc:`domain ` + with a linkable anchor (``id`` attribute). + + + **Example request**: + + .. tabs:: + + .. code-tab:: bash + + $ curl "https://readthedocs.org/api/v2/search/?project=docs&version=latest&q=server%20side%20search" + + .. code-tab:: python + + import requests + URL = 'https://readthedocs.org/api/v2/search/' + params = { + 'q': 'server side search', + 'project': 'docs', + 'version': 'latest', + } + response = requests.get(URL, params=params) + print(response.json()) + + **Example response**: + + .. sourcecode:: json + + { + "count": 41, + "next": "https://readthedocs.org/api/v2/search/?page=2&project=read-the-docs&q=server+side+search&version=latest", + "previous": null, + "results": [ + { + "type": "page", + "project": "docs", + "project_alias": null, + "version": "latest", + "title": "Server Side Search", + "domain": "https://docs.readthedocs.io", + "path": "/en/latest/server-side-search.html", + "highlights": { + "title": [ + "Server Side Search" + ] + }, + "blocks": [ + { + "type": "section", + "id": "server-side-search", + "title": "Server Side Search", + "content": "Read the Docs provides full-text search across all of the pages of all projects, this is powered by Elasticsearch.", + "highlights": { + "title": [ + "Server Side Search" + ], + "content": [ + "You can search all projects at https://readthedocs.org/search/" + ] + } + }, + { + "type": "domain", + "role": "http:get", + "name": "/_/api/v2/search/", + "id": "get--_-api-v2-search-", + "content": "Retrieve search results for docs", + "highlights": { + "name": [""], + "content": ["Retrieve search results for docs"] + } + } + ] + }, + ] + } diff --git a/docs/user/server-side-search/index.rst b/docs/user/server-side-search/index.rst new file mode 100644 index 00000000000..cd19098e682 --- /dev/null +++ b/docs/user/server-side-search/index.rst @@ -0,0 +1,102 @@ +Server Side Search +================== + +Read the Docs provides full-text search across all of the pages of all projects, +this is powered by Elasticsearch_. +You can search all projects at https://readthedocs.org/search/, +or search only on your project from the :guilabel:`Search` tab of your project. + +.. contents:: Table of contents + :local: + :backlinks: none + :depth: 3 + +.. toctree:: + :maxdepth: 2 + :glob: + :hidden: + + * + +Search features +--------------- + +We offer a number of benefits compared to other documentation hosts: + +Search across :doc:`subprojects ` + Subprojects allow you to host multiple discrete projects on a single domain. + Every subproject hosted on that same domain is included in the search results of the main project. + +Search results land on the exact content you were looking for + We index every heading in the document, + allowing you to get search results exactly to the content that you are searching for. + Try this out by searching for `"full-text search"`_. + +Full control over which results should be listed first + Set a custom rank per page, + allowing you to deprecate content, and always show relevant content to your users first. + See :ref:`config-file/v2:search.ranking`. + +Search across projects you have access to + Search across all the projects you access to in your Dashboard. + **Don't remember where you found that document the other day? + No problem, you can search across them all.** + + You can also specify what projects you want to search + using the ``project:{name}`` syntax, for example: + `"project:docs project:dev search"`_. + See :doc:`/server-side-search/syntax`. + +Special query syntax for more specific results + We support a full range of search queries. + You can see some examples at :ref:`server-side-search/syntax:special queries`. + +Configurable + Tweak search results according to your needs using a + :ref:`configuration file `. + +Ready to use + We override the default search engine of your Sphinx project with ours + to provide you with all these benefits within your project. + We fallback to the built-in search engine from your project if ours doesn't return any results, + just in case we missed something |:smile:|. + +API + Integrate our search as you like. + See :doc:`/server-side-search/api`. + +.. _"full-text search": https://docs.readthedocs.io/en/latest/search.html?q=%22full-text+search%22 +.. _"project:docs project:dev search": https://docs.readthedocs.io/en/latest/search.html?q=project:docs+project:dev+search + +Search analytics +---------------- + +Know what your users are looking for in your docs. +To see a list of the top queries and an overview from the last month, +go to the :guilabel:`Admin` tab of your project, +and then click on :guilabel:`Search Analytics`. + +.. figure:: /_static/images/search-analytics-demo.png + :width: 50% + :align: center + :alt: Search analytics demo + + Search analytics demo + +.. _Elasticsearch: https://www.elastic.co/products/elasticsearch + + +Search as you type +------------------ + +`readthedocs-sphinx-search`_ is a Sphinx extension that integrates your +documentation more closely with the search implementation of Read the Docs. +It adds a clean and minimal full-page search UI that supports a **search as you type** feature. + +To try this feature, +you can press :guilabel:`/` (forward slash) and start typing or just visit these URLs: + +- https://docs.readthedocs.io/?rtd_search=contributing +- https://docs.readthedocs.io/?rtd_search=api/v3/projects/ + +.. _readthedocs-sphinx-search: https://readthedocs-sphinx-search.readthedocs.io/ diff --git a/docs/user/server-side-search/syntax.rst b/docs/user/server-side-search/syntax.rst new file mode 100644 index 00000000000..8db8c31a9d2 --- /dev/null +++ b/docs/user/server-side-search/syntax.rst @@ -0,0 +1,145 @@ +Search Query Syntax +=================== + +When searching on Read the Docs, you can use some parameters in your +query in order to search on given projects, versions, +or to get more accurate results. + +.. contents:: Table of contents + :local: + :backlinks: none + :depth: 3 + +Parameters +---------- + +Parameters are in the form of ``name:value``, +they can appear anywhere in the query, +and depending on the parameter, you can use one or more of them. + +Any other text that isn't a parameter will be part of the search query. +If you don't want your search term to be interpreted as a parameter, +you can escape it like ``project\:docs``. + +.. note:: + + Unknown parameters like ``foo:bar`` don't require escaping + +The available parameters are: + +project + Indicates the project and version to includes results from + (this doesn’t include subprojects or translations). + If the version isn’t provided, the default version will be used. + More than one parameter can be included. + + Examples: + + - ``project:docs test`` + - ``project:docs/latest test`` + - ``project:docs/stable project:dev test`` + +subprojects + Include results from the given project and its subprojects. + If the version isn't provided, the default version of all projects will be used, + if a version is provided, all subprojects matching that version + will be included, and if they don't have a version with that name, + we use their default version. + More than one parameter can be included. + + Examples: + + - ``subprojects:docs test`` + - ``subprojects:docs/latest test`` + - ``subprojects:docs/stable subprojects:dev test`` + +user + Include results from projects the given user has access to. + The only supported value is ``@me``, + which is an alias for the current user. + Only one parameter can be included, + if duplicated, the last one will override the previous one. + + Examples: + + - ``user:@me test`` + +Permissions +~~~~~~~~~~~ + +If the user doesn’t have permission over one version, +or if the version doesn’t exist, we don’t include results from that version. + +The API will return all the projects that were used in the final search, +with that information you can check which projects were used in the search. + +Limitations +~~~~~~~~~~~ + +In order to keep our search usable for everyone, +you can search up to 100 projects at a time. +If the resulting query includes more than 100 projects, +they will be omitted from the final search. + +This syntax is only available when using our search API V3 +or when using the global search (https://readthedocs.org/search/). + +Searching multiple versions of the same project isn't supported, +the last version will override the previous one. + +Special queries +--------------- + +Read the Docs uses the `Simple Query String`_ feature from `Elasticsearch`_. +This means that as the search query becomes more complex, +the results yielded become more specific. + +.. _Simple Query String: https://www.elastic.co/guide/en/elasticsearch/reference/current/query-dsl-simple-query-string-query.html# +.. _Elasticsearch: https://www.elastic.co/products/elasticsearch + +Exact phrase search +~~~~~~~~~~~~~~~~~~~ + +If a query is wrapped in ``"`` (double quotes), +then only those results where the phrase is exactly matched will be returned. + +Examples: + +- ``"custom css"`` +- ``"adding a subproject"`` +- ``"when a 404 is returned"`` + +Prefix query +~~~~~~~~~~~~ + +``*`` (asterisk) at the end of any term signifies a prefix query. +It returns the results containing the words with specific prefix. + +Examples: + +- ``test*`` +- ``build*`` + +Fuzziness +~~~~~~~~~ + +``~N`` (tilde followed by a number) after a word indicates edit distance (fuzziness). +This type of query is helpful when the exact spelling of the keyword is unknown. +It returns results that contain terms similar to the search term. + +Examples: + +- ``doks~1`` +- ``test~2`` +- ``getter~2`` + +Words close to each other +~~~~~~~~~~~~~~~~~~~~~~~~~ + +``~N`` (tilde followed by a number) after a phrase can be used to match words that are near to each other. + +Examples: + +- ``"dashboard admin"~2`` +- ``"single documentation"~1`` +- ``"read the docs policy"~5`` diff --git a/docs/user/tutorial/index.rst b/docs/user/tutorial/index.rst index fba1f577a65..21d7cefa97c 100644 --- a/docs/user/tutorial/index.rst +++ b/docs/user/tutorial/index.rst @@ -626,7 +626,7 @@ Browsing Search Analytics ~~~~~~~~~~~~~~~~~~~~~~~~~ Apart from traffic analytics, Read the Docs also offers the possibility -to inspect :ref:`what search terms your readers use ` +to inspect :ref:`what search terms your readers use ` on your documentation. This can inform decisions on what areas to reinforce, or what parts of your project are less understood or more difficult to find. diff --git a/readthedocs/api/v3/proxied_urls.py b/readthedocs/api/v3/proxied_urls.py index 51dd7b53704..d446203d37a 100644 --- a/readthedocs/api/v3/proxied_urls.py +++ b/readthedocs/api/v3/proxied_urls.py @@ -5,12 +5,14 @@ so they can make use of features that require to have access to their cookies. """ -from django.conf.urls import re_path +from django.urls import path, re_path from readthedocs.api.v3.proxied_views import ProxiedEmbedAPI +from readthedocs.search.api.v3.views import ProxiedSearchAPI api_proxied_urls = [ - re_path(r'embed/', ProxiedEmbedAPI.as_view(), name='embed_api_v3'), + re_path("embed/", ProxiedEmbedAPI.as_view(), name="embed_api_v3"), + path("search/", ProxiedSearchAPI.as_view(), name="search_api_v3"), ] urlpatterns = api_proxied_urls diff --git a/readthedocs/builds/querysets.py b/readthedocs/builds/querysets.py index b0e23d644e0..7dde39fe718 100644 --- a/readthedocs/builds/querysets.py +++ b/readthedocs/builds/querysets.py @@ -103,7 +103,9 @@ def public( if user.is_superuser: queryset = self.all() else: - queryset = self._add_from_user_projects(queryset, user) + queryset = self._add_from_user_projects( + queryset, user, admin=True, member=True + ) if project: queryset = queryset.filter(project=project) if only_active: diff --git a/readthedocs/core/templatetags/core_tags.py b/readthedocs/core/templatetags/core_tags.py index eeb3727a306..ad77a11b31d 100644 --- a/readthedocs/core/templatetags/core_tags.py +++ b/readthedocs/core/templatetags/core_tags.py @@ -5,16 +5,13 @@ from urllib.parse import urlencode from django import template -from django.conf import settings from django.core.serializers.json import DjangoJSONEncoder -from django.utils.encoding import force_bytes, force_str from django.utils.safestring import mark_safe from readthedocs import __version__ from readthedocs.core.resolver import resolve from readthedocs.projects.models import Project - register = template.Library() @@ -70,9 +67,9 @@ def get_version(slug): @register.simple_tag -def url_replace(request, field, value): +def url_replace(request, field, *values): dict_ = request.GET.copy() - dict_[field] = value + dict_[field] = "".join(values) return dict_.urlencode() diff --git a/readthedocs/rtd_tests/tests/test_privacy_urls.py b/readthedocs/rtd_tests/tests/test_privacy_urls.py index 646535edc2e..0282a2a659f 100644 --- a/readthedocs/rtd_tests/tests/test_privacy_urls.py +++ b/readthedocs/rtd_tests/tests/test_privacy_urls.py @@ -220,10 +220,11 @@ class PublicProjectMixin(ProjectMixin): } response_data = { # Public - '/projects/': {'status_code': 301}, - '/projects/pip/downloads/pdf/latest/': {'status_code': 200}, - '/projects/pip/badge/': {'status_code': 200}, - '/projects/invalid_slug/': {'status_code': 302}, + "/projects/": {"status_code": 301}, + "/projects/pip/downloads/pdf/latest/": {"status_code": 200}, + "/projects/pip/badge/": {"status_code": 200}, + "/projects/invalid_slug/": {"status_code": 302}, + "/projects/pip/search/": {"status_code": 302}, } def test_public_urls(self): diff --git a/readthedocs/search/api/__init__.py b/readthedocs/search/api/v3/__init__.py similarity index 100% rename from readthedocs/search/api/__init__.py rename to readthedocs/search/api/v3/__init__.py diff --git a/readthedocs/search/api/v3/executor.py b/readthedocs/search/api/v3/executor.py new file mode 100644 index 00000000000..f5a7ff67123 --- /dev/null +++ b/readthedocs/search/api/v3/executor.py @@ -0,0 +1,232 @@ +from functools import cached_property +from itertools import islice + +from readthedocs.builds.models import Version +from readthedocs.projects.models import Project +from readthedocs.search.api.v3.queryparser import SearchQueryParser +from readthedocs.search.faceted_search import PageSearch + + +class SearchExecutor: + + """ + Parse the query, search, and return the projects used in the search. + + :param arguments_required: If `True` and the user didn't provide + any arguments in the query, we don't perform the search. + :param default_all: If `True` and `arguments_required` is `False` + we search all projects by default, otherwise we search all projects + the user has access to. + :param max_projects: The maximum number of projects used in the search. + This limit is only applied for projects given explicitly, + not when we default to search all projects. + """ + + def __init__( + self, + *, + request, + query, + arguments_required=True, + default_all=False, + max_projects=100 + ): + self.request = request + self.query = query + self.arguments_required = arguments_required + self.default_all = default_all + self.max_projects = max_projects + + @cached_property + def projects(self): + """ + Return all projects used in this search. + + If empty, it will search all projects. + + :returns: A list of tuples (project, version). + """ + projects = islice(self._get_projects_to_search(), self.max_projects) + # Make sure we are using just one version per-project, + # searching multiple versions of the same projects isn't supported yet. + projects_dict = dict(projects) + return list(projects_dict.items()) + + def search(self, **kwargs): + """ + Perform the search. + + :param kwargs: All kwargs are passed to the `PageSearch` constructor. + """ + if not self._has_arguments and self.arguments_required: + return None + + projects = {project.slug: version.slug for project, version in self.projects} + # If the search is done without projects, ES will search on all projects. + # If we don't have projects and the user provided arguments, + # it means we don't have anything to search on (no results). + # Or if we don't have projects and we don't allow searching all, + # we also just return. + if not projects and (self._has_arguments or not self.default_all): + return None + + search = PageSearch( + query=self.parser.query, + projects=projects, + **kwargs, + ) + return search + + def _get_projects_to_search(self): + """ + Return an iterator of (project, version) used in this search. + + An iterator (yield syntax) is used so we can stop at + ``self.max_projects``, this way we avoid fetching projects + that we won't use. + """ + if not self._has_arguments: + if self.arguments_required: + return None + yield from self._get_default_projects() + return None + + for value in self.parser.arguments["project"]: + project, version = self._get_project_and_version(value) + if version and self._has_permission(self.request.user, version): + yield project, version + + for value in self.parser.arguments["subprojects"]: + project, version = self._get_project_and_version(value) + + # Add the project itself. + if version and self._has_permission(self.request.user, version): + yield project, version + + if project: + # If the user didn't provide a version, version_slug will be `None`, + # and we add all subprojects with their default version, + # otherwise we will add all projects that match the given version. + _, version_slug = self._split_project_and_version(value) + yield from self._get_subprojects( + project=project, + version_slug=version_slug, + ) + + # Add all projects the user has access to. + if self.parser.arguments["user"] == "@me": + yield from self._get_projects_from_user() + + def _get_projects_from_user(self): + for project in Project.objects.for_user(user=self.request.user): + version = self._get_project_version( + project=project, + version_slug=project.default_version, + include_hidden=False, + ) + if version and self._has_permission(self.request.user, version): + yield project, version + + def _get_subprojects(self, project, version_slug=None): + """ + Get a tuple (project, version) of all subprojects of `project`. + + If `version_slug` doesn't match a version of the subproject, + the default version will be used. + If `version_slug` is None, we will always use the default version. + """ + subprojects = Project.objects.filter(superprojects__parent=project) + for subproject in subprojects: + version = None + if version_slug: + version = self._get_project_version( + project=subproject, + version_slug=version_slug, + include_hidden=False, + ) + + # Fallback to the default version of the subproject. + if not version and subproject.default_version: + version = self._get_project_version( + project=subproject, + version_slug=subproject.default_version, + include_hidden=False, + ) + + if version and self._has_permission(self.request.user, version): + yield subproject, version + + def _has_permission(self, user, version): + """ + Check if `user` is authorized to access `version`. + + The queryset from `_get_project_version` already filters public + projects. This is mainly to be overridden in .com to make use of + the auth backends in the proxied API. + """ + return True + + def _get_project_version(self, project, version_slug, include_hidden=True): + """ + Get a version from a given project. + + :param project: A `Project` object. + :param version_slug: The version slug. + :param include_hidden: If hidden versions should be considered. + """ + return ( + Version.internal.public( + user=self.request.user, + project=project, + only_built=True, + include_hidden=include_hidden, + ) + .filter(slug=version_slug) + .first() + ) + + @cached_property + def _has_arguments(self): + return any(self.parser.arguments.values()) + + def _get_default_projects(self): + if self.default_all: + # Default to search all. + return [] + return self._get_projects_from_user() + + @cached_property + def parser(self): + parser = SearchQueryParser(self.query) + parser.parse() + return parser + + def _split_project_and_version(self, term): + """ + Split a term of the form ``{project}/{version}``. + + :returns: A tuple of project and version. + If the version part isn't found, `None` will be returned in its place. + """ + parts = term.split("/", maxsplit=1) + if len(parts) > 1: + return parts + return parts[0], None + + def _get_project_and_version(self, value): + project_slug, version_slug = self._split_project_and_version(value) + project = Project.objects.filter(slug=project_slug).first() + if not project: + return None, None + + if not version_slug: + version_slug = project.default_version + + if version_slug: + version = self._get_project_version( + project=project, + version_slug=version_slug, + ) + return project, version + + return None, None diff --git a/readthedocs/search/api/v3/queryparser.py b/readthedocs/search/api/v3/queryparser.py new file mode 100644 index 00000000000..9d99576f75b --- /dev/null +++ b/readthedocs/search/api/v3/queryparser.py @@ -0,0 +1,77 @@ +class TextToken: + def __init__(self, text): + self.text = text + + +class ArgumentToken: + def __init__(self, *, name, value, type): + self.name = name + self.value = value + self.type = type + + +class SearchQueryParser: + + """Simplified and minimal parser for ``name:value`` expressions.""" + + allowed_arguments = { + "project": list, + "subprojects": list, + "user": str, + } + + def __init__(self, query): + self._query = query + self.query = "" + # Set all arguments to their default values. + self.arguments = {name: type() for name, type in self.allowed_arguments.items()} + + def parse(self): + r""" + Parse the expression into a query and arguments. + + The parser steps are: + + - Split the string using white spaces. + - Tokenize each string into a ``text`` or ``argument`` token. + A valid argument has the ``name:value`` form, + and it's declared in `allowed_arguments`, + anything else is considered a text token. + - All text tokens are concatenated to form the final query. + + To interpret an argument as text, it can be escaped as ``name\:value``. + """ + tokens = (self._get_token(text) for text in self._query.split()) + query = [] + for token in tokens: + if isinstance(token, TextToken): + query.append(token.text) + elif isinstance(token, ArgumentToken): + if token.type == str: + self.arguments[token.name] = token.value + elif token.type == list: + self.arguments[token.name].append(token.value) + else: + raise ValueError(f"Invalid argument type {token.type}") + else: + raise ValueError("Invalid node") + + self.query = self._unescape(" ".join(query)) + + def _get_token(self, text): + result = text.split(":", maxsplit=1) + if len(result) < 2: + return TextToken(text) + + name, value = result + if name in self.allowed_arguments: + return ArgumentToken( + name=name, + value=value, + type=self.allowed_arguments[name], + ) + + return TextToken(text) + + def _unescape(self, text): + return text.replace("\\:", ":") diff --git a/readthedocs/search/api/v3/serializers.py b/readthedocs/search/api/v3/serializers.py new file mode 100644 index 00000000000..ac6c9177f3e --- /dev/null +++ b/readthedocs/search/api/v3/serializers.py @@ -0,0 +1,38 @@ +from rest_framework import serializers + +from readthedocs.search.api.v2.serializers import ( + PageSearchSerializer as PageSearchSerializerBase, +) + + +class PageSearchSerializer(PageSearchSerializerBase): + + """ + Serializer for API V3. + + This is very similar to the serializer from V2, + with the following changes: + + - ``project`` is an object, not a string. + - ``version`` is an object, not a string. + - ``project_alias`` isn't present, + it is contained in the ``project`` object. + """ + + project = serializers.SerializerMethodField() + version = serializers.SerializerMethodField() + + def __init__(self, *args, **kwargs): + super().__init__(*args, **kwargs) + self.fields.pop("project_alias") + + def get_project(self, obj): + return { + "slug": obj.project, + "alias": self.get_project_alias(obj), + } + + def get_version(self, obj): + return { + "slug": obj.version, + } diff --git a/readthedocs/search/api/v3/tests/test_api.py b/readthedocs/search/api/v3/tests/test_api.py new file mode 100644 index 00000000000..780c9c1eafc --- /dev/null +++ b/readthedocs/search/api/v3/tests/test_api.py @@ -0,0 +1,732 @@ +import itertools +from unittest import mock + +import pytest +from django.contrib.auth.models import User +from django.core.management import call_command +from django.test import TestCase, override_settings +from django.urls import reverse +from django_dynamic_fixture import get + +from readthedocs.builds.models import Version +from readthedocs.organizations.models import Organization, Team +from readthedocs.projects.constants import PRIVATE, PUBLIC +from readthedocs.projects.models import HTMLFile, Project +from readthedocs.search.documents import PageDocument + + +@pytest.mark.search +class SearchTestBase(TestCase): + def setUp(self): + call_command("search_index", "--delete", "-f") + call_command("search_index", "--create") + + def tearDown(self): + super().tearDown() + call_command("search_index", "--delete", "-f") + + def get_dummy_processed_json(self, extra=None): + """ + Return a dict to be used as data indexed by ES. + + :param extra: By default it returns some default values, + you can override this passing a dict to extra. + """ + extra = extra or {} + default = { + "path": "index.html", + "title": "Title", + "sections": [ + { + "id": "first", + "title": "First Paragraph", + "content": "First paragraph, content of interest: test.", + } + ], + "domain_data": [], + } + default.update(extra) + return default + + def create_index(self, version, files=None): + """ + Create a search index for `version` with files as content. + + :param version: Version object + :param files: A dictionary with the filename as key and a dict as value + to be passed to `get_dummy_processed_json`. + """ + files = files or {"index.html": {}} + for file, extra in files.items(): + html_file = HTMLFile.objects.filter( + project=version.project, version=version, name=file + ).first() + if not html_file: + html_file = get( + HTMLFile, + project=version.project, + version=version, + name=file, + ) + html_file.get_processed_json = mock.MagicMock( + name="get_processed_json", + return_value=self.get_dummy_processed_json(extra), + ) + PageDocument().update(html_file) + + +@override_settings(ALLOW_PRIVATE_REPOS=False) +@override_settings(RTD_ALLOW_ORGANIZATIONS=False) +class SearchAPITest(SearchTestBase): + def setUp(self): + super().setUp() + self.user = get(User) + self.another_user = get(User) + self.project = get( + Project, slug="project", users=[self.user], privacy_level=PUBLIC + ) + self.another_project = get( + Project, + slug="another-project", + users=[self.another_user], + privacy_level=PUBLIC, + ) + + self.project.versions.update(privacy_level=PUBLIC, active=True, built=True) + self.version = self.project.versions.first() + + self.another_project.versions.update( + privacy_level=PUBLIC, active=True, built=True + ) + self.another_version = self.another_project.versions.first() + + self.url = reverse("search_api_v3") + self.client.force_login(self.user) + + for version in Version.objects.all(): + self.create_index(version) + + def get(self, *args, **kwargs): + return self.client.get(*args, **kwargs) + + def test_search_no_projects(self): + resp = self.get(self.url, data={"q": "test"}) + + self.assertEqual(resp.status_code, 200) + results = resp.data["results"] + projects = resp.data["projects"] + self.assertEqual(results, []) + self.assertEqual(projects, []) + self.assertEqual(resp.data["query"], "test") + + def test_search_project(self): + resp = self.get(self.url, data={"q": "project:project test"}) + + self.assertEqual(resp.status_code, 200) + projects = resp.data["projects"] + results = resp.data["results"] + self.assertEqual( + projects, [{"slug": "project", "versions": [{"slug": "latest"}]}] + ) + self.assertEqual(len(results), 1) + self.assertEqual(resp.data["query"], "test") + + def test_search_project_explicit_version(self): + resp = self.get(self.url, data={"q": "project:project/latest test"}) + + self.assertEqual(resp.status_code, 200) + projects = resp.data["projects"] + results = resp.data["results"] + self.assertEqual( + projects, [{"slug": "project", "versions": [{"slug": "latest"}]}] + ) + self.assertEqual(len(results), 1) + self.assertEqual(resp.data["query"], "test") + + new_version = get( + Version, + project=self.project, + slug="v2", + privacy_level=PUBLIC, + built=True, + active=True, + ) + self.create_index(new_version) + resp = self.get(self.url, data={"q": "project:project/v2 test"}) + + self.assertEqual(resp.status_code, 200) + projects = resp.data["projects"] + results = resp.data["results"] + self.assertEqual(projects, [{"slug": "project", "versions": [{"slug": "v2"}]}]) + self.assertEqual(len(results), 1) + self.assertEqual(resp.data["query"], "test") + + def test_search_project_explicit_version_unexisting(self): + resp = self.get(self.url, data={"q": "project:project/v3 test"}) + self.assertEqual(resp.status_code, 200) + projects = resp.data["projects"] + results = resp.data["results"] + self.assertEqual(projects, []) + self.assertEqual(results, []) + self.assertEqual(resp.data["query"], "test") + + def test_search_project_unexisting(self): + resp = self.get(self.url, data={"q": "project:foobar/latest test"}) + self.assertEqual(resp.status_code, 200) + projects = resp.data["projects"] + results = resp.data["results"] + self.assertEqual(projects, []) + self.assertEqual(results, []) + self.assertEqual(resp.data["query"], "test") + + def test_search_project_valid_and_invalid(self): + resp = self.get( + self.url, data={"q": "project:foobar/latest project:project/latest test"} + ) + self.assertEqual(resp.status_code, 200) + projects = resp.data["projects"] + results = resp.data["results"] + self.assertEqual( + projects, [{"slug": "project", "versions": [{"slug": "latest"}]}] + ) + self.assertEqual(len(results), 1) + self.assertEqual(resp.data["query"], "test") + + def test_search_multiple_projects(self): + resp = self.get( + self.url, data={"q": "project:project project:another-project test"} + ) + + self.assertEqual(resp.status_code, 200) + projects = resp.data["projects"] + results = resp.data["results"] + self.assertEqual( + projects, + [ + {"slug": "project", "versions": [{"slug": "latest"}]}, + {"slug": "another-project", "versions": [{"slug": "latest"}]}, + ], + ) + self.assertEqual(len(results), 2) + self.assertEqual(resp.data["query"], "test") + + def test_search_user_me_anonymous_user(self): + self.client.logout() + resp = self.get(self.url, data={"q": "user:@me test"}) + projects = resp.data["projects"] + results = resp.data["results"] + self.assertEqual(projects, []) + self.assertEqual(results, []) + self.assertEqual(resp.data["query"], "test") + + def test_search_user_me_logged_in_user(self): + self.client.force_login(self.user) + resp = self.get(self.url, data={"q": "user:@me test"}) + projects = resp.data["projects"] + results = resp.data["results"] + self.assertEqual( + projects, [{"slug": "project", "versions": [{"slug": "latest"}]}] + ) + self.assertEqual(len(results), 1) + self.assertEqual(resp.data["query"], "test") + + self.client.force_login(self.another_user) + resp = self.get(self.url, data={"q": "user:@me test"}) + projects = resp.data["projects"] + results = resp.data["results"] + self.assertEqual( + projects, [{"slug": "another-project", "versions": [{"slug": "latest"}]}] + ) + self.assertEqual(len(results), 1) + self.assertEqual(resp.data["query"], "test") + + def test_search_user_invalid_value(self): + self.client.force_login(self.user) + resp = self.get(self.url, data={"q": "user:test test"}) + projects = resp.data["projects"] + results = resp.data["results"] + self.assertEqual(projects, []) + self.assertEqual(results, []) + self.assertEqual(resp.data["query"], "test") + + def test_search_user_and_project(self): + self.client.force_login(self.user) + resp = self.get( + self.url, data={"q": "user:@me project:another-project/latest test"} + ) + projects = resp.data["projects"] + results = resp.data["results"] + self.assertEqual( + projects, + [ + {"slug": "another-project", "versions": [{"slug": "latest"}]}, + {"slug": "project", "versions": [{"slug": "latest"}]}, + ], + ) + self.assertEqual(len(results), 2) + self.assertEqual(resp.data["query"], "test") + + def test_search_subprojects(self): + subproject = get( + Project, slug="subproject", users=[self.user], privacy_level=PUBLIC + ) + self.project.add_subproject(subproject) + get(Version, slug="v2", project=self.project, active=True, built=True) + get(Version, slug="v3", project=self.project, active=True, built=True) + get(Version, slug="v2", project=subproject, active=True, built=True) + get(Version, slug="v4", project=subproject, active=True, built=True) + subproject.versions.update(built=True, active=True, privacy_level=PUBLIC) + self.project.versions.update(built=True, active=True, privacy_level=PUBLIC) + + for version in itertools.chain( + subproject.versions.all(), self.project.versions.all() + ): + self.create_index(version) + + # Search default version of the project and its subprojects. + resp = self.get(self.url, data={"q": "subprojects:project test"}) + projects = resp.data["projects"] + results = resp.data["results"] + self.assertEqual( + projects, + [ + {"slug": "project", "versions": [{"slug": "latest"}]}, + {"slug": "subproject", "versions": [{"slug": "latest"}]}, + ], + ) + self.assertEqual(len(results), 2) + self.assertEqual(resp.data["query"], "test") + + # Explicitly search on the latest version. + resp = self.get(self.url, data={"q": "subprojects:project/latest test"}) + projects = resp.data["projects"] + results = resp.data["results"] + self.assertEqual( + projects, + [ + {"slug": "project", "versions": [{"slug": "latest"}]}, + {"slug": "subproject", "versions": [{"slug": "latest"}]}, + ], + ) + self.assertEqual(len(results), 2) + self.assertEqual(resp.data["query"], "test") + + # Explicitly search on the v2 version. + resp = self.get(self.url, data={"q": "subprojects:project/v2 test"}) + projects = resp.data["projects"] + results = resp.data["results"] + self.assertEqual( + projects, + [ + {"slug": "project", "versions": [{"slug": "v2"}]}, + {"slug": "subproject", "versions": [{"slug": "v2"}]}, + ], + ) + self.assertEqual(len(results), 2) + self.assertEqual(resp.data["query"], "test") + + # Explicitly search on the v3 version. + # Only the main project has this version, + # we will default to the default version of its subproject. + resp = self.get(self.url, data={"q": "subprojects:project/v3 test"}) + projects = resp.data["projects"] + results = resp.data["results"] + self.assertEqual( + projects, + [ + {"slug": "project", "versions": [{"slug": "v3"}]}, + {"slug": "subproject", "versions": [{"slug": "latest"}]}, + ], + ) + self.assertEqual(len(results), 2) + self.assertEqual(resp.data["query"], "test") + + # Explicitly search on the v4 version. + # The main project doesn't have this version, + # we include results from its subprojects only. + resp = self.get(self.url, data={"q": "subprojects:project/v4 test"}) + projects = resp.data["projects"] + results = resp.data["results"] + self.assertEqual( + projects, [{"slug": "subproject", "versions": [{"slug": "v4"}]}] + ) + self.assertEqual(len(results), 1) + self.assertEqual(resp.data["query"], "test") + + +@pytest.mark.proxito +@override_settings(PUBLIC_DOMAIN="readthedocs.io", USE_SUBDOMAIN=True) +class ProxiedSearchAPITest(SearchAPITest): + + host = "docs.readthedocs.io" + + def get(self, *args, **kwargs): + return self.client.get(*args, HTTP_HOST=self.host, **kwargs) + + +@override_settings(ALLOW_PRIVATE_REPOS=True) +@override_settings(RTD_ALLOW_ORGANIZATIONS=True) +class SearchAPIWithOrganizationsTest(SearchTestBase): + def setUp(self): + super().setUp() + + self.user = get(User) + self.member = get(User) + self.project = get(Project, slug="project") + self.project.versions.update(built=True, active=True, privacy_level=PRIVATE) + self.version = self.project.versions.first() + self.version_public = get( + Version, + slug="public", + project=self.project, + privacy_level=PUBLIC, + active=True, + built=True, + ) + + self.project_b = get(Project, slug="project-b") + self.project_b.versions.update(built=True, active=True, privacy_level=PRIVATE) + self.version_b = self.project_b.versions.first() + self.version_b_public = get( + Version, + slug="public", + project=self.project_b, + privacy_level=PUBLIC, + built=True, + active=True, + ) + + self.organization = get( + Organization, owners=[self.user], projects=[self.project, self.project_b] + ) + + self.team = get( + Team, + members=[self.member], + organization=self.organization, + projects=[self.project_b], + access="readonly", + ) + + self.another_user = get(User) + self.another_project = get(Project, slug="another-project") + self.another_project.versions.update( + built=True, active=True, privacy_level=PRIVATE + ) + self.another_version = self.another_project.versions.first() + self.another_version_public = get( + Version, + slug="public", + project=self.another_project, + privacy_level=PUBLIC, + built=True, + active=True, + ) + + self.another_organization = get( + Organization, owners=[self.another_user], projects=[self.another_project] + ) + + self.url = reverse("search_api_v3") + self.client.force_login(self.user) + + for version in Version.objects.all(): + self.create_index(version) + + def test_search_no_projects(self): + resp = self.client.get(self.url, data={"q": "test"}) + + self.assertEqual(resp.status_code, 200) + results = resp.data["results"] + projects = resp.data["projects"] + self.assertEqual(results, []) + self.assertEqual(projects, []) + self.assertEqual(resp.data["query"], "test") + + def test_search_project(self): + resp = self.client.get(self.url, data={"q": "project:project test"}) + + self.assertEqual(resp.status_code, 200) + results = resp.data["results"] + projects = resp.data["projects"] + self.assertEqual(len(results), 1) + self.assertEqual( + projects, [{"slug": "project", "versions": [{"slug": "latest"}]}] + ) + self.assertEqual(resp.data["query"], "test") + + def test_search_project_explicit_version(self): + resp = self.client.get(self.url, data={"q": "project:project/public test"}) + + self.assertEqual(resp.status_code, 200) + results = resp.data["results"] + projects = resp.data["projects"] + self.assertEqual(len(results), 1) + self.assertEqual( + projects, [{"slug": "project", "versions": [{"slug": "public"}]}] + ) + self.assertEqual(resp.data["query"], "test") + + def test_search_project_no_permissions(self): + resp = self.client.get(self.url, data={"q": "project:another-project test"}) + + self.assertEqual(resp.status_code, 200) + results = resp.data["results"] + projects = resp.data["projects"] + self.assertEqual(results, []) + self.assertEqual(projects, []) + self.assertEqual(resp.data["query"], "test") + + def test_search_project_private_version_anonymous_user(self): + self.client.logout() + + resp = self.client.get(self.url, data={"q": "project:project test"}) + + self.assertEqual(resp.status_code, 200) + results = resp.data["results"] + projects = resp.data["projects"] + self.assertEqual(results, []) + self.assertEqual(projects, []) + self.assertEqual(resp.data["query"], "test") + + def test_search_project_public_version_anonymous_user(self): + self.client.logout() + + resp = self.client.get(self.url, data={"q": "project:project/public test"}) + + self.assertEqual(resp.status_code, 200) + results = resp.data["results"] + projects = resp.data["projects"] + self.assertEqual(len(results), 1) + self.assertEqual( + projects, [{"slug": "project", "versions": [{"slug": "public"}]}] + ) + self.assertEqual(resp.data["query"], "test") + + def test_search_multiple_projects(self): + resp = self.client.get( + self.url, + data={ + "q": "project:project project:another-project/latest project:project-b/latest test" + }, + ) + self.assertEqual(resp.status_code, 200) + results = resp.data["results"] + projects = resp.data["projects"] + self.assertEqual(len(results), 2) + self.assertEqual( + projects, + [ + {"slug": "project", "versions": [{"slug": "latest"}]}, + {"slug": "project-b", "versions": [{"slug": "latest"}]}, + ], + ) + self.assertEqual(resp.data["query"], "test") + + def test_search_multiple_projects_team_member(self): + self.client.force_login(self.member) + + resp = self.client.get( + self.url, + data={ + "q": "project:project project:another-project/latest project:project-b/latest test" + }, + ) + self.assertEqual(resp.status_code, 200) + results = resp.data["results"] + projects = resp.data["projects"] + self.assertEqual(len(results), 1) + self.assertEqual( + projects, [{"slug": "project-b", "versions": [{"slug": "latest"}]}] + ) + self.assertEqual(resp.data["query"], "test") + + def test_search_user_me_anonymous_user(self): + self.client.logout() + resp = self.client.get(self.url, data={"q": "user:@me test"}) + projects = resp.data["projects"] + results = resp.data["results"] + self.assertEqual(projects, []) + self.assertEqual(results, []) + self.assertEqual(resp.data["query"], "test") + + def test_search_user_me_logged_in_user(self): + self.client.force_login(self.user) + resp = self.client.get(self.url, data={"q": "user:@me test"}) + projects = resp.data["projects"] + results = resp.data["results"] + self.assertEqual( + projects, + [ + {"slug": "project", "versions": [{"slug": "latest"}]}, + {"slug": "project-b", "versions": [{"slug": "latest"}]}, + ], + ) + self.assertEqual(len(results), 2) + self.assertEqual(resp.data["query"], "test") + + self.client.force_login(self.member) + resp = self.client.get(self.url, data={"q": "user:@me test"}) + projects = resp.data["projects"] + results = resp.data["results"] + self.assertEqual( + projects, [{"slug": "project-b", "versions": [{"slug": "latest"}]}] + ) + self.assertEqual(len(results), 1) + self.assertEqual(resp.data["query"], "test") + + self.client.force_login(self.another_user) + resp = self.client.get(self.url, data={"q": "user:@me test"}) + projects = resp.data["projects"] + results = resp.data["results"] + self.assertEqual( + projects, [{"slug": "another-project", "versions": [{"slug": "latest"}]}] + ) + self.assertEqual(len(results), 1) + self.assertEqual(resp.data["query"], "test") + + def test_search_user_and_project(self): + self.client.force_login(self.member) + resp = self.client.get( + self.url, data={"q": "user:@me project:another-project/public test"} + ) + projects = resp.data["projects"] + results = resp.data["results"] + self.assertEqual( + projects, + [ + {"slug": "another-project", "versions": [{"slug": "public"}]}, + {"slug": "project-b", "versions": [{"slug": "latest"}]}, + ], + ) + self.assertEqual(len(results), 2) + self.assertEqual(resp.data["query"], "test") + + def test_search_subprojects(self): + self.project.add_subproject(self.project_b) + + # Search default version of the project and its subprojects. + resp = self.client.get(self.url, data={"q": "subprojects:project test"}) + projects = resp.data["projects"] + results = resp.data["results"] + self.assertEqual( + projects, + [ + {"slug": "project", "versions": [{"slug": "latest"}]}, + {"slug": "project-b", "versions": [{"slug": "latest"}]}, + ], + ) + self.assertEqual(len(results), 2) + self.assertEqual(resp.data["query"], "test") + + # Explicitly search on the latest version. + resp = self.client.get(self.url, data={"q": "subprojects:project/latest test"}) + projects = resp.data["projects"] + results = resp.data["results"] + self.assertEqual( + projects, + [ + {"slug": "project", "versions": [{"slug": "latest"}]}, + {"slug": "project-b", "versions": [{"slug": "latest"}]}, + ], + ) + self.assertEqual(len(results), 2) + self.assertEqual(resp.data["query"], "test") + + # Explicitly search on the public version. + resp = self.client.get(self.url, data={"q": "subprojects:project/public test"}) + projects = resp.data["projects"] + results = resp.data["results"] + self.assertEqual( + projects, + [ + {"slug": "project", "versions": [{"slug": "public"}]}, + {"slug": "project-b", "versions": [{"slug": "public"}]}, + ], + ) + self.assertEqual(len(results), 2) + self.assertEqual(resp.data["query"], "test") + + def test_search_subprojects_with_team_member(self): + self.client.force_login(self.member) + self.project.add_subproject(self.project_b) + + # Search default version of the project and its subprojects. + resp = self.client.get(self.url, data={"q": "subprojects:project test"}) + projects = resp.data["projects"] + results = resp.data["results"] + self.assertEqual( + projects, + [ + {"slug": "project-b", "versions": [{"slug": "latest"}]}, + ], + ) + self.assertEqual(len(results), 1) + self.assertEqual(resp.data["query"], "test") + + # Explicitly search on the latest version. + resp = self.client.get(self.url, data={"q": "subprojects:project/latest test"}) + projects = resp.data["projects"] + results = resp.data["results"] + self.assertEqual( + projects, + [ + {"slug": "project-b", "versions": [{"slug": "latest"}]}, + ], + ) + self.assertEqual(len(results), 1) + self.assertEqual(resp.data["query"], "test") + + # Explicitly search on the public version. + resp = self.client.get(self.url, data={"q": "subprojects:project/public test"}) + projects = resp.data["projects"] + results = resp.data["results"] + self.assertEqual( + projects, + [ + {"slug": "project", "versions": [{"slug": "public"}]}, + {"slug": "project-b", "versions": [{"slug": "public"}]}, + ], + ) + self.assertEqual(len(results), 2) + self.assertEqual(resp.data["query"], "test") + + def test_search_subprojects_with_anonymous_user(self): + self.client.logout() + self.project.add_subproject(self.project_b) + + # Search default version of the project and its subprojects. + resp = self.client.get(self.url, data={"q": "subprojects:project test"}) + projects = resp.data["projects"] + results = resp.data["results"] + self.assertEqual( + projects, + [], + ) + self.assertEqual(len(results), 0) + self.assertEqual(resp.data["query"], "test") + + # Explicitly search on the latest version. + resp = self.client.get(self.url, data={"q": "subprojects:project/latest test"}) + projects = resp.data["projects"] + results = resp.data["results"] + self.assertEqual( + projects, + [], + ) + self.assertEqual(len(results), 0) + self.assertEqual(resp.data["query"], "test") + + # Explicitly search on the public version. + resp = self.client.get(self.url, data={"q": "subprojects:project/public test"}) + projects = resp.data["projects"] + results = resp.data["results"] + self.assertEqual( + projects, + [ + {"slug": "project", "versions": [{"slug": "public"}]}, + {"slug": "project-b", "versions": [{"slug": "public"}]}, + ], + ) + self.assertEqual(len(results), 2) + self.assertEqual(resp.data["query"], "test") diff --git a/readthedocs/search/api/v3/tests/test_queryparser.py b/readthedocs/search/api/v3/tests/test_queryparser.py new file mode 100644 index 00000000000..f7d043affae --- /dev/null +++ b/readthedocs/search/api/v3/tests/test_queryparser.py @@ -0,0 +1,86 @@ +from django.test import TestCase + +from readthedocs.search.api.v3.queryparser import SearchQueryParser + + +class TestQueryParser(TestCase): + def test_no_arguments(self): + parser = SearchQueryParser("search query") + parser.parse() + arguments = parser.arguments + self.assertEqual(arguments["project"], []) + self.assertEqual(arguments["subprojects"], []) + self.assertEqual(arguments["user"], "") + self.assertEqual(parser.query, "search query") + + def test_project_arguments(self): + parser = SearchQueryParser("project:foo query") + parser.parse() + arguments = parser.arguments + self.assertEqual(arguments["project"], ["foo"]) + self.assertEqual(arguments["subprojects"], []) + self.assertEqual(arguments["user"], "") + self.assertEqual(parser.query, "query") + + def test_multiple_project_arguments(self): + parser = SearchQueryParser("project:foo query project:bar") + parser.parse() + arguments = parser.arguments + self.assertEqual(arguments["project"], ["foo", "bar"]) + self.assertEqual(arguments["subprojects"], []) + self.assertEqual(arguments["user"], "") + self.assertEqual(parser.query, "query") + + def test_user_argument(self): + parser = SearchQueryParser("query user:foo") + parser.parse() + arguments = parser.arguments + self.assertEqual(arguments["project"], []) + self.assertEqual(arguments["subprojects"], []) + self.assertEqual(arguments["user"], "foo") + self.assertEqual(parser.query, "query") + + def test_multiple_user_arguments(self): + parser = SearchQueryParser("search user:foo query user:bar") + parser.parse() + arguments = parser.arguments + self.assertEqual(arguments["project"], []) + self.assertEqual(arguments["subprojects"], []) + self.assertEqual(arguments["user"], "bar") + self.assertEqual(parser.query, "search query") + + def test_subprojects_argument(self): + parser = SearchQueryParser("search subprojects:foo query ") + parser.parse() + arguments = parser.arguments + self.assertEqual(arguments["project"], []) + self.assertEqual(arguments["subprojects"], ["foo"]) + self.assertEqual(arguments["user"], "") + self.assertEqual(parser.query, "search query") + + def test_multiple_subprojects_arguments(self): + parser = SearchQueryParser("search subprojects:foo query subprojects:bar") + parser.parse() + arguments = parser.arguments + self.assertEqual(arguments["project"], []) + self.assertEqual(arguments["subprojects"], ["foo", "bar"]) + self.assertEqual(arguments["user"], "") + self.assertEqual(parser.query, "search query") + + def test_escaped_argument(self): + parser = SearchQueryParser(r"project\:foo project:bar query") + parser.parse() + arguments = parser.arguments + self.assertEqual(arguments["project"], ["bar"]) + self.assertEqual(arguments["subprojects"], []) + self.assertEqual(arguments["user"], "") + self.assertEqual(parser.query, "project:foo query") + + def test_only_arguments(self): + parser = SearchQueryParser(r"project:foo user:bar") + parser.parse() + arguments = parser.arguments + self.assertEqual(arguments["project"], ["foo"]) + self.assertEqual(arguments["subprojects"], []) + self.assertEqual(arguments["user"], "bar") + self.assertEqual(parser.query, "") diff --git a/readthedocs/search/api/v3/urls.py b/readthedocs/search/api/v3/urls.py new file mode 100644 index 00000000000..2172ab98098 --- /dev/null +++ b/readthedocs/search/api/v3/urls.py @@ -0,0 +1,7 @@ +from django.urls import path + +from readthedocs.search.api.v3.views import SearchAPI + +urlpatterns = [ + path("", SearchAPI.as_view(), name="search_api_v3"), +] diff --git a/readthedocs/search/api/v3/utils.py b/readthedocs/search/api/v3/utils.py new file mode 100644 index 00000000000..7e3c3b82f9b --- /dev/null +++ b/readthedocs/search/api/v3/utils.py @@ -0,0 +1,10 @@ +from readthedocs.projects.models import Feature + + +def should_use_advanced_query(projects): + # TODO: we should make this a parameter in the API, + # we are checking if the first project has this feature for now. + if projects: + project = projects[0][0] + return not project.has_feature(Feature.DEFAULT_TO_FUZZY_SEARCH) + return True diff --git a/readthedocs/search/api/v3/views.py b/readthedocs/search/api/v3/views.py new file mode 100644 index 00000000000..a7dafa11649 --- /dev/null +++ b/readthedocs/search/api/v3/views.py @@ -0,0 +1,172 @@ +from functools import cached_property + +import structlog +from django.utils import timezone +from django.utils.translation import gettext as _ +from rest_framework.exceptions import ValidationError +from rest_framework.generics import GenericAPIView +from rest_framework.permissions import AllowAny +from rest_framework.throttling import AnonRateThrottle, UserRateThrottle + +from readthedocs.api.v3.views import APIv3Settings +from readthedocs.core.utils.extend import SettingsOverrideObject +from readthedocs.search import tasks +from readthedocs.search.api.pagination import SearchPagination +from readthedocs.search.api.v3.executor import SearchExecutor +from readthedocs.search.api.v3.serializers import PageSearchSerializer +from readthedocs.search.api.v3.utils import should_use_advanced_query + +log = structlog.get_logger(__name__) + + +RATE_LIMIT = "100/minute" + + +class SearchAnonRateThrottle(AnonRateThrottle): + + """Rate limit for the search API for anonymous users.""" + + rate = RATE_LIMIT + + +class SearchUserRateThrottle(UserRateThrottle): + + """Rate limit for the search API for authenticated users.""" + + rate = RATE_LIMIT + + +class SearchAPI(APIv3Settings, GenericAPIView): + + """ + Server side search API V3. + + Required query parameters: + + - **q**: [Search term](https://docs.readthedocs.io/page/server-side-search/syntax.html). + + Check our [docs](https://docs.readthedocs.io/page/server-side-search/api.html) for more information. + """ # noqa + + http_method_names = ["get"] + pagination_class = SearchPagination + serializer_class = PageSearchSerializer + search_executor_class = SearchExecutor + permission_classes = [AllowAny] + # The search API would be used by anonymous users, + # and with our search-as-you-type extension. + # So we need to increase the rate limit. + throttle_classes = (SearchUserRateThrottle, SearchAnonRateThrottle) + + def get_view_name(self): + return "Search API V3" + + def _validate_query_params(self): + query = self.request.GET.get("q") + errors = {} + if not query: + errors["q"] = [_("This query parameter is required")] + if errors: + raise ValidationError(errors) + + @cached_property + def _search_executor(self): + search_executor = self.search_executor_class( + request=self.request, + query=self.request.GET["q"], + ) + return search_executor + + def _get_search_query(self): + return self._search_executor.parser.query + + def _get_projects_to_search(self): + return self._search_executor.projects + + def get_queryset(self): + """ + Returns an Elasticsearch DSL search object or an iterator. + + .. note:: + + Calling ``list(search)`` over an DSL search object is the same as + calling ``search.execute().hits``. This is why an DSL search object + is compatible with DRF's paginator. + """ + use_advanced_query = should_use_advanced_query(self._get_projects_to_search()) + search = self._search_executor.search( + use_advanced_query=use_advanced_query, + aggregate_results=False, + ) + if not search: + return [] + + return search + + def get(self, request, *args, **kwargs): + self._validate_query_params() + result = self.list() + self._record_query(result) + return result + + def _record_query(self, response): + total_results = response.data.get("count", 0) + time = timezone.now() + query = self._get_search_query().lower().strip() + # NOTE: I think this may be confusing, + # since the number of results is the total + # of searching on all projects, this specific project + # could have had 0 results. + for project, version in self._get_projects_to_search(): + tasks.record_search_query.delay( + project.slug, + version.slug, + query, + total_results, + time.isoformat(), + ) + + def list(self): + queryset = self.get_queryset() + page = self.paginator.paginate_queryset( + queryset, + self.request, + view=self, + ) + serializer = self.get_serializer( + page, many=True, projects=self._get_projects_to_search() + ) + response = self.paginator.get_paginated_response(serializer.data) + self._add_extra_fields(response) + return response + + def _add_extra_fields(self, response): + """ + Add additional fields to the top level response. + + These are fields that aren't part of the serializers, + and are related to the whole list, rather than each element. + """ + # Add all projects that were used in the final search. + response.data["projects"] = [ + {"slug": project.slug, "versions": [{"slug": version.slug}]} + for project, version in self._get_projects_to_search() + ] + # Add the query used in the final search, + # this doesn't include arguments. + response.data["query"] = self._get_search_query() + + +class BaseProxiedSearchAPI(SearchAPI): + + """ + Use a separate class for the proxied version of this view. + + This is so we can override it in .com, + where we need to make use of our auth backends. + """ + + +class ProxiedSearchAPI(SettingsOverrideObject): + + _default_class = BaseProxiedSearchAPI diff --git a/readthedocs/search/faceted_search.py b/readthedocs/search/faceted_search.py index 4dfb8b0622f..e5fe90e3abb 100644 --- a/readthedocs/search/faceted_search.py +++ b/readthedocs/search/faceted_search.py @@ -1,10 +1,9 @@ -import structlog import re +import structlog from django.conf import settings from elasticsearch import Elasticsearch from elasticsearch_dsl import FacetedSearch, TermsFacet -from elasticsearch_dsl.faceted_search import NestedFacet from elasticsearch_dsl.query import ( Bool, FunctionScore, @@ -20,8 +19,6 @@ log = structlog.get_logger(__name__) -ALL_FACETS = ['project', 'version', 'role_name', 'language'] - class RTDFacetedSearch(FacetedSearch): @@ -268,11 +265,6 @@ def query(self, search, query): class PageSearch(RTDFacetedSearch): facets = { 'project': TermsFacet(field='project'), - 'version': TermsFacet(field='version'), - 'role_name': NestedFacet( - 'domains', - TermsFacet(field='domains.role_name') - ), } doc_types = [PageDocument] index = PageDocument._index._name @@ -365,18 +357,6 @@ def _get_nested_query(self, *, query, path, fields): for field in fields ] - # The ``post_filter`` filter will only filter documents - # at the parent level (domains is a nested document), - # resulting in results with domains that don't match the current - # role_name being filtered, so we need to force filtering by role_name - # on the ``domains`` document here. See #8268. - # TODO: We should use a flattened document instead - # to avoid this kind of problems and have faster queries. - role_name = self.filter_values.get('role_name') - if path == 'domains' and role_name: - role_name_query = Bool(must=Terms(**{'domains.role_name': role_name})) - bool_query = Bool(must=[role_name_query, bool_query]) - highlight = dict( self._highlight_options, fields={ diff --git a/readthedocs/search/tests/test_views.py b/readthedocs/search/tests/test_views.py index 563108f8faf..57fc559b545 100644 --- a/readthedocs/search/tests/test_views.py +++ b/readthedocs/search/tests/test_views.py @@ -36,7 +36,7 @@ def test_search_by_project_name(self, client, project, all_projects): results, _ = self._get_search_result( url=self.url, client=client, - search_params={ 'q': project.name }, + search_params={"q": project.name, "type": "project"}, ) assert len(results) == 1 @@ -52,7 +52,7 @@ def test_search_project_have_correct_language_facets(self, client, project): results, facets = self._get_search_result( url=self.url, client=client, - search_params={ 'q': project.name }, + search_params={"q": project.name, "type": "project"}, ) lang_facets = facets['language'] @@ -66,8 +66,8 @@ def test_search_project_have_correct_language_facets(self, client, project): def test_search_project_filter_language(self, client, project): """Test that searching project filtered according to language.""" # Create a project in bn and add it as a translation - translate = get(Project, language='bn', name=project.name) - search_params = { 'q': project.name, 'language': 'bn' } + translate = get(Project, language="bn", name=project.name) + search_params = {"q": project.name, "language": "bn", "type": "project"} results, facets = self._get_search_result( url=self.url, @@ -206,65 +206,6 @@ def test_file_search(self, client, project, data_type, page_num): # Make it lower because our search is case insensitive assert word.lower() in query.lower() - def test_file_search_have_correct_role_name_facets(self, client): - """Test that searching files should result all role_names.""" - - # searching for 'celery' to test that - # correct role_names are displayed - results, facets = self._get_search_result( - url=self.url, - client=client, - search_params={ 'q': 'celery', 'type': 'file' } - ) - assert len(results) >= 1 - role_name_facets = facets['role_name'] - role_name_facets_str = [facet[0] for facet in role_name_facets] - expected_role_names = ['py:class', 'py:function', 'py:method'] - assert sorted(expected_role_names) == sorted(role_name_facets_str) - for facet in role_name_facets: - assert facet[2] == False # because none of the facets are applied - - def test_file_search_filter_role_name(self, client): - """Test that searching files filtered according to role_names.""" - - search_params = { 'q': 'celery', 'type': 'file' } - # searching without the filter - results, facets = self._get_search_result( - url=self.url, - client=client, - search_params=search_params - ) - assert len(results) >= 2 # there are > 1 results without the filter - role_name_facets = facets['role_name'] - for facet in role_name_facets: - assert facet[2] == False # because none of the facets are applied - - confval_facet = 'py:class' - # checking if 'py:class' facet is present in results - assert confval_facet in [facet[0] for facet in role_name_facets] - - # filtering with role_name=py:class - search_params['role_name'] = confval_facet - new_results, new_facets = self._get_search_result( - url=self.url, - client=client, - search_params=search_params - ) - new_role_names_facets = new_facets['role_name'] - # All results from domains should have role_name='py:class'. - assert len(new_results) == 1 - first_result = new_results[0] - blocks = first_result['blocks'] - for block in blocks: - assert block['type'] == 'domain' - assert block['role'] == confval_facet - - for facet in new_role_names_facets: - if facet[0] == confval_facet: - assert facet[2] == True # because 'py:class' filter is active - else: - assert facet[2] == False - @pytest.mark.parametrize('data_type', DATA_TYPES_VALUES) @pytest.mark.parametrize('case', ['upper', 'lower', 'title']) def test_file_search_case_insensitive(self, client, project, case, data_type): @@ -319,13 +260,15 @@ def test_file_search_exact_match(self, client, project): client=client, search_params={ 'q': query, 'type': 'file' }) - # there must be only 1 result - # because the phrase is present in - # only one project - assert len(results) == 1 - assert results[0]['project'] == 'kuma' - assert results[0]['domain'] == 'http://readthedocs.org' - assert results[0]['path'] == '/docs/kuma/en/latest/documentation.html' + # There are two results, + # one from each version of the "kuma" project. + assert len(results) == 2 + assert results[0]["version"] == {"slug": "stable"} + assert results[1]["version"] == {"slug": "latest"} + for result in results: + assert result["project"] == {"alias": None, "slug": "kuma"} + assert result["domain"] == "http://readthedocs.org" + assert result["path"].endswith("/documentation.html") blocks = results[0]['blocks'] assert len(blocks) == 1 @@ -337,35 +280,14 @@ def test_file_search_exact_match(self, client, project): for word in highlighted_words: assert word.lower() in query.lower() - def test_file_search_have_correct_project_facets(self, client, all_projects): - """Test that file search have correct project facets in results""" - - # `environment` word is present both in `kuma` and `docs` files - # so search with this phrase - query = 'environment' - results, facets = self._get_search_result( - url=self.url, - client=client, - search_params={ 'q': query, 'type': 'file' }, - ) - # There should be 2 search result - assert len(results) == 2 - project_facets = facets['project'] - project_facets_str = [facet[0] for facet in project_facets] - assert len(project_facets_str) == 2 - - # kuma and pipeline should be there - assert sorted(project_facets_str) == sorted(['kuma', 'docs']) - def test_file_search_filter_by_project(self, client): """Test that search result are filtered according to project.""" # `environment` word is present both in `kuma` and `docs` files # so search with this phrase but filter through `kuma` project search_params = { - 'q': 'environment', - 'type': 'file', - 'project': 'kuma' + "q": "project:kuma environment", + "type": "file", } results, facets = self._get_search_result( url=self.url, @@ -378,11 +300,10 @@ def test_file_search_filter_by_project(self, client): # There should be 1 search result as we have filtered assert len(results) == 1 # kuma should should be there only - assert 'kuma' == results[0]['project'] + assert {"alias": None, "slug": "kuma"} == results[0]["project"] - # But there should be 2 projects in the project facets - # as the query is present in both projects - assert sorted(resulted_project_facets) == sorted(['kuma', 'docs']) + # The projects we search is the only one included in the final results. + assert resulted_project_facets == ["kuma"] @pytest.mark.xfail(reason='Versions are not showing correctly! Fixme while rewrite!') def test_file_search_show_versions(self, client, all_projects, es_index, settings): @@ -412,30 +333,24 @@ def test_file_search_show_versions(self, client, all_projects, es_index, setting assert sorted(project_versions) == sorted(version_facets_str) def test_file_search_subprojects(self, client, all_projects, es_index): - """ - TODO: File search should return results from subprojects also. - - This is currently disabled because the UX around it is weird. - You filter by a project, and get results for multiple. - """ project = all_projects[0] subproject = all_projects[1] # Add another project as subproject of the project - project.add_subproject(subproject) + project.add_subproject(subproject, alias="subproject") # Now search with subproject content but explicitly filter by the parent project query = get_search_query_from_project_file(project_slug=subproject.slug) search_params = { - 'q': query, - 'type': 'file', - 'project': project.slug, + "q": f"subprojects:{project.slug} {query}", + "type": "file", } results, _ = self._get_search_result( url=self.url, client=client, search_params=search_params, ) - assert len(results) == 0 + assert len(results) == 1 + assert results[0]["project"] == {"alias": "subproject", "slug": subproject.slug} @override_settings(ALLOW_PRIVATE_REPOS=True) def test_search_only_projects_owned_by_the_user(self, client, all_projects): @@ -451,15 +366,8 @@ def test_search_only_projects_owned_by_the_user(self, client, all_projects): ) assert len(results) > 0 - other_projects = [ - project.slug - for project in all_projects - if project.slug != 'docs' - ] - for result in results: - assert result['project'] == 'docs' - assert result['project'] not in other_projects + assert result["project"] == {"alias": None, "slug": "docs"} @override_settings(ALLOW_PRIVATE_REPOS=True) def test_search_no_owned_projects(self, client, all_projects): diff --git a/readthedocs/search/views.py b/readthedocs/search/views.py index baa18478b4a..aba3201d54a 100644 --- a/readthedocs/search/views.py +++ b/readthedocs/search/views.py @@ -1,18 +1,20 @@ """Search views.""" import collections +from urllib.parse import urlencode import structlog from django.conf import settings -from django.shortcuts import get_object_or_404, render +from django.http.response import HttpResponseRedirect +from django.urls import reverse from django.views import View +from django.views.generic import TemplateView -from readthedocs.builds.constants import LATEST -from readthedocs.projects.models import Feature, Project -from readthedocs.search.api.v2.serializers import ( - PageSearchSerializer, - ProjectSearchSerializer, -) -from readthedocs.search.faceted_search import ALL_FACETS, PageSearch, ProjectSearch +from readthedocs.projects.models import Project +from readthedocs.search.api.v2.serializers import ProjectSearchSerializer +from readthedocs.search.api.v3.executor import SearchExecutor +from readthedocs.search.api.v3.serializers import PageSearchSerializer +from readthedocs.search.api.v3.utils import should_use_advanced_query +from readthedocs.search.faceted_search import ProjectSearch log = structlog.get_logger(__name__) @@ -21,112 +23,36 @@ ( 'query', 'type', - 'project', - 'version', 'language', - 'role_name', ), ) -class SearchViewBase(View): - - http_method_names = ['get'] - max_search_results = 50 - - def _search(self, *, user_input, projects, use_advanced_query): - """Return search results and facets given a `user_input` and `projects` to filter by.""" - if not user_input.query: - return [], {} - - filters = {} - for avail_facet in ALL_FACETS: - value = getattr(user_input, avail_facet, None) - if value: - filters[avail_facet] = value - - search_facets = { - 'project': ProjectSearch, - 'file': PageSearch, - } - faceted_search_class = search_facets.get( - user_input.type, - ProjectSearch, - ) - search = faceted_search_class( - query=user_input.query, - filters=filters, - projects=projects, - use_advanced_query=use_advanced_query, - ) - results = search[:self.max_search_results].execute() - facets = results.facets - - # Make sure the selected facets are displayed, - # even when they return 0 results. - for facet in facets: - value = getattr(user_input, facet, None) - if value and value not in (name for name, *_ in facets[facet]): - facets[facet].insert(0, (value, 0, True)) - - return results, facets - - -class ProjectSearchView(SearchViewBase): +class ProjectSearchView(View): """ Search view of the ``search`` tab. + This redirects to the main search now. + Query params: - q: search term - - version: version to filter by - - role_name: sphinx role to filter by """ - def _get_project(self, project_slug): - queryset = Project.objects.public(self.request.user) - project = get_object_or_404(queryset, slug=project_slug) - return project + http_method_names = ["get"] def get(self, request, project_slug): - project_obj = self._get_project(project_slug) - use_advanced_query = not project_obj.has_feature( - Feature.DEFAULT_TO_FUZZY_SEARCH, + query = request.GET.get("q", "") + url = ( + reverse("search") + + "?" + + urlencode({"q": f"project:{project_slug} {query}"}) ) + return HttpResponseRedirect(url) - user_input = UserInput( - query=request.GET.get('q'), - type='file', - project=project_slug, - version=request.GET.get('version', LATEST), - role_name=request.GET.get('role_name'), - language=None, - ) - results, facets = self._search( - user_input=user_input, - projects=[user_input.project], - use_advanced_query=use_advanced_query, - ) - - results = PageSearchSerializer(results, many=True).data - - template_context = user_input._asdict() - template_context.update({ - 'results': results, - 'facets': facets, - 'project_obj': project_obj, - }) - - return render( - request, - 'search/elastic_search.html', - template_context, - ) - - -class GlobalSearchView(SearchViewBase): +class GlobalSearchView(TemplateView): """ Global search enabled for logged out users and anyone using the dashboard. @@ -135,22 +61,63 @@ class GlobalSearchView(SearchViewBase): - q: search term - type: type of document to search (project or file) - - project: project to filter by - - language: project language to filter by - - version: version to filter by - - role_name: sphinx role to filter by + - language: project language to filter by (only valid if type is project) """ - def get(self, request): + http_method_names = ["get"] + max_search_results = 50 + available_facets = ["language"] + template_name = "search/elastic_search.html" + + def get_context_data(self, **kwargs): + context = super().get_context_data(**kwargs) user_input = UserInput( - query=request.GET.get('q'), - type=request.GET.get('type', 'project'), - project=request.GET.get('project'), - version=request.GET.get('version', LATEST), - language=request.GET.get('language'), - role_name=request.GET.get('role_name'), + query=self.request.GET.get("q"), + type=self.request.GET.get("type", "file"), + language=self.request.GET.get("language"), ) + if user_input.type == "file": + context.update(self._searh_files()) + else: + context.update(self._search_projects(user_input, self.request)) + return context + + def _searh_files(self): + results, facets = [], {} + search_query = "" + total_count = 0 + query = self.request.GET.get("q") + if query: + search_executor = SearchExecutor( + request=self.request, + query=query, + arguments_required=False, + default_all=not settings.ALLOW_PRIVATE_REPOS, + ) + search_query = search_executor.parser.query + use_advanced_query = should_use_advanced_query(search_executor.projects) + search = search_executor.search(use_advanced_query=use_advanced_query) + if search: + results = search[: self.max_search_results].execute() + facets = results.facets + total_count = results.hits.total["value"] + results = PageSearchSerializer( + results, + projects=search_executor.projects, + many=True, + ).data + + return { + "query": query, + "search_query": search_query, + "results": results, + "facets": facets, + "total_count": total_count, + "type": "file", + } + def _search_projects(self, user_input, request): + total_count = 0 projects = [] # If we allow private projects, # we only search on projects the user belongs or have access to. @@ -169,22 +136,45 @@ def get(self, request): projects=projects, use_advanced_query=True, ) + total_count = results.hits.total["value"] + results = ProjectSearchSerializer(results, many=True).data + context = user_input._asdict() + context.update( + { + "search_query": user_input.query, + "results": results, + "total_count": total_count, + "facets": facets, + } + ) + return context - serializers = { - 'project': ProjectSearchSerializer, - 'file': PageSearchSerializer, - } - serializer = serializers.get(user_input.type, ProjectSearchSerializer) - results = serializer(results, many=True).data - - template_context = user_input._asdict() - template_context.update({ - 'results': results, - 'facets': facets, - }) - - return render( - request, - 'search/elastic_search.html', - template_context, + def _search(self, *, user_input, projects, use_advanced_query): + """Return search results and facets given a `user_input` and `projects` to filter by.""" + if not user_input.query: + return [], {} + + filters = {} + for avail_facet in self.available_facets: + value = getattr(user_input, avail_facet, None) + if value: + filters[avail_facet] = value + + search = ProjectSearch( + query=user_input.query, + filters=filters, + projects=projects, + use_advanced_query=use_advanced_query, ) + # pep8 and blank don't agree on having a space before :. + results = search[: self.max_search_results].execute() # noqa + facets = results.facets + + # Make sure the selected facets are displayed, + # even when they return 0 results. + for facet in facets: + value = getattr(user_input, facet, None) + if value and value not in (name for name, *_ in facets[facet]): + facets[facet].insert(0, (value, 0, True)) + + return results, facets diff --git a/readthedocs/templates/core/project_bar_base.html b/readthedocs/templates/core/project_bar_base.html index 5e6252991e9..1b25a857a3f 100644 --- a/readthedocs/templates/core/project_bar_base.html +++ b/readthedocs/templates/core/project_bar_base.html @@ -41,7 +41,7 @@

  • {% trans "Downloads" %}
  • -
  • {% trans "Search" %}
  • +
  • {% trans "Search" %}
  • {% trans "Builds" %}
  • diff --git a/readthedocs/templates/search/elastic_search.html b/readthedocs/templates/search/elastic_search.html index 8773b353a3b..921e9f9eae4 100644 --- a/readthedocs/templates/search/elastic_search.html +++ b/readthedocs/templates/search/elastic_search.html @@ -36,30 +36,17 @@

    {% trans 'Object Type' %}
    {% if facets.project and not project_obj %}
    {% trans 'Projects' %}
    + +
  • + + {% trans 'Search all' %} + +
  • + {% for name, count, selected in facets.project %}
  • - {% if project == name %} - {{ name }} - {% else %} - {{ name }} - {% endif %} - ({{ count }}) - -
  • - {% endfor %} -
    - {% endif %} - - {% if facets.version %} -
    {% trans 'Version' %}
    - {% for name, count, selected in facets.version %} -
  • - {% if version == name %} - {{ name }} - {% else %} - {{ name }} - {% endif %} - ({{ count }}) + + {{ name }} ({{ count }})
  • {% endfor %} @@ -82,23 +69,6 @@
    {% trans 'Language' %}

    {% endif %} - - {% if facets.role_name %} -
    {% trans 'Code API Type' %}
    - {% for name, count, selected in facets.role_name %} -
  • - {% if role_name == name %} - {{ name }} - {% else %} - {{ name }} - {% endif %} - ({{ count }}) - -
  • - {% endfor %} -
    - {% endif %} - {% block sponsor %}
    Search is sponsored by Elastic, and hosted on Elastic Cloud. @@ -115,10 +85,7 @@

    {% trans 'Search' %}

    {% if type %} {% endif %} - {% if project %} {% endif %} - {% if version %} {% endif %} {% if language %} {% endif %} - {% if role_name %} {% endif %}
    {% comment %}Translators: This is about starting a search{% endcomment %} @@ -135,8 +102,8 @@

    {% trans 'Search' %}

    - {% blocktrans with count=results.hits.total query=query|default:"" %} - {{ count }} results for `{{ query }}` + {% blocktrans with count=total_count|default:"0" query=search_query|default:"" %} + {{ count }} results for {{ query }} {% endblocktrans %}

    @@ -167,14 +134,14 @@

    {% endfor %} {% elif result.type == 'page' %} - - {{ result.project }} - {% if result.highlights.title %} {{ result.highlights.title.0|safe }} {% else %} {{ result.title }} {% endif %} + + {{ result.project.slug }} - {% if result.highlights.title %} {{ result.highlights.title.0|safe }} {% else %} {{ result.title }} {% endif %} {% for block in result.blocks %} {% if block.type == 'domain' %}

    - + {% if block.highlights.name %} {% with domain_name=block.highlights.name %} [{{ block.role }}]: {{ domain_name.0|safe }} @@ -198,7 +165,7 @@

    {% elif block.type == 'section' %}

    - + {% if block.highlights.title %} {% with section_title=block.highlights.title %} {{ section_title.0|safe }} diff --git a/readthedocs/urls.py b/readthedocs/urls.py index 9fce157e7a8..d637418ca3f 100644 --- a/readthedocs/urls.py +++ b/readthedocs/urls.py @@ -91,6 +91,7 @@ re_path(r'^api/v2/', include('readthedocs.api.v2.urls')), # Keep `search_api` at root level, so the test does not fail for other API path("api/v2/search/", include("readthedocs.search.api.v2.urls")), + path("api/v3/search/", include("readthedocs.search.api.v3.urls")), # Deprecated re_path(r'^api/v1/embed/', include('readthedocs.embed.urls')), re_path(r'^api/v2/embed/', include('readthedocs.embed.urls')),