From e56e39c741253449c08adf1c280f4cbba16bf9e0 Mon Sep 17 00:00:00 2001 From: Santos Gallegos Date: Mon, 22 Aug 2022 16:16:54 -0500 Subject: [PATCH 1/5] Design doc: new search API Ref: - https://github.com/readthedocs/readthedocs.org/pull/9376 - https://github.com/readthedocs/readthedocs.org/issues/8678 --- docs/dev/design/new-search-api.rst | 238 +++++++++++++++++++++++++++++ 1 file changed, 238 insertions(+) create mode 100644 docs/dev/design/new-search-api.rst diff --git a/docs/dev/design/new-search-api.rst b/docs/dev/design/new-search-api.rst new file mode 100644 index 00000000000..ad5db6eb952 --- /dev/null +++ b/docs/dev/design/new-search-api.rst @@ -0,0 +1,238 @@ +New search API +============== + +Goals +----- + +- Allow to configure search at the API level, + instead of having the options in the database. +- Allow to search a group of projects/versions at the same time. + +Syntax +------ + +The parameters will be given in the query using the ``key:value`` syntax. +Inspired by `GitHub `__ and other services. + +Currently the values from all parameters don't include spaces, +so surrounding the value with quotes won't be supported (``key:"value"``). + +To avoid interpreting a query as a parameter, +an escape character can be put in place, +for example ``project\:docs`` won't be interpreted as +a parameter, but as the search term ``project:docs``. +This is only necessary if the query includes a valid parameter, +unknown parameters (``foo:bar``) don't require escaping. + +All other tokens that don't match a valid parameter, +will be join to form the final search term. + +Parameters +---------- + +project: + Indicates the project and version + to includes results from (this doesn't include subprojects). + If the version isn't provided, + the default version is used. + + Examples: + + - ``project:docs/latest`` + - ``project:docs`` + + It can be one or more project parameters. + At least one is required. + + If the user doesn't have permission over one version or if the version doesn't exist, + we don't include results from that version. + We don't fail the search, this is so users can use one endpoint for all their users, + without worrying about what permissions each user has or updating it after a version or project + has been deleted. + + The ``/`` is used as separator, + but it could be any other character that isn't present in the slug of a version or project. + ``:`` was considered (``project:docs:latest``), but it could be hard to read + since ``:`` is already used to separate the key from the value. + +Including subprojects +````````````````````` +Now that we are returning results only +from the given projects, we need an easy way to +include results from subprojects. +Some ideas for implementing this feature are: + +``include-subprojects:true`` + This doesn't make it clear from what + projects we are going to include subprojects from. + We could make it so it returns subprojects for all projects. + Users will probably use this with one project only. + +``subprojects:project/version`` (inclusive) + This allows to specify from what project exactly + we are going to return subprojects from, + and also include the version we are going to try to match. + This includes the parent project in the results. + + As the ``project`` parameter, the version can be optional, + and defaults to the default version of the parent project. + +``subprojects:project/version`` (exclusive) + This is the same as the above, + but it doesn't include the parent project in the results. + If we want to include the results from the project, then + the query will be ``project:project/latest subprojects:project/latest``. + Is this useful? + +Cache +----- + +Since the request could be attached to more than one project. +We will return all the list of projects for the cache tags, +this is ``project1, project1:version, project2, project2:version``. + +CORS +---- + +Since the request could be attached to more than one project. +we can't make the decision if we should enable CORS or not on a given request from the middleware easily, +so we won't allow cross site requests when using the new API for now +(we need to refactor our CORS code, so every view can decide if CORS should be allowed or not). + +Analytics +--------- + +We will record the same query for each project that was used in the final search. + +Response +-------- + +The response will be similar to the old one, +but will include extra information about the search, +like the projects, versions, and the query that were used in the final search. + +And the ``version``, ``project``, and ``project_alias`` attributes will +now be objects. + +We could just re-use the old response too, +since the only breaking changes would be the attributes now being objects, +and we aren't adding any new information to those objects (yet). +But also, re-using the current serializers shouldn't be a problem either. + +.. code-block:: json + + { + "count": 1, + "next": null, + "previous": null, + "projects": [ + { + "slug": "docs", + "versions": [ + { + "slug": "latest" + } + ] + } + ], + "query": "The final query used in the search", + "results": [ + { + "type": "page", + "project": { + "slug": "docs", + "alias": null + }, + "version": { + "slug": "latest" + }, + "title": "Main Features", + "path": "/en/latest/features.html", + "domain": "https://docs.readthedocs.io", + "highlights": { + "title": [] + }, + "blocks": [ + { + "type": "section", + "id": "full-text-search", + "title": "Full-Text Search", + "content": "We provide search across all the projects that we host. This actually comes in two different search experiences: dashboard search on the Read the Docs dashboard and in-doc search on documentation sites, using your own theme and our search results. We offer a number of search features: Search across subprojects Search results land on the exact content you were looking for Search across projects you have access to (available on Read the Docs for Business) A full range of search operators including exact matching and excluding phrases. Learn more about Server Side Search.", + "highlights": { + "title": [ + "Full-Text Search" + ], + "content": [] + } + }, + { + "type": "domain", + "role": "http:post", + "name": "/api/v3/projects/", + "id": "post--api-v3-projects-", + "content": "Import a project under authenticated user. Example request: BashPython$ curl \\ -X POST \\ -H \"Authorization: Token \" https://readthedocs.org/api/v3/projects/ \\ -H \"Content-Type: application/json\" \\ -d @body.json import requests import json URL = 'https://readthedocs.org/api/v3/projects/' TOKEN = '' HEADERS = {'Authorization': f'token {TOKEN}'} data = json.load(open('body.json', 'rb')) response = requests.post( URL, json=data, headers=HEADERS, ) print(response.json()) The content of body.json is like, { \"name\": \"Test Project\", \"repository\": { \"url\": \"https://github.com/readthedocs/template\", \"type\": \"git\" }, \"homepage\": \"http://template.readthedocs.io/\", \"programming_language\": \"py\", \"language\": \"es\" } Example response: See Project details Note Read the Docs for Business, also accepts", + "highlights": { + "name": [], + "content": [ + ", json=data, headers=HEADERS, ) print(response.json()) The content of body.json is like, "name": "Test" + ] + } + } + ] + } + ] + } + +Examples +-------- + +- ``project:docs project:dev/latest test``: search for ``test`` in the default + version of the ``docs`` project, and in the latest version of the ``dev`` project. +- ``a project:docs/stable search term``: search for ``a search term`` in the + stable version of the ``docs`` project. + +- ``project:docs project\:project/version``: search for ``project::project/version`` in the + default version of the ``docs`` project. + +- ``search``: invalid, at least one project is required. + + +Future features +--------------- + +- Allow searching on several versions of the same project + (the API response is prepared to support this). +- Allow specify the type of search: + + - Multi match (query as is) + - Simple query string (allows using the ES query syntax) + - Fuzzy search (same as multi match, but with with fuzziness) + +Questions / pending decisions +----------------------------- + +Integration with the dashboard search. +~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ + +The search API and the search from the dashboard +use the same backend, but they are used in a different way. + +The dashboard search by default searches on all projects (.org), +or all project the current user has access to (.com). +And the API search searches on explicitly given projects. + +The dashboard search allows filtering by version and role, +the API search allows filtering only by version (and it's required). + +The dashboard search makes use of filters in order to return +the number of results from other versions/roles. +Is this feature useful? It could slow down the response. +Searching several versions at the same time could be a better replace? + +The dashboard search can be used to search for projects +by their name and description. +The API search doesn't support this. +Is this feature useful? Should we implement +this as a way to filter projects from https://readthedocs.org/dashboard/ instead? +This will be using just https://docs.djangoproject.com/en/4.0/ref/contrib/postgres/search/ +or ``contains``. From 05f883430cdb0acaf44e0ccff97a988e86b053ca Mon Sep 17 00:00:00 2001 From: Santos Gallegos Date: Thu, 25 Aug 2022 17:57:37 -0500 Subject: [PATCH 2/5] Mention possible solutions for CORS --- docs/dev/design/new-search-api.rst | 8 ++++++-- 1 file changed, 6 insertions(+), 2 deletions(-) diff --git a/docs/dev/design/new-search-api.rst b/docs/dev/design/new-search-api.rst index ad5db6eb952..70dc5edbcb7 100644 --- a/docs/dev/design/new-search-api.rst +++ b/docs/dev/design/new-search-api.rst @@ -96,8 +96,12 @@ CORS Since the request could be attached to more than one project. we can't make the decision if we should enable CORS or not on a given request from the middleware easily, -so we won't allow cross site requests when using the new API for now -(we need to refactor our CORS code, so every view can decide if CORS should be allowed or not). +so we won't allow cross site requests when using the new API for now. +We would need to refactor our CORS code, +so every view can decide if CORS should be allowed or not, +for this case, cross site requests will be allowed only if all versions of the final search are public, +another alternative could be to always allow cross site requests, +but when a request is cross site, we only return results from public versions. Analytics --------- From d5221644fcb3f81cf867cca1c7c998a77dc97134 Mon Sep 17 00:00:00 2001 From: Santos Gallegos Date: Wed, 31 Aug 2022 18:57:26 -0500 Subject: [PATCH 3/5] Updates --- docs/dev/design/new-search-api.rst | 113 +++++++++++++++++++++-------- 1 file changed, 84 insertions(+), 29 deletions(-) diff --git a/docs/dev/design/new-search-api.rst b/docs/dev/design/new-search-api.rst index 70dc5edbcb7..1b441f54f46 100644 --- a/docs/dev/design/new-search-api.rst +++ b/docs/dev/design/new-search-api.rst @@ -7,6 +7,7 @@ Goals - Allow to configure search at the API level, instead of having the options in the database. - Allow to search a group of projects/versions at the same time. +- Bring the same syntax to the dashboard search. Syntax ------ @@ -56,7 +57,8 @@ project: since ``:`` is already used to separate the key from the value. Including subprojects -````````````````````` +~~~~~~~~~~~~~~~~~~~~~ + Now that we are returning results only from the given projects, we need an easy way to include results from subprojects. @@ -200,43 +202,96 @@ Examples - ``search``: invalid, at least one project is required. +Dashboard search +---------------- + +This is the search feature that you can access from +the readthedocs.org/readthedocs.com domains. + +We have two types: + +Project scoped search: + Search files and versions of the curent project only. + +Global search: + Search files and versions of all projects in .org, + and only the projects the user has access to in .com. + + Global search also allows to search projects by name/description. + +This search also allows you to see the number of results +from other projects/versions/sphinx domains (facets). + +Project scoped search +~~~~~~~~~~~~~~~~~~~~~ + +Here the new syntax won't have effect, +since we are searching for the files of one project only! + +Another approach could be linking to the global search +with ``project:{project.slug}`` filled in the query. + +Global search (projects) +~~~~~~~~~~~~~~~~~~~~~~~~ + +We can keep the project search as is, +without using the new syntax (since it doesn't make sense there). + +Global search (files) +~~~~~~~~~~~~~~~~~~~~~ + +Using the same syntax from the API will be allowed, +by default it will search all projects in .org, +and all projects the user has access to in .com. + +Another approach could be to allow +filtering by user on .org, this is ``user:stsewd`` or ``user:@me`` +so a user can search all their projects easily. +We could allow just ``@me`` to start. + +Facets +~~~~~~ + +We can keep the facets, but they would be a little different, +since with the new syntax we need to specify a project in order to search for +a version, i.e, we can't search all ``latest`` versions of all projects. + +By default we will use/show the ``project`` facet, +and after the user has filtered by a project, +we will use/show the ``version`` facet. + +If the user searches more than one project, +things get complicated, should we keep showing the ``version`` facet? +If clicked, should we change the version on all the projects? + +If that is too complicated to explain/implement, +we should be fine by just supporting the ``project`` +facet for now. + +Backwards compatibility +~~~~~~~~~~~~~~~~~~~~~~~ + +We should be able to keep the old URLs working in the global search, +but we could also just ignore the old syntax, or transform +the old syntax to the new one and redirect the user to it, +for example ``?q=test&project=docs&version=latest`` +would be transformed to ``?q=test project:docs/latest``. Future features --------------- - Allow searching on several versions of the same project (the API response is prepared to support this). +- Allow searching on all versions of a project easily, + with a syntax like ``project:docs/*`` or ``project:docs/@all``. - Allow specify the type of search: - Multi match (query as is) - Simple query string (allows using the ES query syntax) - Fuzzy search (same as multi match, but with with fuzziness) -Questions / pending decisions ------------------------------ - -Integration with the dashboard search. -~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ - -The search API and the search from the dashboard -use the same backend, but they are used in a different way. - -The dashboard search by default searches on all projects (.org), -or all project the current user has access to (.com). -And the API search searches on explicitly given projects. - -The dashboard search allows filtering by version and role, -the API search allows filtering only by version (and it's required). - -The dashboard search makes use of filters in order to return -the number of results from other versions/roles. -Is this feature useful? It could slow down the response. -Searching several versions at the same time could be a better replace? - -The dashboard search can be used to search for projects -by their name and description. -The API search doesn't support this. -Is this feature useful? Should we implement -this as a way to filter projects from https://readthedocs.org/dashboard/ instead? -This will be using just https://docs.djangoproject.com/en/4.0/ref/contrib/postgres/search/ -or ``contains``. +- Add the ``organization`` filter, + so users can search by all projects that belong + to an organization. + Would we show results of all versions + or just the default version? From d31be80ada60d342022635ffa1bde2ec91243d97 Mon Sep 17 00:00:00 2001 From: Santos Gallegos Date: Thu, 1 Sep 2022 12:58:14 -0500 Subject: [PATCH 4/5] Updates --- docs/dev/design/new-search-api.rst | 7 ++++--- 1 file changed, 4 insertions(+), 3 deletions(-) diff --git a/docs/dev/design/new-search-api.rst b/docs/dev/design/new-search-api.rst index 1b441f54f46..c375cac6505 100644 --- a/docs/dev/design/new-search-api.rst +++ b/docs/dev/design/new-search-api.rst @@ -252,6 +252,8 @@ We could allow just ``@me`` to start. Facets ~~~~~~ +We will support only the ``projects`` facet to start. + We can keep the facets, but they would be a little different, since with the new syntax we need to specify a project in order to search for a version, i.e, we can't search all ``latest`` versions of all projects. @@ -290,8 +292,7 @@ Future features - Simple query string (allows using the ES query syntax) - Fuzzy search (same as multi match, but with with fuzziness) -- Add the ``organization`` filter, +- Add the ``org`` filter, so users can search by all projects that belong to an organization. - Would we show results of all versions - or just the default version? + We would show results of the default versions of each project. From 08517656370ae2932c8994d2bf9541841f9f7198 Mon Sep 17 00:00:00 2001 From: Santos Gallegos Date: Mon, 14 Nov 2022 10:18:59 -0500 Subject: [PATCH 5/5] Update doc --- docs/dev/design/new-search-api.rst | 55 ++++++++++++++++++++---------- 1 file changed, 37 insertions(+), 18 deletions(-) diff --git a/docs/dev/design/new-search-api.rst b/docs/dev/design/new-search-api.rst index c375cac6505..4d829679185 100644 --- a/docs/dev/design/new-search-api.rst +++ b/docs/dev/design/new-search-api.rst @@ -32,29 +32,43 @@ Parameters ---------- project: - Indicates the project and version - to includes results from (this doesn't include subprojects). - If the version isn't provided, - the default version is used. + Indicates the project and version + to includes results from (this doesn't include subprojects). + If the version isn't provided, + the default version is used. - Examples: + Examples: - - ``project:docs/latest`` - - ``project:docs`` + - ``project:docs/latest`` + - ``project:docs`` - It can be one or more project parameters. - At least one is required. + It can be one or more project parameters. + At least one is required. - If the user doesn't have permission over one version or if the version doesn't exist, - we don't include results from that version. - We don't fail the search, this is so users can use one endpoint for all their users, - without worrying about what permissions each user has or updating it after a version or project - has been deleted. + If the user doesn't have permission over one version or if the version doesn't exist, + we don't include results from that version. + We don't fail the search, this is so users can use one endpoint for all their users, + without worrying about what permissions each user has or updating it after a version or project + has been deleted. - The ``/`` is used as separator, - but it could be any other character that isn't present in the slug of a version or project. - ``:`` was considered (``project:docs:latest``), but it could be hard to read - since ``:`` is already used to separate the key from the value. + The ``/`` is used as separator, + but it could be any other character that isn't present in the slug of a version or project. + ``:`` was considered (``project:docs:latest``), but it could be hard to read + since ``:`` is already used to separate the key from the value. + +subprojects: + This allows to specify from what project exactly + we are going to return subprojects from, + and also include the version we are going to try to match. + This includes the parent project in the results. + + As the ``project`` parameter, the version can be optional, + and defaults to the default version of the parent project. + +user: + Include results from projects the given user has access to. + The only supported value is ``@me``, + which is an alias for the current user. Including subprojects ~~~~~~~~~~~~~~~~~~~~~ @@ -86,6 +100,11 @@ Some ideas for implementing this feature are: the query will be ``project:project/latest subprojects:project/latest``. Is this useful? +The second option was chosen, since that's the current behavior +of our search when searching on a project with subprojects, +and avoids having to repeat the project if the user wants to +include it in the search too. + Cache -----