Skip to content

Commit fc7f68b

Browse files
authored
Design doc: new search API (#9533)
Ref: - #9376 - #8678
1 parent e600fd1 commit fc7f68b

File tree

1 file changed

+317
-0
lines changed

1 file changed

+317
-0
lines changed

docs/dev/design/new-search-api.rst

+317
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,317 @@
1+
New search API
2+
==============
3+
4+
Goals
5+
-----
6+
7+
- Allow to configure search at the API level,
8+
instead of having the options in the database.
9+
- Allow to search a group of projects/versions at the same time.
10+
- Bring the same syntax to the dashboard search.
11+
12+
Syntax
13+
------
14+
15+
The parameters will be given in the query using the ``key:value`` syntax.
16+
Inspired by `GitHub <https://docs.github.com/en/rest/search>`__ and other services.
17+
18+
Currently the values from all parameters don't include spaces,
19+
so surrounding the value with quotes won't be supported (``key:"value"``).
20+
21+
To avoid interpreting a query as a parameter,
22+
an escape character can be put in place,
23+
for example ``project\:docs`` won't be interpreted as
24+
a parameter, but as the search term ``project:docs``.
25+
This is only necessary if the query includes a valid parameter,
26+
unknown parameters (``foo:bar``) don't require escaping.
27+
28+
All other tokens that don't match a valid parameter,
29+
will be join to form the final search term.
30+
31+
Parameters
32+
----------
33+
34+
project:
35+
Indicates the project and version
36+
to includes results from (this doesn't include subprojects).
37+
If the version isn't provided,
38+
the default version is used.
39+
40+
Examples:
41+
42+
- ``project:docs/latest``
43+
- ``project:docs``
44+
45+
It can be one or more project parameters.
46+
At least one is required.
47+
48+
If the user doesn't have permission over one version or if the version doesn't exist,
49+
we don't include results from that version.
50+
We don't fail the search, this is so users can use one endpoint for all their users,
51+
without worrying about what permissions each user has or updating it after a version or project
52+
has been deleted.
53+
54+
The ``/`` is used as separator,
55+
but it could be any other character that isn't present in the slug of a version or project.
56+
``:`` was considered (``project:docs:latest``), but it could be hard to read
57+
since ``:`` is already used to separate the key from the value.
58+
59+
subprojects:
60+
This allows to specify from what project exactly
61+
we are going to return subprojects from,
62+
and also include the version we are going to try to match.
63+
This includes the parent project in the results.
64+
65+
As the ``project`` parameter, the version can be optional,
66+
and defaults to the default version of the parent project.
67+
68+
user:
69+
Include results from projects the given user has access to.
70+
The only supported value is ``@me``,
71+
which is an alias for the current user.
72+
73+
Including subprojects
74+
~~~~~~~~~~~~~~~~~~~~~
75+
76+
Now that we are returning results only
77+
from the given projects, we need an easy way to
78+
include results from subprojects.
79+
Some ideas for implementing this feature are:
80+
81+
``include-subprojects:true``
82+
This doesn't make it clear from what
83+
projects we are going to include subprojects from.
84+
We could make it so it returns subprojects for all projects.
85+
Users will probably use this with one project only.
86+
87+
``subprojects:project/version`` (inclusive)
88+
This allows to specify from what project exactly
89+
we are going to return subprojects from,
90+
and also include the version we are going to try to match.
91+
This includes the parent project in the results.
92+
93+
As the ``project`` parameter, the version can be optional,
94+
and defaults to the default version of the parent project.
95+
96+
``subprojects:project/version`` (exclusive)
97+
This is the same as the above,
98+
but it doesn't include the parent project in the results.
99+
If we want to include the results from the project, then
100+
the query will be ``project:project/latest subprojects:project/latest``.
101+
Is this useful?
102+
103+
The second option was chosen, since that's the current behavior
104+
of our search when searching on a project with subprojects,
105+
and avoids having to repeat the project if the user wants to
106+
include it in the search too.
107+
108+
Cache
109+
-----
110+
111+
Since the request could be attached to more than one project.
112+
We will return all the list of projects for the cache tags,
113+
this is ``project1, project1:version, project2, project2:version``.
114+
115+
CORS
116+
----
117+
118+
Since the request could be attached to more than one project.
119+
we can't make the decision if we should enable CORS or not on a given request from the middleware easily,
120+
so we won't allow cross site requests when using the new API for now.
121+
We would need to refactor our CORS code,
122+
so every view can decide if CORS should be allowed or not,
123+
for this case, cross site requests will be allowed only if all versions of the final search are public,
124+
another alternative could be to always allow cross site requests,
125+
but when a request is cross site, we only return results from public versions.
126+
127+
Analytics
128+
---------
129+
130+
We will record the same query for each project that was used in the final search.
131+
132+
Response
133+
--------
134+
135+
The response will be similar to the old one,
136+
but will include extra information about the search,
137+
like the projects, versions, and the query that were used in the final search.
138+
139+
And the ``version``, ``project``, and ``project_alias`` attributes will
140+
now be objects.
141+
142+
We could just re-use the old response too,
143+
since the only breaking changes would be the attributes now being objects,
144+
and we aren't adding any new information to those objects (yet).
145+
But also, re-using the current serializers shouldn't be a problem either.
146+
147+
.. code-block:: json
148+
149+
{
150+
"count": 1,
151+
"next": null,
152+
"previous": null,
153+
"projects": [
154+
{
155+
"slug": "docs",
156+
"versions": [
157+
{
158+
"slug": "latest"
159+
}
160+
]
161+
}
162+
],
163+
"query": "The final query used in the search",
164+
"results": [
165+
{
166+
"type": "page",
167+
"project": {
168+
"slug": "docs",
169+
"alias": null
170+
},
171+
"version": {
172+
"slug": "latest"
173+
},
174+
"title": "Main Features",
175+
"path": "/en/latest/features.html",
176+
"domain": "https://docs.readthedocs.io",
177+
"highlights": {
178+
"title": []
179+
},
180+
"blocks": [
181+
{
182+
"type": "section",
183+
"id": "full-text-search",
184+
"title": "Full-Text Search",
185+
"content": "We provide search across all the projects that we host. This actually comes in two different search experiences: dashboard search on the Read the Docs dashboard and in-doc search on documentation sites, using your own theme and our search results. We offer a number of search features: Search across subprojects Search results land on the exact content you were looking for Search across projects you have access to (available on Read the Docs for Business) A full range of search operators including exact matching and excluding phrases. Learn more about Server Side Search.",
186+
"highlights": {
187+
"title": [
188+
"Full-<span>Text</span> Search"
189+
],
190+
"content": []
191+
}
192+
},
193+
{
194+
"type": "domain",
195+
"role": "http:post",
196+
"name": "/api/v3/projects/",
197+
"id": "post--api-v3-projects-",
198+
"content": "Import a project under authenticated user. Example request: BashPython$ curl \\ -X POST \\ -H \"Authorization: Token <token>\" https://readthedocs.org/api/v3/projects/ \\ -H \"Content-Type: application/json\" \\ -d @body.json import requests import json URL = 'https://readthedocs.org/api/v3/projects/' TOKEN = '<token>' HEADERS = {'Authorization': f'token {TOKEN}'} data = json.load(open('body.json', 'rb')) response = requests.post( URL, json=data, headers=HEADERS, ) print(response.json()) The content of body.json is like, { \"name\": \"Test Project\", \"repository\": { \"url\": \"https://github.com/readthedocs/template\", \"type\": \"git\" }, \"homepage\": \"http://template.readthedocs.io/\", \"programming_language\": \"py\", \"language\": \"es\" } Example response: See Project details Note Read the Docs for Business, also accepts",
199+
"highlights": {
200+
"name": [],
201+
"content": [
202+
", json=data, headers=HEADERS, ) print(response.json()) The content of body.json is like, &quot;name&quot;: &quot;<span>Test</span>"
203+
]
204+
}
205+
}
206+
]
207+
}
208+
]
209+
}
210+
211+
Examples
212+
--------
213+
214+
- ``project:docs project:dev/latest test``: search for ``test`` in the default
215+
version of the ``docs`` project, and in the latest version of the ``dev`` project.
216+
- ``a project:docs/stable search term``: search for ``a search term`` in the
217+
stable version of the ``docs`` project.
218+
219+
- ``project:docs project\:project/version``: search for ``project::project/version`` in the
220+
default version of the ``docs`` project.
221+
222+
- ``search``: invalid, at least one project is required.
223+
224+
Dashboard search
225+
----------------
226+
227+
This is the search feature that you can access from
228+
the readthedocs.org/readthedocs.com domains.
229+
230+
We have two types:
231+
232+
Project scoped search:
233+
Search files and versions of the curent project only.
234+
235+
Global search:
236+
Search files and versions of all projects in .org,
237+
and only the projects the user has access to in .com.
238+
239+
Global search also allows to search projects by name/description.
240+
241+
This search also allows you to see the number of results
242+
from other projects/versions/sphinx domains (facets).
243+
244+
Project scoped search
245+
~~~~~~~~~~~~~~~~~~~~~
246+
247+
Here the new syntax won't have effect,
248+
since we are searching for the files of one project only!
249+
250+
Another approach could be linking to the global search
251+
with ``project:{project.slug}`` filled in the query.
252+
253+
Global search (projects)
254+
~~~~~~~~~~~~~~~~~~~~~~~~
255+
256+
We can keep the project search as is,
257+
without using the new syntax (since it doesn't make sense there).
258+
259+
Global search (files)
260+
~~~~~~~~~~~~~~~~~~~~~
261+
262+
Using the same syntax from the API will be allowed,
263+
by default it will search all projects in .org,
264+
and all projects the user has access to in .com.
265+
266+
Another approach could be to allow
267+
filtering by user on .org, this is ``user:stsewd`` or ``user:@me``
268+
so a user can search all their projects easily.
269+
We could allow just ``@me`` to start.
270+
271+
Facets
272+
~~~~~~
273+
274+
We will support only the ``projects`` facet to start.
275+
276+
We can keep the facets, but they would be a little different,
277+
since with the new syntax we need to specify a project in order to search for
278+
a version, i.e, we can't search all ``latest`` versions of all projects.
279+
280+
By default we will use/show the ``project`` facet,
281+
and after the user has filtered by a project,
282+
we will use/show the ``version`` facet.
283+
284+
If the user searches more than one project,
285+
things get complicated, should we keep showing the ``version`` facet?
286+
If clicked, should we change the version on all the projects?
287+
288+
If that is too complicated to explain/implement,
289+
we should be fine by just supporting the ``project``
290+
facet for now.
291+
292+
Backwards compatibility
293+
~~~~~~~~~~~~~~~~~~~~~~~
294+
295+
We should be able to keep the old URLs working in the global search,
296+
but we could also just ignore the old syntax, or transform
297+
the old syntax to the new one and redirect the user to it,
298+
for example ``?q=test&project=docs&version=latest``
299+
would be transformed to ``?q=test project:docs/latest``.
300+
301+
Future features
302+
---------------
303+
304+
- Allow searching on several versions of the same project
305+
(the API response is prepared to support this).
306+
- Allow searching on all versions of a project easily,
307+
with a syntax like ``project:docs/*`` or ``project:docs/@all``.
308+
- Allow specify the type of search:
309+
310+
- Multi match (query as is)
311+
- Simple query string (allows using the ES query syntax)
312+
- Fuzzy search (same as multi match, but with with fuzziness)
313+
314+
- Add the ``org`` filter,
315+
so users can search by all projects that belong
316+
to an organization.
317+
We would show results of the default versions of each project.

0 commit comments

Comments
 (0)