Search: Only use generic parsers #10272

benjaoming · 2023-04-26T16:48:09Z

Since #10128 removed specific Sphinx domain logic in the SphinxParser, we aren't doing much more search indexing that's specific for Sphinx or MkDocs.

Make SphinxParser redundant (move anything essential into the GenericParser)
Make MkDocsParser redundant (move anything essential into the GenericParser)
Remove entire readthedocs.sphinx_domains application?
Backup and delete database table (ops)

CC: @stsewd does this list look good? Perhaps we can break into separate issues/PRs if any change becomes too complex.

The text was updated successfully, but these errors were encountered:

stsewd · 2023-04-26T16:57:28Z

Looks good. Not sure if we need to back up the tables, since we aren't using them anymore, but not strong opinion about not doing it.

We should also check if we can remove the generation of the readthedocs-sphinx-domain-names.json file from our sphinx extension and the fjson files too.

readthedocs.org/readthedocs/projects/tasks/search.py

Line 145 in 40d21dc

    
           type_file = build_media_storage.join(json_storage_path, 'readthedocs-sphinx-domain-names.json')

https://github.com/readthedocs/readthedocs-sphinx-ext/blob/a46f5ca05c7c8ec7ec30b44cc724ac334ce91a38/readthedocs_ext/readthedocs.py#L198

https://github.com/readthedocs/readthedocs-sphinx-ext/blob/f1145b1819458643f442b21083b3397aeebc8e72/readthedocs_ext/readthedocs.py#L280-L280

humitos · 2023-04-27T17:16:39Z

We should also check if we can remove the generation of the readthedocs-sphinx-domain-names.json file from our sphinx extension and the fjson files too.

💯 , this will speed up the build time a little bit. I'm 👍🏼 on removing it if possible.

We are indexing sphinx domains as sections now, we no longer need to create sphinx domains or index them. We should see some small improvement in build time, and maybe in search? This change will require a re-index, since old projects that haven't been re-build still have sphinx domain objects indexed separately from sections (this step can be done after or before the deploy). This also fixes a small bug, we are still indexing sphinx domains titles, without their content ![Screenshot 2023-06-20 at 14-36-48 Read the Docs documentation simplified](https://github.com/readthedocs/readthedocs.org/assets/4975310/eeb1865b-a859-4010-9cff-95a95d877882) ref #10272

humitos · 2023-06-22T11:14:43Z

It would be awesome to return to this work and finish this project (cc @ericholscher). The work on this issue will allow us to use always the same parser without matter the doctool the user is using (which will give them the exact same search experience). We are pretty close now! I arrived here while doing the cleanup of the feature flags at #9779

Remember to remove the INDEX_FROM_HTML_FILES as well when doing this work. It won't be required anymore, since everybody will be indexing from HTML files.

ericholscher · 2023-06-22T17:46:32Z

Agreed, this is a useful next step on this work. I thought we'd already finished making everything use the HTML-only parsing.

humitos · 2023-06-26T10:57:19Z

While reviewing #10042 I found the work from this issue is, in part, required to removed the Version.documentation_type attribute.

ericholscher · 2023-08-02T16:14:55Z

We're slowly rolling this out via feature flag 👍

We were using these for search, but we no longer need them. We index the HTML directly now. Ref readthedocs/readthedocs.org#10272

Closes #10272. We can't remove the code that generates the fjson files, since they are being used by the embed API (v2), that API is mentioned as deprecated, we should do the actual deprecation by contacting projects using it.

Closes #10272. We can't remove the code that generates the fjson files, since they are being used by the embed API (v2), that API is mentioned as deprecated, we should do the actual deprecation by contacting projects using it. --------- Co-authored-by: Manuel Kaufmann <[email protected]>

benjaoming added Improvement Minor improvement to code Operations Operations or server issue Accepted Accepted issue on our roadmap labels Apr 26, 2023

benjaoming added this to 📍Roadmap Apr 26, 2023

github-project-automation bot moved this to Planned in 📍Roadmap Apr 26, 2023

humitos mentioned this issue Jun 12, 2023

Deprecate old feature flags #9779

Closed

stsewd mentioned this issue Jun 20, 2023

Search: stop creating SphinxDomain objects #10451

Merged

humitos mentioned this issue Jun 26, 2023

Refactor: remove Version.documentation_type #10042

Draft

humitos assigned stsewd Jul 4, 2023

stsewd mentioned this issue Jul 6, 2023

Search: use generic parser for MkDocs projects #10516

Merged

stsewd moved this from Planned to In progress in 📍Roadmap Jul 25, 2023

stsewd added a commit to readthedocs/readthedocs-sphinx-ext that referenced this issue Aug 23, 2023

Stop generating JSON data

a26c424

We were using these for search, but we no longer need them. We index the HTML directly now. Ref readthedocs/readthedocs.org#10272

stsewd mentioned this issue Aug 24, 2023

Stop generating JSON data readthedocs/readthedocs-sphinx-ext#126

Closed

stsewd mentioned this issue Aug 28, 2023

Search: remove non-generic parser code #10676

Merged

stsewd moved this from In progress to Needs review in 📍Roadmap Aug 30, 2023

stsewd closed this as completed in #10676 Aug 31, 2023

github-project-automation bot moved this from Needs review to Done in 📍Roadmap Aug 31, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Search: Only use generic parsers #10272

Search: Only use generic parsers #10272

benjaoming commented Apr 26, 2023 •

edited by stsewd

Loading

stsewd commented Apr 26, 2023

humitos commented Apr 27, 2023

humitos commented Jun 22, 2023

ericholscher commented Jun 22, 2023

humitos commented Jun 26, 2023

ericholscher commented Aug 2, 2023

Search: Only use generic parsers #10272

Search: Only use generic parsers #10272

Comments

benjaoming commented Apr 26, 2023 • edited by stsewd Loading

stsewd commented Apr 26, 2023

humitos commented Apr 27, 2023

humitos commented Jun 22, 2023

ericholscher commented Jun 22, 2023

humitos commented Jun 26, 2023

ericholscher commented Aug 2, 2023

benjaoming commented Apr 26, 2023 •

edited by stsewd

Loading