Skip to content

Search: index definition lists as sections #9571

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
stsewd opened this issue Aug 31, 2022 · 5 comments · Fixed by #10128
Closed

Search: index definition lists as sections #9571

stsewd opened this issue Aug 31, 2022 · 5 comments · Fixed by #10128
Assignees
Labels
Improvement Minor improvement to code Needed: design decision A core team decision is required

Comments

@stsewd
Copy link
Member

stsewd commented Aug 31, 2022

Currently, we index h tags as sections for all doc types, but for sphinx we additionally index domains in SphinxDomain objects (aka roles).

Domains/roles are just like sections, they have a title, content, and they allow you to link to them with an anchor.
And sphinx/docutils renders them as a definition list.

So, I think it makes sense to index definition lists as sections for all doctools (or at least the definitions that can be linked),
for sphinx projects we will keep indexing domains/roles on their own, so they will keep the current behavior (but kind of like the idea of not having domains/roles at all, since they are specific to sphinx).

@stsewd stsewd added Improvement Minor improvement to code Needed: design decision A core team decision is required labels Aug 31, 2022
@github-project-automation github-project-automation bot moved this to Planned in 📍Roadmap Feb 15, 2023
@ericholscher
Copy link
Member

Adding this to the roadmap, the big goal here is removing Fjson parsing, and standardizing Sphinx search indexing

@humitos
Copy link
Member

humitos commented Feb 20, 2023

The code that decides which parser to use is related to the Project.documentation_type (which is something we want to deprecate also 👍🏼 -- see #10042) and it's at

def get_processed_json(self):
if (
self.version.documentation_type == constants.GENERIC
or self.project.has_feature(Feature.INDEX_FROM_HTML_FILES)
):
parser_class = GenericParser
elif self.version.is_sphinx_type:
parser_class = SphinxParser
elif self.version.is_mkdocs_type:
parser_class = MkDocsParser
else:
log.warning(
"Invalid documentation type",
documentation_type=self.version.documentation_type,
version_slug=self.version.slug,
project_slug=self.project.slug,
)
return {}
parser = parser_class(self.version)
return parser.parse(self.path)

By enabling INDEX_FROM_HTML_FILES we can start testing the GenericParser in all our projects and improve/adjust the features as we go.

On the other hand, relying on the GenericParser will allow us to stop creating the .fjson files from our Sphinx extension, making our build more standard and fast. This code is at: https://github.com/rtfd/readthedocs-sphinx-ext/blob/f1145b1819458643f442b21083b3397aeebc8e72/readthedocs_ext/readthedocs.py#L201-L236

Besides, with the idea proposed by @stsewd in the description, we can stop creating SphinxDomain objects simplifying our code and making it more efficient as well. So, all of this is 💯 , in my opinion, and it's pretty aligned with our plans and objectives.

@ericholscher
Copy link
Member

Besides, with the idea proposed by @stsewd in the description, we can stop creating SphinxDomain objects simplifying our code and making it more efficient as well. So, all of this is 💯 , in my opinion, and it's pretty aligned with our plans and objectives.

This would be a pretty huge win, if we can do it easily. They take up a ton of room in the DB, for minimal value, I think.

@ericholscher
Copy link
Member

Adding this to @benjaoming for next sprint, since you've shown some interest in search stuff 👍

@benjaoming
Copy link
Contributor

benjaoming commented Mar 8, 2023

The test input page.html has an interesting case of dl that makes me wonder what kind of output we'd want the generic parser to produce:

<!-- This is a sphinx domain, won't be parsed. -->
<dl class="py class">
<dt class="sig sig-object py" id="test_py_module.test.Foo">
<em class="property"><span class="pre">class</span><span class="w"> </span></em><span class="sig-prename descclassname"><span class="pre">test_py_module.test.</span></span><span class="sig-name descname"><span class="pre">Foo</span></span><span class="sig-paren">(</span><em class="sig-param"><span class="n"><span class="pre">qux</span></span></em>, <em class="sig-param"><span class="n"><span class="pre">spam</span></span><span class="o"><span class="pre">=</span></span><span class="default_value"><span class="pre">False</span></span></em><span class="sig-paren">)</span><a class="reference internal" href="../_modules/test_py_module/test.html#Foo"><span class="viewcode-link"><span class="pre">[source]</span></span></a><a class="headerlink" href="#test_py_module.test.Foo" title="Permalink to this definition"></a>
</dt>
<dd>
<p>Docstring for class Foo.</p>
<p>This text tests for the formatting of docstrings generated from output
<code class="docutils literal notranslate"><span class="pre">sphinx.ext.autodoc</span></code>.
</p>
<dl class="py method">
<dt class="sig sig-object py" id="test_py_module.test.Foo.__init__">
<span class="sig-name descname"><span class="pre">__init__</span></span><span class="sig-paren">(</span>
<em class="sig-param"><span class="n"><span class="pre">qux</span></span></em>, <em class="sig-param"><span class="n"><span class="pre">spam</span></span><span class="o"><span class="pre">=</span></span>
<span class="default_value"><span class="pre">False</span></span></em><span class="sig-paren">)</span>
<a class="reference internal" href="../_modules/test_py_module/test.html#Foo.__init__">
<span class="viewcode-link"><span class="pre">[source]</span></span></a><a class="headerlink" href="#test_py_module.test.Foo.__init__" title="Permalink to this definition"></a>
</dt>
<dd>
<p>Start the Foo.</p>
<dl class="field-list simple">
<dt class="field-odd">Parameters<span class="colon">:</span></dt>
<dd class="field-odd">
<ul class="simple">
<li>
<p><strong>qux</strong> (<em>string</em>) – The first argument to initialize class.</p>
</li>
<li>
<p><strong>spam</strong> (<em>bool</em>) – Spam me yes or no…</p>
</li>
</ul>
</dd>
</dl>
</dd>
</dl>
</dd>
</dl>
<!-- End of sphinx domain -->

@benjaoming benjaoming moved this from In progress to Needs review in 📍Roadmap Apr 5, 2023
@github-project-automation github-project-automation bot moved this from Needs review to Done in 📍Roadmap Apr 11, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Improvement Minor improvement to code Needed: design decision A core team decision is required
Projects
Archived in project
Development

Successfully merging a pull request may close this issue.

4 participants