Skip to content

Search: support section titles inside header tags #9339

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 11 commits into from
Jun 16, 2022
Merged
Show file tree
Hide file tree
Changes from 9 commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
75 changes: 62 additions & 13 deletions docs/dev/search-integration.rst
Original file line number Diff line number Diff line change
Expand Up @@ -30,8 +30,9 @@ Read the Docs makes use of ARIA_ roles and other heuristics in order to process
Main content node
~~~~~~~~~~~~~~~~~

The main content node should have a main role (or a ``main`` tag), and there should only be one per page.
This node is the one that contains all the page content. Example:
The main content should be inside a ``main`` tag or an element with the role ``main``,
and there should only be one per page.
This node is the one that contains all the page content to be indexed. Example:

.. code-block:: html
:emphasize-lines: 10-12
Expand All @@ -55,6 +56,51 @@ This node is the one that contains all the page content. Example:
</body>
</html>

If a main node isn't found,
we try to infer the main node from the parent of the first section with a ``h1`` tag.
Example:

.. code-block:: html
:emphasize-lines: 10-20

<html>
<head>
...
</head>
<body>
<div>
This content isn't processed
</div>

<div id="parent">
<h1>First title</h1>
<p>
The parent of the h1 title will
be taken as the main node,
this is the div tag.
</p>

<h2>Second title</h2>
<p>More content</p>
</div>
</body>
</html>

If a section title isn't found, we default to the ``body`` tag.
Example:

.. code-block:: html
:emphasize-lines: 5-7

<html>
<head>
...
</head>
<body>
<p>Content</p>
</body>
</html>
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💯 examples.


Irrelevant content
~~~~~~~~~~~~~~~~~~

Expand Down Expand Up @@ -87,12 +133,15 @@ Example:
Sections
~~~~~~~~

Sections are ``h`` tags, and sections of the same level should be neighbors.
Additionally, sections should have an unique ``id`` attribute per page (this is used to link to the section).
All content below the section, till the new section will be indexed as part of the section. Example:
Sections are composed of a title, and a content.
A section title can be a ``h`` tag, or a ``header`` tag containing a ``h`` tag,
the ``h`` tag or its parent can contain an ``id`` attribute, which will be used to link to the section.

All content bellow the title, till a new section is found will be indexed as part of the section content.
Example:

.. code-block:: html
:emphasize-lines: 2-10
:emphasize-lines: 2-10, 12-17, 21-26

<div role="main">
<h1 id="section-title">
Expand All @@ -114,17 +163,17 @@ All content below the section, till the new section will be indexed as part of t

...

<h1 id="neigbor-section">
This section is neighbor of "section-title"
</h1>
<header>
<h1 id="3">This is also a valid section title</h1>
</header>
<p>
...
Thi is the content of the third section.
</p>
</div>

Sections can be inside till two nested tags (and have nested sections),
and its immediate parent can contain the ``id`` attribute.
Note that the section content still needs to be below the ``h`` tag. Example:
Sections can be contained in up to two nested tags, and can contain other sections (nested sections).
Note that the section content still needs to be below the section title.
Example:

.. code-block:: html
:emphasize-lines: 3-11,14-21
Expand Down
19 changes: 18 additions & 1 deletion docs/user/build-customization.rst
Original file line number Diff line number Diff line change
Expand Up @@ -12,6 +12,8 @@ and also how to override the build process completely:
`Override the build process`_
If you want full control over your build. This option supports any tool that generates HTML as part of the build.

.. contents:: Table of contents
:local:

Extend the build process
------------------------
Expand Down Expand Up @@ -245,7 +247,7 @@ Override the build process
.. warning::

This feature is in a *beta phase* and could suffer incompatible changes or even removed completely in the near feature.
It does not yet support some of the Read the Docs' features like the :term:`flyout menu`, search and ads.
It does not yet support some of the Read the Docs' features like the :term:`flyout menu`, and ads.
However, we do plan to support these features in the future.
Use this feature at your own risk.

Expand Down Expand Up @@ -273,3 +275,18 @@ your project could use the following configuration file:
As Read the Docs does not have control over the build process,
you are responsible for running all the commands required to install requirements and build the documentation properly.
Once the build process finishes, the ``_readthedocs/html/`` folder will be hosted.

Search support
++++++++++++++

Read the Docs will automatically index the content of all your HTML files,
respecting the :ref:`search <config-file/v2:search>` options from your config file.

You can access the search results from the :guilabel:`Search` tab of your project,
or by using the :ref:`search API <server-side-search:api>`.

.. note::

In order for Read the Docs to index your HTML files correctly,
they should follow some of the conventions described
at :doc:`rtd-dev:search-integration`.
6 changes: 6 additions & 0 deletions readthedocs/builds/models.py
Original file line number Diff line number Diff line change
Expand Up @@ -74,6 +74,8 @@
GITLAB_MERGE_REQUEST_COMMIT_URL,
GITLAB_URL,
MEDIA_TYPES,
MKDOCS,
MKDOCS_HTML,
PRIVACY_CHOICES,
PRIVATE,
SPHINX,
Expand Down Expand Up @@ -379,6 +381,10 @@ def supports_wipe(self):
def is_sphinx_type(self):
return self.documentation_type in {SPHINX, SPHINX_HTMLDIR, SPHINX_SINGLEHTML}

@property
def is_mkdocs_type(self):
return self.documentation_type in {MKDOCS, MKDOCS_HTML}

def get_subdomain_url(self):
external = self.type == EXTERNAL
return self.project.get_docs_url(
Expand Down
22 changes: 18 additions & 4 deletions readthedocs/projects/models.py
Original file line number Diff line number Diff line change
Expand Up @@ -46,7 +46,7 @@
validate_repository_url,
)
from readthedocs.projects.version_handling import determine_stable_version
from readthedocs.search.parsers import MkDocsParser, SphinxParser
from readthedocs.search.parsers import GenericParser, MkDocsParser, SphinxParser
from readthedocs.storage import build_media_storage
from readthedocs.vcs_support.backends import backend_cls

Expand Down Expand Up @@ -1430,9 +1430,23 @@ class Meta:
objects = HTMLFileManager()

def get_processed_json(self):
parser_class = (
SphinxParser if self.version.is_sphinx_type else MkDocsParser
)
if (
self.version.documentation_type == constants.GENERIC
or self.project.has_feature(Feature.INDEX_FROM_HTML_FILES)
):
parser_class = GenericParser
elif self.version.is_sphinx_type:
parser_class = SphinxParser
elif self.version.is_mkdocs_type:
parser_class = MkDocsParser
else:
log.warning(
"Invalid documentation type",
documentation_type=self.version.documentation_type,
version_slug=self.version.slug,
project_slug=self.project.slug,
)
return {}
parser = parser_class(self.version)
return parser.parse(self.path)

Expand Down
Loading