readthedocs · stsewd · Jun 16, 2022 · Jun 10, 2022 · Jun 13, 2022 · Jun 13, 2022
@@ -30,8 +30,9 @@ Read the Docs makes use of ARIA_ roles and other heuristics in order to process
 Main content node
 ~~~~~~~~~~~~~~~~~
 
-The main content node should have a main role (or a ``main`` tag), and there should only be one per page.
-This node is the one that contains all the page content. Example:
+The main content should be inside a ``main`` tag or an element with the role ``main``,
+and there should only be one per page.
+This node is the one that contains all the page content to be indexed. Example:
 
 .. code-block:: html
    :emphasize-lines: 10-12
@@ -55,6 +56,51 @@ This node is the one that contains all the page content. Example:
       </body>
    </html>
 
+If a main node isn't found,
+we try to infer the main node from the parent of the first section with a ``h1`` tag.
+Example:
+
+.. code-block:: html
+   :emphasize-lines: 10-20
+
+   <html>
+      <head>
+         ...
+      </head>
+      <body>
+         <div>
+            This content isn't processed
+         </div>
+
+         <div id="parent">
+            <h1>First title</h1>
+            <p>
+               The parent of the h1 title will
+               be taken as the main node,
+               this is the div tag.
+            </p>
+
+            <h2>Second title</h2>
+            <p>More content</p>
+         </div>
+      </body>
+   </html>
+
+If a section title isn't found, we default to the ``body`` tag.
+Example:
+
+.. code-block:: html
+   :emphasize-lines: 5-7
+
+   <html>
+      <head>
+         ...
+      </head>
+      <body>
+         <p>Content</p>
+      </body>
+   </html>
+
 Irrelevant content
 ~~~~~~~~~~~~~~~~~~
 
@@ -87,12 +133,15 @@ Example:
 Sections
 ~~~~~~~~
 
-Sections are ``h`` tags, and sections of the same level should be neighbors.
-Additionally, sections should have an unique ``id`` attribute per page (this is used to link to the section).
-All content below the section, till the new section will be indexed as part of the section. Example:
+Sections are composed of a title, and a content.
+A section title can be a ``h`` tag, or a ``header`` tag containing a ``h`` tag,
+the ``h`` tag or its parent can contain an ``id`` attribute, which will be used to link to the section.
+
+All content bellow the title, till a new section is found will be indexed as part of the section content.
+Example:
 
 .. code-block:: html
-   :emphasize-lines: 2-10
+   :emphasize-lines: 2-10, 12-17, 21-26
 
    <div role="main">
       <h1 id="section-title">
@@ -114,17 +163,17 @@ All content below the section, till the new section will be indexed as part of t
 
       ...
 
-      <h1 id="neigbor-section">
-         This section is neighbor of "section-title"
-      </h1>
+      <header>
+         <h1 id="3">This is also a valid section title</h1>
+      </header>
       <p>
-         ...
+         Thi is the content of the third section.
       </p>
    </div>
 
-Sections can be inside till two nested tags (and have nested sections),
-and its immediate parent can contain the ``id`` attribute.
-Note that the section content still needs to be below the ``h`` tag. Example:
+Sections can be contained in up to two nested tags, and can contain other sections (nested sections).
+Note that the section content still needs to be below the section title.
+Example:
 
 .. code-block:: html
    :emphasize-lines: 3-11,14-21

diff --git a/docs/user/build-customization.rst b/docs/user/build-customization.rst
@@ -12,6 +12,8 @@ and also how to override the build process completely:
 `Override the build process`_
     If you want full control over your build. This option supports any tool that generates HTML as part of the build.
 
+.. contents:: Table of contents
+   :local:
 
 Extend the build process
 ------------------------
@@ -245,7 +247,7 @@ Override the build process
 .. warning::
 
    This feature is in a *beta phase* and could suffer incompatible changes or even removed completely in the near feature.
-   It does not yet support some of the Read the Docs' features like the :term:`flyout menu`, search and ads.
+   It does not yet support some of the Read the Docs' features like the :term:`flyout menu`, and ads.
    However, we do plan to support these features in the future.
    Use this feature at your own risk.
 
@@ -273,3 +275,18 @@ your project could use the following configuration file:
 As Read the Docs does not have control over the build process,
 you are responsible for running all the commands required to install requirements and build the documentation properly.
 Once the build process finishes, the ``_readthedocs/html/`` folder will be hosted.
+
+Search support
+++++++++++++++
+
+Read the Docs will automatically index the content of all your HTML files,
+respecting the :ref:`search <config-file/v2:search>` options from your config file.
+
+You can access the search results from the :guilabel:`Search` tab of your project,
+or by using the :ref:`search API <server-side-search:api>`.
+
+.. note::
+
+   In order for Read the Docs to index your HTML files correctly,
+   they should follow some of the conventions described
+   at :doc:`rtd-dev:search-integration`.
diff --git a/readthedocs/builds/models.py b/readthedocs/builds/models.py
@@ -74,6 +74,8 @@
     GITLAB_MERGE_REQUEST_COMMIT_URL,
     GITLAB_URL,
     MEDIA_TYPES,
+    MKDOCS,
+    MKDOCS_HTML,
     PRIVACY_CHOICES,
     PRIVATE,
     SPHINX,
@@ -379,6 +381,10 @@ def supports_wipe(self):
     def is_sphinx_type(self):
         return self.documentation_type in {SPHINX, SPHINX_HTMLDIR, SPHINX_SINGLEHTML}
 
+    @property
+    def is_mkdocs_type(self):
+        return self.documentation_type in {MKDOCS, MKDOCS_HTML}
+
     def get_subdomain_url(self):
         external = self.type == EXTERNAL
         return self.project.get_docs_url(

diff --git a/readthedocs/projects/models.py b/readthedocs/projects/models.py
@@ -46,7 +46,7 @@
     validate_repository_url,
 )
 from readthedocs.projects.version_handling import determine_stable_version
-from readthedocs.search.parsers import MkDocsParser, SphinxParser
+from readthedocs.search.parsers import GenericParser, MkDocsParser, SphinxParser
 from readthedocs.storage import build_media_storage
 from readthedocs.vcs_support.backends import backend_cls
 
@@ -1430,9 +1430,23 @@ class Meta:
     objects = HTMLFileManager()
 
     def get_processed_json(self):
-        parser_class = (
-            SphinxParser if self.version.is_sphinx_type else MkDocsParser
-        )
+        if (
+            self.version.documentation_type == constants.GENERIC
+            or self.project.has_feature(Feature.INDEX_FROM_HTML_FILES)
+        ):
+            parser_class = GenericParser
+        elif self.version.is_sphinx_type:
+            parser_class = SphinxParser
+        elif self.version.is_mkdocs_type:
+            parser_class = MkDocsParser
+        else:
+            log.warning(
+                "Invalid documentation type",
+                documentation_type=self.version.documentation_type,
+                version_slug=self.version.slug,
+                project_slug=self.project.slug,
+            )
+            return {}
         parser = parser_class(self.version)
         return parser.parse(self.path)