Skip to content

Search: SSS integration guide #7232

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 13 commits into from
Sep 10, 2020
Merged

Search: SSS integration guide #7232

merged 13 commits into from
Sep 10, 2020

Conversation

stsewd
Copy link
Member

@stsewd stsewd commented Jun 24, 2020

I'm documenting the process of our SSS indexing, and how we override the default search.

Note that search for mkdocs is still in beta, and we have 2 ways of indexing content now: from the search_index.json file (we are losing some content here like code blocks) and from the html page itself (this works good so far).

I'm also editing the parser to support a more general structure as mentioned in this document based on https://webaim.org/techniques/semanticstructure/.

Closes #4588

@stsewd stsewd marked this pull request as ready for review June 25, 2020 02:42
stsewd added a commit that referenced this pull request Jun 25, 2020
This will make the parser more general and match
#7232
(also, one bug fix).

- Try the main tag before trying the first h1
- Always inspect all headers till 2 levels (this removes the need for
  the special case from Sphinx, where the h tag is inside a div)
- `_parse_content` now not only removes all new line chars, but it also
  reduces multiple spaces into one.
- Remove elements with the search role in addition to the navigation
  role.
- The headerlink class doesn't need to be inside an `a` tag.
- Fix bug where calling .text() over a text node will return empty.
  (I was able to catch this one now that we are checking till 2 levels)
@stsewd stsewd mentioned this pull request Jun 25, 2020
stsewd added a commit that referenced this pull request Jun 25, 2020
This will make the parser more general and match
#7232
(also, one bug fix).

- Try the main tag before trying the first h1
- Always inspect all headers till 2 levels (this removes the need for
  the special case from Sphinx, where the h tag is inside a div)
- `_parse_content` now not only removes all new line chars, but it also
  reduces multiple spaces into one.
- Remove elements with the search role in addition to the navigation
  role.
- The headerlink class doesn't need to be inside an `a` tag.
- Fix bug where calling .text() over a text node will return empty.
  (I was able to catch this one now that we are checking till 2 levels)
@stsewd stsewd requested review from a team and ericholscher June 25, 2020 02:46
Copy link
Member

@humitos humitos left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This document is super good.

I'm not sure if it's in the scope of this document, but maybe worth to mention what's the difference between main, section and subsection regarding indexing. Is it about weight in results?

Also, we mention semantic structure where section, article and main are used. However, we do no provide examples on those. Does it work the same way, like <article role="main"> ...?

How does our parser behaves with HTML4 and HTML5 writers in Sphinx?

</p>
</div>

Sections can also be wrapped till two levels, and it's parent can contain the id attribute.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How does this affect search indexing/results when the sections have subsection? Are sub-sections treated in a different way than regular sections? Maybe this example should clarify that.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So, we don't have something like sub-sections, all sections are just treated as sections. If you have nested sections (like the example above). I have improved the examples and wording.

@stsewd
Copy link
Member Author

stsewd commented Jun 25, 2020

I'm not sure if it's in the scope of this document, but maybe worth to mention what's the difference between main, section and subsection regarding indexing. Is it about weight in results?

We only have sections, there isn't a main section or subsections. The only thing special is that the title from the first section will be taken as the title of the page (this is mentioned in the doc)

Also, we mention semantic structure where section, article and main are used. However, we do no provide examples on those. Does it work the same way, like

...?

We don't mention semantic structure, we mention ARIA roles, any tag can contain a main role.

How does our parser behaves with HTML4 and HTML5 writers in Sphinx?

Yes, since we parse based on ARIA roles, not on html tags (well, we fallback to the main tag if there isn't a main role)

@stale
Copy link

stale bot commented Aug 16, 2020

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.

@stale stale bot added the Status: stale Issue will be considered inactive soon label Aug 16, 2020
@stsewd stsewd removed the Status: stale Issue will be considered inactive soon label Aug 16, 2020
Copy link
Member

@ericholscher ericholscher left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a great document 💯

</p>
</div>

Sections can also be wrapped till two levels (and have nested sections),
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't understand this line "wrapped till two levels"

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What about

Sections can be inside till two nested tags (and have nested sections),

and puts the results into the ``#mkdocs-search-results`` element.
A simplified example looks like this:

.. code-block:: js
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a really complex code example in our docs, and seems likely to get out of date. It's super useful though, so I think it's fine, I just worry about it being super specific vs. something a bit more generic.

@stsewd stsewd force-pushed the sss-integration-guide branch from 0f3b449 to 0681a16 Compare August 17, 2020 18:38
stsewd added a commit that referenced this pull request Aug 17, 2020
* Search: improve parser

This will make the parser more general and match
#7232
(also, one bug fix).

- Try the main tag before trying the first h1
- Always inspect all headers till 2 levels (this removes the need for
  the special case from Sphinx, where the h tag is inside a div)
- `_parse_content` now not only removes all new line chars, but it also
  reduces multiple spaces into one.
- Remove elements with the search role in addition to the navigation
  role.
- The headerlink class doesn't need to be inside an `a` tag.
- Fix bug where calling .text() over a text node will return empty.
  (I was able to catch this one now that we are checking till 2 levels)

* Increase depth

Now that we prioritize the main tag as main node,
the main node from the mkdocs material theme is more wide.

* Strip spaces
@stsewd stsewd merged commit 64edef1 into master Sep 10, 2020
@stsewd stsewd deleted the sss-integration-guide branch September 10, 2020 18:19
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Vision for MkDocs on Read the Docs
4 participants