Search: SSS integration guide #7232

stsewd · 2020-06-24T23:12:51Z

I'm documenting the process of our SSS indexing, and how we override the default search.

Note that search for mkdocs is still in beta, and we have 2 ways of indexing content now: from the search_index.json file (we are losing some content here like code blocks) and from the html page itself (this works good so far).

I'm also editing the parser to support a more general structure as mentioned in this document based on https://webaim.org/techniques/semanticstructure/.

Closes #4588

This will make the parser more general and match #7232 (also, one bug fix). - Try the main tag before trying the first h1 - Always inspect all headers till 2 levels (this removes the need for the special case from Sphinx, where the h tag is inside a div) - `_parse_content` now not only removes all new line chars, but it also reduces multiple spaces into one. - Remove elements with the search role in addition to the navigation role. - The headerlink class doesn't need to be inside an `a` tag. - Fix bug where calling .text() over a text node will return empty. (I was able to catch this one now that we are checking till 2 levels)

humitos

This document is super good.

I'm not sure if it's in the scope of this document, but maybe worth to mention what's the difference between main, section and subsection regarding indexing. Is it about weight in results?

Also, we mention semantic structure where section, article and main are used. However, we do no provide examples on those. Does it work the same way, like <article role="main"> ...?

How does our parser behaves with HTML4 and HTML5 writers in Sphinx?

docs/guides/search-indexing.rst

humitos · 2020-06-25T08:43:30Z

docs/guides/search-indexing.rst

+      </p>
+   </div>
+
+Sections can also be wrapped till two levels, and it's parent can contain the id attribute.


How does this affect search indexing/results when the sections have subsection? Are sub-sections treated in a different way than regular sections? Maybe this example should clarify that.

So, we don't have something like sub-sections, all sections are just treated as sections. If you have nested sections (like the example above). I have improved the examples and wording.

docs/guides/search-indexing.rst

stsewd · 2020-06-25T15:55:02Z

I'm not sure if it's in the scope of this document, but maybe worth to mention what's the difference between main, section and subsection regarding indexing. Is it about weight in results?

We only have sections, there isn't a main section or subsections. The only thing special is that the title from the first section will be taken as the title of the page (this is mentioned in the doc)

Also, we mention semantic structure where section, article and main are used. However, we do no provide examples on those. Does it work the same way, like
...?

We don't mention semantic structure, we mention ARIA roles, any tag can contain a main role.

How does our parser behaves with HTML4 and HTML5 writers in Sphinx?

Yes, since we parse based on ARIA roles, not on html tags (well, we fallback to the main tag if there isn't a main role)

docs/guides/search-indexing.rst

stale · 2020-08-16T17:58:57Z

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.

ericholscher

This is a great document 💯

docs/guides/search-indexing.rst

docs/guides/platform.rst

docs/guides/search-indexing.rst

ericholscher · 2020-08-17T18:10:18Z

docs/guides/search-indexing.rst

+      </p>
+   </div>
+
+Sections can also be wrapped till two levels (and have nested sections),


I don't understand this line "wrapped till two levels"

What about

Sections can be inside till two nested tags (and have nested sections),

ericholscher · 2020-08-17T18:13:11Z

docs/guides/search-indexing.rst

+and puts the results into the ``#mkdocs-search-results`` element.
+A simplified example looks like this:
+
+.. code-block:: js


This is a really complex code example in our docs, and seems likely to get out of date. It's super useful though, so I think it's fine, I just worry about it being super specific vs. something a bit more generic.

Co-authored-by: Manuel Kaufmann <[email protected]>

Co-authored-by: Maksudul Haque <[email protected]>

* Search: improve parser This will make the parser more general and match #7232 (also, one bug fix). - Try the main tag before trying the first h1 - Always inspect all headers till 2 levels (this removes the need for the special case from Sphinx, where the h tag is inside a div) - `_parse_content` now not only removes all new line chars, but it also reduces multiple spaces into one. - Remove elements with the search role in addition to the navigation role. - The headerlink class doesn't need to be inside an `a` tag. - Fix bug where calling .text() over a text node will return empty. (I was able to catch this one now that we are checking till 2 levels) * Increase depth Now that we prioritize the main tag as main node, the main node from the mkdocs material theme is more wide. * Strip spaces

stsewd marked this pull request as ready for review June 25, 2020 02:42

stsewd mentioned this pull request Jun 25, 2020

Search: improve parser #7233

Merged

stsewd requested review from a team and ericholscher June 25, 2020 02:46

humitos reviewed Jun 25, 2020

View reviewed changes

saadmk11 reviewed Jun 29, 2020

View reviewed changes

docs/guides/search-indexing.rst Outdated Show resolved Hide resolved

stale bot added the Status: stale Issue will be considered inactive soon label Aug 16, 2020

stsewd removed the Status: stale Issue will be considered inactive soon label Aug 16, 2020

ericholscher approved these changes Aug 17, 2020

View reviewed changes

stsewd and others added 10 commits August 17, 2020 13:29

SSS integration first draft

7b3190d

Document search override for Sphinx

14523b8

Document process for MkDocs

edb5712

Additional context

d26c312

Apply suggestions from code review

e188346

Co-authored-by: Manuel Kaufmann <[email protected]>

Improve examples and wording

67afe8c

Reduce highlight

318569c

Apply suggestions from code review

9471c20

Co-authored-by: Maksudul Haque <[email protected]>

Ignore the nav tag too

ffee1c5

Improve wording

0681a16

stsewd force-pushed the sss-integration-guide branch from 0f3b449 to 0681a16 Compare August 17, 2020 18:38

Re-apply suggestions

0f057d0

stsewd added 2 commits August 17, 2020 14:44

Move to development/

56bdb60

Merge branch 'master' into sss-integration-guide

8d96e51

stsewd merged commit 64edef1 into master Sep 10, 2020

stsewd deleted the sss-integration-guide branch September 10, 2020 18:19

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Search: SSS integration guide #7232

Search: SSS integration guide #7232

stsewd commented Jun 24, 2020

humitos left a comment

humitos Jun 25, 2020

stsewd Jun 25, 2020

stsewd commented Jun 25, 2020

stale bot commented Aug 16, 2020

ericholscher left a comment

ericholscher Aug 17, 2020

stsewd Aug 17, 2020

ericholscher Aug 17, 2020

Search: SSS integration guide #7232

Search: SSS integration guide #7232

Conversation

stsewd commented Jun 24, 2020

humitos left a comment

Choose a reason for hiding this comment

humitos Jun 25, 2020

Choose a reason for hiding this comment

stsewd Jun 25, 2020

Choose a reason for hiding this comment

stsewd commented Jun 25, 2020

stale bot commented Aug 16, 2020

ericholscher left a comment

Choose a reason for hiding this comment

ericholscher Aug 17, 2020

Choose a reason for hiding this comment

stsewd Aug 17, 2020

Choose a reason for hiding this comment

ericholscher Aug 17, 2020

Choose a reason for hiding this comment