-
-
Notifications
You must be signed in to change notification settings - Fork 3.6k
Search: SSS integration guide #7232
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
This will make the parser more general and match #7232 (also, one bug fix). - Try the main tag before trying the first h1 - Always inspect all headers till 2 levels (this removes the need for the special case from Sphinx, where the h tag is inside a div) - `_parse_content` now not only removes all new line chars, but it also reduces multiple spaces into one. - Remove elements with the search role in addition to the navigation role. - The headerlink class doesn't need to be inside an `a` tag. - Fix bug where calling .text() over a text node will return empty. (I was able to catch this one now that we are checking till 2 levels)
This will make the parser more general and match #7232 (also, one bug fix). - Try the main tag before trying the first h1 - Always inspect all headers till 2 levels (this removes the need for the special case from Sphinx, where the h tag is inside a div) - `_parse_content` now not only removes all new line chars, but it also reduces multiple spaces into one. - Remove elements with the search role in addition to the navigation role. - The headerlink class doesn't need to be inside an `a` tag. - Fix bug where calling .text() over a text node will return empty. (I was able to catch this one now that we are checking till 2 levels)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This document is super good.
I'm not sure if it's in the scope of this document, but maybe worth to mention what's the difference between main, section and subsection regarding indexing. Is it about weight in results?
Also, we mention semantic structure where section
, article
and main
are used. However, we do no provide examples on those. Does it work the same way, like <article role="main"> ...
?
How does our parser behaves with HTML4 and HTML5 writers in Sphinx?
docs/guides/search-indexing.rst
Outdated
</p> | ||
</div> | ||
|
||
Sections can also be wrapped till two levels, and it's parent can contain the id attribute. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How does this affect search indexing/results when the sections have subsection? Are sub-sections treated in a different way than regular sections? Maybe this example should clarify that.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So, we don't have something like sub-sections, all sections are just treated as sections. If you have nested sections (like the example above). I have improved the examples and wording.
We only have sections, there isn't a main section or subsections. The only thing special is that the title from the first section will be taken as the title of the page (this is mentioned in the doc)
We don't mention semantic structure, we mention ARIA roles, any tag can contain a
Yes, since we parse based on ARIA roles, not on html tags (well, we fallback to the main tag if there isn't a main role) |
This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is a great document 💯
docs/guides/search-indexing.rst
Outdated
</p> | ||
</div> | ||
|
||
Sections can also be wrapped till two levels (and have nested sections), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't understand this line "wrapped till two levels"
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What about
Sections can be inside till two nested tags (and have nested sections),
docs/guides/search-indexing.rst
Outdated
and puts the results into the ``#mkdocs-search-results`` element. | ||
A simplified example looks like this: | ||
|
||
.. code-block:: js |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is a really complex code example in our docs, and seems likely to get out of date. It's super useful though, so I think it's fine, I just worry about it being super specific vs. something a bit more generic.
Co-authored-by: Manuel Kaufmann <[email protected]>
Co-authored-by: Maksudul Haque <[email protected]>
0f3b449
to
0681a16
Compare
* Search: improve parser This will make the parser more general and match #7232 (also, one bug fix). - Try the main tag before trying the first h1 - Always inspect all headers till 2 levels (this removes the need for the special case from Sphinx, where the h tag is inside a div) - `_parse_content` now not only removes all new line chars, but it also reduces multiple spaces into one. - Remove elements with the search role in addition to the navigation role. - The headerlink class doesn't need to be inside an `a` tag. - Fix bug where calling .text() over a text node will return empty. (I was able to catch this one now that we are checking till 2 levels) * Increase depth Now that we prioritize the main tag as main node, the main node from the mkdocs material theme is more wide. * Strip spaces
I'm documenting the process of our SSS indexing, and how we override the default search.
Note that search for mkdocs is still in beta, and we have 2 ways of indexing content now: from the
search_index.json
file (we are losing some content here like code blocks) and from the html page itself (this works good so far).I'm also editing the parser to support a more general structure as mentioned in this document based on https://webaim.org/techniques/semanticstructure/.
Closes #4588