-
-
Notifications
You must be signed in to change notification settings - Fork 3.6k
Search: improve parser #7233
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Search: improve parser #7233
Conversation
This will make the parser more general and match #7232 (also, one bug fix). - Try the main tag before trying the first h1 - Always inspect all headers till 2 levels (this removes the need for the special case from Sphinx, where the h tag is inside a div) - `_parse_content` now not only removes all new line chars, but it also reduces multiple spaces into one. - Remove elements with the search role in addition to the navigation role. - The headerlink class doesn't need to be inside an `a` tag. - Fix bug where calling .text() over a text node will return empty. (I was able to catch this one now that we are checking till 2 levels)
@@ -6,7 +6,7 @@ | |||
{ | |||
"id": "mkdocs", | |||
"title": "MkDocs", | |||
"content": "Project documentation with\u00a0Markdown." | |||
"content": "Project documentation with Markdown." |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No more weird chars now that we are stripping all white spaces :D
Now that we prioritizes the main tag as main node, the main node from the mkdocs material theme is more wide.
This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This seems useful, sorry it sat for so long 👍
This will make the parser more general and match
#7232
(also, one bug fix).
the special case from Sphinx, where the h tag is inside a div)
_parse_content
now not only removes all new line chars, but it alsoreduces multiple spaces into one.
role.
a
tag.(I was able to catch this one now that we are checking till 2 levels)
This doesn't change the current indexing, maybe we will be indexing more content if we had a top text node (calling .text() would have returned empty).