-
-
Notifications
You must be signed in to change notification settings - Fork 3.6k
Search: index from html files for mkdocs projects #7208
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This should make life a lot easier for us to support going forward, but we likely need some kind of user-facing error when search indexing fails. That's likely a larger project, but perhaps we need some kind of build output interface for search?
|
||
return None | ||
|
||
def _get_main_node(self, html): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is definitely part of the complexity that an "only HTML" parser runs into. We're definitely going to end up parsing random navigation and other content on pages, just because we're relying on heuristics instead of the tool telling us what the exact main content is.
I don't think we can avoid that, but definitely a downside of generic HTML parsing.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yeah, that's why I want just to write a guide for authors of theme/static site generators. That would also make their site to be better indexed by search engines and work better with screen readers.
I like the idea of having a "indexing search" in the output, but indexing happens async, so not sure how that will work. Maybe having an "extra logs" tab? I can open an issue to discuss that. |
It's under a feature flag currently, since I would like to test this with sphinx later. Tested with all themes available in https://github.com/mkdocs/mkdocs/wiki/MkDocs-Themes I did a search in all of them to check that sections are correctly indexed, and that irrelevant content isn't indexed. I wasn't able to build these, but I checked the html, and it's very similar to the default mkdocs theme, so it should work. - https://github.com/michaeltlombardi/mkdocs-psinder - https://github.com/byrnereese/mkdocs-bootstrap4 - https://github.com/daizutabi/mkdocs-ivory These themes don't expose search, but the content is indexed and we show results in the dashboard. - https://gitlab.com/lramage/mkdocs-bootstrap386 - https://gitlab.com/lramage/mkdocs-gitbook-theme Things to do later: Write a guide with recommendations for static site generators/themes authors, to integrate nice with Read the Docs. Other problems I found: We don't override the search for https://github.com/squidfunk/mkdocs-material (they don't use the identifier we use to override search). That isn't related to this, but just wanted to mention it. We can send a patch to fix that. You can check the results by either building mkdocs projects and searching (you need to enable the mkdocs search feature flag and the index from html feature flag), or just by opening the html files and checking the content of the json files from tests and see if the indexed content makes sense (note that I remove some content from the original html file to make it easier to read the json files, since json doesn't support multiple lines...)
a0d645c
to
47ffe35
Compare
This is on top of #7207
It's under a feature flag currently,
since I would like to test this with sphinx later.
Tested with all themes available in
https://github.com/mkdocs/mkdocs/wiki/MkDocs-Themes
I did a search in all of them to check that sections are correctly indexed,
and that irrelevant content isn't indexed.
I wasn't able to build these,
but I checked the html, and it's very similar to the default mkdocs theme,
so it should work.
These themes don't expose search,
but the content is indexed, and we show results in the dashboard.
Other problems I found
We don't override the search for https://github.com/squidfunk/mkdocs-material
(they don't use the same identifier we use for overriding search).
That isn't related to this, but just wanted to mention it.
We can send a patch to fix that.
You can check the results by either building mkdocs projects and
searching (you need to enable the mkdocs search feature flag and the
index from html feature flag), or just by opening the html files and
checking the content of the json files from tests and see if the indexed
content makes sense (note that I remove some content from the original
html file to make it easier to read the json files, since json doesn't
support multiple lines...)
Things to do later
authors, to integrate nice with Read the Docs.
we could make this configurable by the user in the config file, like:
Those would be the default values.