-
-
Notifications
You must be signed in to change notification settings - Fork 3.6k
Search: support section titles inside header tags #9339
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Another convention to single `h` headers is to put them inside a `header` tag. https://developer.mozilla.org/en-US/docs/Web/HTML/Element/header#usage_notes
81e26e6
to
3b7d3df
Compare
{ | ||
"id": "", | ||
"title": "Pelican 4.7 released", | ||
"content": "Fri 01 October 2021 By Pelican Contributors In news. Pelican 4.7 is now available. This new release includes the following enhancements, fixes, and tweaks: Improve default theme rendering on mobile and other small screen devices (#2914) For more info, please refer to the release page." |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The date here shouldn't be indexed, it's inside a footer tag, maybe we could also omit the footer tag when indexing
A
<footer>
typically contains information about the author of the section, copyright data or links to related documents.
https://developer.mozilla.org/en-US/docs/Web/HTML/Element/footer
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not sure the best approach here.
I really wish we could comment JSON :(
"id": "", | ||
"title": "links", | ||
"content": "Pelican Docs Support Pelican Justin Mayer" | ||
}, | ||
{ | ||
"id": "", | ||
"title": "follow", | ||
"content": "atom feed @getpelican @jmayer github" | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
These two are valid sections, but shouldn't be indexed. They are outside the pelican's main node (<section id="content" class="body">
), so my idea from #9322 about making the main node an option could help here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This looks like a good improvement to the parser 👍
<body> | ||
<p>Content</p> | ||
</body> | ||
</html> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
💯 examples.
""" | ||
nodes_to_be_removed = tag.css('.headerlink') | ||
for node in nodes_to_be_removed: | ||
node.decompose() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we not need this anymore?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This element is already being deleted in the _clean_body
step
{ | ||
"id": "", | ||
"title": "Pelican 4.7 released", | ||
"content": "Fri 01 October 2021 By Pelican Contributors In news. Pelican 4.7 is now available. This new release includes the following enhancements, fixes, and tweaks: Improve default theme rendering on mobile and other small screen devices (#2914) For more info, please refer to the release page." |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not sure the best approach here.
I really wish we could comment JSON :(
Co-authored-by: Eric Holscher <[email protected]>
Another convention to single
h
headersis to put them inside a
header
tag.https://developer.mozilla.org/en-US/docs/Web/HTML/Element/header#usage_notes
This is mainly to support pages generated by pelican, since they follow that convention.
Other changes are:
all irrelevant content from all the other parsing logic, not just the content of sections.
This is on top of #9322