-
-
Notifications
You must be signed in to change notification settings - Fork 3.6k
Search: Allow authors to set a "search score" per pages #7082
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
We could also just use the robots metatag https://support.google.com/webmasters/answer/79812?hl=en We still show results for those pages, but with lower priority. |
I think this is definitely a good feature. The more we can allow users to customize search the better. Another one I've wanted is the ability to add "tags" or similar, so that we can boost pages for results on a search term. I don't think |
Just to mention that also in our case the API docs autogenerated with |
@ltalirz Thanks for the example! What would the best implementation be for you? Sounds like it would probably need to be path specific? eg. |
Hi @ericholscher, thanks for following up on this and sorry for the late reply!
Are you referring to the ability to cover directory subtrees instead of just individual pages?
I think that would already work for us :-) I'm not familiar with how elasticsearch handles relevance - what I understand from here is that pages are ranked by scores that are floating point numbers, and you can specify boosts of different kinds. Anyhow, for us it would not really matter - I suggest you decide by what makes the integration with elasticsearch as simple as possible. |
I was able to implement this. The final number that ES requires needs to be greater than 0. Numbers less than 1, will make the search results for that page to appear down down, and a number greater than 1 will boost the results for that page. We could expose that directly to users version: 2
search:
boosting:
- api/v1/*: 0.5
- api/v2/*: 2 Or have an internal range and map that to ES version: 2
search:
boosting:
- api/v1/*: -1
- api/v2/*: 2 e.g, an alternative name version: 2
search:
rank:
- api/v1/*: -1
- api/v2/*: 2 |
This looks great! |
I think passing along the actual values we plan to use in ES is probably not the best design. We likely want to be able to tune how these boost numbers interact with our other methods of optimization and boosting, so I think an abstract range that we translate is best. Something like a range from -10 to +10, where:
Or something like this. |
This should be out by next week, in the meantime you can check the docs for the new option at https://docs.readthedocs.io/en/latest/config-file/v2.html#search |
This is live now, you can set the custom ranking! Let us know if it works as expected, we still can tweak it a little more. |
@stsewd Thanks, I've just given it a try, but it seems to have the opposite effect compared to the one I expected: I've set
with the goal of moving hits from the APIdoc further down in the results. This is the original search ranking: https://aiida.readthedocs.io/projects/aiida-core/en/latest/search.html?q=workflows The new search ranking seems to consistently rank pages from the APIdoc higher than the old one (also for other search terms). |
Wait, this may be a different issue - when I click on the second link I.e. not only is the result order different from the one you see, but it somehow finds more than twice the number of pages! You created your second screenshot from clicking on the link above correct? P.S. In case it helps, this is the branch from which the docs are generated: https://github.com/aiidateam/aiida-core/tree/fix-search-rank, with the search rank being added in the last commit aiidateam/aiida-core@3e7223e |
@ltalirz that looks like it's rendering the default results from sphinx. See if you have an ad blocker installed, it may be blocking our override from https://assets.readthedocs.org/static/javascript/readthedocs-doc-embed.js
yeah, I'm on firefox |
I did have an adblocker on, but I'm getting the same results as before after disabling it + I get the same on firefox and safari where I don't have adblockers installed. Can it be a geographical region thing? I'm in Switzerland. |
ha, that looks related to the CDN. Can you try with https://aiida.readthedocs.io/projects/aiida-core/en/fix-search-rank/search.html?q=workflows&foo=bar ? Also, you can try re-building the version in case the CDN failed or something in the previous build |
Thanks, with the link you provided, the results match your screenshot. I'll rebuild the version now to see whether it fixes the link also without &foo |
Yeah, maybe the cache is taking longer to purge in Switzerland, or something broke in the previous build. |
Do you suspect something failed? |
Ok, after the rebuild also the original link displays as in your screenshot. |
Maybe another layer of cache (ISP?)? You can also check the |
@stsewd quick related question regarding search indexing/ranking, does https://www.sphinx-doc.org/en/master/usage/restructuredtext/basics.html#html-metadata feed into the RTD search rankings, and if so how? e.g. if you added this example and searched for .. meta::
:keywords: my-keyword
A heading
=========
.. meta::
:keywords: my-keyword @ltalirz tried adding a keyword (above the title) in aiidateam/aiida-core#4217 (comment), but it didn't seem to make any difference |
Currently, we don't process any metatags, but I can see that as a future improvement. Also, I think metatags are always rendered at the top of the html document. |
thanks for the quick reply @stsewd
yep that looks to be the case, @ltalirz FYI I see now that you can only use meta to apply to the whole page, and since this is already used previously on the page for groupath (that is rendered at the top of the document), the latter querybuilder one actually appears to be ignored 😞 |
The use case is basically when users want to deprecate something but still don't want to delete that content (like api v1 vs v2).
Or in our case where we have the design docs, and we rank those results first rather than the actual content.
This is related to #5968, but I think making this explicitly is better, since probably docs from the v1 could have more views, but we want users to start using v2 (new pages!).
I'm not sure the best way to do this.
We could allow users to add this in a meta tag
search-ranking=x
orsearch-score=x
, x could be an integer >= 0. This only allow users to rank content per page, not per sections, but I think that's enough.The text was updated successfully, but these errors were encountered: