-
-
Notifications
You must be signed in to change notification settings - Fork 3.6k
Search: custom search page ranking #7237
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
I'm still figuring out the correct range to map to ES and make it work as we want. But I'm not changing anything else in the code, so it shouldn't change much if someone wants to review this. |
# boosting for these fields need to be close enough | ||
# to be re-boosted by the page rank. | ||
_outer_fields = ['title^1.5'] | ||
_section_fields = ['sections.title^2', 'sections.content'] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
while one is greater than the other results remain the same, and we don't have any other factor that alters the boosting, so this doesn't change our current score.
0.5, | ||
0.6, | ||
0.7, | ||
0.8, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
we can move this 0.8 up if we want to give more priority to the boost (and move the 1.3 down).
api/v2/*: 4 | ||
|
||
search.ranking | ||
`````````````` |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We can mark this option as experimental if we want to try it first, but we can make changes without re-indexing, so we are safe to do some tuning after if needed.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is super cool and a great addition to our search. I have a few ideas for improvements, but they can be a "v2" after we ship this and test it:
- The ability to set the rank at the page level with metadata in RST
:search_rank: 7
at the top of the file. We would then parse this in our extension and pull it out for indexing. Not a huge priority, but I think would be a nice authoring option - Adding ranking by pageview data. We have the data now, and this approach seems quite easy to adapt to additional data points. I think that should probably be the next small improvement, and shouldn't be difficult.
The rank can be an integer number between -10 and 10 (inclusive). | ||
Pages with a rank closer to -10 will appear further down the list of results, | ||
and pages with a rank closer to 10 will appear higher in the list of results. | ||
Note that 0 means *normal rank*, not *no rank*. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good point 👍
@@ -1547,13 +1553,23 @@ def _create_imported_files(version, commit, build): | |||
version_slug=version.slug, | |||
), | |||
) | |||
|
|||
page_rank = 0 | |||
# Last pattern to match takes precedence |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would generally expect the first match to take precedent -- is there a reason we're doing last?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is in case you have something like this:
- api/*.html: -2
- api/important.html: 2
The first matches can be too greedy.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hrm, I understand that, but don't quite understand why that ordering would make the most sense. I guess building some kind of algorithm for "closeness" match is too complex, and we need some kind of default, so this probably makes sense?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do you mean something like matching the longest pattern? Actually I just thought the last pattern makes sense in general when matching this kind of patterns, gitignore for example
within one level of precedence, the last matching pattern decides the outcome
https://git-scm.com/docs/gitignore
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yea, I don't think first or last really makes more sense, so we just need to pick one. Longest could be interesting, maybe add a comment about looking into it?
Need to test this more locally, but it should be ready for review.
Closes #7082
Ref #7217