-
-
Notifications
You must be signed in to change notification settings - Fork 3.6k
Search: weight page views into search results #7297
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Page name is a sphinx only concept, so we don't have correct data for page views from mkdocs projects. Using the origin url + the unresolver gives us the correct path. Also, we are always storing the full path (index.html instead of "/"), so we don't have duplicates (let me know if we do want to have duplicates).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't have too much experience with ES, but this PR looks fine to me.
The only thing that I'm seeing is that even it's behind a feature flag, we are changing the ES query completely from what we currently have. So, we could break existing ES search for projects even if we don't enable the feature flag on them. We could keep the old _get_script_score
code (without weight) for those projects that don't have the feature flag enabled; and execute the new ES code on those projects with the feature flag.
Example,
def _get_script_score():
if has_feature(SEARCH_WEIGHT):
return _get_script_score_with_weight()
else:
return _get_script_score_without_weight() # old code
On the other hand, as a user, I think it's hard to tests how it behaves with and without the feature flag enabled if the user has to ask us to enable/disable it. If we want to have better feedback, we probably need to give the user the ability to enable/disable it by themselves. That way, you can perform a query, enable the feature, and repeat it. Repeating the process multiple times will give us a better understanding how the search perform on those projects.
- absolute: this is equal to ``log10(views + 1)`` | ||
(we add one since logarithms start at 1). | ||
A logarithmic function is a good fit due to its growth rate. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This absolute value is across all the versions for the same page?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nope, here views is the number of views from that page in that version, this value isn't compared to anything (that's why it's absolute).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So, if we have a page /api/v2.html
that has a lot of views in version 1.0
, then a new version of the docs is published, 1.1
; the new page /1.1/api/v2.html
won't be shown first in the results because it will have less views?
If that is correct, won't we be pointing the user always to old pages because they will have more views than the up-to-date version of the exact same page?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Similar example searching for the term API
in 1.1
version of the docs. It may refers to another page (i.e. configuration.html
) instead of api/v2.html
because the newer version of configuration.html
has more views than the newer version of api/v2.html
--but historically (old versions) api/v2.hml
has a lot more views. Makes sense?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So, I avoided merging versions since the content can be different, I think we should keep versions isolated. Another example is if a new api/v3.html is published, we don't want to list the results from the old api first.
We are adding the extra bits to get the page views, the other logic is the same (first two lines). Projects that don't have the flag will default to 0 for the value of page views, we add the values from page views and ranking so 0 doesn't change the final result. |
That's what I'm saying. There is a completely different new |
We are not changing or deleting the old logic, tests are passing too. I don't think is worth having two scripts for this change. |
This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions. |
This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions. |
@stsewd what's the status of this PR? @ericholscher is this PR aligned with our vision of the search in the future? In that case, should we prioritize it, or close it? |
I think this is useful, but if it's super complex, I don't think it's worth adding a bunch of complexity to our search to support it, especially when we don't really know if users want this or not. Also curious to hear @stsewd's thoughts on the value/complexity trade off here. |
I'm fine closing this, I think our current search ranking works well https://docs.readthedocs.io/en/stable/config-file/v2.html#search, sorting by popularity has other problems #5968 (comment). |
Should we also close the issue, then? |
I'm fine with that too |
This is on top of #7293 since is needed to save the correct path into the page views.
Still needs to test this more locally, and write tests :DI'm calculating the final weight based on two values:
This allows us to prevent pages with a few views take over the results, but at the same time
popular pages from that particular project can still add some weight.
Things I want to test more:
This is under a feature flag to see if the db doesn't implode 💣
Design decisions after merging this PR:
Closes #5968