Search: custom search page ranking #7237

stsewd · 2020-06-26T00:03:05Z

Need to test this more locally, but it should be ready for review.

stsewd · 2020-06-30T04:09:07Z

I'm still figuring out the correct range to map to ES and make it work as we want. But I'm not changing anything else in the code, so it shouldn't change much if someone wants to review this.

stsewd · 2020-06-30T21:07:55Z

readthedocs/search/faceted_search.py

+    # boosting for these fields need to be close enough
+    # to be re-boosted by the page rank.
+    _outer_fields = ['title^1.5']
+    _section_fields = ['sections.title^2', 'sections.content']


while one is greater than the other results remain the same, and we don't have any other factor that alters the boosting, so this doesn't change our current score.

stsewd · 2020-06-30T21:15:49Z

readthedocs/search/faceted_search.py

+            0.5,
+            0.6,
+            0.7,
+            0.8,


we can move this 0.8 up if we want to give more priority to the boost (and move the 1.3 down).

stsewd · 2020-06-30T21:53:41Z

docs/config-file/v2.rst

+        api/v2/*: 4
+
+search.ranking
+``````````````


We can mark this option as experimental if we want to try it first, but we can make changes without re-indexing, so we are safe to do some tuning after if needed.

ericholscher

This is super cool and a great addition to our search. I have a few ideas for improvements, but they can be a "v2" after we ship this and test it:

The ability to set the rank at the page level with metadata in RST :search_rank: 7 at the top of the file. We would then parse this in our extension and pull it out for indexing. Not a huge priority, but I think would be a nice authoring option
Adding ranking by pageview data. We have the data now, and this approach seems quite easy to adapt to additional data points. I think that should probably be the next small improvement, and shouldn't be difficult.

ericholscher · 2020-06-30T22:56:52Z

docs/config-file/v2.rst

+The rank can be an integer number between -10 and 10 (inclusive).
+Pages with a rank closer to -10 will appear further down the list of results,
+and pages with a rank closer to 10 will appear higher in the list of results.
+Note that 0 means *normal rank*, not *no rank*.


Good point 👍

ericholscher · 2020-06-30T23:02:30Z

readthedocs/projects/tasks.py

@@ -1547,13 +1553,23 @@ def _create_imported_files(version, commit, build):
                        version_slug=version.slug,
                    ),
                )
+
+            page_rank = 0
+            # Last pattern to match takes precedence


I would generally expect the first match to take precedent -- is there a reason we're doing last?

This is in case you have something like this:

api/*.html: -2

api/important.html: 2

The first matches can be too greedy.

Hrm, I understand that, but don't quite understand why that ordering would make the most sense. I guess building some kind of algorithm for "closeness" match is too complex, and we need some kind of default, so this probably makes sense?

Do you mean something like matching the longest pattern? Actually I just thought the last pattern makes sense in general when matching this kind of patterns, gitignore for example

within one level of precedence, the last matching pattern decides the outcome
https://git-scm.com/docs/gitignore

Yea, I don't think first or last really makes more sense, so we just need to pick one. Longest could be interesting, maybe add a comment about looking into it?

readthedocs/search/faceted_search.py

stsewd added 4 commits June 25, 2020 18:50

Add search page rankings

985e3bd

Update docs

c836a40

Fix tests

c38df09

Fix linter

fa0d102

stsewd added 3 commits June 30, 2020 16:00

Update docs

54b4f54

Pass dict, config object isn't serializable

83157fb

Do ranking at query time

8c3434b

stsewd commented Jun 30, 2020

View reviewed changes

stsewd added 2 commits June 30, 2020 16:10

Update docs

d0d5785

Linter

3ad8d18

stsewd commented Jun 30, 2020

View reviewed changes

stsewd requested review from ericholscher and a team June 30, 2020 21:22

stsewd commented Jun 30, 2020

View reviewed changes

Add tip

4384242

ericholscher approved these changes Jun 30, 2020

View reviewed changes

stsewd added 2 commits June 30, 2020 21:08

Add comment

596ff77

TODO about alternative precedence

8e3bdef

stsewd merged commit 505be8c into master Jul 2, 2020

stsewd deleted the search-ranking branch July 2, 2020 16:53

stsewd mentioned this pull request Jul 8, 2020

Search related settings in the configuration file #7217

Closed

chrisjsewell mentioned this pull request Jul 27, 2020

Improve search jupyter-book/jupyter-book#815

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Search: custom search page ranking #7237

Search: custom search page ranking #7237

stsewd commented Jun 26, 2020

stsewd commented Jun 30, 2020

stsewd Jun 30, 2020 •

edited

Loading

stsewd Jun 30, 2020

stsewd Jun 30, 2020

ericholscher left a comment

ericholscher Jun 30, 2020

ericholscher Jun 30, 2020

stsewd Jul 1, 2020

ericholscher Jul 1, 2020

stsewd Jul 1, 2020

ericholscher Jul 1, 2020

+.5,
+.6,
+.7,
+.8,

Search: custom search page ranking #7237

Search: custom search page ranking #7237

Conversation

stsewd commented Jun 26, 2020

stsewd commented Jun 30, 2020

stsewd Jun 30, 2020 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

ericholscher left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

stsewd Jun 30, 2020 •

edited

Loading