Skip to content

Search: improve results for simple queries #7194

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 7 commits into from
Jun 16, 2020

Conversation

stsewd
Copy link
Member

@stsewd stsewd commented Jun 15, 2020

SimpleQueryString don't allow us to set an implicit value for fuzziness,
but is still useful for advanced queries.
This allows us to support both.

Results look good in local testing and comparing them with the results
from production.

SimpleQueryString don't allow us to set an implicit value for fuzziness,
but is still useful for advanced queries.
This allows us to support both.

Results look good in local testing and comparing them with the results
from production.
@stsewd stsewd requested review from a team and ericholscher June 15, 2020 22:19
Copy link
Member

@ericholscher ericholscher left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this will be a huge benefit to our less advanced users. We should be careful rolling this out though, because I do think users will be somewhat confused by this change.

I wonder if we could allow projects to configure this a bit. As a future feature, but it would be neat to let projects decide on the "fuzziness" of their search, for example.

"""
tokens = {'+', '|', '-', '"' , '*', '(', ')', '~'}
query_tokens = set(query)
return not tokens.isdisjoint(query_tokens)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fancy.

@ericholscher
Copy link
Member

I wonder if it's worth putting this behind a feature flag for testing -- I do think it might cause some confusion, especially on the .com

@stsewd
Copy link
Member Author

stsewd commented Jun 15, 2020

I do think it might cause some confusion, especially on the .com

Not sure what would be confusing. Actually I'm getting results more quickly in the search as you type, since I don't need to type the full word or remember the exact word in english. But I'm fine putting this behind a feature flag, let me know if you want to go that way.

@ericholscher
Copy link
Member

I think it's more that the results will change, so if people are used to a certain result, it might not show up as high in the results.

Do you find it lists all exact matches prior to partial? It seems like the XSS test suggests that isn't the case?

@stsewd
Copy link
Member Author

stsewd commented Jun 15, 2020

I think it's more that the results will change, so if people are used to a certain result, it might not show up as high in the results.

Ok, got it. A feature flag makes sense them, we could see how the results changed in our docs (but in local testing with my frequent queries was all good)

Do you find it lists all exact matches prior to partial? It seems like the XSS test suggests that isn't the case?

The XSS query was listing more results without "", not fewer. If that is what you mean.

@ericholscher
Copy link
Member

ericholscher commented Jun 16, 2020

The XSS query was listing more results without "", not fewer. If that is what you mean.

Right, but presumably the additional results were partial matches? My question is: do all full matches return higher in results than any partial matches -- because that would be an issue if not.

@stsewd
Copy link
Member Author

stsewd commented Jun 16, 2020

Ok, I just checked, the first results is what we expect

(Pdb) pp hits[0]
{'_id': '5',
 '_index': 'test_page_index',
 '_score': 4.838786,
 '_source': {'build': None,
             'commit': '5',
             'domains': [{'role_name': 'py:function', 'anchor': 'celery.utils.depreca...},
                         {'role_name': 'py:function', 'anchor': 'celery.utils.depreca...}],
             'full_path': 'support.html',
             'path': 'support',
             'project': 'docs',
             'sections': [{'id': 'usage-questions', 'title': 'Usage Questions', 'conte...},
                          {'id': 'community-support', 'title': 'Community Support', 'c...},
                          {'id': 'commercial-support', 'title': 'Commercial Support', ...}],
             'title': 'Support',
             'version': 'latest'},
 '_type': 'doc',
 'inner_hits': {'domains': <Response: {}>,
                'sections': <Response: [{'_index': 'test_page_index', '_type': 'doc', '_id': '5', '_...}]>}}

That's from the page we expect https://github.com/readthedocs/readthedocs.org/blob/master/readthedocs/search/tests/data/docs/support.json#L18

I have added the feature flag on the indoc-search, we can't add the feature flag to the dashboard search bc isn't related to a project but to a user (it defaults to the old behavior, so we are fine)

@stsewd stsewd merged commit 3fc1fc7 into master Jun 16, 2020
@stsewd stsewd deleted the improve-search-simple-queries branch June 16, 2020 17:59
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants