-
-
Notifications
You must be signed in to change notification settings - Fork 3.6k
Search: Use serializers to parse search results #7157
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
for result in results: | ||
inner_hits = result.meta.inner_hits | ||
sections = inner_hits.sections or [] | ||
domains = inner_hits.domains or [] | ||
all_results = itertools.chain(sections, domains) | ||
|
||
sorted_results = utils._get_sorted_results( | ||
results=all_results, | ||
source_key='source', | ||
) | ||
|
||
result.meta.inner_hits = sorted_results |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this logic was moved to the serializer itself.
inner_hits = serializers.SerializerMethodField() | ||
|
||
def get_link(self, obj): | ||
# TODO: optimize this to not query the db for each result. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We are already doing one query per result, so this isn't something I'm introducing in this PR. We can fix this when merging the serializer from the api.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's doing more than 1 query if we're calling resolve
. We probably want to add this to the index, instead of computing it on render -- is that the plan?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We probably want to add this to the index
My plan was to so something similar to subprojects, fetch the project once and resolve the main domain for that project and then pass it down here.
OLD_TYPES = { | ||
'domain': 'domains', | ||
'section': 'sections', | ||
} | ||
OLD_FIELDS = { | ||
'docstring': 'docstrings', | ||
} | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This can be removed once we start using the same serializer in the api
def to_representation(self, instance): | ||
return { | ||
'name': getattr(instance, 'domains.name', []), | ||
'docstring': getattr(instance, 'domains.docstrings', []), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this is named docstrings
, but we only store one docstring per domain. (note here it defaults to a list bc this is the result of the highlight)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This looks like a great start.
A good way to enable this on the API might be adding a new API endpoint that returns these results, and we can keep serving the old endpoint for a release after the deploy, to allow the JS to switch over.
readthedocs/search/serializers.py
Outdated
path = serializers.CharField(source='full_path') | ||
link = serializers.SerializerMethodField() | ||
highlight = PageHighlightSerializer(source='meta.highlight', default=dict) | ||
inner_hits = serializers.SerializerMethodField() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I wonder if this should just be called hits
? Or something even more explicit? search_results
or something?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What about sections or blocks? This should be something that includes the results from domains and sections (and maybe images/codeblocks in the future?)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sure, but they are still results, no? I guess I don't see what value inner
has here, and seems confusing?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, they are results (and the object itself is a result, so that was kind of my idea of dropping the results
suffix). And, yeah inner hits is more an ES thing than something users should care about.
inner_hits = serializers.SerializerMethodField() | ||
|
||
def get_link(self, obj): | ||
# TODO: optimize this to not query the db for each result. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's doing more than 1 query if we're calling resolve
. We probably want to add this to the index, instead of computing it on render -- is that the plan?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This makes sense to me. 👍
I do think we likely want some more documentation on the fields and what they mean, but that can probably come with the public API.
This is towards #5821 and #6341 and #5966 and having a stable/understandable API for search.
I'm not using these serializers in the API yet, this is because I want to handle the update in the JS and in the server in another PR, making sure it doesn't break for users with a cached js file.
Had to update some tests on the api bc those depend on some data used for the tests for views, but after the serializers are used in the api that change can be reverted.