Skip to content

Search: Use serializers to parse search results #7157

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 12 commits into from
Jun 23, 2020
Merged

Search: Use serializers to parse search results #7157

merged 12 commits into from
Jun 23, 2020

Conversation

stsewd
Copy link
Member

@stsewd stsewd commented Jun 4, 2020

This is towards #5821 and #6341 and #5966 and having a stable/understandable API for search.

I'm not using these serializers in the API yet, this is because I want to handle the update in the JS and in the server in another PR, making sure it doesn't break for users with a cached js file.

Had to update some tests on the api bc those depend on some data used for the tests for views, but after the serializers are used in the api that change can be reverted.

@stsewd stsewd added the PR: work in progress Pull request is not ready for full review label Jun 4, 2020
Comment on lines -108 to -119
for result in results:
inner_hits = result.meta.inner_hits
sections = inner_hits.sections or []
domains = inner_hits.domains or []
all_results = itertools.chain(sections, domains)

sorted_results = utils._get_sorted_results(
results=all_results,
source_key='source',
)

result.meta.inner_hits = sorted_results
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this logic was moved to the serializer itself.

inner_hits = serializers.SerializerMethodField()

def get_link(self, obj):
# TODO: optimize this to not query the db for each result.
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We are already doing one query per result, so this isn't something I'm introducing in this PR. We can fix this when merging the serializer from the api.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's doing more than 1 query if we're calling resolve. We probably want to add this to the index, instead of computing it on render -- is that the plan?

Copy link
Member Author

@stsewd stsewd Jun 15, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We probably want to add this to the index

My plan was to so something similar to subprojects, fetch the project once and resolve the main domain for that project and then pass it down here.

Comment on lines +26 to +33
OLD_TYPES = {
'domain': 'domains',
'section': 'sections',
}
OLD_FIELDS = {
'docstring': 'docstrings',
}

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This can be removed once we start using the same serializer in the api

def to_representation(self, instance):
return {
'name': getattr(instance, 'domains.name', []),
'docstring': getattr(instance, 'domains.docstrings', []),
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this is named docstrings, but we only store one docstring per domain. (note here it defaults to a list bc this is the result of the highlight)

@stsewd stsewd removed the PR: work in progress Pull request is not ready for full review label Jun 4, 2020
@stsewd stsewd requested a review from a team June 4, 2020 17:38
Copy link
Member

@ericholscher ericholscher left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks like a great start.

A good way to enable this on the API might be adding a new API endpoint that returns these results, and we can keep serving the old endpoint for a release after the deploy, to allow the JS to switch over.

path = serializers.CharField(source='full_path')
link = serializers.SerializerMethodField()
highlight = PageHighlightSerializer(source='meta.highlight', default=dict)
inner_hits = serializers.SerializerMethodField()
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wonder if this should just be called hits? Or something even more explicit? search_results or something?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What about sections or blocks? This should be something that includes the results from domains and sections (and maybe images/codeblocks in the future?)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure, but they are still results, no? I guess I don't see what value inner has here, and seems confusing?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, they are results (and the object itself is a result, so that was kind of my idea of dropping the results suffix). And, yeah inner hits is more an ES thing than something users should care about.

inner_hits = serializers.SerializerMethodField()

def get_link(self, obj):
# TODO: optimize this to not query the db for each result.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's doing more than 1 query if we're calling resolve. We probably want to add this to the index, instead of computing it on render -- is that the plan?

@stsewd stsewd requested a review from ericholscher June 22, 2020 19:19
Copy link
Member

@ericholscher ericholscher left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This makes sense to me. 👍

I do think we likely want some more documentation on the fields and what they mean, but that can probably come with the public API.

@stsewd stsewd merged commit 9846324 into master Jun 23, 2020
@stsewd stsewd deleted the api-for-search branch June 23, 2020 19:25
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants