-
-
Notifications
You must be signed in to change notification settings - Fork 3.6k
Search: return relatives URLS #7376
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This looks good except for the hacky logic to get the path & domain :)
highlights = PageHighlightSerializer(source='meta.highlight', default=dict) | ||
blocks = serializers.SerializerMethodField() | ||
|
||
def get_link(self, obj): | ||
def get_domain(self, obj): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This definitely seems like it's adding a bunch of queries on both of these functions to do the full resolve
and then ignore parts of it (eg. we don't care about subprojects for the domain). Is there a reason not to just call the resolver resolve_path
and resolve_domain
directly here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The result from _get_full_path
is cached, and we already pass project_data
into the context, so this won't generate any extra queries.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
readthedocs.org/readthedocs/search/api.py
Lines 379 to 382 in 7274123
def get_serializer_context(self): | |
context = super().get_serializer_context() | |
context['projects_data'] = self._get_all_projects_data() | |
return context |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Calling resolve_path and resolve_domain here will generate extra queries.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It seems like projects_data
is only used in this one place, so I don't understand why we're caching it prior to calling this code? Seems like we could just remove all the pre-setting and only set it here when we actually use it?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The serializer only knows about one object, not about all of them. But the caller of this class has the list of all objects that the serializer is going to use, so it can retrieve all the data in one query.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sure, but you're doing queries for every Domain for every subproject with this approach, instead of querying the doctype for every Version, which will lead to the same number of queries?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ok, I see what you mean, yeah, that can be optimized to query the domain only once.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think we need to worry too much about making this super efficient -- I'm actually saying we should make the code simpler rather than try and make it super fast. We don't do that many searches, so having simple code is probably better. I guess it might matter for projects with a lot of subprojects.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I also think we need to think a bit more deeply about how to make this stuff faster at the resolver level, rather than trying to optimize specific areas. We've done this a few times, and really the solution should be "calling the resolver is always fast"
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This seems fine for now, though a little complicated :)
Closes #7311