-
-
Notifications
You must be signed in to change notification settings - Fork 3.6k
Design architecture of search #4094
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
I'll preface my part of the discussion by saying that I'm not extremely familiar with our current search setup. I believe our search solution:
|
I want to echo all @davidfischer has said (including the part of not being very familiar with how our current search works :) ) Besides that, some specific comments:
I want to mention a little our experience with Instead of a new instance, isn't it better to have a new django app inside the current project?
Won't this kill the servers? Update: I just found this on the readthedocs.org site. https://readthedocs.org/search/?q=memory&type=file . Not sure if it's exactly what you are talking about or not though.
Totally agree with this. I wasn't able to set up the search in my local instance either. I don't know if you both have access to the Would be good to have also this extra points:
|
Follow up
No. I think David was just suggesting a UI element that would let you expand the search, similar to what GitHub does. This is a separate issue though, and should be raised elsewhere to continue discussion.
This is hard, because of Elastic Search. We can do some work around docker compose or something that could make this easier, but it will always require having an additional server process running as currently designed. We could consider switching from ES to Postgres full-text search, but I think that's a larger issue, and I'm not fully convinced the Postgres will give us everything we need. Django does have extensions for Postgres' full-text search though (https://docs.djangoproject.com/en/2.0/ref/contrib/postgres/search/), and we could use a separate Postgres instance in prod. If we are still depending on ES, the local setup is always going to be decently complicated. The other option would be do implement again using something like haystack (http://django-haystack.readthedocs.io/en/master/). We moved away from this in the past because it wasn't being actively developed, and it didn't support new versions of ES that I can tell. It also means that we get a lot of support issues asking us to support other search backends, which isn't something we really want to do. It would give us a way to support local dev easier though. FeedbackFeedback on a few things In production, running search as a separate processThis is possible to do without having a separate repo for dev. For instance, we run We also run the In development, depending on the RTD codebaseI think we should build the search functionality so that it could be run outside of RTD. This means not importing or depending on the actual RTD models as they exist in this codebase. It could live inside the RTD repo, but not depend on the Django models. If we built this in another repo, that certainly makes that split very obvious. We could have the RTD codebase depend on a version (similar to readthedocs-build), and just use it like another django reusable app that would be developed by a third party. Ease of developmentThe main value that I see in having it split out is allowing people to do development on it without depending on RTD itself. I think that is a good long term goal, but doesn't even really require having it in a separate repo if we design it properly. Final thoughtsI'm still a bit torn on this. I think having it be it's own repo would be a nice thing. I do worry about the deployment and development overhead though. I think I'm leaning towards having it included in the RTD repo for now, but designing it so that if we wanted to move it out in the future, we could do it with minimal effort. |
Yes. I hope to work to index other documentation formats after making some progress with the current Sphinx one.
I will eventually give a try to dockerize the application so all the stack can be run in separate container seamlessly. Maybe, in future, we need to work heavily in search feature because its much needed for big documentations!
Currently, its possible to search project names in the core application and project search in project page. I think in future we can also add all the project search in our core application. |
If we keep it inside the the RTD repo, we can use many built in signals of models like post_save and other in order to index in elasticsearch. Thats a good point also! |
I have referenced this issue in our downstream docs project (see above) -- because it seems related: It has been flagged that search results need some kind of scoring or we need a way to configure exclusion of directories or sets of files from the search results (because they for one reason or the other aren't nice to have in search results). |
Having more project control over search is definitely a good addition that we should think about. We could likely support metadata on the page, or some kind of setting which could support this. Thanks for the idea @benjaoming. |
I think the disadvantages outweigh the advantages with regard to splitting out this code. Namely:
I agree with the points about maintaining this in our main code base -- we can make this code isolated and not dependent on core code, but the code still lives alongside our other code. I'm not sure we have a reason yet to run a separate instance, but if we do, this is absolutely an option. I vote to plan on keeping this code in our main code base. |
Without any additional feedback here, I think we've determined that splitting this out to a microservice, or out to a separate repository, doesn't have many benefits for us. |
I know this issue is closed, but I want to add information about the full-text search with Django and PostgreSQL
The website djangoproject.com has full-text search powered only by Django itself and PostgreSQL. We removed ES from the stack 2 years ago and now we have a simplified stack, faster indexing and great search performance. I'm working with RTD and my first thought when I started was: "Is ES really necessary ? There's already PostgreSQL on the stack with all data, why can't we use it to have a full-text search ?" |
The search functionality hopefully will be rewritten in the upcoming months. Before we start, we need to make some architectural decision about it.
Currently the search functionality lives inside the readthedocs main application. But there are 2 kind of search. One resides inside project page and another one is on the readthedocs.org site. Both of the functionality is hosted on the core application.
I would like to propose to separate the search functionality in a different django application. So it will be easier to maintain the code as well as scale the application as per needed! The indexing as well as query will be made to the application through API and it will return the result through API.
Pros
Cons
What do you think @ericholscher @agjohnson @humitos @davidfischer @stsewd
The text was updated successfully, but these errors were encountered: