Skip to content

Search engine setup #1673

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
kottenator opened this issue Sep 21, 2015 · 10 comments
Closed

Search engine setup #1673

kottenator opened this issue Sep 21, 2015 · 10 comments

Comments

@kottenator
Copy link

I'm setting up self-hosted RTD instance and I'm experiencing problems with setting up the search engine.

I'm using 939c7f8 commit (master branch).

Q1: what engine should I setup?

I see that you use Haystack with different search engines. As I understand, in production you use Solr.

Q2: meanwhile, do I need to run ElasticSearch anyway? There is settings.ES_HOSTS - what's the purpose?

Q3: may I use ElasticSearch for Haystack? I tried and it's failed to build the index:

elasticsearch.exceptions.SerializationError: ({u'django_id': u'1', 'description': u'', 'title': u'Pip', 'text': u'Pip\n\n', 'author': <User: docs>, 'repo_type': u'git', u'django_ct': u'projects.project', 'absolute_url': u'/projects/pip/', u'id': u'projects.project.1'}, TypeError("Unable to serialize <User: docs> (type: <class 'django.contrib.auth.models.User'>)",))

Q4: ./manage.py reindex_elasticsearch raises an error:

File "/var/www/rtd/readthedocs/core/management/commands/reindex_elasticsearch.py", line 48, in handle
    project_scale=0, page_scale=0, section=False, delete=False)
  File "/var/www/rtd/readthedocs/restapi/utils.py", line 144, in index_search_request
    page_obj.bulk_index(index_list, parent=project.slug)
  File "/var/www/rtd/readthedocs/search/indexes.py", line 140, in bulk_index
    bulk_index(self.es, docs, chunk_size=chunk_size)
  File "/var/www/rtd/venv/lib/python2.7/site-packages/elasticsearch/helpers/__init__.py", line 182, in bulk
    for ok, item in streaming_bulk(client, actions, **kwargs):
  File "/var/www/rtd/venv/lib/python2.7/site-packages/elasticsearch/helpers/__init__.py", line 124, in streaming_bulk
    raise e
RequestError: TransportError(400, u"ElasticsearchIllegalArgumentException[Can't specify parent if no parent field has been configured]")

Q5: it seems that you use outdated django-haystack==2.1.0. I use Django 1.8 and when I do ./manage.py rebuild_index - it gives an error:

...
File "/var/www/rtd/venv/lib/python2.7/site-packages/haystack/indexes.py", line 204, in full_prepare
    self.prepared_data = self.prepare(obj)
File "/var/www/rtd/venv/lib/python2.7/site-packages/haystack/indexes.py", line 187, in prepare
    ID: get_identifier(obj),
File "/var/www/rtd/venv/lib/python2.7/site-packages/haystack/utils/__init__.py", line 33, in default_get_identifier
    obj_or_string._meta.module_name,
AttributeError: 'Options' object has no attribute 'module_name'

I've updated it to django-haystack==2.4.0 and now I can rebuild the index.

@frodopwns
Copy link

I am pretty sure the main search engine being used is ES.

@kottenator
Copy link
Author

As I understand, ES is used for search in files and it's used directly, without Haystack. But how Haystack is used then?

And I still can't build ES index...

@frodopwns
Copy link

I used this: https://github.com/moul/docker-readthedocs

Then spun up an ES container: https://hub.docker.com/_/elasticsearch/

I was able to search project titles and within the docs but i wasn't able to search the docs from the project's overview page.

@frodopwns
Copy link

That image no longer builds so I checked out a copy of the source from about when it did build and it still doesn't build! For some reason the manage.py isn't where he expected it to be.

@ranman
Copy link

ranman commented Dec 4, 2015

what's the status of this ticket? I am not able to serialize the users-guide project because of references to users:
elasticsearch.exceptions.SerializationError: ({u'django_id': u'1', 'description': u'', 'title': u'User Guide', 'text': u'User Guide\n\n', 'author': <User: something>, 'repo_type': u'git', u'django_ct': u'projects.project', 'absolute_url': u'/projects/user-guide/', u'id': u'projects.project.1'}, TypeError("Unable to serialize <User: something> (type: <class 'django.contrib.auth.models.User'>)",))

When I run python manage.py update_index

@berkerpeksag
Copy link
Member

But how Haystack is used then?

According to https://github.com/rtfd/readthedocs.org/blob/master/readthedocs/projects/search_indexes.py#L6 Haystack has been deprecated.

@stsewd
Copy link
Member

stsewd commented Apr 16, 2018

Does this #3881 solves the issue? If so, we can close this one in favor of #3803

@berkerpeksag
Copy link
Member

django-haystack (there is also celery-haystack which I don't know its use case) is still listed in requirements/pip.txt so I think its usage should still be clarified in the installation guide. If it can be clarified in PR #3881 then we can close this one.

@stsewd
Copy link
Member

stsewd commented Apr 16, 2018

I can see that haystack is still used only on the API v1

@stsewd
Copy link
Member

stsewd commented Jun 14, 2018

Haystack was removed #4039 and there is a guide for setup ES #3881

@stsewd stsewd closed this as completed Jun 14, 2018
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

6 participants