Skip to content

Add section linking for the search result #5829

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 70 commits into from
Jul 12, 2019
Merged
Show file tree
Hide file tree
Changes from 7 commits
Commits
Show all changes
70 commits
Select commit Hold shift + click to select a range
ee1ba1a
add sections field
dojutsu-user Jun 19, 2019
4b05f8a
index each section as separate document in ES
dojutsu-user Jun 21, 2019
79d2459
Merge branch 'master' into search-section-linking
dojutsu-user Jun 21, 2019
54ceb5c
few refactoring
dojutsu-user Jun 21, 2019
b11e357
revert all
dojutsu-user Jun 24, 2019
5b81471
Merge branch 'master' into search-section-linking
dojutsu-user Jun 24, 2019
762a79d
update document mapping (nested fields)
dojutsu-user Jun 24, 2019
7a61dbd
format text
dojutsu-user Jun 24, 2019
644565b
get results from inner_hits
dojutsu-user Jun 25, 2019
fa51a1c
Merge branch 'master' into search-section-linking
dojutsu-user Jun 25, 2019
0bc6be5
correct the none
dojutsu-user Jun 25, 2019
0139993
add field for PageDocument.
dojutsu-user Jun 26, 2019
53a02e8
remove domain_index settings
dojutsu-user Jun 26, 2019
11ba9e7
Merge branch 'htmlfile-sphinx-domain-integration' into search-section…
dojutsu-user Jun 26, 2019
6207f4e
remove SphinxDomainDocument and DomainSearch
dojutsu-user Jun 26, 2019
a251a98
generate correct query
dojutsu-user Jun 26, 2019
b6847b9
remove boosting and allsearch
dojutsu-user Jun 26, 2019
af2d69f
remove allsearch import
dojutsu-user Jun 26, 2019
32d0bed
recursively remove newline characters from highlight dict
dojutsu-user Jun 26, 2019
d472f29
Merge branch 'master' into search-section-linking
dojutsu-user Jun 27, 2019
878343d
lint fix
dojutsu-user Jun 27, 2019
f98d91c
Merge branch 'master' into search-section-linking
dojutsu-user Jun 28, 2019
7c1c641
set number_of_fragments to 1
dojutsu-user Jun 28, 2019
fd8e8f7
use nested facet
dojutsu-user Jun 28, 2019
f6221ec
get sorted results
dojutsu-user Jul 2, 2019
8840606
Merge branch 'master' into search-section-linking
dojutsu-user Jul 2, 2019
60e229c
fix search.html
dojutsu-user Jul 2, 2019
3835e2e
remove unused imports and add logging
dojutsu-user Jul 2, 2019
ae5033c
show more data on domain objects
dojutsu-user Jul 3, 2019
28e7cbf
fix main site search
dojutsu-user Jul 3, 2019
1e2a40b
mark as safe and change log to debug
dojutsu-user Jul 3, 2019
7b7a3c9
add transpiled files -- js
dojutsu-user Jul 3, 2019
3931bc0
remove log
dojutsu-user Jul 3, 2019
84a2494
small improvements in template
dojutsu-user Jul 3, 2019
5cae508
change variable name
dojutsu-user Jul 3, 2019
adb74ed
fix template
dojutsu-user Jul 4, 2019
d500d98
fix lint
dojutsu-user Jul 4, 2019
9461d4f
use python datatypes
dojutsu-user Jul 4, 2019
75dcc2f
remove highlight url param from sections and domains
dojutsu-user Jul 4, 2019
ea36138
fix clashing css classes
dojutsu-user Jul 4, 2019
451c0f4
Merge branch 'master' into search-section-linking
dojutsu-user Jul 8, 2019
0817d43
use underscore.js template
dojutsu-user Jul 8, 2019
5305458
add _ with variables
dojutsu-user Jul 8, 2019
68cb7af
add comment in template
dojutsu-user Jul 8, 2019
d62bf3e
use .iterator()
dojutsu-user Jul 8, 2019
ed16e56
show multiple results per section, if present
dojutsu-user Jul 9, 2019
0ed64f7
fix sphinx indexing
dojutsu-user Jul 9, 2019
f988302
don't index '-' value of domain.display_name
dojutsu-user Jul 9, 2019
429b3e9
fix eslint
dojutsu-user Jul 9, 2019
897e09f
Merge branch 'master' into search-section-linking
dojutsu-user Jul 9, 2019
aeaba6f
reduce complexity in search.js
dojutsu-user Jul 10, 2019
6f9b2bc
refactor tasks.py file
dojutsu-user Jul 10, 2019
6135cde
fix logic in search.views
dojutsu-user Jul 10, 2019
d3566ac
make 100 a constant
dojutsu-user Jul 10, 2019
992c72e
Add checkbox for searching in current section
dojutsu-user Jul 10, 2019
f0babf1
remove checkbox code for now
dojutsu-user Jul 10, 2019
4527839
Merge branch 'master' into search-section-linking
dojutsu-user Jul 10, 2019
7e75d7e
fix test_imported_file
dojutsu-user Jul 10, 2019
1e6721d
fix test_search_json_parsing
dojutsu-user Jul 10, 2019
2a4c070
fix test_search_json_parsing
dojutsu-user Jul 10, 2019
4beec39
update test_search_json_parsing
dojutsu-user Jul 10, 2019
01346a0
Merge branch 'master' into search-section-linking
dojutsu-user Jul 11, 2019
91282de
refactor parse_json and its test
dojutsu-user Jul 11, 2019
cfe8f5b
write initial tests
dojutsu-user Jul 11, 2019
7e99f6a
make 100 as constant
dojutsu-user Jul 11, 2019
b7ce777
fix lint
dojutsu-user Jul 12, 2019
6701a4e
add test for domains and filter by version and project
dojutsu-user Jul 12, 2019
cee24ed
revert changes to python_environments.py
dojutsu-user Jul 12, 2019
685f6db
remove tests from this pr
dojutsu-user Jul 12, 2019
d7edeee
update template to make 100 as constant
dojutsu-user Jul 12, 2019
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 0 additions & 2 deletions readthedocs/projects/models.py
Original file line number Diff line number Diff line change
Expand Up @@ -1252,8 +1252,6 @@ def get_processed_json(self):
file_path,
)
return {
'headers': [],
'content': '',
'path': file_path,
'title': '',
'sections': [],
Expand Down
10 changes: 8 additions & 2 deletions readthedocs/search/documents.py
Original file line number Diff line number Diff line change
Expand Up @@ -120,8 +120,14 @@ class PageDocument(RTDDocTypeMixin, DocType):

# Searchable content
title = fields.TextField(attr='processed_json.title')
headers = fields.TextField(attr='processed_json.headers')
content = fields.TextField(attr='processed_json.content')
sections = fields.NestedField(
attr='processed_json.sections',
properties={
'id': fields.KeywordField(),
'title': fields.TextField(),
'content': fields.TextField(),
}
)

modified_model_field = 'modified_date'

Expand Down
34 changes: 2 additions & 32 deletions readthedocs/search/parse_json.py
Original file line number Diff line number Diff line change
Expand Up @@ -10,17 +10,6 @@
log = logging.getLogger(__name__)


def process_headers(data, filename):
"""Read headers from toc data."""
headers = []
if data.get('toc', False):
for element in PyQuery(data['toc'])('a'):
headers.append(recurse_while_none(element))
if None in headers:
log.info('Unable to index file headers for: %s', filename)
return headers


def generate_sections_from_pyquery(body):
"""Given a pyquery object, generate section dicts for each section."""
# Capture text inside h1 before the first h2
Expand All @@ -35,7 +24,7 @@ def generate_sections_from_pyquery(body):
if next_p[0].tag == 'div' and 'class' in next_p[0].attrib:
if 'section' in next_p[0].attrib['class']:
break
h1_content += '\n%s\n' % next_p.html()
h1_content += '\n%s\n' % next_p.text()
next_p = next_p.next()
if h1_content:
yield {
Expand All @@ -51,7 +40,7 @@ def generate_sections_from_pyquery(body):
header = section_list.eq(num)
title = header.text().replace('¶', '').strip()
section_id = div.attr('id')
content = div.html()
content = div.text()
yield {
'id': section_id,
'title': title,
Expand All @@ -71,7 +60,6 @@ def process_file(fjson_filename):
sections = []
path = ''
title = ''
body_content = ''

if 'current_page_name' in data:
path = data['current_page_name']
Expand All @@ -80,7 +68,6 @@ def process_file(fjson_filename):

if data.get('body'):
body = PyQuery(data['body'])
body_content = body.text().replace('¶', '')
sections.extend(generate_sections_from_pyquery(body))
else:
log.info('Unable to index content for: %s', fjson_filename)
Expand All @@ -93,24 +80,7 @@ def process_file(fjson_filename):
log.info('Unable to index title for: %s', fjson_filename)

return {
'headers': process_headers(data, fjson_filename),
'content': body_content,
'path': path,
'title': title,
'sections': sections,
}


def recurse_while_none(element):
"""
Traverse the ``element`` until a non-None text is found.

:param element: element to traverse until get a non-None text.
:type element: pyquery.PyQuery

:returns: the first non-None value found
:rtype: str
"""
if element.text is None:
return recurse_while_none(element.getchildren()[0])
return element.text
3 changes: 0 additions & 3 deletions readthedocs/settings/base.py
Original file line number Diff line number Diff line change
Expand Up @@ -446,9 +446,6 @@ def USE_PROMOS(self): # noqa
'settings': {
'number_of_shards': 2,
'number_of_replicas': 0,
"index": {
"sort.field": ["project", "version"]
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

}
}
},
}
Expand Down