Skip to content

Add section linking for the search result #5829

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 70 commits into from
Jul 12, 2019
Merged
Show file tree
Hide file tree
Changes from 6 commits
Commits
Show all changes
70 commits
Select commit Hold shift + click to select a range
ee1ba1a
add sections field
dojutsu-user Jun 19, 2019
4b05f8a
index each section as separate document in ES
dojutsu-user Jun 21, 2019
79d2459
Merge branch 'master' into search-section-linking
dojutsu-user Jun 21, 2019
54ceb5c
few refactoring
dojutsu-user Jun 21, 2019
b11e357
revert all
dojutsu-user Jun 24, 2019
5b81471
Merge branch 'master' into search-section-linking
dojutsu-user Jun 24, 2019
762a79d
update document mapping (nested fields)
dojutsu-user Jun 24, 2019
7a61dbd
format text
dojutsu-user Jun 24, 2019
644565b
get results from inner_hits
dojutsu-user Jun 25, 2019
fa51a1c
Merge branch 'master' into search-section-linking
dojutsu-user Jun 25, 2019
0bc6be5
correct the none
dojutsu-user Jun 25, 2019
0139993
add field for PageDocument.
dojutsu-user Jun 26, 2019
53a02e8
remove domain_index settings
dojutsu-user Jun 26, 2019
11ba9e7
Merge branch 'htmlfile-sphinx-domain-integration' into search-section…
dojutsu-user Jun 26, 2019
6207f4e
remove SphinxDomainDocument and DomainSearch
dojutsu-user Jun 26, 2019
a251a98
generate correct query
dojutsu-user Jun 26, 2019
b6847b9
remove boosting and allsearch
dojutsu-user Jun 26, 2019
af2d69f
remove allsearch import
dojutsu-user Jun 26, 2019
32d0bed
recursively remove newline characters from highlight dict
dojutsu-user Jun 26, 2019
d472f29
Merge branch 'master' into search-section-linking
dojutsu-user Jun 27, 2019
878343d
lint fix
dojutsu-user Jun 27, 2019
f98d91c
Merge branch 'master' into search-section-linking
dojutsu-user Jun 28, 2019
7c1c641
set number_of_fragments to 1
dojutsu-user Jun 28, 2019
fd8e8f7
use nested facet
dojutsu-user Jun 28, 2019
f6221ec
get sorted results
dojutsu-user Jul 2, 2019
8840606
Merge branch 'master' into search-section-linking
dojutsu-user Jul 2, 2019
60e229c
fix search.html
dojutsu-user Jul 2, 2019
3835e2e
remove unused imports and add logging
dojutsu-user Jul 2, 2019
ae5033c
show more data on domain objects
dojutsu-user Jul 3, 2019
28e7cbf
fix main site search
dojutsu-user Jul 3, 2019
1e2a40b
mark as safe and change log to debug
dojutsu-user Jul 3, 2019
7b7a3c9
add transpiled files -- js
dojutsu-user Jul 3, 2019
3931bc0
remove log
dojutsu-user Jul 3, 2019
84a2494
small improvements in template
dojutsu-user Jul 3, 2019
5cae508
change variable name
dojutsu-user Jul 3, 2019
adb74ed
fix template
dojutsu-user Jul 4, 2019
d500d98
fix lint
dojutsu-user Jul 4, 2019
9461d4f
use python datatypes
dojutsu-user Jul 4, 2019
75dcc2f
remove highlight url param from sections and domains
dojutsu-user Jul 4, 2019
ea36138
fix clashing css classes
dojutsu-user Jul 4, 2019
451c0f4
Merge branch 'master' into search-section-linking
dojutsu-user Jul 8, 2019
0817d43
use underscore.js template
dojutsu-user Jul 8, 2019
5305458
add _ with variables
dojutsu-user Jul 8, 2019
68cb7af
add comment in template
dojutsu-user Jul 8, 2019
d62bf3e
use .iterator()
dojutsu-user Jul 8, 2019
ed16e56
show multiple results per section, if present
dojutsu-user Jul 9, 2019
0ed64f7
fix sphinx indexing
dojutsu-user Jul 9, 2019
f988302
don't index '-' value of domain.display_name
dojutsu-user Jul 9, 2019
429b3e9
fix eslint
dojutsu-user Jul 9, 2019
897e09f
Merge branch 'master' into search-section-linking
dojutsu-user Jul 9, 2019
aeaba6f
reduce complexity in search.js
dojutsu-user Jul 10, 2019
6f9b2bc
refactor tasks.py file
dojutsu-user Jul 10, 2019
6135cde
fix logic in search.views
dojutsu-user Jul 10, 2019
d3566ac
make 100 a constant
dojutsu-user Jul 10, 2019
992c72e
Add checkbox for searching in current section
dojutsu-user Jul 10, 2019
f0babf1
remove checkbox code for now
dojutsu-user Jul 10, 2019
4527839
Merge branch 'master' into search-section-linking
dojutsu-user Jul 10, 2019
7e75d7e
fix test_imported_file
dojutsu-user Jul 10, 2019
1e6721d
fix test_search_json_parsing
dojutsu-user Jul 10, 2019
2a4c070
fix test_search_json_parsing
dojutsu-user Jul 10, 2019
4beec39
update test_search_json_parsing
dojutsu-user Jul 10, 2019
01346a0
Merge branch 'master' into search-section-linking
dojutsu-user Jul 11, 2019
91282de
refactor parse_json and its test
dojutsu-user Jul 11, 2019
cfe8f5b
write initial tests
dojutsu-user Jul 11, 2019
7e99f6a
make 100 as constant
dojutsu-user Jul 11, 2019
b7ce777
fix lint
dojutsu-user Jul 12, 2019
6701a4e
add test for domains and filter by version and project
dojutsu-user Jul 12, 2019
cee24ed
revert changes to python_environments.py
dojutsu-user Jul 12, 2019
685f6db
remove tests from this pr
dojutsu-user Jul 12, 2019
d7edeee
update template to make 100 as constant
dojutsu-user Jul 12, 2019
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
3 changes: 3 additions & 0 deletions readthedocs/rtd_tests/tests/test_search_json_parsing.py
Original file line number Diff line number Diff line change
Expand Up @@ -23,3 +23,6 @@ def test_h2_parsing(self):
'You can use Slumber'
))
self.assertEqual(data['title'], 'Read the Docs Public API')

for section in data['sections']:
self.assertFalse('\n' in section['content'])
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This could probably use a comment. Likely it should also test for a length before doing this, otherwise this check could be running on 0 sections.

4 changes: 2 additions & 2 deletions readthedocs/search/api.py
Original file line number Diff line number Diff line change
Expand Up @@ -48,14 +48,14 @@ def get_inner_hits(self, obj):
domains = inner_hits.domains or []
all_results = itertools.chain(sections, domains)

sorted_results = (
sorted_results = [
Copy link
Member Author

@dojutsu-user dojutsu-user Jul 11, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@stsewd
Here, If I used a generator expression -- test_search_works_with_title_query and test_search_works_with_sections_query will fail.
I can't find the reason though. For now, I have changed them to list comprehension for now.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So, I wasn't able to run the test because there is an import error, my guess is that when the generator gets evaluated the object inner_hits has changed. You can confirm this if you do a copy of inner_hits.sections and inner_hits.domains before assign them.

Also, I'd just left the list comprehension, since we don't know when the generator gets evaluated by django rest.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I will add comments there to avoid any confusion in the future.

{
'type': hit._nested.field,
'_source': hit._source.to_dict(),
'highlight': self._get_inner_hits_highlights(hit),
}
for hit in sorted(all_results, key=utils._get_hit_score, reverse=True)
)
]

return sorted_results

Expand Down
6 changes: 3 additions & 3 deletions readthedocs/search/faceted_search.py
Original file line number Diff line number Diff line change
Expand Up @@ -101,11 +101,11 @@ class PageSearchBase(RTDFacetedSearch):
doc_types = [PageDocument]
index = PageDocument._doc_type.index

_outer_fields = ['title']
_section_fields = ['sections.title', 'sections.content']
_outer_fields = ['title^4']
_section_fields = ['sections.title^3', 'sections.content']
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added the boosters.
They are working fine.

_domain_fields = [
'domains.type_display',
'domains.name',
'domains.name^2',
'domains.display_name',
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not sure which fields are to be included.

]
fields = _outer_fields
Expand Down
32 changes: 24 additions & 8 deletions readthedocs/search/parse_json.py
Original file line number Diff line number Diff line change
Expand Up @@ -25,16 +25,13 @@ def generate_sections_from_pyquery(body):
if 'section' in next_p[0].attrib['class']:
break

h1_content += '\n%s\n' % next_p.text().replace('¶', '').strip()
h1_content = h1_content.split('\n')[1:] # to remove the redundant text
h1_content = '\n'.join(h1_content)

h1_content += parse_content(next_p.text())
next_p = next_p.next()
if h1_content:
yield {
'id': h1_id,
'title': h1_title,
'content': h1_content,
'content': h1_content.replace('\n', '. '),
}

# Capture text inside h2's
Expand All @@ -45,9 +42,8 @@ def generate_sections_from_pyquery(body):
title = header.text().replace('¶', '').strip()
section_id = div.attr('id')

content = div.text().replace('¶', '').strip()
content = content.split('\n')[1:] # to remove the redundant text
content = '\n'.join(content)
content = div.text()
content = parse_content(content)

yield {
'id': section_id,
Expand Down Expand Up @@ -92,3 +88,23 @@ def process_file(fjson_filename):
'title': title,
'sections': sections,
}


def parse_content(content):
"""
Removes the starting text and ¶.

It removes the starting text from the content
because it contains the the title of that content,
which is redundant here.
"""
content = content.replace('¶', '').strip()

# removing the starting text of each
content = content.split('\n')
if len(content) > 1: # there were \n
content = content[1:]

# converting newlines to ". "
content = '. '.join([text.strip() for text in content])
return content
35 changes: 32 additions & 3 deletions readthedocs/search/tests/conftest.py
Original file line number Diff line number Diff line change
Expand Up @@ -8,6 +8,8 @@

from readthedocs.projects.models import Project, HTMLFile
from readthedocs.search.documents import PageDocument
from readthedocs.sphinx_domains.models import SphinxDomain

from .dummy_data import ALL_PROJECTS, PROJECT_DATA_FILES


Expand All @@ -32,6 +34,28 @@ def all_projects(es_index, mock_processed_json, db, settings):
file_name = file_basename + '.html'
version = project.versions.all()[0]
html_file = G(HTMLFile, project=project, version=version, name=file_name)

# creating sphinx domain test objects
file_path = get_json_file_path(project.slug, file_basename)
if os.path.exists(file_path):
with open (file_path) as f:
data = json.load(f)
domains = data['domains']

for domain_data in domains:
domain_role_name = domain_data.pop('role_name')
domain, type_ = domain_role_name.split(':')

G(
SphinxDomain,
project=project,
version=version,
html_file=html_file,
domain=domain,
type=type_,
**domain_data
)

PageDocument().update(html_file)

projects_list.append(project)
Expand All @@ -46,12 +70,17 @@ def project(all_projects):
return all_projects[0]


def get_json_file_path(project_slug, basename):
current_path = os.path.abspath(os.path.dirname(__file__))
file_name = f'{basename}.json'
file_path = os.path.join(current_path, 'data', project_slug, file_name)
return file_path


def get_dummy_processed_json(instance):
project_slug = instance.project.slug
basename = os.path.splitext(instance.name)[0]
file_name = basename + '.json'
current_path = os.path.abspath(os.path.dirname(__file__))
file_path = os.path.join(current_path, "data", project_slug, file_name)
file_path = get_json_file_path(project_slug, basename)

if os.path.exists(file_path):
with open(file_path) as f:
Expand Down
31 changes: 0 additions & 31 deletions readthedocs/search/tests/data/docs/story.json

This file was deleted.

41 changes: 41 additions & 0 deletions readthedocs/search/tests/data/docs/support.json
Original file line number Diff line number Diff line change
@@ -0,0 +1,41 @@
{
"path": "support",
"title": "Support",
"sections": [
{
"id": "usage-questions",
"title": "Usage Questions",
"content": "If you have questions about how to use Read the Docs, or have an issue that isn’t related to a bug, Stack Overflow is the best place to ask. Tag questions with read-the-docs so other folks can find them easily.. Good questions for Stack Overflow would be:. “What is the best way to structure the table of contents across a project?”. “How do I structure translations inside of my project for easiest contribution from users?”. “How do I use Sphinx to use SVG images in HTML output but PNG in PDF output?”"
},
{
"id": "community-support",
"title": "Community Support",
"content": "Read the Docs is supported by community contributions and advertising. We hope to bring in enough money with our Gold and Ethical Ads programs to keep Read the Docs sustainable.. All people answering your questions are doing it with their own time, so please be kind and provide as much information as possible.. Bugs & Support Issues. You can file bug reports on our GitHub issue tracker, and they will be addressed as soon as possible. Support is a volunteer effort, and there is no guaranteed response time. If you need answers quickly, you can buy commercial support below.. Reporting Issues. When reporting a bug, please include as much information as possible that will help us solve this issue. This includes:. Project name. URL. Action taken. Expected result. Actual result. Specific Requests. If you need a specific request for your project or account, like more resources, change of the project’s slug or username. Send an email to [email protected]."
},
{
"id": "commercial-support",
"title": "Commercial Support",
"content": "We offer commercial support for Read the Docs, commercial hosting, as well as consulting around all documentation systems. You can contact us at [email protected] to learn more, or read more at https://readthedocs.com/services/#open-source-support."
}
],
"domains": [
{
"role_name": "http:post",
"doc_name": "api/v3.html",
"anchor": "post--api-v3-projects-(string-project_slug)-versions-(string-version_slug)-builds-",
"type_display": "post",
"doc_display": "API v3",
"name": "/api/v3/projects/(string:project_slug)/versions/(string:version_slug)/builds/",
"display_name": ""
},
{
"role_name": "http:patch",
"doc_name": "api/v3.html",
"anchor": "patch--api-v3-projects-(string-project_slug)-version-(string-version_slug)-",
"type_display": "patch",
"doc_display": "API v3",
"name": "/api/v3/projects/(string:project_slug)/version/(string:version_slug)/",
"display_name": ""
}
]
}
60 changes: 47 additions & 13 deletions readthedocs/search/tests/data/docs/wiping.json
Original file line number Diff line number Diff line change
@@ -1,15 +1,49 @@
{
"content": "ReadtheDocsWiping a Build Environment\nSometimes it happen that your Builds start failing because the build environment where the is created is stale or broken. This could happen for a couple of different reasons like pip not upgrading a package properly or a corrupted cached Python package.\nIn any of these cases (and many others), the solution could be just wiping out the existing build environment files and allow Read the Docs to create a new fresh one.\nFollow these steps to wipe the build environment:\nGo to Versions\nClick on the Edit button of the version you want to wipe on the right side of the page\nGo to the bottom of the page and click the wipe link, next to the \u201cSave\u201d button\nNote\nBy wiping the build environment, all the rst, md, and code files associated with it will be removed but not the already built (HTML and PDF files). Your will still online after wiping the build environment.\nNow you can re-build the version with a fresh build environment!",
"headers": [
"Wiping a Build Environment"
],
"title": "Wiping a Build Environment",
"sections": [
{
"content": "\nSometimes it happen that your Builds start failing because the build\nenvironment where the is created is stale or\nbroken. This could happen for a couple of different reasons like <code class=\"xref py py-obj docutils literal notranslate\"><span class=\"pre\">pip</span></code>\nnot upgrading a package properly or a corrupted cached Python package.\n\nIn any of these cases (and many others), the solution could be just\nwiping out the existing build environment files and allow Read the\nDocs to create a new fresh one.\n\nFollow these steps to wipe the build environment:\n\n\n<li>Go to <strong>Versions</strong></li>\n<li>Click on the <strong>Edit</strong> button of the version you want to wipe on the\nright side of the page</li>\n<li>Go to the bottom of the page and click the <strong>wipe</strong> link, next to\nthe \u201cSave\u201d button</li>\n\n\n\n<p class=\"first admonition-title\">Note</p>\n<p class=\"last\">By wiping the build environment, all the <code class=\"xref py py-obj docutils literal notranslate\"><span class=\"pre\">rst</span></code>, <code class=\"xref py py-obj docutils literal notranslate\"><span class=\"pre\">md</span></code>,\nand code files associated with it will be removed but not the\n already built (<code class=\"xref py py-obj docutils literal notranslate\"><span class=\"pre\">HTML</span></code> and <code class=\"xref py py-obj docutils literal notranslate\"><span class=\"pre\">PDF</span></code> files). Your\n will still online after wiping the build environment.</p>\n\n\nNow you can re-build the version with a fresh build environment!\n",
"id": "wiping-a-build-environment",
"title": "Wiping a Build Environment"
}
],
"path": "guides/wipe-environment"
"path": "guides/wipe-environment",
"title": "Wiping a Build Environment",
"sections": [
{
"id": "wiping-a-build-environment",
"title": "Wiping a Build Environment",
"content": "Sometimes it happen that your Builds start failing because the build environment where the documentation is created is stale or broken. This could happen for a couple of different reasons like pip not upgrading a package properly or a corrupted cached Python package.In any of these cases (and many others), the solution could be just wiping out the existing build environment files and allow Read the Docs to create a new fresh one.Follow these steps to wipe the build environment:Click on the Edit button of the version you want to wipe on the right side of the page. Go to the bottom of the page and click the wipe link, next to the “Save” buttonBy wiping the documentation build environment, all the rst, md, and code files associated with it will be removed but not the documentation already built (HTML and PDF files). Your documentation will still online after wiping the build environment.Now you can re-build the version with a fresh build environment!"
}
],
"domains": [
{
"role_name": "http:get",
"doc_name": "api/v3.html",
"anchor": "get--api-v3-users-(str-username)",
"type_display": "get",
"doc_display": "API v3",
"name": "/api/v3/users/(str:username)",
"display_name": ""
},
{
"role_name": "http:get",
"doc_name": "api/v3.html",
"anchor": "get--api-v3-projects-(string-project_slug)-versions-(string-version_slug)-",
"type_display": "get",
"doc_display": "API v3",
"name": "/api/v3/projects/(string:project_slug)/versions/(string:version_slug)/",
"display_name": ""
},
{
"role_name": "http:get",
"doc_name": "api/v3.html",
"anchor": "get--api-v3-projects-(string-project_slug)-versions-",
"type_display": "get",
"doc_display": "API v3",
"name": "/api/v3/projects/(string:project_slug)/versions/",
"display_name": ""
},
{
"role_name": "http:get",
"doc_name": "api/v3.html",
"anchor": "get--api-v3-projects-(string-project_slug)-",
"type_display": "get",
"doc_display": "API v3",
"name": "/api/v3/projects/(string:project_slug)/",
"display_name": ""
}
]
}
Loading