-
-
Notifications
You must be signed in to change notification settings - Fork 3.6k
Search: allow ignoring files from indexing #7308
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Changes from all commits
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -468,6 +468,8 @@ Settings for more control over :doc:`/server-side-search`. | |
ranking: | ||
api/v1/*: -1 | ||
api/v2/*: 4 | ||
ignore: | ||
- 404.html | ||
|
||
search.ranking | ||
`````````````` | ||
|
@@ -490,14 +492,17 @@ Pages with a rank closer to -10 will appear further down the list of results, | |
and pages with a rank closer to 10 will appear higher in the list of results. | ||
Note that 0 means *normal rank*, not *no rank*. | ||
|
||
If you are looking to completely ignore a page, | ||
check :ref:`config-file/v2:search.ignore`. | ||
|
||
.. code-block:: yaml | ||
|
||
version: 2 | ||
|
||
search: | ||
ranking: | ||
# Match a single file | ||
tutorial.hml: 2 | ||
tutorial.html: 2 | ||
|
||
# Match all files under the api/v1 directory | ||
api/v1/*: -5 | ||
|
@@ -514,6 +519,43 @@ Note that 0 means *normal rank*, not *no rank*. | |
Is better to decrease the rank of pages you want to deprecate, | ||
rather than increasing the rank of the other pages. | ||
|
||
search.ignore | ||
````````````` | ||
|
||
Don't index files matching a pattern. | ||
This is, you won't see search results from these files. | ||
|
||
:Type: ``list`` of patterns | ||
:Default: ``['search.html', 'search/index.html', '404.html', '404/index.html']`` | ||
|
||
Patterns are matched against the final html pages produced by the build | ||
(you should try to match `index.html`, not `docs/index.rst`). | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. How is the pattern to use if the project is built with pretty URLs? There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. From the defaults, I would guess that if I want to ignore There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Also, having the URL disassociated from the path you need to put here is not good UX, IMO. However, I'm not sure how we can improve that. Is is possible to just use URLs here instead of path files? There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. We index content from paths, so makes sense to use paths, as we do with rankings. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
I agree that makes sense from our side. I'm thinking if it's the same from the user's perspective. They see a URL like There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I don't thing is worth it to try to guess the correct path, that introduces a lot of problems rather than helping, we already ask the users to match the final path rather than the source file. |
||
Patterns can include some special characters: | ||
|
||
- ``*`` matches everything | ||
- ``?`` matches any single character | ||
- ``[seq]`` matches any character in ``seq`` | ||
|
||
.. code-block:: yaml | ||
|
||
version: 2 | ||
|
||
search: | ||
ignore: | ||
# Ignore a single file | ||
- 404.html | ||
|
||
# Ignore all files under the search/ directory | ||
- search/* | ||
Comment on lines
+548
to
+549
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. What happen with There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. all patterns are matched from the beginning There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. you can start a path with So, not sure how to make this example more clear. |
||
|
||
# Ignore all files that end with ref.html | ||
- '*/ref.html' | ||
stsewd marked this conversation as resolved.
Show resolved
Hide resolved
|
||
|
||
.. note:: | ||
|
||
Since Read the Docs fallbacks to the original search engine when no results are found, | ||
you may still see search results from ignored pages. | ||
|
||
Schema | ||
------ | ||
|
||
|
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -656,7 +656,7 @@ def submodules(self): | |
|
||
@property | ||
def search(self): | ||
return Search(ranking={}) | ||
return Search(ranking={}, ignore=[]) | ||
|
||
|
||
class BuildConfigV2(BuildConfigBase): | ||
|
@@ -1023,7 +1023,8 @@ def validate_search(self): | |
""" | ||
Validates the search key. | ||
|
||
- Ranking is a map of path patterns to a rank. | ||
- ``ranking`` is a map of path patterns to a rank. | ||
- ``ignore`` is a list of patterns. | ||
- The path pattern supports basic globs (*, ?, [seq]). | ||
- The rank can be a integer number between -10 and 10. | ||
""" | ||
|
@@ -1046,6 +1047,22 @@ def validate_search(self): | |
|
||
search['ranking'] = final_ranking | ||
|
||
with self.catch_validation_error('search.ignore'): | ||
ignore_default = [ | ||
'search.html', | ||
'search/index.html', | ||
'404.html', | ||
'404/index.html', | ||
] | ||
search_ignore = self.pop_config('search.ignore', ignore_default) | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. If the user adds a page, There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I don't think is a good idea to combine them, some users may want to override this in order to get rid of the defaults. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I don't think users will want results from 404 or search pages. With the current behavior, adding just one page to be ignored means adding 5 entries in the config file: your page and the defaults. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. You can still have valid content in |
||
validate_list(search_ignore) | ||
|
||
final_ignore = [ | ||
validate_path_pattern(pattern) | ||
for pattern in search_ignore | ||
] | ||
search['ignore'] = final_ignore | ||
|
||
return search | ||
|
||
def validate_keys(self): | ||
|
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -72,4 +72,4 @@ class Submodules(Base): | |
|
||
class Search(Base): | ||
|
||
__slots__ = ('ranking',) | ||
__slots__ = ('ranking', 'ignore') |
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -767,6 +767,7 @@ def test_as_dict(tmpdir): | |
}, | ||
'search': { | ||
'ranking': {}, | ||
'ignore': [], | ||
}, | ||
} | ||
assert build.as_dict() == expected_dict | ||
|
@@ -1908,6 +1909,45 @@ def test_search_ranking_normilize_path(self, path, expected): | |
build.validate() | ||
assert build.search.ranking == {expected: 1} | ||
|
||
@pytest.mark.parametrize( | ||
'value', | ||
[ | ||
'invalid', | ||
True, | ||
0, | ||
[2, 3], | ||
{'foo/bar': 11}, | ||
], | ||
) | ||
def test_search_ignore_invalid_type(self, value): | ||
build = self.get_build_config({ | ||
'search': {'ignore': value}, | ||
}) | ||
with raises(InvalidConfig) as excinfo: | ||
build.validate() | ||
assert excinfo.value.key == 'search.ignore' | ||
|
||
@pytest.mark.parametrize('path, expected', [ | ||
('/foo/bar', 'foo/bar'), | ||
('///foo//bar', 'foo/bar'), | ||
('///foo//bar/', 'foo/bar'), | ||
('/foo/bar/../', 'foo'), | ||
('/foo*', 'foo*'), | ||
('/foo/bar/*', 'foo/bar/*'), | ||
('/foo/bar?/*', 'foo/bar?/*'), | ||
Comment on lines
+1931
to
+1937
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I would make a decision here and support only one case: with or without starting I think it makes more sense to use relative paths here, considering that There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I don't see that confusing, it would be annoying to fail the build just because you included or not |
||
('foo/[bc]ar/*/', 'foo/[bc]ar/*'), | ||
('*', '*'), | ||
('index.html', 'index.html'), | ||
]) | ||
def test_search_ignore_valid_type(self, path, expected): | ||
build = self.get_build_config({ | ||
'search': { | ||
'ignore': [path], | ||
}, | ||
}) | ||
build.validate() | ||
assert build.search.ignore == [expected] | ||
|
||
@pytest.mark.parametrize('value,key', [ | ||
({'typo': 'something'}, 'typo'), | ||
( | ||
|
@@ -2048,6 +2088,12 @@ def test_as_dict(self, tmpdir): | |
}, | ||
'search': { | ||
'ranking': {}, | ||
'ignore': [ | ||
'search.html', | ||
'search/index.html', | ||
'404.html', | ||
'404/index.html', | ||
], | ||
}, | ||
} | ||
assert build.as_dict() == expected_dict |
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,18 @@ | ||
# Generated by Django 2.2.12 on 2020-07-21 18:21 | ||
|
||
from django.db import migrations, models | ||
|
||
|
||
class Migration(migrations.Migration): | ||
|
||
dependencies = [ | ||
('projects', '0060_make_rank_not_null'), | ||
] | ||
|
||
operations = [ | ||
migrations.AddField( | ||
model_name='importedfile', | ||
name='ignore', | ||
field=models.BooleanField(null=True, verbose_name='Ignore this file from operations like indexing'), | ||
), | ||
] |
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -140,23 +140,16 @@ def prepare_domains(self, html_file): | |
return all_domains | ||
|
||
def get_queryset(self): | ||
"""Overwrite default queryset to filter certain files to index.""" | ||
queryset = super().get_queryset() | ||
|
||
# Do not index files from external versions | ||
queryset = queryset.internal().all() | ||
|
||
# TODO: Make this smarter | ||
# This was causing issues excluding some valid user documentation pages | ||
# excluded_files = [ | ||
# 'search.html', | ||
# 'genindex.html', | ||
# 'py-modindex.html', | ||
# 'search/index.html', | ||
# 'genindex/index.html', | ||
# 'py-modindex/index.html', | ||
Comment on lines
-153
to
-157
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Not sure if we should add these to the defaults too. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I mean, they are sphinx only |
||
# ] | ||
# for ending in excluded_files: | ||
# queryset = queryset.exclude(path=ending) | ||
""" | ||
Ignore certain files from indexing. | ||
|
||
- Files from external versions | ||
- Ignored files | ||
""" | ||
queryset = super().get_queryset() | ||
queryset = ( | ||
queryset | ||
.internal() | ||
.exclude(ignore=True) | ||
) | ||
return queryset |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reading the code, I guess that the ignored page will affect only the version that has this config file, right? If that's correct, we should probably communicate this in the documentation.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
All options from the configuration file are per-version.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Are we mentioning that somewhere? That's my point.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We do in https://docs.readthedocs.io/en/stable/config-file/index.html, but I can see helpful mentioning it again at the top of this file.