-
-
Notifications
You must be signed in to change notification settings - Fork 3.6k
Search: allow ignoring files from indexing #7308
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
# 'genindex.html', | ||
# 'py-modindex.html', | ||
# 'search/index.html', | ||
# 'genindex/index.html', | ||
# 'py-modindex/index.html', |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not sure if we should add these to the defaults too.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I mean, they are sphinx only
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This feature is super cool! 💯
'404.html', | ||
'404/index.html', | ||
] | ||
search_ignore = self.pop_config('search.ignore', ignore_default) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If the user adds a page, ignoreme.html
, are we avoiding the ones from the list ignore_default
? Shouldn't we combine the user list with the default list?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think is a good idea to combine them, some users may want to override this in order to get rid of the defaults.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think users will want results from 404 or search pages.
With the current behavior, adding just one page to be ignored means adding 5 entries in the config file: your page and the defaults.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You can still have valid content in 404/index.html
and in search/index.html
. 404.html and search.html are pages that kind of always don't have valid content. We shouldn't lock users from not ignoring defaults.
# Ignore all files under the search/ directory | ||
- search/* |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What happen with some/path/search/index.html
? is it ignored or not? Using relative paths here may bring this confusion to users. We could make this clear here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
all patterns are matched from the beginning
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
you can start a path with /
in the config, but it's the same as one without it. Using an absolute path like saying it will ignore all from /search/
isn't quite right either, since the file probably is at /en/latest/search/..
So, not sure how to make this example more clear.
:Default: ``['search.html', 'search/index.html', '404.html', '404/index.html']`` | ||
|
||
Patterns are matched against the final html pages produced by the build | ||
(you should try to match `index.html`, not `docs/index.rst`). |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How is the pattern to use if the project is built with pretty URLs? /
for index? /path/to/page/
for another page? This may be explained in the docs as well.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
From the defaults, I would guess that if I want to ignore /404/
, I will need to put /404/index.html
. If that's correct, we definitely need to explain this because it will bring lot of confusions.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Also, having the URL disassociated from the path you need to put here is not good UX, IMO. However, I'm not sure how we can improve that. Is is possible to just use URLs here instead of path files?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We index content from paths, so makes sense to use paths, as we do with rankings.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We index content from paths, so makes sense to use paths
I agree that makes sense from our side. I'm thinking if it's the same from the user's perspective. They see a URL like /en/latest/404/
and they need to know that 404/index.html
is the correct value, which looks completely different than the URL.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't thing is worth it to try to guess the correct path, that introduces a lot of problems rather than helping, we already ask the users to match the final path rather than the source file.
('/foo/bar', 'foo/bar'), | ||
('///foo//bar', 'foo/bar'), | ||
('///foo//bar/', 'foo/bar'), | ||
('/foo/bar/../', 'foo'), | ||
('/foo*', 'foo*'), | ||
('/foo/bar/*', 'foo/bar/*'), | ||
('/foo/bar?/*', 'foo/bar?/*'), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would make a decision here and support only one case: with or without starting /
, but not both. Having to values for the same config may only confuse users: /foo/bar
will work and foo/bar
too. Both will have the same effect, but I would be expecting different things as a user.
I think it makes more sense to use relative paths here, considering that /en/latest/
is not taken into account here. What do you think?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't see that confusing, it would be annoying to fail the build just because you included or not /
'404.html', | ||
'404/index.html', | ||
] | ||
search_ignore = self.pop_config('search.ignore', ignore_default) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think users will want results from 404 or search pages.
With the current behavior, adding just one page to be ignored means adding 5 entries in the config file: your page and the defaults.
Don't index files matching a pattern. | ||
This is, you won't see search results from these files. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reading the code, I guess that the ignored page will affect only the version that has this config file, right? If that's correct, we should probably communicate this in the documentation.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
All options from the configuration file are per-version.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Are we mentioning that somewhere? That's my point.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We do in https://docs.readthedocs.io/en/stable/config-file/index.html, but I can see helpful mentioning it again at the top of this file.
This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions. |
We should try and get this merged, it looks useful. |
@ericholscher you may want to give your opinion at #7308 (comment) and #7308 (comment). Other than that, I think this is ready to be merged, IMO |
We already use the same for search rankings, so I think we should use the same for ignoring files. |
OK. That makes sense, there is no need to have two different ways to express similar things. |
We can't stop creating this models since we need them for the CDN, so I had to add a
ignore
field.Closes #5247
Ref #7217