Custom robots.txt support? #3161

agjohnson · 2017-10-12T23:21:55Z

We've talked about blowing away the protected designation, so not sure if it makes sense to put special case on the protected privacy level, but maybe a separate option for docs that shouldn't be crawled?

dend · 2018-03-29T22:41:36Z

@agjohnson any momentum on this particular item? What is the current recommendation to NOINDEX/NOFOLLOW a site?

agjohnson · 2018-09-19T21:23:18Z

At very least, we could kill our global robots.txt redirect in nginx and allow projects to contribute their own robots.txt via a static page in Sphinx

humitos · 2018-10-11T10:59:32Z

@agjohnson what's the status of this issue?

I'm not sure to clearly understand what's the action needed here.

if it's around Protected Privacy level, I think we can close it as won't fix since we are removing the privacy levels from Community site.
if it's about giving our users a way to upload by themselves a robots.txt I think the solution that I proposed at Avoid having old versions of the docs indexed by search engines #2430 (comment) should work (there is an example of a repository in that conversation also) and we can close this issue.

If none of those are what you have in mind, please elaborate a little more what you are considering here.

dasdachs · 2018-10-11T11:44:50Z

@humitos the solution provided in #2430 (comment) is not optimal:

Your site can have only one robots.txt file.

The robots.txt file must be located at the root of the website host that it applies to. For instance, to control crawling on all URLs below http://www.example.com/, the robots.txt file must be located at http://www.example.com/robots.txt. It cannot be placed in a subdirectory ( for example, at http://example.com/pages/robots.txt). If you're unsure about how to access your website root, or need permissions to do so, contact your web hosting service provider. If you can't access your website root, use an alternative blocking method such as meta tags.

Google support

I think the only viable option is using the "meta tags" method [1][2]. I am working on a workaround for Astropy's docs (refer to issue #7794 and pull request #7874).

I'll be done by the end of the day and will let you know. If it's a good workaround. I'd be happy to document the process.

humitos · 2018-10-11T14:08:26Z

@dasdachs I see. You are right.

I'll be done by the end of the day and will let you know. If it's a good workaround. I'd be happy to document the process.

If the workaround by using meta tags is a good one, maybe it's a good solution to be implemented by a sphinx extension. It's still a hack, but at least "an automatic one" 😬

After reading the docs you linked, I don't see a solution coming from Sphinx or without a hack, so I think we should implement this from Read the Docs itself by adding a robotstxt_file: option in our YAML (or similar) and copy it at the root of the subdomain. Not sure if that's possible, though.

humitos · 2018-10-11T14:38:28Z

I think we should implement this from Read the Docs itself by adding a robotstxt_file: option in our YAML

This is not trivial.

With that file, we will need to do:

append our own set of rules to the custom robots.txt
sync the result to all our web servers
- since this file will be outside Sphinx output, we need adapt that code
modify the nginx rule to try serving first the custom robots.txt from the project/version and as a fallback serve ours

This raise another problem: we have one subdomain with multiples versions but only one root place to serve the robots.txt file. Which one should we serve?

Being a "global setting" makes me doubt if it isn't better to add a text box in the admin where the user can paste the contents of that file or think something easier like that.

stsewd · 2018-10-11T15:25:04Z

I think we should implement this from Read the Docs itself by adding a robotstxt_file: option in our YAML

I doubt this will be on the yaml, as this is a per-project configuration rather than per-version

dasdachs · 2018-10-11T22:49:52Z

The hack I found could be quite simple (this): add meta tags to files you don't want indexed.
But because of the global robots.txt, it would have no affect (refering to this answer from Google). Some solution using YAML or a text box seems like the way to go.

astrofrog · 2018-10-18T16:21:49Z

Unfortunately, the idea of adding meta tags isn't really an ideal solution, because we can't add it to all the old versions we host. In the case of astropy for example, we host a lot of old versions based on GitHub tags, e.g.:

http://docs.astropy.org/en/v1.0/

We can't change all the tags in our GitHub repo for all the old versions, so any solution that involves changes to the repository are a no-go. The only real solution would be to be able to customize robots.txt from the RTD settings interface.

humitos · 2019-01-16T16:28:45Z

@dasdachs @astrofrog we just merged a PR that will allow to use a custom robots.txt. I will be deployed soon. Here are the docs: https://docs.readthedocs.io/en/latest/faq.html#how-can-i-avoid-search-results-having-a-deprecated-version-of-my-docs

Please, after the deploy and following the docs let us know if it works as you expected.

dasdachs · 2019-01-17T09:26:11Z

@humitos This is amazing. Thanks for the great work!

AmmaraAnis · 2020-04-22T12:32:30Z

What is the best way to add a custom robots.txt file and sitemap.xml file to a readthedocs.com external domain?

humitos · 2020-04-22T13:00:07Z

@AmmaraAnis Hi! For robots.txt you can read this FAQ at https://docs.readthedocs.io/en/latest/faq.html#how-can-i-avoid-search-results-having-a-deprecated-version-of-my-docs

Regarding, sitemap.xml there is no way to modify the default server at root yet (see #6938) although, you can change the Sitemap: entry in your robots.txt pointing to a custom one and that may work.

agjohnson added the Needed: design decision A core team decision is required label Oct 12, 2017

humitos mentioned this issue Dec 26, 2017

Robots.txt everytime is overwritten #2986

Closed

agjohnson added this to the Admin UX milestone Sep 19, 2018

agjohnson added the Accepted Accepted issue on our roadmap label Sep 19, 2018

saimn mentioned this issue Oct 10, 2018

Added robots.txt and updated .gitignore astropy/astropy#7874

Merged

humitos mentioned this issue Oct 11, 2018

Avoid having old versions of the docs indexed by search engines #2430

Closed

astrofrog mentioned this issue Oct 19, 2018

Consider using a robots.txt file to prevent search engines from indexing old docs astropy/astropy#7794

Closed

humitos mentioned this issue Jan 10, 2019

Support custom robots.txt #5086

Merged

humitos modified the milestones: Admin UX, File serving improvements Jan 15, 2019

humitos added the Feature New feature label Jan 15, 2019

humitos closed this as completed in #5086 Jan 16, 2019

nalimilan mentioned this issue Sep 27, 2020

Setting canonical URL by default JuliaDocs/Documenter.jl#1423

Open

snyk-bot mentioned this issue Dec 16, 2021

[Snyk] Fix for 8 vulnerabilities majacQ/readthedocs.org#75

Closed

snyk-bot mentioned this issue Aug 19, 2022

[Snyk] Fix for 19 vulnerabilities meonBot/readthedocs.org#4

Open

MarcelRaschke mentioned this issue Aug 19, 2022

[Snyk] Fix for 19 vulnerabilities 47-studio-org/readthedocs.org#43

Merged

snyk-bot mentioned this issue Aug 19, 2022

[Snyk] Fix for 19 vulnerabilities MarcelRaschke/readthedocs.org#45

Open

MarcelRaschke mentioned this issue Aug 19, 2022

[Snyk] Fix for 19 vulnerabilities devcode1981/readthedocs.org#25

Open

MarcelRaschke mentioned this issue Nov 2, 2022

[Snyk] Fix for 12 vulnerabilities 47-studio-org/readthedocs.org#58

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Custom robots.txt support? #3161

Custom robots.txt support? #3161

agjohnson commented Oct 12, 2017

dend commented Mar 29, 2018

agjohnson commented Sep 19, 2018

humitos commented Oct 11, 2018 •

edited

Loading

dasdachs commented Oct 11, 2018 •

edited

Loading

humitos commented Oct 11, 2018

humitos commented Oct 11, 2018 •

edited

Loading

stsewd commented Oct 11, 2018

dasdachs commented Oct 11, 2018

astrofrog commented Oct 18, 2018

humitos commented Jan 16, 2019

dasdachs commented Jan 17, 2019

AmmaraAnis commented Apr 22, 2020

humitos commented Apr 22, 2020

Custom robots.txt support? #3161

Custom robots.txt support? #3161

Comments

agjohnson commented Oct 12, 2017

dend commented Mar 29, 2018

agjohnson commented Sep 19, 2018

humitos commented Oct 11, 2018 • edited Loading

dasdachs commented Oct 11, 2018 • edited Loading

humitos commented Oct 11, 2018

humitos commented Oct 11, 2018 • edited Loading

stsewd commented Oct 11, 2018

dasdachs commented Oct 11, 2018

astrofrog commented Oct 18, 2018

humitos commented Jan 16, 2019

dasdachs commented Jan 17, 2019

AmmaraAnis commented Apr 22, 2020

humitos commented Apr 22, 2020

humitos commented Oct 11, 2018 •

edited

Loading

dasdachs commented Oct 11, 2018 •

edited

Loading

humitos commented Oct 11, 2018 •

edited

Loading