Skip to content

Read the Docs ignores custom 404 at /404/index.html with dirhtml #215

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
hugovk opened this issue Feb 2, 2023 · 6 comments
Closed

Read the Docs ignores custom 404 at /404/index.html with dirhtml #215

hugovk opened this issue Feb 2, 2023 · 6 comments
Assignees

Comments

@hugovk
Copy link
Contributor

hugovk commented Feb 2, 2023

Steps to reproduce

  1. Build docs using dirhtml with notfound_urls_prefix = "/" for a single version site that is deployed at the root (i.e. no /en/latest/)
  2. Visit some page that doesn't exist: https://cpython-devguide--1042.org.readthedocs.build/notfound

Actual result

Get the default maze:

image

Expected result

Get the custom 404 page.

More info

It seems that Read the Docs only supports a custom 404 page at 404.html?

https://docs.readthedocs.io/en/stable/hosting.html#custom-not-found-404-pages

And it looks like, with dirhtml, this extension builds the custom 404 page at 404/index.html, which Read the Docs ignores?

For example, I can see custom 404 page (returning a 200) at:

https://cpython-devguide--1042.org.readthedocs.build/404/index.html

image

See also

python/devguide#1042

@humitos
Copy link
Member

humitos commented Feb 3, 2023

Thanks for reporting this issue.

It seems it's a bug in Read the Docs itself. We do have a check in place for 404/index.html but for some reason it doesn't find it in your project.

https://github.com/readthedocs/readthedocs.org/blob/77781b122cbdc5c70f56d44722ef83ed1abd9ce6/readthedocs/proxito/views/serve.py#L380-L381

I guess it's because it's not detected as HTMLDIR. I'll research a little and come back with the solution... Hopefully 😅

@humitos humitos self-assigned this Feb 3, 2023
@github-project-automation github-project-automation bot moved this to Planned in 📍Roadmap Feb 3, 2023
@humitos
Copy link
Member

humitos commented Feb 3, 2023

My guess was correct. Since you are using build.commands (in beta), we don't really know what's the "documentation type" of that version of your documentation. Because of that, we set it as generic. Then, that line that I linked is skipped and 404/index.html file is not checked.

In [1]: p = Project.objects.get(slug='cpython-devguide')

In [4]: v = p.versions.get(slug='1042')

In [5]: v.documentation_type
Out[5]: 'generic'

I are not checking for that file always to avoid hitting S3 API and extra time for all the requests. However, I suppose we may need to re-consider this since we have support for more doctools now, and I don't think we can keep making the same assumption. @ericholscher what do you think?

humitos added a commit to readthedocs/readthedocs.org that referenced this issue Feb 3, 2023
With the introduction of `build.commands` we cannot use
`version.documentation_type` anymore since those versions will be `generic` and
we can't skip checking for this file location.

Note this commit may add and extra call to S3 API for all the 404 pages where
our regular Maze will be shown. However, it removes 2 databsae calls from all
the 404 requests.

We could only add this extra check on S3 for
`version.documentation_type='generic'`, but that would make the code a little
more complex and we won't be removing these 2 db queries.

Reference: readthedocs/sphinx-notfound-page#215 (comment)
@ericholscher
Copy link
Member

@humitos Yea, agreed. We likely need to be smarter here, but I really dislike adding more checks for this.

humitos added a commit to readthedocs/readthedocs.org that referenced this issue Feb 7, 2023
* Proxito: always check `404/index.hmtml`

With the introduction of `build.commands` we cannot use
`version.documentation_type` anymore since those versions will be `generic` and
we can't skip checking for this file location.

Note this commit may add and extra call to S3 API for all the 404 pages where
our regular Maze will be shown. However, it removes 2 databsae calls from all
the 404 requests.

We could only add this extra check on S3 for
`version.documentation_type='generic'`, but that would make the code a little
more complex and we won't be removing these 2 db queries.

Reference: readthedocs/sphinx-notfound-page#215 (comment)

* Docs: mention `404/index.html` in hosting docs

* Proxito: get the `Version` from inside `register_page_view`

* Proxit: get the version after checking for bot score

* Proxit: call `register_page_view` just once

* Log: index file (`index.html` or `README.html`)

I'm interested in `README.html` since we should probably deprecate that "feature"

* Tests: add extra checks to make tests pass

* Lint
@humitos
Copy link
Member

humitos commented Feb 7, 2023

This is already solved in readthedocs/readthedocs.org#9983 and it's going to be deploy today. Please, confirm this is working as you expected later today, or tomorrow 😄

@humitos humitos closed this as completed Feb 7, 2023
@github-project-automation github-project-automation bot moved this from Planned to Done in 📍Roadmap Feb 7, 2023
@humitos
Copy link
Member

humitos commented Feb 7, 2023

It works now! 🥳

Screenshot_2023-02-07_17-47-27

@hugovk
Copy link
Contributor Author

hugovk commented Feb 7, 2023

Brilliant, thank you for the quick fix!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
Archived in project
Development

No branches or pull requests

3 participants