Skip to content

Design doc: Better handling of docs URLs #9425

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 9 commits into from
Aug 31, 2022
Merged
Changes from 1 commit
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
113 changes: 109 additions & 4 deletions docs/dev/design/better-doc-urls-handling.rst
Original file line number Diff line number Diff line change
Expand Up @@ -7,12 +7,28 @@ to users and handling any other URLs from the user documentation domain.
The current implementation has some problems that are discussed in this document,
and an alternative implementation is proposed to solve those problems.

Goals
-----

* Removing reserved paths and ambiguities from URLs
* Simplifying our parsing logic for URLs

Non-goals
---------

* Allowing fully arbitrary URL generation for projects.

.. note::

This is taken into consideration in the current implementation,
but

Current implementation
----------------------

The current implementation is based on Django URLs
trying to match a pattern that looks like a single project, a versioned project,
or a subproject, this means that a couple or URLs are *reserved*,
or a subproject, this means that a couple of URLs are *reserved*,
and won't resolve to the correct file if it exists
(https://github.com/readthedocs/readthedocs.org/issues/8399, https://github.com/readthedocs/readthedocs.org/issues/2292),
this usually happens with single version projects.
Expand Down Expand Up @@ -287,7 +303,7 @@ Projects:
- project-es (latest, 1.0)
- subproject (latest)

.. list-table::
.. list-table::
:header-rows: 1

* - Request
Expand Down Expand Up @@ -331,7 +347,7 @@ Project with custom prefix

``project`` has the ``prefix`` prefix, and ``sub`` subproject prefix.

.. list-table::
.. list-table::
:header-rows: 1

* - Request
Expand Down Expand Up @@ -368,7 +384,7 @@ Project with custom subproject prefix (empty)
``project`` has the ``/`` subproject prefix,
this allow us to serve subprojects without using a prefix.

.. list-table::
.. list-table::
:header-rows: 1

* - Request
Expand All @@ -392,6 +408,95 @@ this allow us to serve subprojects without using a prefix.
- project
- The subproject/file doesn't exist

Code example
------------

This is a simplified version of the implementation.

.. code-block:: python

from readthedocs.projects.models import Project

LANGUAGES = {"es", "en"}


def resolve(canonical_project: Project, path_parts: list[str], check_subprojects=True):
# Multiversion project.
language = path_parts[0]
version_slug = path_parts[1]
if not canonical_project.single_version and language in LANGUAGES:
if canonical_project.language == language:
project = canonical_project
else:
project = canonical_project.translations.filter(language=language).first()
if project:
version = project.versions.filter(slug=version_slug).first()
if version:
return project, version, "/".join(path_parts[2:])
return project, None, None

# Subprojects.
prefix = path_parts[0]
project_slug = path_parts[1]
if check_subprojects and prefix == "projects":
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm confused with this line and the goal just added: "Removing reserved paths and ambiguities from URLs". I understand that we are not removing the reserved path /projects/. Am I misunderstanding the meaning of reserved path? Can you expand on the specifics of that goal?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

projects isn't reserved with this implementation, single version projects can use that path to serve files, see the last two examples from https://dev--9425.org.readthedocs.build/en/9425/design/better-doc-urls-handling.html#single-version-project-with-subprojects.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Considering the "Single version with subproject" example.

  • project is single version
  • it's build with HTMLDir URLs
  • it has a subproject called something
  • the something subproject is also single version

Now, let's say that we receive the URL /projects/something/:

  • file /projects/something/index.html exists in the parent project
  • subproject called something also has an index.html file at its root

What file will we serve? Whatever the answer be, these are the cases why I'm saying that we are not removing the "reserved path" and the design has these restrictions. It does not seem we can remove this ambiguity while keep using a "reserved path" to give URLs a different meaning. No matter the implementation.

Copy link
Member

@ericholscher ericholscher Jul 19, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yea, I am a little confused here too. I thought the goal of this project was to improve our handling to allow us to be smarter about these cases. It seemed in the design doc that custom prefixes were supported, is that not shown in this code example?

@humitos to your point, if we're serving subprojects there will always be some kind of overlap. You can't serve 2 different directories from the same path and not have an order that they are evaluated. I think that's fine, as long as we let users decide that the prefix is, instead of hard-coding projects.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@humitos to your point, if we're serving subprojects there will always be some kind of overlap. You can't serve 2 different directories from the same path and not have an order that they are evaluated. I think that's fine, as long as we let users decide that the prefix is, instead of hard-coding projects.

Exactly. That's what I'm saying.

The proposed implementation in this document is hard-coding /projects/ and that's why I raised this here.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It seemed in the design doc that custom prefixes were supported, is that not shown in this code example?

I thought that was part of the custom URLs part that we wanted to not support yet, I can update the code example to integrate the suffix.

project = canonical_project.subprojects.filter(alias=project_slug).first()
if project:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So this does only serve subprojects from matching directory if there is a matching subproject. That seems 💯

return resolve(
canonical_project=project,
path_parts=path_parts[2:],
check_subprojects=False,
)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this calling itself in a recursive way or it's calling the current resolve function we have now?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this is recursive, yes, guess this mimics more what we currently call unresolve, and what we call resolve is more like a reverse (from django.urls.reverse).


# Single project.
if canonical_project.single_version:
version = canonical_project.versions.filter(
slug=canonical_project.default_version
).first()
if version:
return canonical_project, version, "/".join(path_parts)
return canonical_project, None, None

return None, None, None


def view(canonical_project, path):
current_project, version, file = resolve(
canonical_project=canonical_project, path_parts=path.split("/")
)
if current_project and version:
return serve(current_project, version, file)

if current_project:
return serve_404(current_project)

return serve_404(canonical_project)


def serve_404(project, version=None):
pass


def serve(project, version, file):
pass


Number of db lookups, best case and worst case:

- A single version project:

- '/index.html': 1, the version!
- '/projects/index.html': 2, the version and one additional lookup for a path that looks like a subproject.

- A multi version project:

- '/en/latest/index.html': 1, the version!
- '/es/latest/index.html': 2, the translation and the version!
- '/br/latest/index.html': 1, the translation (it doesn't exist)

- A project with subprojects:

- '/projects/subproject/index.html': 2, the subproject and its version.

Questions
---------

Expand Down