Skip to content

Design doc: forward path to a future builder #8190

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 5 commits into from
May 24, 2022
Merged
Changes from 1 commit
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
230 changes: 230 additions & 0 deletions docs/development/design/future-builder.rst
Original file line number Diff line number Diff line change
@@ -0,0 +1,230 @@
Future Builder
==============

.. contents::
:local:
:depth: 2

This document is a continuation of Santos' work about "`Explicit Builders`_".
It builds on top of that document some extra features and makes some decisions about the final goal,
proposing a clear direction to move forward with intermediate steps keeping backward and forward compatibility.

.. _Explicit Builders: https://github.com/readthedocs/readthedocs.org/pull/8103/


Goals
-----

* Keep the current builder working as-is
* Keep backward and forward (with intermediate steps) compatibility
* Define a clear support for newbie, intermediate and advanced users
* Allow users to override a command, run pre/post hook commands or define all commands by themselves
Copy link
Member Author

@humitos humitos Sep 15, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Note there are people already doing this in hacky ways: #6662 (comment)

* Remove the Read the Docs requirement of having access to the build process
* Translate our current magic at build time to a defined contract with the user
* Provide a way to add a command argument without implementing it as a config file (e.g. ``fail_on_warning``)
* Define a path forward towards supporting other tools
* Re-write all ``readthedocs-sphinx-ext`` features to post-processsing HTML features
* Reduce complexity maintained by Read the Docs' core team
* Make Read the Docs responsible for Sphinx support and delegate other tools to the community
* Eventually support upload pre-build docs
* Allow us to add a feature with a defined contract without worry about breaking old builds
* Introduce ``build.builder: 2`` config (does not install pre-defined packages) for these new features
* Motivate users to migrate to ``v2`` to finally deprecate this magic by educating users


Steps ran by the builder
------------------------

Read the Docs currently controls all the build process.
Users are only allowed to modify very limited behavior by using a ``.readthedocs.yaml`` file.
This drove us to implement features like ``sphinx.fail_on_warning``, ``submodules``, among others,
at a high implementation and maintenance cost to the core team.
Besides, this hasn't been enough for more advanced users that require more control over these commands.

This document proposes to clearly define the steps the builder ran and allow users to override them
depending on their needings:

- Newbie user / simple platform usage: Read the Docs controls all the commands (current builder)
- Intermediate user: ability to override one or more commands plus running pre/post hooks
- Advanced user: controls *all the commands* executed by the builder

The steps identified so far are:

#. Checkout
#. Expose project data via environment variables (\*)
#. Create environment (virtualenv / conda)
#. Install dependencies
#. Build documentation
#. Generate defined contract (``metadata.yaml``)
#. Post-process HTML (\*)
#. Upload to storage (\*)

Steps marked with *(\*)* are managed by Read the Docs and can't be overwritten.


Defined contract
----------------

Projects building on Read the Docs must provide a ``metadata.yaml`` file after running their last command.
This file contains all the data required by Read the Docs to be able to add its integrations.
If this file is not provided or malformed, Read the Docs will fail the build and stop the process
communicating to the user that there was a problem with the ``metadata.yaml`` and we require them to fix the problem.

.. note::

There is no restriction about how this file is generated
(e.g. generated with Python, Bash, statically uploaded to the repository, etc)
Read the Docs does not have control over it and it's only responsible for generating it when building with Sphinx.


The following is an example of a ``metadata.yaml`` that is generated by Read the Docs when building Sphinx documentation:

.. code:: yaml

# metadata.yaml
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Feels like all this information can be in the rtd config file, I don't see why you would want to make this dynamically generated to change at runtime.

Copy link
Member Author

@humitos humitos May 20, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I put some static data here just as an example. However, I'm reserving this file to act as a place to specify dynamic metadata that's only known at build time by the doctool: something that it's known after the tool has built the documentation or while it's building it.

I'm proposing to create the pipeline in a generic way to be prepared for these cases. It may not be needed to generate this file dynamically for some tools/users but it must be required for other tools (*).

Besides, I want to separate it from readthedocs.yaml since I imagine this could eventually be useful for uploading pre-built documentation where you upload a project-html.zip + metadata.yaml and Read the Docs gets everything needed from the metadata.yaml without the needing of a readthedocs.yaml (which is currently only useful to configure the build process and we won't have to build the docs in this case).

I also envision readthedocs.yaml as a way to "configure the build process in the platform" (in the future we may be able to configure other platform features than just the build, tho). On the other hand, I see metadata.yaml as an "integration point between the build's output and Read the Docs' features/integrations (flyout, search, warning banners, hosting, etc)"

(*) For example, the PDF filename in Sphinx is generated based on the latex_documents config (and -jobname argument in latexmk command) and it would be handy to modify it in one place and keep everything working without manually updating the readthedocs.pdf_output field in the metadata.yaml as well.

version: 1
tool:
name: sphinx
version: 3.5.1
builder: html
readthedocs:
html_output: ./_build/html/
pdf_output: ./_build/pdf/myproject.pdf
epub_output: ./_build/pdf/myproject.epub
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We could simplify this structure to be like the following to support other formats like man (see #4458):

readthedocs:
  output:
    html:
    pdf:
    epub:
    man:

Besides, each of these keys could also be a list so we can support multiple PDF output (see #2045):

readthedocs:
  output:
    pdf:
      - ./_build/pdf/myproject.pdf
      - ./_build/pdf/tutorial.pdf
      - ...

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1 on both counts. I like the nested structure more and the list of files is a feature I've wanted for a while.

search:
enabled: true
css_identifier: #search-form > input[name="q"]
analytics: false
flyout: false
canonical: docs.myproject.com
language: en

.. warning::

The ``metadata.yaml`` contract is not defined yet.
This is just an example of what we could expect from it to be able to add our integrations.


Config file
-----------

As we mentioned, we want all users to use the same config file and have a clear way to override commands as they need.
This will be done by using the current ``.readthedocs.yaml`` file that we already have by adding two new keys:
``build.jobs`` and ``build.commands``.

If neither ``build.jobs`` or ``build.commands`` are present in the config file,
Read the Docs will execute the builder we currently support without modification,
keeping compatibility with all projects already building successfully.

When users make usage of ``jobs:`` or ``commands:`` keys we are not responsible for them in case they fail.
In these cases, we only check for a ``metadata.yaml`` file and run our code to add the integrations.


``build.jobs``
~~~~~~~~~~~~~~

It allows users to execute one or multiple pre/post hooks and/or overwrite one or multiple commands.
These are some examples where this is useful:

- User wants to pass an extra argument to ``sphinx-build``
- Project requires to execute a command *before* building
- User has a personal/private PyPI URL
- etc

.. code:: yaml

# .readthedocs.yaml
build:
builder: 2
jobs:
pre_checkout:
checkout: git clone --branch main https://github.com/readthedocs/readthedocs.org
post_checkout:
pre_create_environment:
create_environment: python -m virtualenv venv
post_create_environment:
pre_install:
install: pip install -r requirements.txt
post_install:
pre_build:
build:
html: sphinx-build -T -j auto -E -b html -d _build/doctrees -D language=en . _build/html
pdf: latexmk -r latexmkrc -pdf -f -dvi- -ps- -jobname=test-builds -interaction=nonstopmode
epub: sphinx -T -j auto -b epub -d _build/doctrees -D language=en . _build/epub
post_build:
pre_metadata:
metadata: ./metadata_sphinx.py
post_medatada:


.. note::

*All these commands* are executed passing all the exposed environment variables.

If the user only provides a subset of these jobs, we ran our default commands if the user does not provide them
(see :ref:`Step ran by the builder`).
For example, the following YAML is enough when the project requires running Doxygen as a pre-build step:

.. code:: yaml

# .readthedocs.yaml
build:
builder: 2
jobs:
# https://breathe.readthedocs.io/en/latest/readthedocs.html#generating-doxygen-xml-files
pre_build: cd ../doxygen; doxygen


``build.commands``
~~~~~~~~~~~~~~~~~~

It allows users to have full control over the commands executed in the build process.
These are some examples where this is useful:

- project with a custom build process that does map ours
- specific requirements that we can't/want to cover as a general rule
- build documentation with a different tool than Sphinx


.. code:: yaml

# .readthedocs.yaml
build:
builder: 2
commands:
- git clone --branch main https://github.com/readthedocs/readthedocs.org
- pip install -r requirements.txt
- sphinx-build -T -j auto -E -b html -d _build/doctrees -D language=en . _build/html
- ./metadata.py


Intermediate steps for rollout
------------------------------

#. Remove all the exposed data in the ``conf.py.tmpl`` file and move it to ``metadata.yaml``
#. Define structure required for ``metadata.yaml`` as contract
#. Define the environment variables required (e.g. some from ``html_context``) and execute all commands with them
#. Build documentation using this contract
#. Leave ``readthedocs-sphinx-ext`` as the only package installed and extension install in ``conf.py.tmpl``
#. Add ``build.builder: 2`` config without any *magic*
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not really sold on the idea of versioning this, as it would be confusing having two versions (one for the config file and other for how the docs are build?). Also, I'd call it build.version as builder: 2 looks like you are requiring more builders.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, I understand that it may be confusing having the global version and build.version. However, we need a way to differentiate the "builder that does some magic" (current behavior) from the "builder that does nothing". What do you think could be better for this?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just to resolve this conversation, I believe we settled on not versioning this. If the user does nothing, they get our normal build commands. If the user wants to alter the pre/post hooks, they use build.jobs, and if the user needs to override a build command, they are explicitly opting into our new builder. There doesn't need to be a version in this case.

#. Build everything needed to support ``build.jobs`` and ``build.commands`` keys
#. Write guides about how to use the new keys
#. Re-write ``readthedocs-sphinx-ext`` features to post-process HTML features


Final notes
-----------

- The migration path from ``v1`` to ``v2`` will require users to explicitly specify their requirements
(we don't install pre-defined packages anymore)
- We probably not want to support ``build.jobs`` on ``v1`` to reduce core team's time maintaining that code
without the ability to update it due to projects randomly breaking.
- We would be able to start building documentation using new tools without having to *integrate them*.
- Building on Read the Docs with a new tool will require:
- the user to execute a different set of commands by overriding the defaults.
- the project/build/user to expose a ``metadata.yaml`` with the contract that Read the Docs expects.
- none, some or all the integrations will be added to the HTML output (these have to be implemented at Read the Docs core)
- We are not responsible for extra formats (e.g. PDF, ePub, etc) on other tools.
- Focus on support Sphinx with nice integrations made in a tool-agnostic way that can be re-used.
- Removing the manipulation of ``conf.py.tmpl`` does not require us to implement the same manipulation
for projects using the new potential feature ``sphinx.yaml`` file.