-
-
Notifications
You must be signed in to change notification settings - Fork 3.6k
Build: standardize output directory for artifacts #9888
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Always expose the final artifacts under `_readthedocs/<format>` directory, where `<format>` can be one of the following: - html - htmlzip - pdf - epub - json This allows all users of the platform to have the same behavior. No matter if they are using the default Sphinx/MkDocs builders or a custom one by using `build.commands`. Besides, this commit removes the "intermediary" directory used _before_ uploading the files to S3. This allows users to modify the output files directly from `_readthedocs/<format>` using `build.jobs.post_build`, for example. - `BaseBuilder.type` is removed since it was not required anymore. - There is no `old_artifact_path` anymore since there is no intermediary folder. - The `sphinx-build` command is executed from path where the repository is cloned, instead from the the directory where the `conf.py` lives. This is to simplify the code and be able to use relative path for the build output directory (e.g. `sphinx-build -b html ... docs/ _readthedocs/html`). - A new `BaseSphinx._post_build` method was added to cleanup the output directory (leave only one .epub, .html and .pdf file before moving forward). - These changes should allow us to support _all formats_ for `build.commands`. We would need to check if these standard directories exist, set these formats in the version and upload their content. - Upload step now validate there is one and only one PDF, ePUB and ZIP file when they are built. Currently, we only support one file per version. In the future we could remove this restriction. It fails silently for now, which is not great. We need to refactor this code a little to fail the build properly and communicate this to the user. I think it's _the correct way_ of doing it. - This could also allow to support other ways to build PDF (SimplePDF or `tectonic`, see #9598) and also ZIP (e.g. zundler, see readthedocs/meta#77 (comment)). Closes #9179
db78cdf
to
faec2a2
Compare
This one scares me a little bit, as I mentioned in the call. Users are doing lots of weird things with the working directory in
This is a nice improvement. Silent failure is definitely not a great UX in general, and I'm glad we're working to remove these as we go. I think the next step here would probably be supporting a |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm very excited about this change, and think it's a great one. Other than my 2 comments above, I just had a few small nitpicks as I read through things. Feel free to disregard some of them, I am just a bit hesitent to change too much all at once, given the side of this already underlying change.
Really think this is a great step forward overall though. We should coordinate this change w/ the rclone changes @stsewd is making. I don't think they'll conflict too badly, but definitely changing a lot of similar code.
safe_rmtree(self.old_artifact_path) | ||
log.info('Removing old artifact path.', path=self.old_artifact_path) | ||
def _post_build(self): | ||
"""Execute extra steps (e.g. create ZIP, rename PDF, etc) after building if required.""" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Does Python support empty function bodies?!
I generally still think it's cleaner to have a pass
or raise NotImplementedError
here to be more explicit.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
He, yes. It does. I usually add a pass
as well. However, it seems that Black removed it. We cannot have a raise NotImplementedError
because it's always called by the builder and some of the subclass don't implement it.
I'm not sure if there is a way to tell Black that I want to keep the pass
in this line. That's probably what we need here.
log.info( | ||
"Store build artifacts finished.", | ||
time=(timezone.now() - time_before_store_build_artifacts).seconds, | ||
) | ||
|
||
# NOTE: we are updating the db version instance *only* when | ||
# HTML are built successfully. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Seems we deleted half this comment. Is it not true anymore?
Ah. I see it got missed in a comment refactor. This line needs to be moved down below :)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I got confused by this comment. I'm not sure if there is something I have to do here or not 😅
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, the comment was 2 lines, and only 1 line was moved a few lines down. The comment is basically saying we only hit the API when html
passes, which is medium-weird, but we don't want to change with this refactor.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
which is medium-weird, but we don't want to change with this refactor
Yes, this is wrong, in my opinion. I noticed this when I did the Celery refactor. This is something I want to change soon, if possible.
This doesn't conflict with the rclone PR. |
This is definitely a good point. I can revert this change, but it will produce different relative output paths depending on the location of the
Note we will be keeping the In theory, people should be modifying the |
I prefer supporting uploading and serving multiple In summary, I want both of your suggestions 😄 |
I proposed to changed this CWD to be the one where the repository was cloned, but that could lead to some unexpected behavior. So, it's better to be conservative here and keep the CWD as it was. There are some small downsides for this decision, but it's better than breaking people's builds :)
@ericholscher I addressed most of the feedback (thanks!). I reverted the CWD change and I added an exception to raise when there is no PDF found. I repeated some QA locally and it seems that things keep working fine. Note that I tried to make incremental commits to allow easier reviews. Please, let me know if there are more changes to be made on this PR. If there are things we want to address later, we can create issues otherwise. |
I really like this idea of
Then, the Sphinx command executed would be:
I think it's pretty clear, short and easy to understand 👍🏼 . It also doesn't change the contract we have with Footnotes
|
Co-authored-by: Eric Holscher <[email protected]>
Co-authored-by: Eric Holscher <[email protected]>
I opened #9913 to continue the work making the output path an environment variable because I found it will require some more extra work, and I'd like to not block this particular because of that. I'm happy to merge this PR exposing the path as relative now (we are not changing the behavior here since it was the way it currently works) and make the improvement on being explicit in the other PR with fresh eyes and deeper research about why it's not working as expected. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this looks pretty much ready. It's definitely something we should be testing heavily during QA steps, and ahead of deploy if we have the energy. Really excited to see it go out!
# IMO, if there are multiple config files, | ||
# the build should fail immediately communicating this to the user. | ||
# This can be achived by unhandle the exception here | ||
# and leaving `on_failure` Celery handle to deal with it. | ||
# | ||
# In case there is no config file, we should continue the build | ||
# because Read the Docs will automatically create one for it. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This seems reasonable. Is there a good way to implement the suggested logic? I guess raise a different exception for multiple configs that we raise
(or just don't catch)?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes. I documented this here in this way because I want to change the internal behavior and also communicate these errors better to the user. I'm opening a new issue with all these findings to continue working after this, since it will require some refactor and more design decisions.
@ericholscher thanks for all these reviews 👍🏼 . I opened #9918 with all the notes we took from the conversations on this PR to not forget them and decide which of them we want to prioritize. I tried to link as many thing there as I could. I'm going to merge this PR now and ping the rest of the team so they can do some QA locally before deploying it next week. |
Always expose the final artifacts under
_readthedocs/<format>
directory, where<format>
can be one of the following:This allows all users of the platform to have the same behavior. No matter if they are using the default Sphinx/MkDocs builders or a custom one by using
build.commands
.Besides, this commit removes the "intermediary" directory used before uploading the files to S3. This allows users to modify the output files directly from
_readthedocs/<format>
usingbuild.jobs.post_build
, for example.Some important notes
BaseBuilder.type
is removed since it was not required anymore.old_artifact_path
anymore since there is no intermediary folder.sphinx-build
command is executed from path where the repository is cloned, instead from the the directory where theconf.py
lives. This is to simplify the code and be able to use relative path for the build output directory (e.g.sphinx-build -b html ... docs/ _readthedocs/html
).BaseSphinx._post_build
method was added to cleanup the output directory (leave only one .epub, .html and .pdf file before moving forward).build.commands
. We would need to check if these standard directories exist, set these formats in the version and upload their content.tectonic
, see PDF builder:tectonic
#9598) and also ZIP (e.g. zundler, see https://github.com/readthedocs/meta/issues/77#issuecomment-1301927685).Use cases manually tested from
test-builds
mkdocs-python-tags
build-commands
all-formats
latest
build-jobs-post-build
singlehtml
htmldir
Closes #9179