Restructure of Docker command execution and large build refactor #1524

agjohnson · 2015-07-31T19:16:14Z

Note: this is ongoing, review for design decisions for now

I was not happy with the pattern used to execute commands in containers. The original flow for executing tasks in Docker was:

Get build/project/version state from API
Serialize JSON
Execute django admin command in docker, pass JSON into forked docker command via STDIN, docker command pipes to django admin command
Django admin command deserialized JSON
Execute build task from admin command, passing in project/version data
Build task executes
Serialize response, output STDOUT from django admin
docker command returns STDOUT to original task

This was deficient is several ways:

It required our full stack in a container - this was a very large container that needed to be re-rolled every deploy
It is very fragile, requiring several layers of piping STDIN/STDOUT through commands.
Breakage was difficult to track down when it did happen

So, this aimed to tackle that and turned into a larger refactor, as we could be more intelligent about how we were executing commands. My first inclination was to use proper IPC to make this less fragile, but @ericholscher brought up the point that we would treat the container as a dumb container as well. The goals of this PR will be:

Wrap all command execution in stateful Environment objects, so all forked commands are tracked.
A LocalEnvironment provides local command execution
Remove the need for shell=True command execution
Remove passing around results object in task -- Environment object tracks whether a command failed and has state on output/etc
Make docker image a lightweight shell, with only python dependencies
Docker commands execute through exec mechanism instead of a single command
Handle all docker commands through docker-py
Docker commands have no stdin/stdout, so output will be combined. This seems like a good time to drop the weird pattern of splitting stderr/stdout. I have thoughts on how to finally get combined stdout/stderr with formatting on stderr, but it's a surprisingly hard problem to solve.

Side effects of this refactor so far include:

Reduce duplicated code and build complexity greatly by removing magical PDF builds. Refactoring the commands executed, we were duplicating efforts to create a secondary PDF build. This build is ofter confusing for users, and shouldn't exist.
Put all commands for PDF/ePub builds in main build, fail task but don't halt on these errors. This will take some handling of exceptions during the build process and rewrap them if they are critical
Build output page is now a list of commands and output, not broken up stdout/stderr per build stage
No more build "steps" -- I'm still contemplating whether this would be useful, or if it would be useful in a differently applied method than it is now.
Task update_docs is now a subclassed Task object, allowing us to track state for the task. This could also be a separate class, but seemed more applicable as a full Task object with build/version/project state.

Finally, some of the design decisions here:

Environments (LocalEnvironment/DockerEnvironment) are used to track build state, instead of passing around results and other states through method calls on project build
Commands no longer require running in a with context
Environments create these command objects, tracking and executing them
Environment wrap all calls with exception handling. If an unhandled exception bubbles up to the Environment it is reported.
A special exception is used to report errors to the user via builds page, BuildEnvironmentError. This will add error state to the command on the build page

Thoughts on where to take this next, and I think this work sets us up for an easy transition:

Per-command output tracking -- this might make sense to include here, as the output is changing significantly already
Additional environments for execution -- these will just be wrappers around executing commands, but could be anything (spin up ec2 instance, build, destroy/etc)

ericholscher · 2015-07-31T21:06:39Z

readthedocs/core/management/commands/update_docker.py

@@ -20,4 +20,6 @@ def handle(self, *args, **options):
                project_data = api.project(slug).get()
                p = tasks.make_api_project(project_data)
                log.info("Building %s" % p)
-                tasks.update_docs(pk=p.pk, docker=True)
+                # TODO just move this to update_api.py


Do we still need this file?

We should decide what management commands are worth sticking around. Either way, build vs docker build should just be a parameter to the build comand.

Agreed. We should figure out what makes sense to keep around, and document the useful ones.

Also, maybe we consider implementing them as admin functions instead. Cases like build docs for a version from the admin are more useful for rtd.com, but allowing non ops folks to trigger some of the management functionality from the admin might be a win

Yea, they should all be super light wrappers around util methods, so we could expose them both places.

Made #1526 for this.

agjohnson · 2015-08-06T18:42:05Z

Outstanding issues that I am finishing up on:

On docker command timeout, the container goes away as expected, but we should tell the user their build timed out on build output page
Ensure container is cleaned up on exceptions, or be smarter about creating containers. We don't have access to the --rm option for run through the docker api for some reason, so the container sticks around on faulty builds
Build output page is still awful, will address separately so this PR can continue
Where to put Dockerfile -- this is maybe a good deploy step, though it overlaps more with server provisioning. Dockerfile should be public

ericholscher · 2015-08-06T20:27:01Z

readthedocs/core/management/commands/update_api.py

        if len(args):
            for slug in args:
                project_data = api.project(slug).get()
                p = tasks.make_api_project(project_data)
                log.info("Building %s" % p)
-                tasks.update_docs(pk=p.pk)
+                update_docs = tasks.UpdateDocsTask()


Is there a reason we don't just put this in the tasks.py so it's importable like previously?

These objects will allow for multiple command execution patterns without changing the actual commands we're executing. This allows us to use Docker as a thin wrapper around container execution, instead of treating the Docker containers as a black box. This works towards per-command tracking of commands as well, giving us a far better UI around build state.

Fixes and adds more testing on build tests, including tests for pdf build failure states.

This cleans up some testing of environments and adds tests to handle extended character return gracefully.

This allows for delving into the passing behavior of a build, but the meat and potatoes is the error output.

This catches errors that did not emanate from a command

This addresses exiting due to timeout and memory usage

This will remove stale containers and throw an exception that would back out of the buildenv/task if there is a collision

Missing build data, and subsequent mocked out update_build method on envs caused a done state to not be detected, and thus success (with done state implied) was false.

agjohnson · 2015-08-08T00:17:16Z

Okay, to wrap this up, I've added some documentation and cleaned up the last remaining issues that I had with the code. I've push up a new repo with the docker file required to start with this build env: https://github.com/rtfd/readthedocs-docker-images

It would be worth throwing some more projects at this system for integration testing. I've been through many iterations of testing on this all, but would be good to revisit initial setup issues/etc.

I will follow this up with another PR that makes the UX on output not suck. It currently is a first pass and is quite noisey.

gregmuellegger · 2015-08-10T07:48:55Z

docs/development/buildenvironments.rst

+==================
+
+Read the Docs uses container virtualization to encapsulate documentation build
+processes. Each build spins up a new virtual machinge using our base image,


type: "machinge" -> "machine"

ericholscher · 2015-08-12T22:06:03Z

Restructure of Docker command execution and large build refactor

agjohnson added Needed: design decision A core team decision is required Needed: documentation Documentation is required Needed: tests Tests are required PR: work in progress Pull request is not ready for full review labels Jul 31, 2015

agjohnson self-assigned this Jul 31, 2015

agjohnson force-pushed the build-refactor-and-docker branch from 8a4751a to 75dd2df Compare July 31, 2015 20:15

ericholscher reviewed Jul 31, 2015
View reviewed changes

ericholscher mentioned this pull request Jul 31, 2015

Audit management commands #1526

Closed

ericholscher reviewed Aug 6, 2015
View reviewed changes

agjohnson added 18 commits August 7, 2015 14:57

Add test

74d4391

Testing environments and addressing some design issues

875624a

Cleanup and fix tests

176742b

Drop unused docker command, move functionality to update_api command

8300741

Correct call on mkdocs json and some cleanup on mkdocs backend

2f545a0

Handle PDF builds better, clean up build tests

11866e0

Fixes and adds more testing on build tests, including tests for pdf build failure states.

Lint fixes

6543a96

Handle non-ascii characters from build commands

e40018d

This cleans up some testing of environments and adds tests to handle extended character return gracefully.

Make error output show up first on build output page, hide stdout

d2df335

This allows for delving into the passing behavior of a build, but the meat and potatoes is the error output.

Add some error output for top-level failure reporting

e6e928f

This catches errors that did not emanate from a command

Handle build env container exit states with error messages

d4d63dc

This addresses exiting due to timeout and memory usage

More aggressive linting

0b40d63

Add existing container detection and handling to docker build env

1722890

This will remove stale containers and throw an exception that would back out of the buildenv/task if there is a collision

Clean up mkdocs backend a bit

e6a67d1

Alter pattern for update_docs tasks to include task instance

33c0078

Provide build test envs with build object to determine done-ness

cc8f83d

Missing build data, and subsequent mocked out update_build method on envs caused a done state to not be detected, and thus success (with done state implied) was false.

Add tests for timeout and memory failures in containers

d0fb52f

Add docs, move standards due to naming

caea61e

agjohnson added PR: ready for review and removed Needed: design decision A core team decision is required Needed: documentation Documentation is required Needed: tests Tests are required PR: work in progress Pull request is not ready for full review labels Aug 8, 2015

agjohnson mentioned this pull request Aug 8, 2015

build hangs when conf.py blocks for TTY input; cannot kill hung build #1539

Closed

gregmuellegger reviewed Aug 10, 2015
View reviewed changes

agjohnson added 6 commits August 12, 2015 10:37

Fix test undone in rebase

709faf6

Protect against py3 __iter__ check

8db0d31

Documentation fixes

39ee22c

Make build backend docs_dir resolution absolute

a592aa0

Default autodetect type to Sphinx type on project build

bd64667

Make sure PDF/ePub types aren't built if type is not sphinx

1476bbb

agjohnson mentioned this pull request Aug 12, 2015

Call python scripts as argument to virtualenv python command #1559

Merged

agjohnson added 2 commits August 12, 2015 14:57

Lint cleanup

467d240

Update refactor misses

2356621

agjohnson added a commit that referenced this pull request Aug 12, 2015

Merge pull request #1524 from rtfd/build-refactor-and-docker

b977191

Restructure of Docker command execution and large build refactor

agjohnson merged commit b977191 into master Aug 12, 2015

agjohnson deleted the build-refactor-and-docker branch August 12, 2015 22:12

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Restructure of Docker command execution and large build refactor #1524

Restructure of Docker command execution and large build refactor #1524

agjohnson commented Jul 31, 2015

ericholscher Jul 31, 2015

agjohnson Jul 31, 2015

ericholscher Jul 31, 2015

agjohnson Jul 31, 2015

ericholscher Jul 31, 2015

ericholscher Jul 31, 2015

agjohnson commented Aug 6, 2015

ericholscher Aug 6, 2015

agjohnson commented Aug 8, 2015

gregmuellegger Aug 10, 2015

ericholscher commented Aug 12, 2015

Restructure of Docker command execution and large build refactor #1524

Restructure of Docker command execution and large build refactor #1524

Conversation

agjohnson commented Jul 31, 2015

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

agjohnson commented Aug 6, 2015

Choose a reason for hiding this comment

agjohnson commented Aug 8, 2015

Choose a reason for hiding this comment

ericholscher commented Aug 12, 2015