Execute checkout step inside docker #6253

stsewd · 2019-10-08T01:54:36Z

I'm making a PR so is easy to discuss some problems? Or hope so.

Design decisions

How to install/copy the script?

Currently, I'm just mounting a volume inside the container, but we can make it an instalable package.

How to install the deps?

Currently, I'm installing gitpython in the global python environment of the container, but we can create a virtualenv just for this package (we need to refactor our class that creates virtualenvs for this).

Other problems

We have chicken/egg problem, we don't know the build image to use before the checkout step, so we'll be using two build env, one for the checkout process (we can use any docker image we want) and other for the normal build process. We can keep it this way and initialize the ssh agent on each image or use the providers API to query the config file before cloning. The second option introduce more changes and removes support for repos that are not hosted on those providers, so I think we are just fine with the first option for now (this may be a problem when we move to lambda functions I think?)

humitos

The code looks good. Although, I'm not sold on the idea behind this. I'm still in doubt what's a good way to implement this --I'm not seeing clearly what's the path to follow :(

I'm just mounting a volume inside the container, but we can make it an instalable package.

I think mounting it's easier. It will allow us to make changes to the script without adding the overhead of having another package in the release/deploy process.

I'm installing gitpython in the global python environment of the container, but we can create a virtualenv just for this package (we need to refactor our class that creates virtualenvs for this).

Using the global site-packages seems enough to me at this point.

We have chicken/egg problem, we don't know the build image to use before the checkout step, so we'll be using two build env, one for the checkout process (we can use any docker image we want) and other for the normal build process

This is my main concern. It feels too complicated to spin up two images. How are we going to share the cloned code between these two images?

It's probably slow as well since we are going to spin up a 8gb image just to run git. We may want a small image that contains only the packages needed for VCS...

Another thing to consider here is that with this PR we are only solving the git case, but we need to do the same for the others as well --where we are using some custom code that we wrote for the parsing.

Finally, I think that I lost the main goal of all of this. What's the problem we are solving by running git inside the container? I'd like to have it 100% isolated, but it seems that we are adding some issues (and I'm not sure what we are solving)

There are some changes that look unrelated to this PR (i.e. docker= argument)

humitos · 2019-11-04T09:56:53Z

readthedocs/settings/base.py

@@ -218,6 +218,7 @@ def USE_PROMOS(self):  # noqa
    LOGS_ROOT = os.path.join(SITE_ROOT, 'logs')
    PRODUCTION_ROOT = os.path.join(SITE_ROOT, 'prod_artifacts')
    PRODUCTION_MEDIA_ARTIFACTS = os.path.join(PRODUCTION_ROOT, 'media')
+    RTD_SCRIPTS_PATH = os.path.join(SITE_ROOT, 'rtd_scripts')


I would put this inside the doc_environments/ directory or similar, but not at the top of the repository.

humitos · 2019-11-04T09:59:59Z

readthedocs/vcs_support/backends/git.py

@@ -54,6 +60,39 @@ def _get_clone_url(self):
            #     clone_url = 'git://%s' % (hacked_url)
        return self.repo_url

+    def _get_repo_info(self):


This can probably be a @property called _repo.

humitos · 2019-11-04T10:02:11Z

rtd_scripts/get_vcs_info.py

@@ -0,0 +1,145 @@
+#!/usr/bin/env python


This file should be named mentioning git on its name.

stsewd · 2019-11-04T16:35:02Z

How are we going to share the cloned code between these two images?

We have a shared volume, is the same as executing the checkout step as now.

Another thing to consider here is that with this PR we are only solving the git case, but we need to do the same for the others as well --where we are using some custom code that we wrote for the parsing.

Those commands are executed inside the build environment (docker or local), we only need to do this with git because we are using a library (gitpython)

Finally, I think that I lost the main goal of all of this. What's the problem we are solving by running git inside the container? I'd like to have it 100% isolated, but it seems that we are adding some issues (and I'm not sure what we are solving)

We want to be able to run the ssh agent inside docker for .com.

humitos · 2019-11-04T16:56:30Z

Those commands are executed inside the build environment (docker or local), we only need to do this with git because we are using a library (gitpython)

So, considering that we already hit some gitpython bugs and that all this code/PR is needed only for git, isn't it better to go back to our own implementation that parses the git command output and forget about this?

stsewd · 2019-11-04T17:24:05Z

So, considering that we already hit some gitpython bugs and that all this code/PR is needed only for git, isn't it better to go back to our own implementation that parses the git command output and forget about this?

That was an option #5673
But I remember that Anthony was for the second option too

agjohnson · 2019-11-11T17:40:58Z

I think we need a call on this one to discuss options and this implementation.

On my side: regarding an option that uses GitPython vs parsing the output, I suppose I'd rather rely on GitPython to do this. Mostly for the reason that we'll have to maintain a connection to git/hg no matter what -- GitPython is at least a stronger choice for this job. I'm partially unsure about the connection mechanism between the application and the container. I do feel like there is maybe something more we could do here. The biggest hurdle is how code is organized, it does feel a little lost in a one-off script now. We could explore making this a separate package if it would be cleaner.

agjohnson · 2019-12-02T19:05:56Z

We didn't come to a conclusion here, I think because we all have reservations about the solutions we've noted so far -- and because we don't really have any better opinions.

This change does address a security issue around exploiting git, which would be good to address eventually.

The other part of this work is to unblock sharing an ssh-agent between git checkout and docker operations. This is perhaps the more immediate change we can introduce, and directly makes the user's experience better.

Two options here are:

Keep the ssh-agent out of docker, pass in the socket so steps can have access to the agent. ssh-agent on the host system dies after timeout
Put the ssh-agent in docker, expose the socket to the host system for git checkout steps. ssh-agent lifecycle is directly linked to docker container -- it lives the entire time the container is in operation, it dies as soon as the container is killed.

ericholscher · 2019-12-04T11:23:12Z

I guess I want to start from first-principals a bit -- is there a reason we can't run the code we currently have, just inside Docker? That seems to me to be the most obvious first step, but I don't know why that approach doesn't work.

Thinking on this more -- it's because we'd need the entire RTD codebase running inside docker, because gitpython is running in process. I think going back to the previous approach where we just run the git commands directly and pass them into docker like all the other commands feels like a better outcome now that I think about it.

If I remember correctly, we needed gitpython because of unicode branch names. I'm wondering if upgrading to Python 3 has solved this for us, and we can just go back to calling git directly.

agjohnson · 2019-12-04T15:57:58Z

It wasn't just unicode branches, it was the actual parsing where we weren't consisently parsing the output correctly, and gitpython does this all for us. It does seem like a lot of duplication of effort for a jankier solution.

We can also be looking into gitpython/hg internals and see if it's possible to replace the execution method, or to parse a different i/o stream that the system i/o (i/o from docker exec in our case).

stsewd · 2019-12-04T23:36:26Z

Closing in favor of #6436

There is still one decision, related to use the docker image just with git or use our build images

stsewd added 4 commits October 7, 2019 19:13

Add script

aafbff3

Silly POC

45dd3b7

Make POC usable

4cdf043

Make POC usable.. again

0789cb5

stsewd changed the title ~~Stsewd/get git info from container~~ Execute checkout step inside docker Oct 8, 2019

stsewd added the Needed: design decision A core team decision is required label Oct 8, 2019

humitos reviewed Nov 4, 2019

View reviewed changes

stsewd closed this Dec 4, 2019

stsewd deleted the stsewd/get-git-info-from-container branch December 4, 2019 23:36

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Execute checkout step inside docker #6253

Execute checkout step inside docker #6253

stsewd commented Oct 8, 2019 •

edited

Loading

humitos left a comment

humitos Nov 4, 2019

humitos Nov 4, 2019

humitos Nov 4, 2019

stsewd commented Nov 4, 2019

humitos commented Nov 4, 2019

stsewd commented Nov 4, 2019

agjohnson commented Nov 11, 2019

agjohnson commented Dec 2, 2019

ericholscher commented Dec 4, 2019 •

edited

Loading

agjohnson commented Dec 4, 2019

stsewd commented Dec 4, 2019

Execute checkout step inside docker #6253

Execute checkout step inside docker #6253

Conversation

stsewd commented Oct 8, 2019 • edited Loading

Design decisions

How to install/copy the script?

How to install the deps?

Other problems

humitos left a comment

Choose a reason for hiding this comment

humitos Nov 4, 2019

Choose a reason for hiding this comment

humitos Nov 4, 2019

Choose a reason for hiding this comment

humitos Nov 4, 2019

Choose a reason for hiding this comment

stsewd commented Nov 4, 2019

humitos commented Nov 4, 2019

stsewd commented Nov 4, 2019

agjohnson commented Nov 11, 2019

agjohnson commented Dec 2, 2019

ericholscher commented Dec 4, 2019 • edited Loading

agjohnson commented Dec 4, 2019

stsewd commented Dec 4, 2019

stsewd commented Oct 8, 2019 •

edited

Loading

ericholscher commented Dec 4, 2019 •

edited

Loading