-
-
Notifications
You must be signed in to change notification settings - Fork 3.6k
Execute checkout step inside docker #6253
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Execute checkout step inside docker #6253
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The code looks good. Although, I'm not sold on the idea behind this. I'm still in doubt what's a good way to implement this --I'm not seeing clearly what's the path to follow :(
I'm just mounting a volume inside the container, but we can make it an instalable package.
I think mounting it's easier. It will allow us to make changes to the script without adding the overhead of having another package in the release/deploy process.
I'm installing gitpython in the global python environment of the container, but we can create a virtualenv just for this package (we need to refactor our class that creates virtualenvs for this).
Using the global site-packages seems enough to me at this point.
We have chicken/egg problem, we don't know the build image to use before the checkout step, so we'll be using two build env, one for the checkout process (we can use any docker image we want) and other for the normal build process
This is my main concern. It feels too complicated to spin up two images. How are we going to share the cloned code between these two images?
It's probably slow as well since we are going to spin up a 8gb image just to run git
. We may want a small image that contains only the packages needed for VCS...
Another thing to consider here is that with this PR we are only solving the git
case, but we need to do the same for the others as well --where we are using some custom code that we wrote for the parsing.
Finally, I think that I lost the main goal of all of this. What's the problem we are solving by running git
inside the container? I'd like to have it 100% isolated, but it seems that we are adding some issues (and I'm not sure what we are solving)
There are some changes that look unrelated to this PR (i.e.
docker=
argument)
@@ -218,6 +218,7 @@ def USE_PROMOS(self): # noqa | |||
LOGS_ROOT = os.path.join(SITE_ROOT, 'logs') | |||
PRODUCTION_ROOT = os.path.join(SITE_ROOT, 'prod_artifacts') | |||
PRODUCTION_MEDIA_ARTIFACTS = os.path.join(PRODUCTION_ROOT, 'media') | |||
RTD_SCRIPTS_PATH = os.path.join(SITE_ROOT, 'rtd_scripts') |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would put this inside the doc_environments/
directory or similar, but not at the top of the repository.
@@ -54,6 +60,39 @@ def _get_clone_url(self): | |||
# clone_url = 'git://%s' % (hacked_url) | |||
return self.repo_url | |||
|
|||
def _get_repo_info(self): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This can probably be a @property
called _repo
.
@@ -0,0 +1,145 @@ | |||
#!/usr/bin/env python |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This file should be named mentioning git
on its name.
We have a shared volume, is the same as executing the checkout step as now.
Those commands are executed inside the build environment (docker or local), we only need to do this with git because we are using a library (gitpython)
We want to be able to run the ssh agent inside docker for .com. |
So, considering that we already hit some gitpython bugs and that all this code/PR is needed only for git, isn't it better to go back to our own implementation that parses the git command output and forget about this? |
That was an option #5673 |
I think we need a call on this one to discuss options and this implementation. On my side: regarding an option that uses GitPython vs parsing the output, I suppose I'd rather rely on GitPython to do this. Mostly for the reason that we'll have to maintain a connection to git/hg no matter what -- GitPython is at least a stronger choice for this job. I'm partially unsure about the connection mechanism between the application and the container. I do feel like there is maybe something more we could do here. The biggest hurdle is how code is organized, it does feel a little lost in a one-off script now. We could explore making this a separate package if it would be cleaner. |
We didn't come to a conclusion here, I think because we all have reservations about the solutions we've noted so far -- and because we don't really have any better opinions. This change does address a security issue around exploiting git, which would be good to address eventually. The other part of this work is to unblock sharing an ssh-agent between git checkout and docker operations. This is perhaps the more immediate change we can introduce, and directly makes the user's experience better. Two options here are:
|
I guess I want to start from first-principals a bit -- is there a reason we can't run the code we currently have, just inside Docker? That seems to me to be the most obvious first step, but I don't know why that approach doesn't work. Thinking on this more -- it's because we'd need the entire RTD codebase running inside docker, because gitpython is running in process. I think going back to the previous approach where we just run the git commands directly and pass them into docker like all the other commands feels like a better outcome now that I think about it. If I remember correctly, we needed gitpython because of unicode branch names. I'm wondering if upgrading to Python 3 has solved this for us, and we can just go back to calling git directly. |
It wasn't just unicode branches, it was the actual parsing where we weren't consisently parsing the output correctly, and gitpython does this all for us. It does seem like a lot of duplication of effort for a jankier solution. We can also be looking into gitpython/hg internals and see if it's possible to replace the execution method, or to parse a different i/o stream that the system i/o (i/o from docker exec in our case). |
Closing in favor of #6436 There is still one decision, related to use the docker image just with git or use our build images |
This is a POC from #5673 (comment)
I'm making a PR so is easy to discuss some problems? Or hope so.
Design decisions
How to install/copy the script?
Currently, I'm just mounting a volume inside the container, but we can make it an instalable package.
How to install the deps?
Currently, I'm installing gitpython in the global python environment of the container, but we can create a virtualenv just for this package (we need to refactor our class that creates virtualenvs for this).
Other problems
We have chicken/egg problem, we don't know the build image to use before the checkout step, so we'll be using two build env, one for the checkout process (we can use any docker image we want) and other for the normal build process. We can keep it this way and initialize the ssh agent on each image or use the providers API to query the config file before cloning. The second option introduce more changes and removes support for repos that are not hosted on those providers, so I think we are just fine with the first option for now (this may be a problem when we move to lambda functions I think?)