Skip to content

WIP: docker in docker fix #198

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed

Conversation

sukovanej
Copy link
Contributor

@sukovanej sukovanej commented Jun 11, 2018

Please note that - I'm not a JS dev, I actually haven't studied the whole project and the fix is something I really needed to make 'it work'. I believe it's not the best solution and I'm totally open to a discussion.

So the problem is that in CI I need to run sls deploy and the CI stage itself is running in a container. After some research, I found that the bindPath is assigned to its host $(pwd)$ value. The problem is, the 'docker container inside the CI docker container' thinks the host is the main CI container so the path is actually wrong. So I simply shared $(pwd) from the original host to the CI container as a host_pwd env variable and let the bindPath to take that value in case the dockerInDocker option is true.

In .gitlab-ci.yml

    - export host_pwd=$CI_PROJECT_DIR

and of course, the whole stage must be run in a container capable of running another container and in config.toml there must be docker socket specified.

and in serverless.yml

custom:
    pythonRequirements:
        dockerInDocker: true
        dockerizePip: true

What do you think?

@sukovanej sukovanej force-pushed the sukovanej/docker-in-docker-fix branch from 5d9df7b to 4c1b2b5 Compare June 11, 2018 13:39
@dschep
Copy link
Contributor

dschep commented Jun 19, 2018

I think a bind path option would be preferable. And you'd use it like:

custom:
    pythonRequirements:
        dockerizePip: true
        dockerBindPath: ${env:CI_PROJECT_DIR}

@kichik @heri16, this could also be a solution to weird windows bind paths (tho definitely still less user-friendly than the current solution)

@kichik
Copy link
Contributor

kichik commented Jun 19, 2018

Could be. In that case it would be cool if we can also have this as a command line argument. That's easier to do with different development environments.

@dschep
Copy link
Contributor

dschep commented Jun 19, 2018

@kichik any option can be a CLI option using serverless' variable system 😉

custom:
    pythonRequirements:
        dockerizePip: true
        dockerBindPath: ${opt:dockerBindPath}

then use sls --dockerBindPath /foobar package

@heri16
Copy link
Contributor

heri16 commented Jun 22, 2018

@sukovanej could you elaborate on what is meant by the bindPath is assigned to its host pwd value? 'docker container inside the CI docker container' thinks the host is the main CI container? You meant the docker container spawned by this plugin? Are you able to check the servicePath returned by the serverless framework inside your CI?

I have a feeling this is a problem caused by an incorrect setup/wiring of Docker, and not likely a problem with the plugin.

@sukovanej
Copy link
Contributor Author

@dschep Yep, that seems like a better solution to me. 👍

dschep pushed a commit that referenced this pull request Jul 17, 2018
Updates the requirements on [fs-extra](https://github.com/jprichardson/node-fs-extra) to permit the latest version.
<details>
<summary>Changelog</summary>

*Sourced from [fs-extra's changelog](https://github.com/jprichardson/node-fs-extra/blob/master/CHANGELOG.md).*

> 7.0.0 / 2018-07-16
> ------------------
> 
> - **BREAKING:** Refine `copy*()` handling of symlinks to properly detect symlinks that point to the same file. ([#582](https://github-redirect.dependabot.com/jprichardson/node-fs-extra/pull/582))
> - Fix bug with copying write-protected directories ([#600](https://github-redirect.dependabot.com/jprichardson/node-fs-extra/pull/600))
> - Universalify `fs.lchmod()` ([#596](https://github-redirect.dependabot.com/jprichardson/node-fs-extra/pull/596))
> - Add `engines` field to `package.json` ([#580](https://github-redirect.dependabot.com/jprichardson/node-fs-extra/pull/580))
> 
> 6.0.1 / 2018-05-09
> ------------------
> 
> - Fix `fs.promises` `ExperimentalWarning` on Node v10.1.0 ([#578](https://github-redirect.dependabot.com/jprichardson/node-fs-extra/pull/578))
> 
> 6.0.0 / 2018-05-01
> ------------------
> 
> - Drop support for Node.js versions 4, 5, & 7 ([#564](https://github-redirect.dependabot.com/jprichardson/node-fs-extra/pull/564))
> - Rewrite `move` to use `fs.rename` where possible ([#549](https://github-redirect.dependabot.com/jprichardson/node-fs-extra/pull/549))
> - Don't convert relative paths to absolute paths for `filter` ([#554](https://github-redirect.dependabot.com/jprichardson/node-fs-extra/pull/554))
> - `copy*`'s behavior when `preserveTimestamps` is `false` has been OS-dependent since 5.0.0, but that's now explicitly noted in the docs ([#563](https://github-redirect.dependabot.com/jprichardson/node-fs-extra/pull/563))
> - Fix subdirectory detection for `copy*` & `move*` ([#541](https://github-redirect.dependabot.com/jprichardson/node-fs-extra/pull/541))
> - Handle case-insensitive paths correctly in `copy*` ([#568](https://github-redirect.dependabot.com/jprichardson/node-fs-extra/pull/568))
> 
> 5.0.0 / 2017-12-11
> ------------------
> 
> Significant refactor of `copy()` & `copySync()`, including breaking changes. No changes to other functions in this release.
> 
> Huge thanks to **[[**manidlou**](https://github.com/manidlou)](https://github.com/manidlou)** for doing most of the work on this release.
> 
> - The `filter` option can no longer be a RegExp (must be a function). This was deprecated since fs-extra v1.0.0. [#512](https://github-redirect.dependabot.com/jprichardson/node-fs-extra/pull/512)
> - `copy()`'s `filter` option can now be a function that returns a Promise. [#518](https://github-redirect.dependabot.com/jprichardson/node-fs-extra/pull/518)
> - `copy()` & `copySync()` now use `fs.copyFile()`/`fs.copyFileSync()` in environments that support it (currently Node 8.5.0+). Older Node versions still get the old implementation. [#505](https://github-redirect.dependabot.com/jprichardson/node-fs-extra/pull/505)
> - Don't allow copying a directory into itself. [#83](https://github-redirect.dependabot.com/jprichardson/node-fs-extra/issues/83)
> - Handle copying between identical files. [#198](https://github-redirect.dependabot.com/jprichardson/node-fs-extra/issues/198)
> - Error out when copying an empty folder to a path that already exists. [#464](https://github-redirect.dependabot.com/jprichardson/node-fs-extra/issues/464)
> - Don't create `dest`'s parent if the `filter` function aborts the `copy()` operation. [#517](https://github-redirect.dependabot.com/jprichardson/node-fs-extra/pull/517)
> - Fix `writeStream` not being closed if there was an error in `copy()`. [#516](https://github-redirect.dependabot.com/jprichardson/node-fs-extra/pull/516)
> 
> 4.0.3 / 2017-12-05
> ------------------
> 
> - Fix wrong `chmod` values in `fs.remove()` [#501](https://github-redirect.dependabot.com/jprichardson/node-fs-extra/pull/501)
> - Fix `TypeError` on systems that don't have some `fs` operations like `lchown` [#520](https://github-redirect.dependabot.com/jprichardson/node-fs-extra/pull/520)
> 
> 4.0.2 / 2017-09-12
> ------------------
> 
> - Added `EOL` option to `writeJson*` & `outputJson*` (via upgrade to jsonfile v4)
> - Added promise support to [`fs.copyFile()`](https://nodejs.org/api/fs.html#fs_fs_copyfile_src_dest_flags_callback) in Node 8.5+
></table> ... (truncated)
</details>
<details>
<summary>Commits</summary>

- See full diff in [compare view](https://github.com/jprichardson/node-fs-extra/commits/7.0.0)
</details>
<br />

Dependabot will resolve any conflicts with this PR as long as you don't alter it yourself. You can also trigger a rebase manually by commenting `@dependabot rebase`.

---

<details>
<summary>Dependabot commands and options</summary>
<br />

You can trigger Dependabot actions by commenting on this PR:
- `@dependabot rebase` will rebase this PR
- `@dependabot merge` will merge this PR after your CI passes on it
- `@dependabot ignore this [patch|minor|major] version` will close this PR and stop Dependabot creating any more for this minor/major version (unless you reopen the PR or upgrade to it yourself)
- `@dependabot ignore this dependency` will close this PR and stop Dependabot creating any more for this dependency (unless you reopen the PR or upgrade to it yourself)
- `@dependabot use these labels` will set the current labels as the default for future PRs for this repo and language
- `@dependabot use these reviewers` will set the current reviewers as the default for future PRs for this repo and language
- `@dependabot use these assignees` will set the current assignees as the default for future PRs for this repo and language
- `@dependabot badge me` will comment on this PR with code to add a "Dependabot enabled" badge to your readme

Additionally, you can set the following in your Dependabot [dashboard](https://app.dependabot.com):
- Update frequency (including time of day and day of week)
- Automerge options (never/patch/minor, and dev/runtime dependencies)
- Pull request limits (per update run and/or open at any time)
- Out-of-range updates (receive only lockfile updates, if desired)
- Security updates (receive only security updates, if desired)

Finally, you can contact us by mentioning @dependabot.

</details>
@brettdh
Copy link

brettdh commented Oct 10, 2018

@sukovanej I have the same issue - using dockerizePip in Gitlab CI, which itself runs docker-in-docker. Oddly, I only noticed a problem starting recently, when I upgraded the plugin from 3.0.7 to 4.2.4. Thanks for tracking this down and figuring it out.

Do you still plan to adjust this PR to implement a general bindPath option, or should I perhaps create a new one to do the same?

@heri16 it's possible that this is just a band-aid. I don't think I fully understand the problem yet, but I have observed the symptom:

Could not open requirements file: [Errno 2] No such file or directory: '/var/task/requirements.txt'

but I also recall other volume-related issues in this CI setup, so I'd need to dig some more to be sure that it's the same issue.

@marlonchalegre
Copy link

Is this issue solved? I'm facing the same problem to build in a docker in docker environment

@campellcl
Copy link

Yup, same issue here when using Jenkins CI with docker in docker.

@AndrewFarley
Copy link
Contributor

I’m going to help pull this over the finish line. I will try to test this out more completely and add a test for it and recommend it’s merging when complete. I think DID is quite critical for some folks ci systems. I myself use kubernetes quite heavily. I will test and fiddle with this today. Might have a few more commits to this to add some tests and docs. Cheers

@campellcl
Copy link

@AndrewFarley I know DinD is not necessarily a recommended approach these days, but I agree; it still is critical in CI/CD systems such as Jenkins. In the case of Jenkins, the official tutorials still support it's usage. So it would be very helpful if serverless-python-requirements functioned out-of-the-box in CI environments that still leverage the legacy DinD approach. Thanks a ton for looking into this!

@AndrewFarley
Copy link
Contributor

So, I hacked on this a bit this weekend and have some preliminary results.

First, the concept from this MR does work, the challenge is how to make it work reliably. Let me discuss/debate/share some thoughts...

So, first assuming we have docker in docker support, means that I should be able to run the docker command from within docker so I can look-up information about the docker container I am in right now. In order to do that I need to know what container ID I am running in, for that I found a neat cgroup trick I didn't know to lookup what container ID you have from within' the container..

With that above, I can query where the docker volume overlay is on the host and then can use this as a prefix for any bind url(s). This will prevent what this MR's author had to do to make this MR work, which is to know the actual storage path of the host and set that as an environment variable. In many cases, you won't know because it won't be hardcoded (eg: Kubernetes), and doing so is an undue burden on users.

So, I worked out a PoC doing exactly this and it seems to work reliably. However... this is one of a multitude of use-cases and configurations of how docker would be setup. Off the top of my head, an edge case that would fail is if any volumes are mounted which would contain the actual code that you plan to serverless deploy from. Then your root overlay would not contain the serverless data, but the volume mount would. I would need to add code to sift through the volumes mounted in the data from docker inspect and detect if the code you were trying to deploy is within' those volumes, and if so, use THAT as a file base instead of the root volume overlay from the trick above. I'm thinking that I may need to add "two" new features, one to try to auto-detect the overlay/volume from the host and make it work magically, the second is to just allow you to specify what the volume will be if you know where the data volume is on the host. The more I look at and think about this the more daunting of a task this becomes because I see all the edge cases and differing setups.

My goal, for any MR/PR I would create or approve for this would be such that...

• Docker in Docker works from a plain ubuntu 18.x instance just running Docker CE
• Docker in Docker works from AWS EKS Kubernetes (default config)
• Docker in Docker works from Google Cloud GKE (default config)

I probably won't have time to fiddle more until next weekend. But will post here if/when I do. Feel free to respond/comment if you have any input/ideas/support to anything I mentioned.

@campellcl
Copy link

@AndrewFarley Thanks again for looking into this! I haven't dealt with Kubernetes, but In my opinion I don't think it is too unreasonable for the user to have to know the which volumes they have mounted on the host os. If it makes your life easier, it may be worthwhile bringing the discussion back to CI/CD, and consider the standard Docker configurations for the major CI/CD players.

From the Jenkins standpoint, I believe the user is utilizing bind mounts instead of volumes. On the host OS, there is both a DinD container running, and the Jenkins container running. The Jenkins container deploys one-or-many additional "Agent" containers which reference the mounted volumes containing the actual Serverless code to be deployed. The Serverless deployment is happening inside the Jenkins "Agent" container, and not inside the Jenkins container itself.

This is indeed difficult, because theoretically speaking you could be deploying Serverless code inside an arbitrary number of levels of nested Docker containers. I would just try and support the configurations adopted by the major CI/CD players instead of worrying about all potential DinD use cases. As DinD is slowly being replaced anyway by bind mounting the the Docker socket.

@brettdh
Copy link

brettdh commented Mar 2, 2020

@ccampell I would disagree that DinD is being replaced by bind-mounting the docker socket; each approach has its advantages and disadvantages. The article you linked is over four years old, and DinD remains a "recommended configuration" for building docker images with Gitlab CI.

@campellcl
Copy link

@brettdh Fair enough, that is a valid point! Here is a more recent article discussing the pros and cons of each approach. I'm inclined to agree with you, and amend my previous statement. I meant that for the particular use case of being able to run Docker commands from the confines of a containerized CI/CD system, from what I have seen, there is a general trend toward the transition of bind mounting the Docker socket and utilizing the sidecar pattern. But as always, nothing is strictly black and white, and your right there are advantages and disadvantages to each approach.

@AndrewFarley
Copy link
Contributor

There's no reason to debate what the best/current/more secure practice of accomplishing Docker in Docker because this is out of the scope of this plugin. This plugin will/does not care "how" you made your Docker CLI command work inside of Docker, that's on you and/or your admin to accomplish.

@ccampell To your point, I checked in a few systems that I manage and all of them you don't specify where your volumes/binds are mounted they are generated, and while I understand you do, my experience tells me this is unusual. Additionally, most CI workflows (that I use) just check out the code whilst you are already inside your first Docker container, thus you don't know the mount ahead of time. Having a workflow differently seems... odd to me, but I'm not debating that you can know the mount ahead of time. Anyways, as I explained, I overcame that limitation/issue so it's a moot point.

I personally mostly run Kubernetes clusters for clients, and I heavily lean on GitlabCI personally, with the odd Jenkins setup or two and nowadays some Gitlab Actions as well, but to accomplish CI/CD for the serverless framework with this plugin for my clients I typically use an LambCI image directly as the image I start (typically inside Kubernetes) and then from in there install the Serverless framework, check out the code, and run this plugin which in that environment means I don't need Docker in Docker, I already have a comparable environment to AWS that I'm building in. I've avoided needing to use DinD entirely which is why I never really thought this feature was needed. In a sense, I still don't, but I don't want to generalise and assume.

In my research for solving all of this, I did stumble onto a DinD alternative which I think is very interesting and might handle this whole thing a LOT better, but it requires a lot more time investment than I'm willing to give. I thought this might be interesting to you folks who do Docker in Docker and CI stuff: https://github.com/GoogleContainerTools/kaniko . If that does what it says it does, I personally would probably shift most of my CI automation over from DinD to Kaniko simply for security purposes in any future CI setups of mine. But, alas, imho that may be out of the scope of solving this direct issue, if only because I have no experience with that tool at this time. Again keep y'all posted will have a branch you guys can try soon to get some feedback.

@AndrewFarley
Copy link
Contributor

@ccampell / @brettdh (or anyone) Want to test for me? I tested via Docker via the mounted sock and via the DinD sidecar, both seem completely functional with or without caching. There are no tests or anything, there is verbose logging as well. I created a fork, you can use it with...

npm remove --save serverless-python-requirements
npm install --save github:andrewfarley/serverless-python-requirements#dockerindocker

From here, you either run git add/commit/push and let your CI do its thing depending how you have it setup, or just run serverless package to test out the packaging. If you are in docker, it should detect docker-in-docker and detect the volume path and run properly.

I confirm this functional on...
• OS-X via /var/docker sock volume mount (wow, cool huh?)
• Linux Ubuntu 18.04 Test Server with Docker via .sock volume mount
• Linux Ubuntu 18.04 Test Server with Docker via dind sidecar
• Briefly worked on Kubernetes, but need to test more

Here's a snapshot of it working nicely on my Gitlab AWS autoscaling runners. NOTE: This of course did not work before, it now works beautifully.

Screen Shot 2020-03-03 at 4 13 53 PM

Feedback welcome!!!

@AndrewFarley AndrewFarley mentioned this pull request Mar 3, 2020
6 tasks
@AndrewFarley
Copy link
Contributor

I'm going to close this MR I've taken over the work and will finish it in #484. Please continue the conversation and add comments over there.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

9 participants