Skip to content

Add windows driver installation support #510

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
wants to merge 1 commit into from
Closed
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
7 changes: 7 additions & 0 deletions deploy/kubernetes/base/controller.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -17,6 +17,8 @@ spec:
# since it replaces GCE Metadata Server with GKE Metadata Server. Remove
# this requirement when issue is resolved and before any exposure of
# metrics ports
nodeSelector:
kubernetes.io/os: linux
hostNetwork: true
serviceAccountName: csi-gce-pd-controller-sa
priorityClassName: csi-gce-pd-controller
Expand Down Expand Up @@ -82,5 +84,10 @@ spec:
- name: cloud-sa-volume
secret:
secretName: cloud-sa
# https://kubernetes.io/docs/concepts/configuration/taint-and-toleration/
# See "special case". This will tolerate everything. Node component should
# be scheduled on all nodes.
tolerations:
- operator: Exists
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This means that the controller may not get rescheduled if the Node becomes unhealthy. Is that what we want?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

good point, will remove it. I added it because I see node has it. And my test setup has linux tainted.

# This is needed due to https://github.com/kubernetes-sigs/kustomize/issues/504
volumeClaimTemplates: []
1 change: 0 additions & 1 deletion deploy/kubernetes/base/kustomization.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -3,6 +3,5 @@ commonLabels:
namespace:
gce-pd-csi-driver
resources:
- node.yaml
- controller.yaml
- setup-cluster.yaml
22 changes: 13 additions & 9 deletions deploy/kubernetes/base/setup-cluster.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -164,15 +164,6 @@ spec:
volumes:
- '*'
hostNetwork: true
allowedHostPaths:
- pathPrefix: "/var/lib/kubelet/plugins_registry/"
- pathPrefix: "/var/lib/kubelet"
- pathPrefix: "/var/lib/kubelet/plugins/pd.csi.storage.gke.io/"
- pathPrefix: "/dev"
- pathPrefix: "/etc/udev"
- pathPrefix: "/lib/udev"
- pathPrefix: "/run/udev"
- pathPrefix: "/sys"
---

kind: ClusterRole
Expand All @@ -199,6 +190,19 @@ subjects:
- kind: ServiceAccount
name: csi-gce-pd-node-sa

---
apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRoleBinding
metadata:
name: csi-gce-pd-controller
roleRef:
apiGroup: rbac.authorization.k8s.io
kind: ClusterRole
name: csi-gce-pd-node-deploy
subjects:
- kind: ServiceAccount
name: csi-gce-pd-controller-sa

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I added this binding, because without it, when pod security policy admission controller is enabled, service account csi-gce-pd-controller-sa cannot create pod at all.

---
apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRole
Expand Down
7 changes: 6 additions & 1 deletion deploy/kubernetes/delete-driver.sh
Original file line number Diff line number Diff line change
Expand Up @@ -13,11 +13,16 @@ set -o errexit
readonly NAMESPACE="${GCE_PD_DRIVER_NAMESPACE:-gce-pd-csi-driver}"
readonly DEPLOY_VERSION="${GCE_PD_DRIVER_VERSION:-stable}"
readonly PKGDIR="${GOPATH}/src/sigs.k8s.io/gcp-compute-persistent-disk-csi-driver"
readonly DEPLOY_OS_VERSIONS=${DEPLOY_OS_VERSIONS:-"linux stable"}
source "${PKGDIR}/deploy/common.sh"

ensure_kustomize

${KUSTOMIZE_PATH} build ${PKGDIR}/deploy/kubernetes/overlays/${DEPLOY_VERSION} | ${KUBECTL} delete -v="${VERBOSITY}" --ignore-not-found -f -
echo ${DEPLOY_OS_VERSIONS} | tr ';' '\n' | while read -r NODE_OS VERSION; do \
VERSION="${VERSION:-${DEPLOY_VERSION}}"
${KUSTOMIZE_PATH} build ${PKGDIR}/deploy/kubernetes/overlays/${NODE_OS}/${VERSION} | ${KUBECTL} delete -v="${VERBOSITY}" --ignore-not-found -f -; \
done

${KUBECTL} delete secret cloud-sa -v="${VERBOSITY}" --ignore-not-found

if [[ ${NAMESPACE} != "" && ${NAMESPACE} != "default" ]] && \
Expand Down
10 changes: 7 additions & 3 deletions deploy/kubernetes/deploy-driver.sh
Original file line number Diff line number Diff line change
Expand Up @@ -19,6 +19,7 @@ set -x
readonly NAMESPACE="${GCE_PD_DRIVER_NAMESPACE:-gce-pd-csi-driver}"
readonly DEPLOY_VERSION="${GCE_PD_DRIVER_VERSION:-stable}"
readonly PKGDIR="${GOPATH}/src/sigs.k8s.io/gcp-compute-persistent-disk-csi-driver"
readonly DEPLOY_OS_VERSIONS=${DEPLOY_OS_VERSIONS:-"linux stable"}
source "${PKGDIR}/deploy/common.sh"

print_usage()
Expand Down Expand Up @@ -94,7 +95,10 @@ fi
# Debug log: print ${KUBECTL} version
${KUBECTL} version

readonly tmp_spec=/tmp/gcp-compute-persistent-disk-csi-driver-specs-generated.yaml
${KUSTOMIZE_PATH} build ${PKGDIR}/deploy/kubernetes/overlays/${DEPLOY_VERSION} | tee $tmp_spec
${KUBECTL} apply -v="${VERBOSITY}" -f $tmp_spec
echo ${DEPLOY_OS_VERSIONS} | tr ';' '\n' | while read -r NODE_OS VERSION; do \
VERSION="${VERSION:-${DEPLOY_VERSION}}"; \
tmp_spec=/tmp/gcp-compute-persistent-disk-csi-driver-specs-generated-${NODE_OS}.yaml; \
${KUSTOMIZE_PATH} build ${PKGDIR}/deploy/kubernetes/overlays/${NODE_OS}/${VERSION} | tee $tmp_spec; \
${KUBECTL} apply -v="${VERBOSITY}" -f $tmp_spec; \
done

14 changes: 14 additions & 0 deletions deploy/kubernetes/overlays/linux/base/allowedHostPaths.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,14 @@
apiVersion: policy/v1beta1
kind: PodSecurityPolicy
metadata:
name: csi-gce-pd-node-psp
spec:
allowedHostPaths:
- pathPrefix: "/var/lib/kubelet/plugins_registry/"
- pathPrefix: "/var/lib/kubelet"
- pathPrefix: "/var/lib/kubelet/plugins/pd.csi.storage.gke.io/"
- pathPrefix: "/dev"
- pathPrefix: "/etc/udev"
- pathPrefix: "/lib/udev"
- pathPrefix: "/run/udev"
- pathPrefix: "/sys"
8 changes: 8 additions & 0 deletions deploy/kubernetes/overlays/linux/base/enableHostNetwork.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,8 @@
kind: DaemonSet
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this file can be removed now right?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

right, fix it

apiVersion: apps/v1
metadata:
name: csi-gce-pd-node
spec:
template:
spec:
hostNetwork: true
31 changes: 31 additions & 0 deletions deploy/kubernetes/overlays/linux/base/kustomization.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,31 @@
apiVersion: kustomize.config.k8s.io/v1beta1
kind: Kustomization
bases:
- ../../../base
namespace:
gce-pd-csi-driver
resources:
- node.yaml
patchesStrategicMerge:
- allowedHostPaths.yaml
images:
- name: gke.gcr.io/gcp-compute-persistent-disk-csi-driver
# Don't change stable image without changing pdImagePlaceholder in
# test/k8s-integration/main.go
newName: gke.gcr.io/gcp-compute-persistent-disk-csi-driver
newTag: "v0.7.0-gke.0"
- name: gke.gcr.io/csi-provisioner
newName: gke.gcr.io/csi-provisioner
newTag: "v1.5.0-gke.0"
- name: gke.gcr.io/csi-attacher
newName: gke.gcr.io/csi-attacher
newTag: "v2.1.1-gke.0"
- name: gke.gcr.io/csi-node-driver-registrar
newName: gke.gcr.io/csi-node-driver-registrar
newTag: "v1.2.0-gke.0"
- name: gke.gcr.io/csi-resizer
newName: gke.gcr.io/csi-resizer
newTag: "v0.4.0-gke.0"
- name: gke.gcr.io/csi-snapshotter
newName: gke.gcr.io/csi-snapshotter
newTag: "v2.1.1-gke.0"
Original file line number Diff line number Diff line change
Expand Up @@ -15,21 +15,20 @@ spec:
# Host network must be used for interaction with Workload Identity in GKE
# since it replaces GCE Metadata Server with GKE Metadata Server. Remove
# this requirement when issue is resolved and before any exposure of
# metrics ports.
# metrics ports. But hostNetwork is not working for Windows, might be an issue
# when deploying on GKE Windows node.
hostNetwork: true
priorityClassName: csi-gce-pd-node
serviceAccountName: csi-gce-pd-node-sa
nodeSelector:
kubernetes.io/os: linux
containers:
- name: csi-driver-registrar
image: gke.gcr.io/csi-node-driver-registrar
args:
- "--v=5"
- "--csi-address=/csi/csi.sock"
- "--kubelet-registration-path=/var/lib/kubelet/plugins/pd.csi.storage.gke.io/csi.sock"
lifecycle:
preStop:
exec:
command: ["/bin/sh", "-c", "rm -rf /registration/pd.csi.storage.gke.io /registration/pd.csi.storage.gke.io-reg.sock"]
env:
- name: KUBE_NODE_NAME
valueFrom:
Expand All @@ -41,14 +40,14 @@ spec:
- name: registration-dir
mountPath: /registration
- name: gce-pd-driver
securityContext:
privileged: true
# Don't change base image without changing pdImagePlaceholder in
# test/k8s-integration/main.go
image: gke.gcr.io/gcp-compute-persistent-disk-csi-driver
args:
- "--v=5"
- "--endpoint=unix:/csi/csi.sock"
securityContext:
privileged: true
volumeMounts:
- name: kubelet-dir
mountPath: /var/lib/kubelet
Expand All @@ -67,8 +66,6 @@ spec:
mountPath: /run/udev
- name: sys
mountPath: /sys
nodeSelector:
kubernetes.io/os: linux
volumes:
- name: registration-dir
hostPath:
Expand Down Expand Up @@ -108,4 +105,4 @@ spec:
# See "special case". This will tolerate everything. Node component should
# be scheduled on all nodes.
tolerations:
- operator: Exists
- operator: Exists
13 changes: 13 additions & 0 deletions deploy/kubernetes/overlays/linux/base/noderegistrar.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,13 @@
kind: DaemonSet
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this file can be removed now right?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fix it.

apiVersion: apps/v1
metadata:
name: csi-gce-pd-node
spec:
template:
spec:
containers:
- name: csi-driver-registrar
args:
- "--v=5"
- "--csi-address=/csi/csi.sock"
- "--kubelet-registration-path=/var/lib/kubelet/plugins/pd.csi.storage.gke.io/csi.sock"
Original file line number Diff line number Diff line change
Expand Up @@ -5,11 +5,4 @@ BROKEN AT ANY TIME
This is the absolute cutting edge development Driver, it is intended for testing
and development only and can have vast differences in
functionality/behavior/configuration. Use only to try the newest features that
are not guaranteed to work yet.

APPROXIMATE CHANGELOG in latest:
* Topology
* RePD
* Volume ID Format Changed
* Node ID Format Changed
* Parameter "zone" Removed
are not guaranteed to work yet.
Original file line number Diff line number Diff line change
@@ -1,7 +1,7 @@
apiVersion: kustomize.config.k8s.io/v1beta1
kind: Kustomization
bases:
- ../../base
- ../base
images:
- name: gke.gcr.io/gcp-compute-persistent-disk-csi-driver
newName: gcr.io/gke-release-staging/gcp-compute-persistent-disk-csi-driver
Expand Down
Original file line number Diff line number Diff line change
@@ -1,7 +1,7 @@
apiVersion: kustomize.config.k8s.io/v1beta1
kind: Kustomization
bases:
- ../../base
- ../base
images:
- name: gke.gcr.io/gcp-compute-persistent-disk-csi-driver
newName: gcr.io/gke-release-staging/gcp-compute-persistent-disk-csi-driver
Expand Down
Original file line number Diff line number Diff line change
@@ -1,7 +1,7 @@
apiVersion: kustomize.config.k8s.io/v1beta1
kind: Kustomization
bases:
- ../../base
- ../base
images:
- name: gke.gcr.io/gcp-compute-persistent-disk-csi-driver
# Don't change stable image without changing pdImagePlaceholder in
Expand Down
12 changes: 12 additions & 0 deletions deploy/kubernetes/overlays/windows/base/allowedHostPaths.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,12 @@
apiVersion: policy/v1beta1
kind: PodSecurityPolicy
metadata:
name: csi-gce-pd-node-psp
spec:
allowedHostPaths:
- pathPrefix: \var\lib\kubelet
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this is going to replace the current PSP with these rules, instead of add, which is not desirable if we're going to be running both linux + windows.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

good point, will try to make a merge here.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

actually, I think it is better to have separate policy for linux and windows instead of merging

- pathPrefix: \var\lib\kubelet\plugins_registry
- pathPrefix: \var\lib\kubelet\plugins\pd.csi.storage.gke.io
- pathPrefix: \\.\pipe\csi-proxy-disk-v1alpha1
- pathPrefix: \\.\pipe\csi-proxy-volume-v1alpha1
- pathPrefix: \\.\pipe\csi-proxy-filesystem-v1alpha1
36 changes: 36 additions & 0 deletions deploy/kubernetes/overlays/windows/base/kustomization.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,36 @@
apiVersion: kustomize.config.k8s.io/v1beta1
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are you going to need multiple overlays for windows too (ie staging, stable, dev)?

The challenge I see with this structure is that we now have 2 places that we need to keep in sync when we update overlays. For example, if we make a update to staging, then we need to update both windows and linux staging overlays.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So with one overlay, we have both windows and linux together. What if user need to deploy with mixed OS and versions, e.g, linux stable, and windows dev etc? Or just because some versions are only available in linux, not windows?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we'll still need to have different overlays for dev/alpha/staging/stable and we can control the versions used in each. For example, the dev overlay could use stable versions for linux, and unstable versions for windows. And in terms of development flow, we can add Windows first to the dev overlay, then once we're ready, add it to staging, and then when we're ready to release, add it to stable.

Copy link
Contributor Author

@jingxu97 jingxu97 Jun 4, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe I miss something here, so you think user will not need a mixed version for windows and linux. If they want to deploy stable version, it will be stable version for both linux and windows? But what if windows don't have stable version yet, user must deploy dev version for both linux and windows, instead of one stable version and one dev version?

Copy link
Contributor Author

@jingxu97 jingxu97 Jun 4, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also in the case user already deploy a Linux driver, and want to add a windows driver (might be a different version from Linux), with one overlay, it will try to deploy both linux and windows. Some setup might be messed up with different versions of deployment?

In the current layout, we have common things for both linux and windows in the base, and different things in overlay. So new changes, either common to linux and windows, which will go to base, or different from linux and windows, and we need to update them wherever applicable. So I don't see there is more work needed compared to one overlay which contains both linux and windows. Also there might be cases that some setup is only applicable for linux, and we might accidentally break windows setup if we don't pay attention to the windows part?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have been thinking the structure again last night :), and realized this is more like a multi-stage process. So instead of base/overlay structure, I propose to use the following as multiple stages and use them as bases. I am working on this structure now and will update. Please let me know what you think about this structure.

  • cluster_controller_setup : includes cluster setup and controller yaml file

  • node_setup:
    * Linux: node and other related yaml files for Linux
    * Windows: node and other related yaml files for Windows

  • image_update
    * alpha:
    *. stable:
    *. staging:
    * dev:
    ...

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not quite sure how this would work out but am curious to see. One thing to note is that alpha/staging/stable is not necessarily just different versions. Different features may be added depending on their maturity.

what if windows don't have stable version yet, user must deploy dev version for both linux and windows, instead of one stable version and one dev version?

In terms of looking at the driver itself, I think we should consider both Linux + Windows parts of the driver to be part of one complete driver instead of 2 independent drivers. So considering that Windows is still alpha right now, I think it is fine if it uses the Linux alpha versions as well (which currently don't have any difference from stable). I don't think a user will want to have a mixed Linux + Windows cluster where the Linux part is stable and the Windows part is alpha. I would expect a user to have 2 separate clusters: a Linux-only cluster used for production, and an experimental Windows-only cluster.

Also in the case user already deploy a Linux driver, and want to add a windows driver (might be a different version from Linux), with one overlay, it will try to deploy both linux and windows. Some setup might be messed up with different versions of deployment?

I agree with one deployment and one place to define all the versions, it's easier to make sure the versions are in-sync with each other.

kind: Kustomization
bases:
- ../../../base
namespace:
gce-pd-csi-driver
resources:
- node.yaml
patchesStrategicMerge:
- allowedHostPaths.yaml
images:
- name: gke.gcr.io/gcp-compute-persistent-disk-csi-driver
# Don't change stable image without changing pdImagePlaceholder in
# test/k8s-integration/main.go
newName: gke.gcr.io/gcp-compute-persistent-disk-csi-driver
newTag: "v0.7.0-gke.0"
- name: gke.gcr.io/gcp-compute-persistent-disk-csi-driver-win
# Temporarly set to the private repo. Will swtich to public one
# once it is available.
newName: gcr.io/jing-k8s-dev/gce-pd-windows-2019
newTag: "0.2.0"
- name: gke.gcr.io/csi-provisioner
newName: gke.gcr.io/csi-provisioner
newTag: "v1.5.0-gke.0"
- name: gke.gcr.io/csi-attacher
newName: gke.gcr.io/csi-attacher
newTag: "v2.1.1-gke.0"
- name: gke.gcr.io/csi-node-driver-registrar
newName: gcr.io/k8s-staging-csi/csi-node-driver-registrar
newTag: "amd64-windows-v20200428-v1.3.0-26-g510710d5"
- name: gke.gcr.io/csi-resizer
newName: gke.gcr.io/csi-resizer
newTag: "v0.4.0-gke.0"
- name: gke.gcr.io/csi-snapshotter
newName: gke.gcr.io/csi-snapshotter
newTag: "v2.1.1-gke.0"
Loading