Skip to content

Allow cross project snapshots and volumes #782

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Conversation

christian-roggia
Copy link
Contributor

@christian-roggia christian-roggia commented May 19, 2021

What type of PR is this?

/kind feature

What this PR does / why we need it:

This PR allows using CSI snapshots across projects.

This feature enables GKE users to create VolumeSnapshots in a project FOO and import them in a project BAR.
This enhancement simplifies the migration process from between projects and improves the user experience with tools like Velero.

Which issue(s) this PR fixes:

Fixes #781

Special notes for your reviewer:

Does this PR introduce a user-facing change?:

- It is now possible to access snapshots and volumes across different projects.

@k8s-ci-robot k8s-ci-robot added release-note Denotes a PR that will be considered when it comes time to generate release notes. kind/feature Categorizes issue or PR as related to a new feature. cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. needs-ok-to-test Indicates a PR that requires an org member to verify it is safe to test. labels May 19, 2021
@k8s-ci-robot
Copy link
Contributor

Welcome @christian-roggia!

It looks like this is your first PR to kubernetes-sigs/gcp-compute-persistent-disk-csi-driver 🎉. Please refer to our pull request process documentation to help your PR have a smooth ride to approval.

You will be prompted by a bot to use commands during the review process. Do not be afraid to follow the prompts! It is okay to experiment. Here is the bot commands documentation.

You can also check if kubernetes-sigs/gcp-compute-persistent-disk-csi-driver has its own contribution guidelines.

You may want to refer to our testing guide if you run into trouble with your tests not passing.

If you are having difficulty getting your pull request seen, please follow the recommended escalation practices. Also, for tips and tricks in the contribution process you may want to read the Kubernetes contributor cheat sheet. We want to make sure your contribution gets all the attention it needs!

Thank you, and welcome to Kubernetes. 😃

@k8s-ci-robot
Copy link
Contributor

Hi @christian-roggia. Thanks for your PR.

I'm waiting for a kubernetes-sigs member to verify that this patch is reasonable to test. If it is, they should reply with /ok-to-test on its own line. Until that is done, I will not automatically test new commits in this PR, but the usual testing commands by org members will still work. Regular contributors should join the org to skip this step.

Once the patch is verified, the new status will be reflected by the ok-to-test label.

I understand the commands that are listed here.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@k8s-ci-robot k8s-ci-robot added the size/S Denotes a PR that changes 10-29 lines, ignoring generated files. label May 19, 2021
@msau42
Copy link
Contributor

msau42 commented May 21, 2021

/ok-to-test
/assign @mattcary
This is an interesting idea but I'm not sure we'll be able to support it in GKE. The service account that we use the run the driver in GKE is limited to a single project. I'm not even sure a GCP service account can have cross project permissions?

@k8s-ci-robot k8s-ci-robot added ok-to-test Indicates a non-member PR verified by an org member that is safe to test. and removed needs-ok-to-test Indicates a PR that requires an org member to verify it is safe to test. labels May 21, 2021
@mattcary
Copy link
Contributor

At any rate it is confusing that as written we will silently change the project given in a snapshot id.

@jingxu97 it looks like you implemented this, do you have any background?

@christian-roggia
Copy link
Contributor Author

Service accounts can normally have cross-project permissions. This is actually a very common scenario.

I believe that the CSI driver has a dedicated "Service Agent" service account, but it should behave like any other service account.

Additionally, the CSI driver in GKE is mounting a Kubernetes service account. That Kubernetes-native SA can be used as a Workload Identity binding it to a GCP IAM service account managed by the project owner instead of GCP (i.e. this allows to avoid granting permissions directly to the "Service Agent" service account).

In both cases allowing the right IAM permissions cross-project should be possible.

@christian-roggia
Copy link
Contributor Author

@mattcary it is worth mentioning that the proposed PR solves the issue only for the lookup operation (i.e. GetSnapshot), but actually all operations currently performed by the driver have an hardcoded project.

I believe that in all cases where a fully qualified path is provided to the driver it should attempt to perform the operation with the project given in the path instead of silently replacing it.

I can update the PR with changes to all operations that should be provided with the project name.

@mattcary
Copy link
Contributor

I can update the PR with changes to all operations that should be provided with the project name.

+1 let's do that. Even if for some reason the cross-project roles don't work out (I think you're right but I want to double-check), we should not be changing the project name. eg a user may have different resources in two different projects sharing the same name. It would be much better to fail in that situation than silently switch projects....

@mattcary
Copy link
Contributor

/ok-to-test
/assign @mattcary
This is an interesting idea but I'm not sure we'll be able to support it in GKE. The service account that we use the run the driver in GKE is limited to a single project. I'm not even sure a GCP service account can have cross project permissions?

Cross-project snapshots are a thing :-) https://cloud.google.com/compute/docs/disks/create-snapshots#sharing_snapshots

It works without any additional roles being necessary when run as a user that is owner of two different projects. I don't see why a service account couldn't be given the same roles on two different projects either. However, in GKE, as Michelle said, which service account is used by master pods to authenticate to GCP is a little obscure. I'm still digging into that.

However, since cross-project snapshots work so smoothly I don't see any reason for the pd csi driver not to support them:

/clank/go/src/k8s.io/kubernetes$ gcloud compute instances list
NAME                                 ZONE           MACHINE_TYPE   PREEMPTIBLE  INTERNAL_IP  EXTERNAL_IP      STATUS
e2e-test-mattcary-master             us-central1-b  n1-standard-1               10.128.0.2   104.198.179.125  RUNNING
e2e-test-mattcary-minion-group-017w  us-central1-b  n1-standard-2               10.128.0.5   35.239.144.136   RUNNING
e2e-test-mattcary-minion-group-g4rb  us-central1-b  n1-standard-2               10.128.0.4   34.70.139.106    RUNNING
e2e-test-mattcary-minion-group-qtm1  us-central1-b  n1-standard-2               10.128.0.3   35.193.112.53    RUNNING
/clank/go/src/k8s.io/kubernetes$ gcloud compute disks create --zone us-central1-b --size 10G cross-test --project mattcary-e2e
WARNING: You have selected a disk size of under [200GB]. This may result in poor I/O performance. For more information, see: https://developers.google.com/compute/docs/disks#performance.
Created [https://www.googleapis.com/compute/v1/projects/mattcary-e2e/zones/us-central1-b/disks/cross-test].
NAME        ZONE           SIZE_GB  TYPE         STATUS
cross-test  us-central1-b  10       pd-standard  READY

New disks are unformatted. You must format and mount a disk before it
can be used. You can find instructions on how to do this at:

https://cloud.google.com/compute/docs/disks/add-persistent-disk#formatting

/clank/go/src/k8s.io/kubernetes$ gcloud compute instances attach-disk e2e-test-mattcary-minion-group-qtm1 --disk cross-test
Updated [https://www.googleapis.com/compute/v1/projects/mattcary-e2e/zones/us-central1-b/instances/e2e-test-mattcary-minion-group-qtm1].
/clank/go/src/k8s.io/kubernetes$ gcloud compute ssh e2e-test-mattcary-minion-group-qtm1
/clank/go/src/k8s.io/kubernetes$ gcloud compute instances describe e2e-test-mattcary-minion-group-qtm1
...
- autoDelete: false
  boot: false
  deviceName: persistent-disk-2
  diskSizeGb: '10'
  index: 2
  interface: SCSI
  kind: compute#attachedDisk
  mode: READ_WRITE
  source: https://www.googleapis.com/compute/v1/projects/mattcary-e2e/zones/us-central1-b/disks/cross-test
  type: PERSISTENT
...
/clank/go/src/k8s.io/kubernetes$ gcloud compute ssh e2e-test-mattcary-minion-group-qtm1
mattcary@e2e-test-mattcary-minion-group-qtm1 ~ $ ls -l /dev/disk/by-id/google-persistent-disk-2
lrwxrwxrwx 1 root root 9 May 21 16:15 /dev/disk/by-id/google-persistent-disk-2 -> ../../sdc
mattcary@e2e-test-mattcary-minion-group-qtm1 ~ $ ls /dev/sdc*
/dev/sdc
mattcary@e2e-test-mattcary-minion-group-qtm1 ~ $ sudo mkfs.ext4 -m 0 /dev/sdc 
...
mattcary@e2e-test-mattcary-minion-group-qtm1 ~ $ sudo mkdir /mnt/disks/cross-test
mattcary@e2e-test-mattcary-minion-group-qtm1 ~ $ sudo mount /dev/sdc /mnt/disks/cross-test
mattcary@e2e-test-mattcary-minion-group-qtm1 ~ $ ls /mnt/disks/cross-test
lost+found
mattcary@e2e-test-mattcary-minion-group-qtm1 ~ $ echo cross-test | sudo tee -a /mnt/disks/cross-test/hello
cross-test
mattcary@e2e-test-mattcary-minion-group-qtm1 ~ $ ls /mnt/disks/cross-test
hello  lost+found
mattcary@e2e-test-mattcary-minion-group-qtm1 ~ $ exit
logout
Connection to 35.193.112.53 closed.
/clank/go/src/k8s.io/kubernetes$ gcloud compute disks snapshot cross-test
Creating snapshot(s) ystif0358wgz...done.
/clank/go/src/k8s.io/kubernetes$ gcloud compute instances create snapshot-test --project mattcary-snapshot --machine-type e2-small
/clank/go/src/k8s.io/kubernetes$ gcloud compute disks create duplicate-cross-test --project mattcary-snapshot --source-snapshot projects/mattcary-e2e/global/snapshots/ystif0358wgz
Created [https://www.googleapis.com/compute/v1/projects/mattcary-snapshot/zones/us-central1-b/disks/duplicate-cross-test].
NAME                  ZONE           SIZE_GB  TYPE         STATUS
duplicate-cross-test  us-central1-b  10       pd-standard  READY
/clank/go/src/k8s.io/kubernetes$ gcloud compute instances attach-disk --project mattcary-snapshot snapshot-test --disk duplicate-cross-test
Updated [https://www.googleapis.com/compute/v1/projects/mattcary-snapshot/zones/us-central1-b/instances/snapshot-test].
/clank/go/src/k8s.io/kubernetes$ gcloud compute instances describe snapshot-test --project mattcary-snapshot
...
- autoDelete: false
  boot: false
  deviceName: persistent-disk-1
  diskSizeGb: '10'
  index: 1
  interface: SCSI
  kind: compute#attachedDisk
  mode: READ_WRITE
  source: https://www.googleapis.com/compute/v1/projects/mattcary-snapshot/zones/us-central1-b/disks/duplicate-cross-test
  type: PERSISTENT
...
/clank/go/src/k8s.io/kubernetes$ gcloud compute ssh --project mattcary-snapshot snapshot-test
mattcary@snapshot-test:~$ ls -l /dev/disk/by-id | grep disk-1
lrwxrwxrwx 1 root root  9 May 21 16:53 google-persistent-disk-1 -> ../../sdb
lrwxrwxrwx 1 root root  9 May 21 16:53 scsi-0Google_PersistentDisk_persistent-disk-1 -> ../../sdb
mattcary@snapshot-test:~$ sudo mount /dev/sdb /mnt
mattcary@snapshot-test:~$ ls /mnt
hello  lost+found
mattcary@snapshot-test:~$ cat /mnt/hello
cross-test

@christian-roggia
Copy link
Contributor Author

I should be able to find some time to work on it this week, do you have any clue why integration tests failed?

@mattcary
Copy link
Contributor

mattcary commented May 24, 2021 via email

@mattcary
Copy link
Contributor

/test pull-gcp-compute-persistent-disk-csi-driver-kubernetes-integration

I think (hope) that #783 fixed this.

@christian-roggia christian-roggia force-pushed the feature/cross-project-snapshot branch from 81f218a to 1364a18 Compare May 28, 2021 02:57
@k8s-ci-robot k8s-ci-robot added size/L Denotes a PR that changes 100-499 lines, ignoring generated files. and removed size/S Denotes a PR that changes 10-29 lines, ignoring generated files. labels May 28, 2021
@christian-roggia christian-roggia changed the title Allow cross project snapshots Allow cross project snapshots and volumes May 28, 2021
@k8s-ci-robot k8s-ci-robot added size/XL Denotes a PR that changes 500-999 lines, ignoring generated files. and removed size/L Denotes a PR that changes 100-499 lines, ignoring generated files. labels May 28, 2021
@christian-roggia
Copy link
Contributor Author

Even though the tests are all green, I feel like new tests should be added (especially e2e) to verify that the CSI driver can import snapshots and volumes from another project.

It is also important to verify which service account requires additional permission when performing cross-project operations, and which IAM permissions should be granted.

As described in my original issue, the candidate service accounts are the following:

    "serviceAccount:[email protected]",
    "serviceAccount:[email protected]",
    "serviceAccount:service-PROJECT_NUMBER@container-engine-robot.iam.gserviceaccount.com",
    "serviceAccount:pdcsi-workload@PROJECT_ID.iam.gserviceaccount.com",

and maybe workload identity might be required for the CSI SA:

pdcsi-workload@PROJECT_ID.iam.gserviceaccount.com
PROJECT_ID.svc.id.goog[kube-system/pdcsi-node-sa]

finally, I would expect the SA to require at least the following IAM permissions for snapshots:

compute.snapshots.list
compute.snapshots.get
compute.snapshots.useReadOnly

with additional permissions on disks that might be necessary:

compute.disks.get
compute.disks.list
compute.disks.use
compute.disks.useReadOnly

@mattcary
Copy link
Contributor

Even though the tests are all green, I feel like new tests should be added (especially e2e) to verify that the CSI driver can import snapshots and volumes from another project.

Thanks! I think that's a great point. However, the logistics of that get complicated. I think it should be possible to do it by renting another project, but then that would have to be plumbed through to the e2e test which is breaking new ground and is therefore complicated?

So I'd be okay both with holding this change until those e2e tests are done, or pushing this out and relying on the VolumeIdToKey unit tests for now.

It's my understanding that on GKE, it will be enough to give the container-engine-robot service accounts roles in the cross-project---that's the account plumbed to the pd csi cloud provider. That of course requires running on GKE, which we do in our internal tests, and so that would also require more plumbing in the k8s-integration tests.

For GCE (ie, kube-up) we'd have to check in the /etc/gke.conf that's installed on the master node. We're in the middle of making the pd csi driver on by default for kube-up which might clarify this issue a bit.

WDYT? In any case thanks again for noticing & clearing up this cross-project problem.

@christian-roggia
Copy link
Contributor Author

However, the logistics of that get complicated. I think it should be possible to do it by renting another project, but then that would have to be plumbed through to the e2e test which is breaking new ground and is therefore complicated?

I was expecting that this setup wouldn't be simple, I would suggest pushing into master then and manually verify that cross-project VolumeSnapshots work without major issues, the Velero community might help to test this feature out too.

WDYT? In any case thanks again for noticing & clearing up this cross-project problem.

You are more than welcome! The CSI driver is a very powerful tool and I am happy to be able to help out.

@@ -306,7 +303,7 @@ func ValidateDiskParameters(disk *CloudDisk, params common.DiskParameters) error
return nil
}

func (cloud *CloudProvider) InsertDisk(ctx context.Context, volKey *meta.Key, params common.DiskParameters, capBytes int64, capacityRange *csi.CapacityRange, replicaZones []string, snapshotID string, multiWriter bool) error {
func (cloud *CloudProvider) InsertDisk(ctx context.Context, project string, volKey *meta.Key, params common.DiskParameters, capBytes int64, capacityRange *csi.CapacityRange, replicaZones []string, snapshotID string, multiWriter bool) error {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since disks cannot be used cross-project, why are we plumbing the project around everywhere? Wouldn't it be easier & less toil to use cloud.project for everything except the snapshot api?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My understanding was that we didn't want to override the original project provided in the resource's names.

I can revert these changes if they don't make sense to you as they provide no additional functionalities.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, good point. Thank you, I've spent too long between reviewing sessions.

/lgtm
/approve

@k8s-ci-robot k8s-ci-robot added the lgtm "Looks good to me", indicates that a PR is ready to be merged. label Jun 1, 2021
@k8s-ci-robot
Copy link
Contributor

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: christian-roggia, mattcary

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@k8s-ci-robot k8s-ci-robot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Jun 1, 2021
@k8s-ci-robot k8s-ci-robot merged commit 7f5fa35 into kubernetes-sigs:master Jun 1, 2021
@husunal
Copy link

husunal commented Aug 18, 2021

Thanks for the PR. In order to consume snapshots between GCP projects, should we use VolumeSnapshotContent and specify the cross-project snapshot resource in the snapshotHandle?

@christian-roggia
Copy link
Contributor Author

@husunal yes, that would be the expected use of cross-projects snapshots.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
approved Indicates a PR has been approved by an approver from all required OWNERS files. cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. kind/feature Categorizes issue or PR as related to a new feature. lgtm "Looks good to me", indicates that a PR is ready to be merged. ok-to-test Indicates a non-member PR verified by an org member that is safe to test. release-note Denotes a PR that will be considered when it comes time to generate release notes. size/XL Denotes a PR that changes 500-999 lines, ignoring generated files.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Snapshots cannot be imported across projects
5 participants