-
Notifications
You must be signed in to change notification settings - Fork 1.3k
Install a storage vendor which supports CSI snapshot in preview env #10201
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
From my side, I have no objections against Longhorn. It's good that you know it well @jenting and that on the Platform side we could collect a bit of experience with it, too.
With that being said, the Harvester cluster currently has a total storage capacity of slightly above 5TB, and we intend it to run up to 65 preview envs in parallel. Based on these numbers, a preview env may use up to 76 GB on average. Because currently, disk are so much larger than the space that's used on them, we've set Longhorn's "Storage Over Provisioning Percentage" to 1000.
I assume the proposal is to install Longhorn inside every preview env VM and I assume the Gitpod (running inside a preview env) will not interact with the Longhorn instance that's part of Harvester. |
Proposed code changes: Lines 323 to 326 in bd8f2c7
Disable the Lines 289 to 303 in bd8f2c7
|
After thinking, Longhorn v1.2.4 CSI snapshot/backup current behavior backs up/restores from the PVC content to remote S3. Therefore, it relies on the S3 bucket now. Since we want PVC content backup/restore located in the local disk only, we need to wait for the Longhorn v1.3.0 release (The date is approximately June 09, 2022). Therefore, I'd check other storage vendors such as Rook Ceph, OpenEBS, etc or waits for Longhorn v1.3.0 release. |
Install the Longhorn v1.3.0-rc2 pre-release, and apply the storage class into the cluster in workspace pre env. kind: VolumeSnapshotClass
apiVersion: snapshot.storage.k8s.io/v1beta1
metadata:
name: longhorn-snapshot-vsc
driver: driver.longhorn.io
deletionPolicy: Delete
parameters:
type: snap Using the latest main branch, the backup thru. the volume snapshot controller works as expected. |
The problem is that if the original PV/PVC was gone, even the VolumeSnapshot/VolumeSnapshotContent exists, restoring the PVC from the VolumeSnapshot fails because the Longhorn is unable to create the PV back. The error message of PVC is Warning ProvisioningFailed 72s driver.longhorn.io_csi-provisioner-869bdc4b79-rnp2r_52143314-b554-4361-818f-7267231ecee3 failed to provision volume with StorageClass "longhorn": rpc error: code = Internal desc = Bad response statusCode [500]. Status [500 Internal Server Error]. Body: [message=unable to create volume: unable to create volume pvc-c718f270-3672-488b-948d-7253611f4fad: failed to verify data source: cannot get client for volume pvc-750f59f7-81d4-4716-a55c-6eacc4bec5d6: engine is not running, code=Server Error, detail=] from [http://longhorn-backend:9500/v1/volumes]
Warning ProvisioningFailed 56s (x2 over 85s) driver.longhorn.io_csi-provisioner-869bdc4b79-rnp2r_52143314-b554-4361-818f-7267231ecee3 failed to provision volume with StorageClass "longhorn": rpc error: code = Internal desc = Bad response statusCode [500]. Status [500 Internal Server Error]. Body: [detail=, message=unable to create volume: unable to create volume pvc-c718f270-3672-488b-948d-7253611f4fad: failed to verify data source: cannot get client for volume pvc-750f59f7-81d4-4716-a55c-6eacc4bec5d6: engine is not running, code=Server Error] from [http://longhorn-backend:9500/v1/volumes]
Normal Provisioning 24s (x7 over 85s) driver.longhorn.io_csi-provisioner-869bdc4b79-rnp2r_52143314-b554-4361-818f-7267231ecee3 External provisioner is provisioning volume for claim "longhorn-system/test-restore-pvc"
Warning ProvisioningFailed 24s (x4 over 85s) driver.longhorn.io_csi-provisioner-869bdc4b79-rnp2r_52143314-b554-4361-818f-7267231ecee3 failed to provision volume with StorageClass "longhorn": rpc error: code = Internal desc = Bad response statusCode [500]. Status [500 Internal Server Error]. Body: [code=Server Error, detail=, message=unable to create volume: unable to create volume pvc-c718f270-3672-488b-948d-7253611f4fad: failed to verify data source: cannot get client for volume pvc-750f59f7-81d4-4716-a55c-6eacc4bec5d6: engine is not running] from [http://longhorn-backend:9500/v1/volumes]
Normal ExternalProvisioning 9s (x8 over 85s) persistentvolume-controller waiting for a volume to be created, either by external provisioner "driver.longhorn.io" or manually created by system administrator |
Thank you for creating the bug with Longhorn, @jenting . 🙏 |
Moving back to schedule, working on other more important issues that would benefit our customers. |
@meysholdt does Platform have bandwidth to help own this issue? 🤔 🙏 As you can see, we're having trouble with Longhorn, and are considering other options. |
@jenting can you share a detailed plan with @meysholdt for how you were doing Gitpod setup and testing of PVC in preview environments (after having installed CSI driver)? For Gitpod setup, after having installed the CSI driver (not trivial, maturity varies) and preparing a storage-class, what else is needed, aside from enabling the PVC feature flag in Gitpod for a user to test? I assume you must configure the storage-class that we would like Gitpod to use, too. For testing, I assume you were trying workspace start and workspace stop, when it does not work, look at the related PVC and snapshotter objects/events/logs. |
I've updated the criteria we require in this issue description. Personally, I'd prefer the storage vendor which supports using the existing file system's directory as storage, because we don't have to create another partition or block device. |
Thank you, @jenting for the detailed description 🙏 ! I agree, using the existing file system as storage would be ideal 💡 . |
Should be all set, @meysholdt , @jenting updated this issue's description with supporting detail. |
Install the Rook/Ceph, it works well and the CSI behavior is what we want. Here are the Rook/Ceph installation steps.
Test
|
Thank you, @jenting ! 🙏 |
Uh oh!
There was an error while loading. Please reload this page.
Is your feature request related to a problem? Please describe
Install a storage vendor which supports CSI snapshot in preview env.
Describe the behavior you'd like
The workspace team is working on backup/restoring user workspace files from the S3 to PVC volume snapshot/restore, this is to address epic #7901.
We're making the CSI snapshot/restore workable in the GCP environment. To make it easier for developer daily development, we should make the preview environment CSI snapshot/restore work as well.
However, we did a test that the local-path-provisioner doesn't support CSI snapshot/restore, therefore, we need to consider another storage vendor which supports CSI snapshot/restore, and this storage vendor needs to be installed in the preview environment as well (could be deployed as an optional by werft annotation).
Successful storge vendor criteria we consider:
pod-1
with PVCpvc-1
, and write some data to the PVCpvc-1
.vs-1
for the PVCpvc-1
, it can back up the snapshot success, and the VolumeSnapshotContent be created.restore-pod-2
and PVCrestore-pvc-2
with data source as VolumeSnapshotvs-1
, check the PVCrestore-pvc-2
with correct data content.pod-1
and PVCpvc-1
, and delete Podrestore-pod-2
and PVCrestore-pvc-2
.pod-3
and PVCrestore-pvc-3
with data source as VolumeSnapshotvs-1
, check the PVCrestore-pvc-3
with correct data content.Describe alternatives you've considered
N/A
Additional context
#7901
We could consider the Longhorn as the storage vendor since it's the default storage of Harvester and it's easy to deploy within the Kubernetes cluster.
What we need to consider is that once we enable the preview environment with CSI snapshot/backup support
The text was updated successfully, but these errors were encountered: