Skip to content

Pods can't mount PVC as disk seems not recreated from restore, despite pvc is up. #739

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
nsteinmetz opened this issue Apr 8, 2021 · 5 comments

Comments

@nsteinmetz
Copy link

nsteinmetz commented Apr 8, 2021

On a GKE cluster with new csi driver enabled and version v1.18.16-gke.2100, I evaluate velero (v1.5 and latest CSI plugin for velero) and volume snapshots.

For snapshots creation, it works well:

  • I can see volumesnapshots objects on kubernetes side
  • I can see snapshots in the GCP console

When trying to restore content:

  • I first delete deployments, pv and pvc from k8s
  • then I deleted disks from GCP
  • launch velero restore command

From PV/PVC side, seems it works as expected. All are bound BUT i don't see newly created disks on GCP side for the impacted pv/pvc.

For a PVC, I can see that it's restored from a snapshot:

Name:          pvc-mysql-k8s-nst3-datataskio
Namespace:     default
StorageClass:  datatask-sc
Status:        Bound
Volume:        pvc-f3bf0015-1c15-430e-bfe8-1b222d161a5d
Labels:        app=git
               dt-volume=mysql
               velero.io/backup-name=naive-backup9
               velero.io/restore-name=naive-backup9-20210408120114
               velero.io/volume-snapshot-name=velero-pvc-mysql-k8s-nst3-datataskio-vm66z
Annotations:   pv.kubernetes.io/bind-completed: yes
               pv.kubernetes.io/bound-by-controller: yes
               velero.io/backup-name: naive-backup9
               velero.io/volume-snapshot-name: velero-pvc-mysql-k8s-nst3-datataskio-vm66z
Finalizers:    [kubernetes.io/pvc-protection]
Capacity:      50Gi
Access Modes:  RWO
VolumeMode:    Filesystem
DataSource:
  APIGroup:  snapshot.storage.k8s.io
  Kind:      VolumeSnapshot
  Name:      velero-pvc-mysql-k8s-nst3-datataskio-vm66z
Used By:     mysql-6f996b96b6-9pmvg
Events:      <none>

but from pod, it cannot mount the pvc:

Name:           mysql-6f996b96b6-9pmvg
Namespace:      default
Priority:       0
Node:           gke-k8s-nst3-datataskio-default-pool-114e9218-12g8/10.1.0.3
Start Time:     Thu, 08 Apr 2021 12:13:34 +0200
Labels:         app=mysql
                pod-template-hash=6f996b96b6
Annotations:    <none>
Status:         Pending
IP:             
IPs:            <none>
Controlled By:  ReplicaSet/mysql-6f996b96b6
Containers:
  mysql:
    Container ID:  
    Image:         mysql:5.7
    Image ID:      
    Port:          3306/TCP
    Host Port:     0/TCP
    Args:
      --ignore-db-dir=lost+found
    State:          Waiting
      Reason:       ContainerCreating
    Ready:          False
    Restart Count:  0
    Limits:
      cpu:  50m
    Requests:
      cpu:  50m
    Environment:
      MYSQL_ROOT_PASSWORD:  <set to the key 'MYSQL_ROOT_PASSWORD' in secret 'mysql-secret'>  Optional: false
      MYSQL_DATABASE:       <set to the key 'MYSQL_DATABASE' in secret 'mysql-secret'>       Optional: false
      MYSQL_USER:           <set to the key 'MYSQL_USER' in secret 'mysql-secret'>           Optional: false
      MYSQL_PASSWORD:       <set to the key 'MYSQL_PASSWORD' in secret 'mysql-secret'>       Optional: false
    Mounts:
      /var/lib/mysql from gce-pd-mysql (rw)
      /var/run/secrets/kubernetes.io/serviceaccount from default-token-pkdbr (ro)
Conditions:
  Type              Status
  Initialized       True 
  Ready             False 
  ContainersReady   False 
  PodScheduled      True 
Volumes:
  gce-pd-mysql:
    Type:       PersistentVolumeClaim (a reference to a PersistentVolumeClaim in the same namespace)
    ClaimName:  pvc-mysql-k8s-nst3-datataskio
    ReadOnly:   false
  default-token-pkdbr:
    Type:        Secret (a volume populated by a Secret)
    SecretName:  default-token-pkdbr
    Optional:    false
QoS Class:       Burstable
Node-Selectors:  <none>
Tolerations:     node.kubernetes.io/not-ready:NoExecute op=Exists for 300s
                 node.kubernetes.io/unreachable:NoExecute op=Exists for 300s
Events:
  Type     Reason              Age                From                     Message
  ----     ------              ----               ----                     -------
  Normal   Scheduled           13m                default-scheduler        Successfully assigned default/mysql-6f996b96b6-9pmvg to gke-k8s-nst3-datataskio-default-pool-114e9218-12g8
  Warning  FailedAttachVolume  54s (x7 over 13m)  attachdetach-controller  AttachVolume.Attach failed for volume "pvc-f3bf0015-1c15-430e-bfe8-1b222d161a5d" : rpc error: code = NotFound desc = Could not find disk Key{"pvc-f3bf0015-1c15-430e-bfe8-1b222d161a5d", zone: "europe-west1-b"}: googleapi: Error 404: The resource 'projects/datataskio/zones/europe-west1-b/disks/pvc-f3bf0015-1c15-430e-bfe8-1b222d161a5d' was not found, notFound
  Warning  FailedMount         20s (x6 over 11m)  kubelet                  Unable to attach or mount volumes: unmounted volumes=[gce-pd-mysql], unattached volumes=[gce-pd-mysql default-token-pkdbr]: timed out waiting for the condition

What should I do ?
Is this use case supported yet ?
Or should I just recreate disk from snapshots on GCP console side instead ?

From the docs, it should be supported as far as I understand it.
https://cloud.google.com/kubernetes-engine/docs/how-to/persistent-volumes/volume-snapshots

GKE cluster use this version of the driver:

gke.gcr.io/gcp-compute-persistent-disk-csi-driver:v1.0.1-gke.0
@mattcary
Copy link
Contributor

I think this may because velero is trying to restore the disk via cloning, which is not supported yet: #161

@nsteinmetz
Copy link
Author

Thanks, I'll recheck this later as I made a quick POC and will do the final implementation and a more complete bunch of tests in the coming days.

I also noticed that some disks named restore-* appeared but I don't know if it was with this driver or the in-tree one. I think the latter.

@mattcary
Copy link
Contributor

mattcary commented Apr 13, 2021 via email

@nsteinmetz
Copy link
Author

Just tried with CSI again:

  • no disk restored as mentionned previously - should be due to the missing cloning support
  • if I recreate manually the disk from snapshots and restore via velero pv/pvc, seems I missed something as some content were not restored. But at least deployments are up and running.

I think restore-* were from velero for gcpPersistentDisk ; I'll confirm tomorrow ;-)

=> closing in favor of #161

@nsteinmetz
Copy link
Author

Just for your info: restore-* disks are created by Velero when using the PV/PVC with the in-tree driver.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants