Improve csi-snapshotter VolumeSnapshotContent requeue fairness #1282

pwschuurman · 2025-03-21T20:53:13Z

Is your feature request related to a problem?/Why is this needed

This enhancement is to improve the requeue behavior for syncing VolumeSnapshotContent resources.

VolumeSnapshotContent resources are reconciled via the contentQueue. For snapshots that are long running, this can be very useful to reduce the amount of polling required to determine if a snapshot is readyToUse=true. However the exponential nature of this backoff can result in the contentQueue rate limiter quickly reaching the maximum. The current default is 1 second, and the current maximum is 300 seconds. This only requires [9 requeue events] to reach the maximum. This limit can quickly be reached today, if a VolumeSnapshotContent is updated. Updates (especially re-entrant updates) trigger resync and requeue, which can quickly bump up the rate limiter retry number, resulting in long requeue wait times.

Describe the solution you'd like in detail

There are two things that should be fixed here:

Prevent updates from bumping the requeue rate limiter limit: Ideally, a additional call to contentQueue.AddRateLimited() should not increase the rate limiter exponent if an item is already scheduled to be requeued. It should either maintain the same requeue schedule, or be adjusted to requeue further into the future, but with the same backoff exponent.
Reduce the number of re-entrant updates. This can reduce the number of requeues (which can lead to the problem above). Some updates are necessary for tracking the lifecycle VolumeSnapshotContent. However it appears that the snapshot.storage.kubernetes.io/volumesnapshot-being-created annotation can be removed early during, prior to the snapshot actually being marked as readyToUse.

Describe alternatives you've considered

A quick fix alternative is just to decrease the max exponential backoff of contentQueue to a lower default (eg: 30 seconds, 60 seconds). This can be used by a CO to reduce the likelihood of higher latency VolumeSnapshotContent reconciliation.

Additional context

The text was updated successfully, but these errors were encountered:

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Improve csi-snapshotter VolumeSnapshotContent requeue fairness #1282

Improve csi-snapshotter VolumeSnapshotContent requeue fairness #1282

pwschuurman commented Mar 21, 2025

Improve csi-snapshotter VolumeSnapshotContent requeue fairness #1282

Improve csi-snapshotter VolumeSnapshotContent requeue fairness #1282

Comments

pwschuurman commented Mar 21, 2025