Skip to content
This repository was archived by the owner on Dec 3, 2024. It is now read-only.

Major refactor and bugfixes #49

Merged
merged 1 commit into from
Apr 6, 2021
Merged

Conversation

@wlan0 wlan0 self-assigned this Mar 26, 2021
@k8s-ci-robot k8s-ci-robot added the cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. label Mar 26, 2021
@wlan0
Copy link
Contributor Author

wlan0 commented Mar 26, 2021

/hold

@k8s-ci-robot
Copy link
Contributor

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: wlan0

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@k8s-ci-robot k8s-ci-robot added do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. approved Indicates a PR has been approved by an approver from all required OWNERS files. size/XXL Denotes a PR that changes 1000+ lines, ignoring generated files. labels Mar 26, 2021
@wlan0 wlan0 force-pushed the master branch 7 times, most recently from 1e86a65 to 8fe5dd5 Compare March 31, 2021 06:13
- "--listen-address=$(LISTEN_ADDRESS)"
- "--s3-endpoint=$(S3_ENDPOINT)"
- "--driver-addr=$(DRIVER_ADDRESS)"
- "--endpoint=$(ENDPOINT)"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The variables endpoint, access-key, secret-key are not used in the driver and the pod fails to start since it doesn't expect that flag

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yeah driver update is still WIP

valueFrom:
secretKeyRef:
name: objectstorage-provisioner
key: LISTEN_ADDRESS
key: DRIVER_ADDRESS
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fails to create the deployment because the correct format for this is:

        env:
        - name: DRIVER_ADDRESS
          valueFrom:
            secretKeyRef:
              name: objectstorage-provisioner
              key: DRIVER_ADDRESS

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

driver changes are WIP

env:
- name: CONNECT_ADDRESS
- "--driver-addr=$(DRIVER_ADDRESS)"
envFrom:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fix to env:

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

driver changes are WIP


func (s *ProvisionerServer) ProvisionerCreateBucket(ctx context.Context,
req *cosi.ProvisionerCreateBucketRequest) (*cosi.ProvisionerCreateBucketResponse, error) {

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is the intention this to be empty? Since the current sample minio provisioner has the functionality of creating buckets, keys etc, maybe as part of this PR we can keep the same functionality
I do like the idea though of having a "bare" sample provisioner so that providers can use as base

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, there are other changes coming into this part. Its a WIP still

}, nil
}

func NewDefaultCOSIProvisionerServer(address string,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe we need to pass a dict of values here instead? Because in the case of COSI we need the endpoint, hmac keys etc and other providers might need other options as well

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is the server - this is the core framework. We shouldn't send the driver specific config (like keys) here. This is purely for starting the grpc serer listener.

Copy link
Contributor Author

@wlan0 wlan0 Apr 2, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If you want to use driver specific input, pass it in via CLI flags or other mechanisms (like env/metadata server/etc..) and wire it into Provisioner/Identity calls directly. (See how I've passed in provisioner name into sampledriver/identityServer)

sig := <-sigs
klog.InfoS("Signal received", "type", sig)
cancel()
<-time.After(30 * time.Second)

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why is this long of a pause needed?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is not a long pause. This is needed in case graceful shutdown doesn;t complete in 30 seconds.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the main thread is expected to exit in these 30 seconds

sig := <-sigs
klog.InfoS("Signal received", "type", sig)
cancel()
<-time.After(30 * time.Second)

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ditto on 30s wait

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

same answer as above

"bucketclass", bucket.Spec.BucketClassName,
)

if !strings.EqualFold(bucket.Spec.Provisioner, b.provisionerName) {
Copy link

@jeffvance jeffvance Apr 1, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

maybe this check could be moved before the klog call above?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It cannot be done because provisionerName can only be transitively obtained by getting the corresponding bucket.

return nil
}

if bucket.Status.BucketAvailable {

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

what about also checking for bucket.deletionTimestamp != nil ?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I haven't gone through all the usecases yet. It's better to do a full sweep across both controllers for such state transitions.


"github.com/pkg/errors"
"google.golang.org/grpc/codes"
"google.golang.org/grpc/status"
Copy link

@jeffvance jeffvance Apr 1, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: no pkg alias may be typical, but it could be less confusing to alias this pkg to something like "rpcstatus". Thinking code below if status.Code ...

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yeah I agree. :)

}
}

bucket.Status.BucketAvailable = false

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this the trigger to the central controller to complete the delete operation?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes

"bucketclass", bucket.Spec.BucketClassName,
)

if !strings.EqualFold(bucket.Spec.Provisioner, b.provisionerName) {

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there a common place in the code path where this test can be done before any CRUD methods are called?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Its not that simple - underlying framework code cannot do that because in case of things like BAs, we have to trasitively find the bucket before inferring the provisioner

return errors.Wrap(err, "Failed to fetch bucket")
}

if !strings.EqualFold(bucket.Spec.Provisioner, bal.provisionerName) {

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can this check be moved to the beginning of the func?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It cannot. Because provisioner is inferred from the bucket

"bucket", bucketName,
)

if bucketAccess.Spec.MintedSecretName != "" {

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should BA.deletionTimestamp be checked to make sure there's not a pending delete on the BA?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same as above - I want to handle all cases that are not basic straightforward ones as a second sweep over the entire codebase.

}, nil
}

// Add attempts to create a bucket for a given bucket. This function must be idempotent

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fix comment: this func grants access to an existing bucket

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fixed

}
ns := bal.namespace
mintedSecretName := "ba-" + string(bucketAccess.UID)
if _, err := bal.Secrets(ns).Get(ctx, mintedSecretName, metav1.GetOptions{}); err != nil {
Copy link

@jeffvance jeffvance Apr 2, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why do Get then Create? I think it would be cleaner to call only Create and test for already-exists.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That would work too

if _, err := bal.Secrets(ns).Get(ctx, mintedSecretName, metav1.GetOptions{}); err != nil {
if !kubeerrors.IsNotFound(err) {
klog.ErrorS(err,
"Failed to create secrets",
Copy link

@jeffvance jeffvance Apr 2, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Err msg could be improved by mentioning the secret already exists.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The condition is true if the error is anything but "Secret is not found". i.e. we are ignoring AlreadyExists.

Namespace: ns,
},
StringData: map[string]string{
"CredentialsFilePath": credentialsFilePath,
Copy link

@jeffvance jeffvance Apr 2, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would there be a benefit to these key names defined as constants somewhere?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, I have thought about it - but left it as is for now. It doesn't make a big difference right now

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ok done

// Return values
// nil - Bucket successfully provisioned
// non-nil err - Internal error [requeue'd with exponential backoff]
func (b *BucketListener) Add(ctx context.Context, bucket *v1alpha1.Bucket) error {
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You should create a copy of bucket, since you modify below.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yeah already done

@wlan0 wlan0 force-pushed the master branch 2 times, most recently from 166514d to 1b92925 Compare April 5, 2021 22:14
provisioner string
}

func (s *ProvisionerServer) ProvisionerCreateBucket(ctx context.Context,
Copy link

@haslersn haslersn Apr 6, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

When the provisioner creates the backend bucket but for some reason (provisioner crash, sidecar crash, etc.) the ProvisionerCreateBucketResponse is not handled, then the backend bucket is leaked. In order to solve this, I propose a two-step creation process:

  1. ProvisionerCreateBucketID (gets called if bucket.Spec.BucketID == ""): The provisioner chooses a bucket ID and returns it in the response, but it doesn't create the backend bucket, yet. The sidecar persists the bucket ID to the bucket.Spec.BucketID.
  2. ProvisionerCreateBucket (gets called if bucket.Spec.BucketID != "" && !bucket.Status.BucketAvailable): The provisioner now creates the backend bucket. The sidecar persists bucket.Status.BucketAvailable = true.

This way ProvisionerCreateBucket is retried with the same BucketID in case of abovementioned exceptional situation.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

When the provisioner creates the backend bucket but for some reason (provisioner crash, sidecar crash, etc.) the ProvisionerCreateBucketResponse is not handled, then the backend bucket is leaked. In order to solve this, I propose a two-step creation process

This is not true. ProvisionerCreateBucket is idempotent. Bucket name + protocol is enough to ensure that.

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That is only the case if the bucket ID is deterministic. Are you saying that, for all provisioners, the bucket ID must be deterministically determined by the bucket name and protocol?

 - Requires changes to API from this PR github.com/kubernetes-retired/container-object-storage-interface-api/pull/35
 - Requires changes to API from this PR github.com/kubernetes-retired/container-object-storage-interface-spec/pull/25
@wlan0 wlan0 removed the do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. label Apr 6, 2021
Copy link

@krishchow krishchow left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

one minor comment, but otherwise it LGTM

provisionerName = DefaultProvisionerName
}

ctrl, err := controller.NewDefaultObjectStorageController("cosi", provisionerName, 40)

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we need to move identity and threads into command line flags?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Identity is inferred from ProvisionerGetInfo, and thread count is not something that NEEDS to be configurable yet - it'll cause more confusion than any potential benefits

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

sounds good!

@krishchow
Copy link

/lgtm

@k8s-ci-robot k8s-ci-robot added the lgtm "Looks good to me", indicates that a PR is ready to be merged. label Apr 6, 2021
@k8s-ci-robot k8s-ci-robot merged commit 42975ae into kubernetes-retired:master Apr 6, 2021
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
approved Indicates a PR has been approved by an approver from all required OWNERS files. cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. lgtm "Looks good to me", indicates that a PR is ready to be merged. size/XXL Denotes a PR that changes 1000+ lines, ignoring generated files.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants