-
Notifications
You must be signed in to change notification settings - Fork 270
✨ Reconcile SecurityGroups <-> Instances separately #1774
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
✅ Deploy Preview for kubernetes-sigs-cluster-api-openstack ready!
To edit notification comments on pull requests, go to your Netlify site configuration. |
[APPROVALNOTIFIER] This PR is NOT APPROVED This pull-request has been approved by: EmilienM The full list of commands accepted by this bot can be found here.
Needs approval from an approver in each of these files:
Approvers can indicate their approval by writing |
1bc5fd9
to
f9f1bae
Compare
@@ -107,6 +107,11 @@ type OpenStackMachineStatus struct { | |||
// +optional | |||
InstanceState *InstanceState `json:"instanceState,omitempty"` | |||
|
|||
// SecurityGroups is the list of security groups that were applied to the ports |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Note for myself: in comment I'll explain that we only populate that once all the rules have been applied.
Note for reviewers: do we want to have another field SecurityGroupsReady
once the reconciler has finished its job successfully? Now, let's say we managed to successfully reconcile once and Status.SecurityGroups
was populated once, then somehow the reconciler fails but we still have Status.SecurityGroups
, we have mark the instance as NotReady. Is that good enough?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Implementation note: something like this would be implemented as a Condition rather than a status field. I don't think it's worth having a condition for this unless we think we have a consumer for it.
The question is: who consumed the Ready signal and what are their expectations of it? The answer is defined here: https://cluster-api.sigs.k8s.io/developer/architecture/controllers/machine:
The status object must at least one field defined:
- ready - a boolean field indicating if the infrastructure is ready to be used or not.
For machine creation I think it's clear that we would only set ready
after security groups have been applied.
For an update to applied security groups it's less clear, and the CAPI definition doesn't indicate if setting it to false after having set it to true is meaningful or allowed. I would not be inclined to set it to false during a rollout of new managed security groups. I think this would be better indicated in the cluster status as a summary of how many machines have been updated.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For machine creation I think it's clear that we would only set ready after security groups have been applied.
Exactly. That's what I meant when writing that code:
err = r.reconcileSecurityGroupsToInstance(scope.Logger(), openStackCluster, machine, openStackMachine, instanceStatus, computeService, networkingService)
if err != nil {
conditions.MarkFalse(openStackMachine, infrav1.InstanceReadyCondition, infrav1.SecurityGroupApplyFailedReason, clusterv1.ConditionSeverityError, "Applying security groups failed: %v", err)
return ctrl.Result{}, fmt.Errorf("add security groups to instance: %w", err)
}
For an update to applied security groups it's less clear, and the CAPI definition doesn't indicate if setting it to false after having set it to true is meaningful or allowed. I would not be inclined to set it to false during a rollout of new managed security groups. I think this would be better indicated in the cluster status as a summary of how many machines have been updated.
ok I'll try to think how the state of how many machines have been updated can be reflected in the OpenStackClusterStatus
.
I've updated my TODO in the PR and I'll close that comment once I came up with something.
if !contains(desiredSecurityGroupIDs, securityGroupID) { | ||
logger.Info("Port has security group that is not desired", "portID", portID, "securityGroupID", securityGroupID) | ||
portUpdateNeeded = true | ||
break |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Note for myself: I'll remove the break
so we get all logs in the loop. We know we need to update the port anyway at this point.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I checked in CI logs and I could see:
I1206 03:01:32.375132 1 securitygroups.go:434] "Port has security group that is not desired" controller="openstackcluster" controllerGroup="infrastructure.cluster.x-k8s.io" controllerKind="OpenStackCluster" OpenStackCluster="e2e-hwo2dd/cluster-e2e-hwo2dd" namespace="e2e-hwo2dd" name="cluster-e2e-hwo2dd" reconcileID=8ce25157-7075-4f45-96db-1a5bc66468f3 cluster="cluster-e2e-hwo2dd" portID="06b4adb5-05b9-405f-ad05-a8f5e5eea001" securityGroupID="0f239ce1-0ab5-4e6e-9f91-1afe29e21e5d"
And 0f239ce1-0ab5-4e6e-9f91-1afe29e21e5d
is the default
security group (source). Do we want to ignore this one when checking unwanted SGs attached to the ports?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hmm. Is it a bug that the default security group is attached to the ports?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ports that are created when port security is enabled will have the default
security group itself managed by Neutron.
I see two options:
- We keep it and look for its ID (once and early to avoid many Neutron API calls) and put it in
desiredSecuritygroupIDs
, therefore it's not a problem anymore. - We remove it in this loop
I personally prefer option 1 for safety & because that's default OpenStack behaviours here.
cc @stephenfin with your OpenStack dev hat, any preference?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
tbh, until today I didn't know that adding a port would add the default security group. What is the source of these ports? If they are ports that we are creating ourselves then we can instruct neutron not to assign a security group to the port. From openstackclient this is done by e.g.:
openstack port create --no-security-group ...
which under the hood sets the security_group_ids
field to []
(empty list) when creating the port. Could we do the same thing?
If these ports are coming from elsewhere then it's possible to have other security groups than the default one assigned to the port (openstack port create --security-group <sec-group> ...
) so I'm not sure how we can ignore everything? Do we forbid specifying ports with security groups defined?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Given the above, we probably need to decide what's authoritative: the security groups on the port, or the security ports defined on the Machine
. I assume the latter, but if this is the case then we're going to modify something that the user gave us. If we go with the former, we're either modifying or ignoring something else that the user gave us.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What is the source of these ports?
This is where we pre-build ports before creating an instance:
cluster-api-provider-openstack/pkg/cloud/services/compute/instance.go
Lines 177 to 222 in 7961f21
func (s *Service) constructPorts(openStackCluster *infrav1.OpenStackCluster, instanceSpec *InstanceSpec) ([]infrav1.PortOpts, error) { | |
// Ensure user-specified ports have all required fields | |
ports, err := s.normalizePorts(instanceSpec.Ports, openStackCluster, instanceSpec) | |
if err != nil { | |
return nil, err | |
} | |
// no networks or ports found in the spec, so create a port on the cluster network | |
if len(ports) == 0 { | |
port := infrav1.PortOpts{ | |
Network: &infrav1.NetworkFilter{ | |
ID: openStackCluster.Status.Network.ID, | |
}, | |
Trunk: &instanceSpec.Trunk, | |
} | |
for _, subnet := range openStackCluster.Status.Network.Subnets { | |
port.FixedIPs = append(port.FixedIPs, infrav1.FixedIP{ | |
Subnet: &infrav1.SubnetFilter{ | |
ID: subnet.ID, | |
}, | |
}) | |
} | |
ports = []infrav1.PortOpts{port} | |
} | |
// trunk support is required if any port has trunk enabled | |
portUsesTrunk := func() bool { | |
for _, port := range ports { | |
if port.Trunk != nil && *port.Trunk { | |
return true | |
} | |
} | |
return false | |
} | |
if portUsesTrunk() { | |
trunkSupported, err := s.isTrunkExtSupported() | |
if err != nil { | |
return nil, err | |
} | |
if !trunkSupported { | |
return nil, fmt.Errorf("there is no trunk support. please ensure that the trunk extension is enabled in your OpenStack deployment") | |
} | |
} | |
return ports, nil | |
} |
We don't pass any security groups.
Could we do the same thing?
I don't think so because we don't provide a list but a list of SecurityGroupFilter
, so I suspect that nothing will be passed to Neutron API if we pass an empty list of filters.
we probably need to decide what's authoritative:
I want CAPO to have the authority if possible: if users provide a security group for the Machine, then we add it to the port in the reconcile (not at instance creation) - so instance + initial ports have no SGs in theory.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've pushed e651abb - I have little hope that will fix this "default" SG, however this is addressing a real problem. We can probably classify this change as unrelated to this comment initially but I wanted to make it visible for review.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I checked with @pierreprinetti and the good news is that port.CreateOpts.SecurityGroups
in Gophercloud is *[]string
so if we pass an empty slice of strings, the port should be created with no security group. I'll test it.
@@ -107,6 +107,11 @@ type OpenStackMachineStatus struct { | |||
// +optional | |||
InstanceState *InstanceState `json:"instanceState,omitempty"` | |||
|
|||
// SecurityGroups is the list of security groups that were applied to the ports |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Implementation note: something like this would be implemented as a Condition rather than a status field. I don't think it's worth having a condition for this unless we think we have a consumer for it.
The question is: who consumed the Ready signal and what are their expectations of it? The answer is defined here: https://cluster-api.sigs.k8s.io/developer/architecture/controllers/machine:
The status object must at least one field defined:
- ready - a boolean field indicating if the infrastructure is ready to be used or not.
For machine creation I think it's clear that we would only set ready
after security groups have been applied.
For an update to applied security groups it's less clear, and the CAPI definition doesn't indicate if setting it to false after having set it to true is meaningful or allowed. I would not be inclined to set it to false during a rollout of new managed security groups. I think this would be better indicated in the cluster status as a summary of how many machines have been updated.
051f8ea
to
22ac367
Compare
Originally, we were adding the SecurityGroups to the instanceSpec. Now, the openstackmachine controller will first create the instances without security groups. Then we have a loop that will reconcile the desired security groups that we want to apply in the cluster[1] with the ones that are added to the instance's ports. [1] for now, we have 2 sets of rules: the one that can be added to `OpenStackMachine.Spec.SecurityGroups` and the one managed by `OpenStackCluster.Spec.ManagedSecurityGroups`.
When defining an OpenStackMachine with `Ports`, we no longer allow the use of `SecurityGroupFilters` because we now have a separate loop to reconcile ports based on 2 things: * If `OpenStackCluster.Spec.ManagedSecurityGroups`, machine ports will be reconciled to have the security group of their role (cp/worker). * If `OpenStackMachine.Spec.SecurityGroups` is specified for the machine, we'll reconcile the machine ports to ensure they have the Security Groups asked by the user. However if a user provides `OpenStackMachine.Spec.Ports`, we will return an error if the Port has `SecurityGroupFilters` because we have a separate reconcile which update *all* ports with the same Security Groups.
/test pull-cluster-api-provider-openstack-e2e-test |
1 similar comment
/test pull-cluster-api-provider-openstack-e2e-test |
if err != nil { | ||
return fmt.Errorf("error getting port %s: %v", portID, err) | ||
} | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Instead of doing 1593e43 - we could:
- Put a new struct which maps ports and their security group IDs in OpenStackMachine.Status so we do the lookup once
- we don't want to add these SG IDs to the slices in the openstackmachine_controller that we created for SG IDs to reconcile because they are specific to ports and not applied for all ports (unlike ManagedSecurityGroups & Spec.SecurityGroups)
- Here in the reconcile, if a port is found in the Status with SG IDs, we simply add them to the desired SGs.
This is a bit complex but this is to address the 3 current options that we have to attach security groups to machines:
- OpenStackCluster.Spec.ManagedSecurityGroups
- OpenStackMachine.Spec.SecurityGroups
- OpenStackMachine.Spec.Ports.SecurityGroupFilters
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Another option is we don't reconcile the security groups from ports defined in OpenStack.Machine.Spec.Ports
because we consider them immutable. That feels odd to me, and I can't find yet a good option here that doesn't bring complexity.
Required to use slices package.
@EmilienM: The following test failed, say
Full PR test history. Your PR dashboard. Please help us cut down on flakes by linking to an open issue when you hit one in your PR. Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. I understand the commands that are listed here. |
I'm taking a break from this PR as I figured that I really want the ports to be managed outside of the instance creation, see #1788 Reconcile Ports -> Reconcile Security groups -> Reconcile Instances |
/hold |
PR needs rebase. Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. |
Note: this one is not staled, I'll come back to it at some point this year, now that we have a new API for Security Groups. |
The Kubernetes project currently lacks enough contributors to adequately respond to all PRs. This bot triages PRs according to the following rules:
You can:
Please send feedback to sig-contributor-experience at kubernetes/community. /lifecycle stale |
The Kubernetes project currently lacks enough active contributors to adequately respond to all PRs. This bot triages PRs according to the following rules:
You can:
Please send feedback to sig-contributor-experience at kubernetes/community. /lifecycle rotten |
The Kubernetes project currently lacks enough active contributors to adequately respond to all issues and PRs. This bot triages PRs according to the following rules:
You can:
Please send feedback to sig-contributor-experience at kubernetes/community. /close |
@k8s-triage-robot: Closed this PR. In response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. |
What this PR does / why we need it:
Originally, the openstackmachine controller was adding the SecurityGroups to the instanceSpec.
Now, the openstackmachine controller will first create the instances
without security groups. Then we have a loop that will reconcile the
desired security groups that we want to apply in the cluster[1] with the
ones that are added to the instance's ports.
[1] for now, we have 2 sets of rules: the one that can be added to
OpenStackMachine.Spec.SecurityGroups
and the one managedby
OpenStackCluster.Spec.ManagedSecurityGroups
.TODOs: