Skip to content

Commit b077d00

Browse files
authored
Merge pull request #3918 from krzykwas/master
KEP-1645, KEP-2149: Location-disambiguated addressing of Headless service pods
2 parents 17483a9 + 48a8c0b commit b077d00

File tree

3 files changed

+86
-44
lines changed

3 files changed

+86
-44
lines changed

keps/sig-multicluster/1645-multi-cluster-services-api/README.md

+49-18
Original file line numberDiff line numberDiff line change
@@ -85,7 +85,7 @@ tags, and then generate with `hack/update-toc.sh`.
8585
- [Proposal](#proposal)
8686
- [Terminology](#terminology)
8787
- [User Stories](#user-stories)
88-
- [Different Services Each Deployed to Separate Cluster](#different-services-each-deployed-to-separate-cluster)
88+
- [Different ClusterIP Services Each Deployed to Separate Cluster](#different-clusterip-services-each-deployed-to-separate-cluster)
8989
- [Single Service Deployed to Multiple Clusters](#single-service-deployed-to-multiple-clusters)
9090
- [Constraints](#constraints)
9191
- [Risks and Mitigations](#risks-and-mitigations)
@@ -258,15 +258,26 @@ nitty-gritty.
258258
controllers, or a human using kubectl to create resources. This document aims
259259
to support any implementation that fulfills the behavioral expectations of
260260
this API.
261-
- **cluster name** - A unique name or identifier for the cluster, scoped to the
261+
- **cluster name** - A unique identifier for a cluster, scoped to the
262262
implementation's cluster registry. We do not attempt to define the registry.
263-
Each cluster must have a name that can uniquely identify it within the
264-
clusterset. A cluster name must be a valid [RFC
265-
1123](https://tools.ietf.org/html/rfc1123) DNS label.
263+
The cluster name must be a valid [RFC 1123](https://tools.ietf.org/html/rfc1123)
264+
DNS label.
266265

267266
The cluster name should be consistent for the life of a cluster and its
268267
membership in the clusterset. Implementations should treat name mutation as a
269268
delete of the membership followed by recreation with the new name.
269+
- **cluster id** - A unique identifier for a cluster, scoped to a clusterset.
270+
The cluster id must be either:
271+
- equal to cluster name,
272+
- or composed of two valid [RFC 1123](https://tools.ietf.org/html/rfc1123)
273+
DNS labels separated with a dot. The first label equals cluster name and the
274+
second one gives additional context, allowing the implementation to uniquely
275+
identify a cluster within a clusterset composed of clusters registered with
276+
multiple cluster registries.
277+
278+
The cluster id should be consistent for the life of a cluster and its
279+
membership in the clusterset. Implementations should treat id mutation as a
280+
delete of the membership followed by recreation with the new name.
270281

271282
[namespace sameness]:
272283
https://github.com/kubernetes/community/blob/master/sig-multicluster/namespace-sameness-position-statement.md
@@ -293,8 +304,15 @@ be recognized as a single combined service. For example, if 5 clusters export
293304
all exporting clusters. Properties of the `ServiceImport` (e.g. ports, topology)
294305
will be derived from a merger of component `Service` properties.
295306

296-
This specification is not prescriptive on exact implementation details. Existing implementations of Kubernetes Service API (e.g. kube-proxy) can be
297-
extended to present `ServiceImports` alongside traditional `Services`. One often discussed implementation requiring no changes to kube-proxy is to have the mcs-controller maintain ServiceImports and create "dummy" or "shadow" Service objects, named after a mcs-controller managed EndpointSlice that aggregates all cross-cluster backend IPs, so that kube-proxy programs those endpoints like a regular Service. Other implementations are encouraged as long as the properties of the API described in this document are maintained.
307+
This specification is not prescriptive on exact implementation details. Existing
308+
implementations of Kubernetes Service API (e.g. kube-proxy) can be extended to
309+
present `ServiceImports` alongside traditional `Services`. One often discussed
310+
implementation requiring no changes to kube-proxy is to have the mcs-controller
311+
maintain ServiceImports and create "dummy" or "shadow" Service objects, named
312+
after a mcs-controller managed EndpointSlice that aggregates all cross-cluster
313+
backend IPs, so that kube-proxy programs those endpoints like a regular Service.
314+
Other implementations are encouraged as long as the properties of the API described
315+
in this document are maintained.
298316

299317
### User Stories
300318

@@ -305,11 +323,11 @@ the system. The goal here is to make this feel real for users without getting
305323
bogged down.
306324
-->
307325

308-
#### Different Services Each Deployed to Separate Cluster
326+
#### Different ClusterIP Services Each Deployed to Separate Cluster
309327

310-
I have 2 clusters, each running different services managed by different teams,
311-
where services from one team depend on services from the other team. I want to
312-
ensure that a service from one team can discover a service from the other team
328+
I have 2 clusters, each running different ClusterIP services managed by different
329+
teams, where services from one team depend on services from the other team. I want
330+
to ensure that a service from one team can discover a service from the other team
313331
(via DNS resolving to VIP), regardless of the cluster that they reside in. In
314332
addition, I want to make sure that if the dependent service is migrated to
315333
another cluster, the dependee is not impacted.
@@ -323,7 +341,7 @@ access instances of this service in priority order based on availability and
323341
locality. Requests to my replicated service should seamlessly transition (within
324342
SLO for dropped requests) between instances of my service in case of failure or
325343
removal without action by or impact on the caller. Routing to my replicated
326-
service should optimize for cost metric (e.g.prioritize traffic local to zone,
344+
service should optimize for cost metric (e.g. prioritize traffic local to zone,
327345
region).
328346

329347
### Constraints
@@ -534,11 +552,11 @@ given `EndpointSlice` will reference its `ServiceImport` using the label
534552
associated with its `Service` in a single cluster.
535553

536554
Each imported `EndpointSlice` will also have a
537-
`multicluster.kubernetes.io/source-cluster` label with the cluster name, a
538-
registry-scoped unique identifier for the cluster. The `EndpointSlice`s imported
539-
for a service are not guaranteed to exactly match the originally exported
540-
`EndpointSlice`s, but each slice is guaranteed to map only to a single source
541-
cluster.
555+
`multicluster.kubernetes.io/source-cluster` label with the cluster id, a
556+
clusterset-scoped unique identifier for the cluster. The `EndpointSlice`s
557+
imported for a service are not guaranteed to exactly match the originally
558+
exported `EndpointSlice`s, but each slice is guaranteed to map only to a single
559+
source cluster.
542560

543561
The mcs-controller is responsible for managing imported `EndpointSlice`s.
544562

@@ -860,6 +878,19 @@ required by virtue of being two different `ServiceExport`s.
860878
Note that this puts the burden of enforcing the boundaries of a
861879
`ServiceExport`'s fungibility on the name/namespace creator.
862880
881+
Individually addressing pods backing a Headless service is exempt from the rules
882+
described in this section. Such a pod may be addressed using the
883+
`<hostname>.<clusterid>.<svc>.<ns>.svc.clusterset.local` format, where `clusterid`
884+
must uniquely identify a cluster within a clusterset. The implementation may use
885+
cluster name as `clusterid`, and this is not ambiguous if all the clusters on
886+
the clusterset are registered with the same cluster registry. In case a
887+
clusterset contains clusters registered with multiple registries, cluster name
888+
may be ambiguous. The implementation may in such case use `clusterid` composed
889+
of cluster name and an additional DNS label, separated with a dot. The
890+
additional label gives additional context, which is implementation-dependent and
891+
may be used for instance to uniquely identify the cluster registry with which a
892+
cluster is registered.
893+
863894
864895
#### EndpointSlice
865896
@@ -891,7 +922,7 @@ mcs-controller itself in distributed implementations.
891922
We recommend creating leases to represent connectivity with source clusters.
892923
These leases should be periodically renewed by the mcs-controller while the
893924
connection with the source cluster is confirmed alive. When a lease expires, the
894-
cluster name and `multicluster.kubernetes.io/source-cluster` label may be used
925+
cluster id and `multicluster.kubernetes.io/source-cluster` label may be used
895926
to find and remove all `EndpointSlices` containing endpoints from the
896927
unreachable cluster.
897928

keps/sig-multicluster/1645-multi-cluster-services-api/specification.md

+7-8
Original file line numberDiff line numberDiff line change
@@ -47,7 +47,7 @@ endpoint's `hostname` field, or b) a unique, system-assigned identifier for the
4747
endpoint. Of importance to highlight is that since the [default hostname of an
4848
endpoint is the Pod's `metadata.name`
4949
field](https://kubernetes.io/docs/concepts/services-networking/dns-pod-service/#pod-s-hostname-and-subdomain-fields),
50-
this will likely often be the podname, but not always, and implementations must
50+
this will likely often be the pod name, but not always, and implementations must
5151
prefer a directly specified `hostname` value.
5252

5353
clusterset = as defined in [KEP-1645: Multi-Cluster Services API](README.md): “A
@@ -63,7 +63,7 @@ namespace.”
6363
`<clusterset-zone>` = domain for multi-cluster services in the clusterset, which
6464
must be `clusterset.local`; as this may become configurable in the future, this
6565
specification refers to it by the placeholder `<clusterset-zone>`, but per the
66-
MCS API it currently must be defined to be `clusterset.local`.
66+
MCS API it currently must be defined to be `clusterset.local`.
6767

6868
ClusterSetIP / `<clusterset-ip>` / clusterset IP = as defined in [KEP-1645:
6969
Multi-Cluster Services API](README.md): “A non-headless ServiceImport is
@@ -76,11 +76,10 @@ the aggregated Service.”
7676

7777
Cluster ID / `<clusterid>` = the cluster id stored in the `id.k8s.io
7878
ClusterProperty` as described in [KEP-2149: ClusterId for ClusterSet
79-
identification](../2149-clusterid/README.md). Though this can be any valid DNS
80-
label, the recommended value is a kube-system namespace uid ( such as
81-
`721ab723-13bc-11e5-aec2-42010af0021e`). For ease of KEP readability, this
82-
document uses human readable names `cluster-a` and `cluster-b` to represent the
83-
cluster IDs of two clusters in a ClusterSet.
79+
identification](../2149-clusterid/README.md). The recommended value is a
80+
kube-system namespace uid ( such as `721ab723-13bc-11e5-aec2-42010af0021e`). For
81+
ease of KEP readability, this document uses human readable names `cluster-a` and
82+
`cluster-b` to represent the cluster IDs of two clusters in a ClusterSet.
8483

8584

8685
### 2.2 - Record for Schema Version
@@ -324,4 +323,4 @@ depend on DNS records of this form.
324323

325324
(See the DNS section of the [KEP-1645: Multi-Cluster Services
326325
API](README.md#not-allowing-cluster-specific-targeting-via-dns) for more
327-
context.)
326+
context.)

0 commit comments

Comments
 (0)