Skip to content

Commit 1f932b6

Browse files
authored
Merge pull request #3906 from MikeSpreitzer/exempt-borrowing
KEP-1040: Update APF for borrowing by exempt priority levels
2 parents d1b916c + ceb7f6f commit 1f932b6

File tree

1 file changed

+214
-74
lines changed
  • keps/sig-api-machinery/1040-priority-and-fairness

1 file changed

+214
-74
lines changed

keps/sig-api-machinery/1040-priority-and-fairness/README.md

+214-74
Original file line numberDiff line numberDiff line change
@@ -641,20 +641,24 @@ queue has a chance of eventually getting useful work done.
641641

642642
Requests of an exempt priority are never held up in a queue; they are
643643
always dispatched immediately. Following is how the other requests
644-
are dispatched at a given apiserver.
644+
are dispatched at a given apiserver. Note that the dispatching of
645+
exempt requests can affect the dispatching of non-exempt requests,
646+
through borrowing of concurrency allocations.
645647

646648
As mentioned [above](#non-goals), the functionality described here
647649
operates independently in each apiserver.
648650

649-
The concurrency limit of an apiserver is divided among the non-exempt
650-
priority levels, and they can do a limited amount of borrowing from
651-
each other.
651+
The concurrency limit of an apiserver is divided among all the
652+
priority levels, exempt as well as non-exempt. There is a nominal
653+
division according to configuration, and a limited amount of dynamic
654+
borrowing between priority levels that responds to recent load.
652655

653656
Two fields of `LimitedPriorityLevelConfiguration`, introduced in the
654-
midst of the `v1beta2` lifetime, limit the borrowing. The fields are
655-
added in all the versions (`v1alpha1`, `v1beta1`, and `v1beta2`). The
656-
following display shows the new fields along with the updated
657-
description for the `AssuredConcurrencyShares` field, in `v1beta2`.
657+
midst of the `v1beta2` lifetime, limit the borrowing by and from
658+
non-exempt priority levels. The fields are added in all the versions
659+
(`v1alpha1`, `v1beta1`, `v1beta2`, and `v1beta3`). The following
660+
display shows the new fields along with the updated description for
661+
the `AssuredConcurrencyShares` field, in `v1beta2`.
658662

659663
```go
660664
type LimitedPriorityLevelConfiguration struct {
@@ -687,7 +691,7 @@ type LimitedPriorityLevelConfiguration struct {
687691
//
688692
// +optional
689693
LendablePercent int32
690-
694+
691695
// `borrowingLimitPercent`, if present, specifies a limit on how many seats
692696
// this priority level can borrow from other priority levels. The limit
693697
// is known as this level's BorrowingConcurrencyLimit (BorrowingCL) and
@@ -713,6 +717,82 @@ existing systems will be more continuous if we keep the meaning of
713717
`AssuredConcurrencyShares` has been renamed to
714718
`NominalConcurrencyShares`.
715719

720+
In the midst of the `v1beta3` lifetime a field was added to
721+
`PriorityLevelConfigurationSpec` to make it possible to specify the
722+
`NominalConcurrencyShares` and `LendablePercent` of exempt priority
723+
levels. That field is shown next. Also, the definition of `sum_acs`
724+
in `LimitedPriorityLevelConfiguration` was updated to sum over all
725+
priority levels rather than just the non-exempt ones. Before this
726+
change, the exempt priority levels did not get any nominal concurrency
727+
allocation nor lending limit and did not participate in borrowing ---
728+
they simply had unlimited dispatching and that had no relation with
729+
the dispatching for non-exempt priority levels.
730+
731+
```go
732+
// `exempt` specifies how requests are handled for an exempt priority level.
733+
// This field MUST be empty if `type` is `"Limited"`.
734+
// This field MAY be non-empty if `type` is `"Exempt"`.
735+
// If empty and `type` is `"Exempt"` then the default values
736+
// for `ExemptPriorityLevelConfiguration` apply.
737+
// +optional
738+
Exempt *ExemptPriorityLevelConfiguration
739+
```
740+
741+
At the same time, the relevant new datatype was added. It is shown
742+
below. The default number of nominal concurrency shares is set to the
743+
minimal value (i.e., 0) so as to minimize disruption as this feature
744+
is rolled into existing clusters; authorized administrators can choose
745+
to set it higher.
746+
747+
```go
748+
// ExemptPriorityLevelConfiguration describes the configurable aspects
749+
// of the handling of exempt requests.
750+
// In the mandatory exempt configuration object the values in the fields
751+
// here can be modified by authorized users, unlike the rest of the `spec`.
752+
type ExemptPriorityLevelConfiguration struct {
753+
// `nominalConcurrencyShares` (NCS) contributes to the computation of the
754+
// NominalConcurrencyLimit (NominalCL) of this level.
755+
// This is the number of execution seats nominally reserved for this priority level.
756+
// This DOES NOT limit the dispatching from this priority level
757+
// but affects the other priority levels through the borrowing mechanism.
758+
// The server's concurrency limit (ServerCL) is divided among all the
759+
// priority levels in proportion to their NCS values:
760+
//
761+
// NominalCL(i) = ceil( ServerCL * NCS(i) / sum_ncs )
762+
// sum_ncs = sum[priority level k] NCS(k)
763+
//
764+
// Bigger numbers mean a larger nominal concurrency limit,
765+
// at the expense of every other Limited priority level.
766+
// This field has a default value of zero.
767+
// +optional
768+
NominalConcurrencyShares int32
769+
770+
// `lendablePercent` prescribes the fraction of the level's NominalCL that
771+
// can be borrowed by other priority levels. This value of this
772+
// field must be between 0 and 100, inclusive, and it defaults to 0.
773+
// The number of seats that other levels can borrow from this level, known
774+
// as this level's LendableConcurrencyLimit (LendableCL), is defined as follows.
775+
//
776+
// LendableCL(i) = round( NominalCL(i) * lendablePercent(i)/100.0 )
777+
//
778+
// +optional
779+
LendablePercent int32
780+
781+
// The `BorrowingCL` of an Exempt priority level is implicitly `ServerCL`.
782+
// In other words, an exempt priority level
783+
// has no meaningful limit on how much it borrows.
784+
// There is no explicit representation of that here.
785+
}
786+
```
787+
788+
The fields of `ExemptPriorityLevelConfiguration` limit the borrowing
789+
from exempt priority levels. This type and its use are added in all
790+
the versions (`v1alpha1`, `v1beta1`, `v1beta2`, and `v1beta3`). In
791+
the next version, the common fields of
792+
`LimitedPriorityLevelConfiguration` and
793+
`ExemptPriorityLevelConfiguration` will move to their common ancestor
794+
`PriorityLevelConfigurationSpec`.
795+
716796
The limits on borrowing are two-sided: a given priority level has a
717797
limit on how much it may borrow and a limit on how much may be
718798
borrowed from it. The latter is a matter of protection, the former is
@@ -728,11 +808,11 @@ may continue to do so, but there will always remain the possibility
728808
that some class of requests is much "heavier" than the APF code
729809
estimates; for those, a deliberate jail is useful.
730810

731-
The following table shows the current default non-exempt priority
732-
levels and a proposal for their new configuration.
811+
The following table shows the values for the non-exempt priority
812+
levels in the default configuration.
733813

734-
| Name | Assured Shares | Proposed Lendable | Proposed Borrowing Limit |
735-
| ---- | -------------: | ----------------: | -----------------------: |
814+
| Name | Nominal Shares | Lendable | Proposed Borrowing Limit |
815+
| ---- | -------------: | -------: | -----------------------: |
736816
| leader-election | 10 | 0% | none |
737817
| node-high | 40 | 25% | none |
738818
| system | 30 | 33% | none |
@@ -741,14 +821,23 @@ levels and a proposal for their new configuration.
741821
| global-default | 20 | 50% | none |
742822
| catch-all | 5 | 0% | none |
743823

744-
Each non-exempt priority level `i` has two concurrency limits: its
824+
The following table shows the `ExemptPriorityLevelConfiguration`
825+
introduced for the exempt priority levels in the default
826+
configuration.
827+
828+
| Name | Nominal Shares | Lendable |
829+
| ---- | -------------- | -------- |
830+
| exempt | 0 | 50% |
831+
832+
Every priority level `i` has two concurrency limits: its
745833
NominalConcurrencyLimit (`NominalCL(i)`) as defined above by
746-
configuration, and a CurrentConcurrencyLimit (`CurrentCL(i)`) that is
747-
used in dispatching requests. The CurrentCLs are adjusted
748-
periodically, based on configuration, the current situation at
749-
adjustment time, and recent observations. The "borrowing" resides in
750-
the differences between CurrentCL and NominalCL. There are upper and lower
751-
bound on each non-exempt priority level's CurrentCL, as follows.
834+
configuration, and a CurrentConcurrencyLimit (`CurrentCL(i)`) ---
835+
which, for non-exempt priority levels, is used in dispatching
836+
requests. The CurrentCLs are adjusted periodically, based on
837+
configuration, the current situation at adjustment time, and recent
838+
observations. The "borrowing" resides in the differences between
839+
CurrentCL and NominalCL. There are upper and lower bound on each
840+
non-exempt priority level's CurrentCL, as follows.
752841

753842
```
754843
MaxCL(i) = NominalCL(i) + BorrowingCL(i)
@@ -762,13 +851,15 @@ CurrentCLs is always equal to the server's concurrency limit
762851
the NominalCLs and plus or minus a little for rounding in the
763852
adjustment algorithm below.
764853

765-
Dispatching is done independently for each priority level. Whenever
766-
(1) a non-exempt priority level's number of occupied seats is zero or
767-
below the level's CurrentCL and (2) that priority level has a
768-
non-empty queue, it is time to consider dispatching another request
769-
for service. The Fair Queuing for Server Requests algorithm below is
770-
used to pick a non-empty queue at that priority level. Then the
771-
request at the head of that queue is dispatched if possible.
854+
Dispatching is done independently for each priority level.
855+
Dispatching for an exempt priority level is never held up. For a
856+
non-exempt priority level: whenever (1) that priority level's number
857+
of occupied seats is zero or below the level's CurrentCL and (2) that
858+
priority level has a non-empty queue, it is time to consider
859+
dispatching another request for service. The Fair Queuing for Server
860+
Requests algorithm below is used to pick a non-empty queue at that
861+
priority level. Then the request at the head of that queue is
862+
dispatched if possible.
772863

773864
Every 10 seconds, all the CurrentCLs are adjusted. We do smoothing on
774865
the inputs to the adjustment logic in order to dampen control
@@ -779,18 +870,28 @@ high watermark `HighSeatDemand(i)`, time-weighted average
779870
`StDevSeatDemand(i)` of each priority level `i`'s seat demand over the
780871
just-concluded adjustment period. A priority level's seat demand at
781872
any given moment is the sum of its occupied seats and the number of
782-
seats in the queued requests. We also define `EnvelopeSeatDemand(i) =
783-
AvgSeatDemand(i) + StDevSeatDemand(i)`. The adjustment logic is
784-
driven by a quantity called smoothed seat demand
785-
(`SmoothSeatDemand(i)`), which does an exponential averaging of
873+
seats in the queued requests (this second term is necessarily zero for
874+
an exempt priority level). We also define a quantity
875+
`EnvelopeSeatDemand` as follows.
876+
877+
```
878+
EnvelopeSeatDemand(i) = AvgSeatDemand(i) + StDevSeatDemand(i)
879+
```
880+
881+
The adjustment logic is driven by a quantity called smoothed seat
882+
demand (`SmoothSeatDemand(i)`), which does an exponential averaging of
786883
EnvelopeSeatDemand values using a coeficient A in the range (0,1) and
787884
immediately tracks EnvelopeSeatDemand when it exceeds
788885
SmoothSeatDemand. The rule for updating priority level `i`'s
789-
SmoothSeatDemand at the end of an adjustment period is
790-
`SmoothSeatDemand(i) := max( EnvelopeSeatDemand(i),
791-
A*SmoothSeatDemand(i) + (1-A)*EnvelopeSeatDemand(i) )`. The value of
792-
`A` is fixed at 0.977 in the code, which means that the half-life of
793-
the exponential decay is about 5 minutes.
886+
SmoothSeatDemand at the end of an adjustment period is as follows.
887+
888+
```
889+
SmoothSeatDemand(i) := max( EnvelopeSeatDemand(i),
890+
A*SmoothSeatDemand(i) + (1-A)*EnvelopeSeatDemand(i) )
891+
```
892+
893+
The value of `A` is fixed at 0.977 in the code, which means that the
894+
half-life of the exponential decay is about 5 minutes.
794895

795896
Adjustment is also done on configuration change, when a priority level
796897
is introduced or removed or its NominalCL, LendableCL, or BorrowingCL
@@ -803,54 +904,93 @@ SmoothSeatDemand to a higher value would risk creating an illusion of
803904
pressure that decays only slowly; initializing to zero is safe because
804905
the arrival of actual pressure gets a quick response.
805906

806-
For adjusting the CurrentCL values, each non-exempt priority level `i`
807-
has a lower bound (`MinCurrentCL(i)`) for the new value. It is simply
808-
HighSeatDemand clipped by the configured concurrency limits:
809-
`MinCurrentCL(i) = max( MinCL(i), min( NominalCL(i), HighSeatDemand(i)
810-
) )`.
907+
For adjusting the CurrentCL values, each priority level `i` has a
908+
lower bound (`MinCurrentCL(i)`) for the new value. It is
909+
HighSeatDemand clipped by the configured lower, and upper if
910+
non-exempt, limit. The more aggressive setting for exempt priority
911+
levels gives them precedence when borrowing: they get all they want,
912+
and the remainder is available to the non-exempt levels.
913+
914+
```
915+
MinCurrentCL(i) = max( MinCL(i), min( NominalCL(i), HighSeatDemand(i) ) ) -- if non-exempt
916+
MinCurrentCL(i) = max( MinCL(i), HighSeatDemand(i) ) -- if exempt
917+
```
918+
919+
For the following logic we let the CurrentCL values be floating-point
920+
numbers, not necessarily integers.
811921

812-
If `MinCurrentCL(i) = NominalCL(i)` for every non-exempt priority
813-
level `i` then there is no wiggle room. In this situation, no
814-
priority level is willing to lend any seats. The new CurrentCL values
815-
must equal the NominalCL values. Otherwise there is wiggle room and
816-
the adjustment proceeds as follows. For the following logic we let
817-
the CurrentCL values be floating-point numbers, not necessarily
818-
integers.
819922

820-
The priority levels would all be fairly happy if we set CurrentCL =
821-
SmoothSeatDemand for each. We clip that by the lower bound just shown
822-
and define `Target(i)` as follows, taking it as a first-order target
823-
for each non-exempt priority level `i`.
923+
If `MinCurrentCL(i) = NominalCL(i)` for every priority level `i` then
924+
no adjustment is needed: the new CurrentCL values are set to the
925+
NominalCL values. Otherwise adjustment is in order and proceeds as
926+
follows.
927+
928+
For each exempt priority level, `CurrentCL` is set to `MinCurrentCL`.
929+
Not that this matters much, because dispatching for those is not
930+
actually limited. The sum of those limits, however, is subtracted
931+
from `ServerCL` to produce a value called `RemainingServerCL` that is
932+
used in computing the allocations for the non-exempt priority levels.
933+
If `RemainingServerCL` is zero or negative then all the non-exempt
934+
priority levels get `CurrentCL = 0`. Otherwise, the computation
935+
proceeds as follows.
936+
937+
Because of the borrowing by exempt priority levels, lower bounds could
938+
be problematic. Define `LowerBoundSum` as follows.
939+
940+
```
941+
LowerBoundSum = sum[non-exempt priority level i] MinCurrentCL(i)
942+
```
943+
944+
If `LowerBoundSum = RemainingServerCL` then there is no wiggle room:
945+
each non-exempt priority level gets `CurrentCL = MinCurrentCL`.
946+
947+
If `LowerBoundSum > RemainingServerCL` then the problem is
948+
over-constrained. The solution taken is to reduce all the lower
949+
bounds in the same proportion, to the point where their sum is
950+
feasible. At that point, there is no wiggle room. Thus, in this case
951+
the settings are as follows.
952+
953+
```
954+
CurrentCL(i) = MinCurrentCL(i) * RemainingServerCL / LowerBoundSum
955+
```
956+
957+
Finally, when `LowerBoundSum < RemainingServerCL` there _is_ wiggle
958+
room and the borrowing computation proceeds as follows.
959+
960+
The non-exempt priority levels would all be fairly happy if we set
961+
CurrentCL = SmoothSeatDemand for each. We clip that by the lower
962+
bound just shown and define `Target(i)` as follows, taking it as a
963+
first-order target for each non-exempt priority level `i`.
824964

825965
```
826966
Target(i) = max( MinCurrentCL(i), SmoothSeatDemand(i) )
827967
```
828968

829969
Sadly, the sum of the Target values --- let's name that TargetSum ---
830-
is not necessarily equal to ServerCL. However, if `TargetSum <
831-
ServerCL` then all the Targets could be scaled up in the same
832-
proportion `FairProp = ServerCL / TargetSum` (if that did not violate
833-
any upper bound) to get the new concurrency limits `CurrentCL(i) :=
834-
FairProp * Target(i)` for each non-exempt priority level `i`.
835-
Similarly, if `TargetSum > ServerCL` then all the Targets could be
836-
scaled down in the same proportion (if that did not violate any lower
837-
bound) to get the new concurrency limits. This shares the wealth or
838-
the pain proportionally among the priority levels (but note: the upper
839-
bound does not affect the target, lest the pain of not achieving a
840-
high SmoothSeatDemand be distorted, while the lower bound _does_
841-
affect the target, so that merely achieving the lower bound is not
842-
considered a gain). The following computation generalizes this idea
843-
to respect the relevant bounds.
970+
is not necessarily equal to `RemainingServerCL`. However, if
971+
`TargetSum < RemainingServerCL` then all the Targets could be scaled
972+
up in the same proportion `FairProp = RemainingServerCL / TargetSum`
973+
(if that did not violate any upper bound) to get the new concurrency
974+
limits `CurrentCL(i) := FairProp * Target(i)` for each non-exempt
975+
priority level `i`. Similarly, if `TargetSum > RemainingServerCL`
976+
then all the Targets could be scaled down in the same proportion (if
977+
that did not violate any lower bound) to get the new concurrency
978+
limits. This shares the wealth or the pain proportionally among the
979+
priority levels (but note: the upper bound does not affect the target,
980+
lest the pain of not achieving a high SmoothSeatDemand be distorted,
981+
while the lower bound _does_ affect the target, so that merely
982+
achieving the lower bound is not considered a gain). The following
983+
computation generalizes this idea to respect the relevant bounds.
844984

845985
We can not necessarily scale all the Targets by the same factor ---
846986
because that might violate some upper or lower bounds. The problem is
847-
to find a proportion `FairProp` that can be shared by all the priority
848-
levels except those with a bound that forbids it. This means to find
849-
a value of `FairProp` that simultaneously solves all the following
850-
conditions, for the non-exempt priority levels `i`, and also makes the
851-
CurrentCL values sum to ServerCL. In some cases there are many
852-
satisfactory values of `FairProp` --- and that is OK, because they all
853-
produce the same CurrentCL values.
987+
to find a proportion `FairProp` that can be shared by all the
988+
non-exempt priority levels except those with a bound that forbids it.
989+
This means to find a value of `FairProp` that simultaneously solves
990+
all the following conditions, for the non-exempt priority levels `i`,
991+
and also makes the CurrentCL values sum to `RemainingServerCL`. In
992+
some cases there are many satisfactory values of `FairProp` --- and
993+
that is OK, because they all produce the same CurrentCL values.
854994

855995
```
856996
CurrentCL(i) = min( MaxCL(i), max( MinCurrentCL(i), FairProp * Target(i) ))
@@ -1916,7 +2056,7 @@ spec:
19162056
match:
19172057
- and: [ ] # match everything
19182058
```
1919-
2059+
19202060
Following is a FlowSchema that might be used for the requests by the
19212061
aggregated apiservers of
19222062
https://github.com/MikeSpreitzer/kube-examples/tree/add-kos/staging/kos

0 commit comments

Comments
 (0)