Skip to content
This repository was archived by the owner on Jan 16, 2025. It is now read-only.

Commit 896f473

Browse files
maschwenknpalm
andauthored
feat(runners): add configurable eviction strategy to idle config (#3375)
We do some on-instance caching so when we scale down we'd prefer to keep the older instances around instead of the new ones (because they will have a hotter cache). This adds a configurable setting to the idleConfig to pick a sorting strategy. Never contributed to this repo, so please tell me if I'm doing something wrong! --------- Co-authored-by: Niek Palm <[email protected]>
1 parent 8b8116b commit 896f473

File tree

10 files changed

+103
-40
lines changed

10 files changed

+103
-40
lines changed

Diff for: README.md

+8-4
Original file line numberDiff line numberDiff line change
@@ -292,11 +292,15 @@ The pool is NOT enabled by default and can be enabled by setting at least one ob
292292

293293
The module will scale down to zero runners by default. By specifying a `idle_config` config, idle runners can be kept active. The scale down lambda checks if any of the cron expressions matches the current time with a margin of 5 seconds. When there is a match, the number of runners specified in the idle config will be kept active. In case multiple cron expressions matches, only the first one is taken into account. Below is an idle configuration for keeping runners active from 9:00am to 5:59pm on working days. The [cron expression generator by Cronhub](https://crontab.cronhub.io/) is a great resource to set up your idle config.
294294

295+
By default, the oldest instances are evicted. This helps keep your environment up-to-date and reduce problems like running out of disk space or RAM. Alternatively, if your older instances have a long-living cache, you can override the `evictionStrategy` to `newest_first` to evict the newest instances first instead.
296+
295297
```hcl
296298
idle_config = [{
297-
cron = "* * 9-17 * * 1-5"
298-
timeZone = "Europe/Amsterdam"
299-
idleCount = 2
299+
cron = "* * 9-17 * * 1-5"
300+
timeZone = "Europe/Amsterdam"
301+
idleCount = 2
302+
# Defaults to 'oldest_first'
303+
evictionStrategy = "oldest_first"
300304
}]
301305
```
302306

@@ -521,7 +525,7 @@ We welcome any improvement to the standard module to make the default as secure
521525
| <a name="input_ghes_ssl_verify"></a> [ghes\_ssl\_verify](#input\_ghes\_ssl\_verify) | GitHub Enterprise SSL verification. Set to 'false' when custom certificate (chains) is used for GitHub Enterprise Server (insecure). | `bool` | `true` | no |
522526
| <a name="input_ghes_url"></a> [ghes\_url](#input\_ghes\_url) | GitHub Enterprise Server URL. Example: https://github.internal.co - DO NOT SET IF USING PUBLIC GITHUB | `string` | `null` | no |
523527
| <a name="input_github_app"></a> [github\_app](#input\_github\_app) | GitHub app parameters, see your github app. Ensure the key is the base64-encoded `.pem` file (the output of `base64 app.private-key.pem`, not the content of `private-key.pem`). | <pre>object({<br> key_base64 = string<br> id = string<br> webhook_secret = string<br> })</pre> | n/a | yes |
524-
| <a name="input_idle_config"></a> [idle\_config](#input\_idle\_config) | List of time periods, defined as a cron expression, to keep a minimum amount of runners active instead of scaling down to 0. By defining this list you can ensure that in time periods that match the cron expression within 5 seconds a runner is kept idle. | <pre>list(object({<br> cron = string<br> timeZone = string<br> idleCount = number<br> }))</pre> | `[]` | no |
528+
| <a name="input_idle_config"></a> [idle\_config](#input\_idle\_config) | List of time periods, defined as a cron expression, to keep a minimum amount of runners active instead of scaling down to 0. By defining this list you can ensure that in time periods that match the cron expression within 5 seconds a runner is kept idle. | <pre>list(object({<br> cron = string<br> timeZone = string<br> idleCount = number<br> evictionStrategy = optional(string, "oldest_first")<br> }))</pre> | `[]` | no |
525529
| <a name="input_instance_allocation_strategy"></a> [instance\_allocation\_strategy](#input\_instance\_allocation\_strategy) | The allocation strategy for spot instances. AWS recommends using `price-capacity-optimized` however the AWS default is `lowest-price`. | `string` | `"lowest-price"` | no |
526530
| <a name="input_instance_max_spot_price"></a> [instance\_max\_spot\_price](#input\_instance\_max\_spot\_price) | Max price price for spot instances per hour. This variable will be passed to the create fleet as max spot price for the fleet. | `string` | `null` | no |
527531
| <a name="input_instance_profile_path"></a> [instance\_profile\_path](#input\_instance\_profile\_path) | The path that will be added to the instance\_profile, if not set the environment name will be used. | `string` | `null` | no |

Diff for: lambdas/functions/control-plane/src/scale-runners/scale-down-config.test.ts

+24-2
Original file line numberDiff line numberDiff line change
@@ -1,16 +1,21 @@
11
import moment from 'moment-timezone';
22

3-
import { ScalingDownConfigList, getIdleRunnerCount } from './scale-down-config';
3+
import { EvictionStrategy, ScalingDownConfigList, getEvictionStrategy, getIdleRunnerCount } from './scale-down-config';
44

55
const DEFAULT_TIMEZONE = 'America/Los_Angeles';
66
const DEFAULT_IDLE_COUNT = 1;
7+
const DEFAULT_EVICTION_STRATEGY: EvictionStrategy = 'oldest_first';
78
const now = moment.tz(new Date(), 'America/Los_Angeles');
89

9-
function getConfig(cronTabs: string[]): ScalingDownConfigList {
10+
function getConfig(
11+
cronTabs: string[],
12+
evictionStrategy: EvictionStrategy | undefined = undefined,
13+
): ScalingDownConfigList {
1014
return cronTabs.map((cron) => ({
1115
cron: cron,
1216
idleCount: DEFAULT_IDLE_COUNT,
1317
timeZone: DEFAULT_TIMEZONE,
18+
evictionStrategy,
1419
}));
1520
}
1621

@@ -31,4 +36,21 @@ describe('scaleDownConfig', () => {
3136
expect(getIdleRunnerCount(scaleDownConfig)).toEqual(DEFAULT_IDLE_COUNT);
3237
});
3338
});
39+
40+
describe('Determine eviction strategy.', () => {
41+
it('Default eviction strategy', async () => {
42+
const scaleDownConfig = getConfig(['* * * * * *']);
43+
expect(getEvictionStrategy(scaleDownConfig)).toEqual('oldest_first');
44+
});
45+
46+
it('Overriding eviction strategy to newest_first', async () => {
47+
const scaleDownConfig = getConfig(['* * * * * *'], 'newest_first');
48+
expect(getEvictionStrategy(scaleDownConfig)).toEqual('newest_first');
49+
});
50+
51+
it('No active cron configuration', async () => {
52+
const scaleDownConfig = getConfig(['* * * * * ' + ((now.day() + 1) % 7)]);
53+
expect(getEvictionStrategy(scaleDownConfig)).toEqual(DEFAULT_EVICTION_STRATEGY);
54+
});
55+
});
3456
});

Diff for: lambdas/functions/control-plane/src/scale-runners/scale-down-config.ts

+16
Original file line numberDiff line numberDiff line change
@@ -1,13 +1,18 @@
1+
import { createChildLogger } from '@terraform-aws-github-runner/aws-powertools-util';
12
import parser from 'cron-parser';
23
import moment from 'moment';
34

45
export type ScalingDownConfigList = ScalingDownConfig[];
6+
export type EvictionStrategy = 'newest_first' | 'oldest_first';
57
export interface ScalingDownConfig {
68
cron: string;
79
idleCount: number;
810
timeZone: string;
11+
evictionStrategy?: EvictionStrategy;
912
}
1013

14+
const logger = createChildLogger('scale-down-config.ts');
15+
1116
function inPeriod(period: ScalingDownConfig): boolean {
1217
const now = moment(new Date());
1318
const expr = parser.parseExpression(period.cron, {
@@ -25,3 +30,14 @@ export function getIdleRunnerCount(scalingDownConfigs: ScalingDownConfigList): n
2530
}
2631
return 0;
2732
}
33+
34+
export function getEvictionStrategy(scalingDownConfigs: ScalingDownConfigList): EvictionStrategy {
35+
for (const scalingDownConfig of scalingDownConfigs) {
36+
if (inPeriod(scalingDownConfig)) {
37+
const evictionStrategy = scalingDownConfig.evictionStrategy ?? 'oldest_first';
38+
logger.debug(`Using evictionStrategy '${evictionStrategy}' for period ${scalingDownConfig.cron}`);
39+
return evictionStrategy;
40+
}
41+
}
42+
return 'oldest_first';
43+
}

Diff for: lambdas/functions/control-plane/src/scale-runners/scale-down.test.ts

+19-7
Original file line numberDiff line numberDiff line change
@@ -394,14 +394,13 @@ describe('scaleDown', () => {
394394
});
395395

396396
describe('With idle config', () => {
397+
const defaultConfig = {
398+
idleCount: 3,
399+
cron: '* * * * * *',
400+
timeZone: 'Europe/Amsterdam',
401+
};
397402
beforeEach(() => {
398-
process.env.SCALE_DOWN_CONFIG = JSON.stringify([
399-
{
400-
idleCount: 3,
401-
cron: '* * * * * *',
402-
timeZone: 'Europe/Amsterdam',
403-
},
404-
]);
403+
process.env.SCALE_DOWN_CONFIG = JSON.stringify([defaultConfig]);
405404
});
406405

407406
it('Terminates 1 runner owned by orgs', async () => {
@@ -431,6 +430,19 @@ describe('scaleDown', () => {
431430
expect(mockOctokit.apps.getRepoInstallation).toBeCalled();
432431
expect(terminateRunner).not.toBeCalled();
433432
});
433+
434+
describe('With newest_first eviction strategy', () => {
435+
beforeEach(() => {
436+
process.env.SCALE_DOWN_CONFIG = JSON.stringify([{ ...defaultConfig, evictionStrategy: 'newest_first' }]);
437+
});
438+
439+
it('Terminates the newest org', async () => {
440+
mockListRunners.mockResolvedValue(RUNNERS_ORG_WITH_AUTO_SCALING_CONFIG);
441+
await scaleDown();
442+
expect(terminateRunner).toBeCalledTimes(1);
443+
expect(terminateRunner).toHaveBeenCalledWith('i-idle-102');
444+
});
445+
});
434446
});
435447

436448
it('No instances terminates when delete runner in github results in a non 204 status.', async () => {

Diff for: lambdas/functions/control-plane/src/scale-runners/scale-down.ts

+22-16
Original file line numberDiff line numberDiff line change
@@ -6,7 +6,7 @@ import { createGithubAppAuth, createGithubInstallationAuth, createOctoClient } f
66
import { bootTimeExceeded, listEC2Runners, terminateRunner } from './../aws/runners';
77
import { RunnerInfo, RunnerList } from './../aws/runners.d';
88
import { GhRunners, githubCache } from './cache';
9-
import { ScalingDownConfig, getIdleRunnerCount } from './scale-down-config';
9+
import { ScalingDownConfig, getEvictionStrategy, getIdleRunnerCount } from './scale-down-config';
1010

1111
const logger = createChildLogger('scale-down');
1212

@@ -148,10 +148,13 @@ async function evaluateAndRemoveRunners(
148148
scaleDownConfigs: ScalingDownConfig[],
149149
): Promise<void> {
150150
let idleCounter = getIdleRunnerCount(scaleDownConfigs);
151+
const evictionStrategy = getEvictionStrategy(scaleDownConfigs);
151152
const ownerTags = new Set(ec2Runners.map((runner) => runner.owner));
152153

153154
for (const ownerTag of ownerTags) {
154-
const ec2RunnersFiltered = ec2Runners.filter((runner) => runner.owner === ownerTag);
155+
const ec2RunnersFiltered = ec2Runners
156+
.filter((runner) => runner.owner === ownerTag)
157+
.sort(evictionStrategy === 'oldest_first' ? oldestFirstStrategy : newestFirstStrategy);
155158
logger.debug(`Found: '${ec2RunnersFiltered.length}' active GitHub runners with owner tag: '${ownerTag}'`);
156159
for (const ec2Runner of ec2RunnersFiltered) {
157160
const ghRunners = await listGitHubRunners(ec2Runner);
@@ -191,17 +194,21 @@ async function terminateOrphan(instanceId: string): Promise<void> {
191194
}
192195
}
193196

194-
async function listAndSortRunners(environment: string) {
195-
return (
196-
await listEC2Runners({
197-
environment,
198-
})
199-
).sort((a, b): number => {
200-
if (a.launchTime === undefined) return 1;
201-
if (b.launchTime === undefined) return 1;
202-
if (a.launchTime < b.launchTime) return 1;
203-
if (a.launchTime > b.launchTime) return -1;
204-
return 0;
197+
function oldestFirstStrategy(a: RunnerInfo, b: RunnerInfo): number {
198+
if (a.launchTime === undefined) return 1;
199+
if (b.launchTime === undefined) return 1;
200+
if (a.launchTime < b.launchTime) return 1;
201+
if (a.launchTime > b.launchTime) return -1;
202+
return 0;
203+
}
204+
205+
function newestFirstStrategy(a: RunnerInfo, b: RunnerInfo): number {
206+
return oldestFirstStrategy(a, b) * -1;
207+
}
208+
209+
async function listRunners(environment: string) {
210+
return await listEC2Runners({
211+
environment,
205212
});
206213
}
207214

@@ -214,8 +221,7 @@ export async function scaleDown(): Promise<void> {
214221
const scaleDownConfigs = JSON.parse(process.env.SCALE_DOWN_CONFIG) as [ScalingDownConfig];
215222
const environment = process.env.ENVIRONMENT;
216223

217-
// list and sort runners, newest first. This ensure we keep the newest runners longer.
218-
const ec2Runners = await listAndSortRunners(environment);
224+
const ec2Runners = await listRunners(environment);
219225
const activeEc2RunnersCount = ec2Runners.length;
220226
logger.info(`Found: '${activeEc2RunnersCount}' active GitHub EC2 runner instances before clean-up.`);
221227

@@ -227,6 +233,6 @@ export async function scaleDown(): Promise<void> {
227233
const runners = filterRunners(ec2Runners);
228234
await evaluateAndRemoveRunners(runners, scaleDownConfigs);
229235

230-
const activeEc2RunnersCountAfter = (await listAndSortRunners(environment)).length;
236+
const activeEc2RunnersCountAfter = (await listRunners(environment)).length;
231237
logger.info(`Found: '${activeEc2RunnersCountAfter}' active GitHub EC2 runners instances after clean-up.`);
232238
}

0 commit comments

Comments
 (0)