Skip to content

feat: adjust pool dynamically based on demand #3855

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 12 commits into
base: main
Choose a base branch
from
Open
Show file tree
Hide file tree
Changes from 6 commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
4 changes: 2 additions & 2 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -181,11 +181,11 @@ Talk to the forestkeepers in the `runners-channel` on Slack.
| <a name="input_logging_retention_in_days"></a> [logging\_retention\_in\_days](#input\_logging\_retention\_in\_days) | Specifies the number of days you want to retain log events for the lambda log group. Possible values are: 0, 1, 3, 5, 7, 14, 30, 60, 90, 120, 150, 180, 365, 400, 545, 731, 1827, and 3653. | `number` | `180` | no |
| <a name="input_metrics_namespace"></a> [metrics\_namespace](#input\_metrics\_namespace) | The namespace for the metrics created by the module. Merics will only be created if explicit enabled. | `string` | `"GitHub Runners"` | no |
| <a name="input_minimum_running_time_in_minutes"></a> [minimum\_running\_time\_in\_minutes](#input\_minimum\_running\_time\_in\_minutes) | The time an ec2 action runner should be running at minimum before terminated, if not busy. | `number` | `null` | no |
| <a name="input_pool_config"></a> [pool\_config](#input\_pool\_config) | The configuration for updating the pool. The `pool_size` to adjust to by the events triggered by the `schedule_expression`. For example you can configure a cron expression for weekdays to adjust the pool to 10 and another expression for the weekend to adjust the pool to 1. | <pre>list(object({<br> schedule_expression = string<br> size = number<br> }))</pre> | `[]` | no |
| <a name="input_pool_config"></a> [pool\_config](#input\_pool\_config) | The configuration for updating the pool. The `pool_size` to adjust to by the events triggered by the `schedule_expression`. For example you can configure a cron expression for weekdays to adjust the pool to 10 and another expression for the weekend to adjust the pool to 1. Setting pool size to -1 will adjust the pool based on the number of queued jobs. | <pre>list(object({<br> schedule_expression = string<br> size = number<br> }))</pre> | `[]` | no |
| <a name="input_pool_lambda_memory_size"></a> [pool\_lambda\_memory\_size](#input\_pool\_lambda\_memory\_size) | Memory size limit for scale-up lambda. | `number` | `512` | no |
| <a name="input_pool_lambda_reserved_concurrent_executions"></a> [pool\_lambda\_reserved\_concurrent\_executions](#input\_pool\_lambda\_reserved\_concurrent\_executions) | Amount of reserved concurrent executions for the scale-up lambda function. A value of 0 disables lambda from being triggered and -1 removes any concurrency limitations. | `number` | `1` | no |
| <a name="input_pool_lambda_timeout"></a> [pool\_lambda\_timeout](#input\_pool\_lambda\_timeout) | Time out for the pool lambda in seconds. | `number` | `60` | no |
| <a name="input_pool_runner_owner"></a> [pool\_runner\_owner](#input\_pool\_runner\_owner) | The pool will deploy runners to the GitHub org ID, set this value to the org to which you want the runners deployed. Repo level is not supported. | `string` | `null` | no |
| <a name="input_pool_runner_owner"></a> [pool\_runner\_owner](#input\_pool\_runner\_owner) | The pool will deploy runners to the GitHub org/repo ID(s), set this value to the org/repo(s) to which you want the runners deployed. Separate the entries by a comma. | `string` | `null` | no |
| <a name="input_prefix"></a> [prefix](#input\_prefix) | The prefix used for naming resources | `string` | `"github-actions"` | no |
| <a name="input_queue_encryption"></a> [queue\_encryption](#input\_queue\_encryption) | Configure how data on queues managed by the modules in ecrypted at REST. Options are encryped via SSE, non encrypted and via KMSS. By default encryptes via SSE is enabled. See for more details the Terraform `aws_sqs_queue` resource https://registry.terraform.io/providers/hashicorp/aws/latest/docs/resources/sqs_queue. | <pre>object({<br> kms_data_key_reuse_period_seconds = number<br> kms_master_key_id = string<br> sqs_managed_sse_enabled = bool<br> })</pre> | <pre>{<br> "kms_data_key_reuse_period_seconds": null,<br> "kms_master_key_id": null,<br> "sqs_managed_sse_enabled": true<br>}</pre> | no |
| <a name="input_redrive_build_queue"></a> [redrive\_build\_queue](#input\_redrive\_build\_queue) | Set options to attach (optional) a dead letter queue to the build queue, the queue between the webhook and the scale up lambda. You have the following options. 1. Disable by setting `enabled` to false. 2. Enable by setting `enabled` to `true`, `maxReceiveCount` to a number of max retries. | <pre>object({<br> enabled = bool<br> maxReceiveCount = number<br> })</pre> | <pre>{<br> "enabled": false,<br> "maxReceiveCount": null<br>}</pre> | no |
Expand Down
201 changes: 191 additions & 10 deletions lambdas/functions/control-plane/src/pool/pool.test.ts
Original file line number Diff line number Diff line change
@@ -1,4 +1,4 @@
import { Octokit } from '@octokit/rest';

Check notice on line 1 in lambdas/functions/control-plane/src/pool/pool.test.ts

View check run for this annotation

CodeScene Delta Analysis / CodeScene Cloud Delta Analysis (main)

ℹ Getting worse: Code Duplication

introduced similar code in: 'Should top up if there are more queued jobs with matching labels than idle runners.','Should top up pools for all runner owners','Should top up the repository runners pool dynamically','Should top up the repository runners pool'. Avoid duplicated, aka copy-pasted, code inside the module. More duplication lowers the code health.
import { mocked } from 'jest-mock';
import moment from 'moment-timezone';
import nock from 'nock';
Expand All @@ -9,13 +9,18 @@
import { adjust } from './pool';

const mockOctokit = {
paginate: jest.fn(),
paginate: (f: (arg0: unknown) => unknown[], o: unknown) => f(o),
checks: { get: jest.fn() },
actions: {
createRegistrationTokenForOrg: jest.fn(),
listJobsForWorkflowRunAttempt: jest.fn(),
listSelfHostedRunnersForOrg: jest.fn(),
listSelfHostedRunnersForRepo: jest.fn(),
listWorkflowRunsForRepo: jest.fn(),
},
apps: {
getOrgInstallation: jest.fn(),
listReposAccessibleToInstallation: jest.fn(),
},
};

Expand All @@ -42,6 +47,7 @@

const ORG = 'my-org';
const MINIMUM_TIME_RUNNING = 15;
const LABELS = ['label1', 'label2'];

const ec2InstancesRegistered = [
{
Expand Down Expand Up @@ -79,31 +85,46 @@
os: 'linux',
status: 'online',
busy: false,
labels: [],
labels: LABELS,
},
{
id: 2,
name: 'i-2-busy',
os: 'linux',
status: 'online',
busy: true,
labels: [],
labels: LABELS,
},
{
id: 3,
name: 'i-3-offline',
os: 'linux',
status: 'offline',
busy: false,
labels: [],
labels: LABELS,
},
{
id: 3,
name: 'i-4-idle-older-than-minimum-time-running',
os: 'linux',
status: 'online',
busy: false,
labels: [],
labels: LABELS,
},
];

const githubReposAccessibleToInstallation = [
{
owner: {
login: ORG,
},
name: 'my-repo-1',
},
{
owner: {
login: ORG,
},
name: 'my-repo-2',
},
];

Expand All @@ -126,6 +147,7 @@
process.env.INSTANCE_TARGET_CAPACITY_TYPE = 'spot';
process.env.RUNNER_OWNER = ORG;
process.env.RUNNER_BOOT_TIME_IN_MINUTES = MINIMUM_TIME_RUNNING.toString();
process.env.RUNNER_LABELS = LABELS.join(',');

const mockTokenReturnValue = {
data: {
Expand All @@ -134,7 +156,15 @@
};
mockOctokit.actions.createRegistrationTokenForOrg.mockImplementation(() => mockTokenReturnValue);

mockOctokit.paginate.mockImplementation(() => githubRunnersRegistered);
mockOctokit.actions.listSelfHostedRunnersForOrg.mockImplementation(() => githubRunnersRegistered);

mockOctokit.actions.listSelfHostedRunnersForRepo.mockImplementation(() => githubRunnersRegistered);

mockOctokit.apps.listReposAccessibleToInstallation.mockImplementation(() => githubReposAccessibleToInstallation);

mockOctokit.actions.listWorkflowRunsForRepo.mockImplementation(async () => []);

mockOctokit.actions.listJobsForWorkflowRunAttempt.mockImplementation(async () => []);

mockListRunners.mockImplementation(async () => ec2InstancesRegistered);

Expand Down Expand Up @@ -171,7 +201,7 @@
await expect(await adjust({ poolSize: 3 })).resolves;
expect(createRunners).toHaveBeenCalledTimes(1);
expect(createRunners).toHaveBeenCalledWith(
expect.anything(),
expect.objectContaining({ runnerOwner: ORG, runnerType: 'Org' }),
expect.objectContaining({ numberOfRunners: 1 }),
expect.anything(),
);
Expand Down Expand Up @@ -206,7 +236,7 @@
// 2 idle + 1 booting = 3, top up with 2 to match a pool of 5
await expect(await adjust({ poolSize: 5 })).resolves;
expect(createRunners).toHaveBeenCalledWith(
expect.anything(),
expect.objectContaining({ runnerOwner: ORG, runnerType: 'Org' }),
expect.objectContaining({ numberOfRunners: 2 }),
expect.anything(),
);
Expand Down Expand Up @@ -236,6 +266,12 @@
await expect(await adjust({ poolSize: 2 })).resolves;
expect(createRunners).not.toHaveBeenCalled();
});

it('Should not top up if pool size is invalid.', async () => {
process.env.RUNNER_LABELS = undefined;
await expect(await adjust({ poolSize: -2 })).resolves;
expect(createRunners).not.toHaveBeenCalled();
});
});

describe('With GHES', () => {
Expand All @@ -247,7 +283,7 @@
await expect(await adjust({ poolSize: 5 })).resolves;
// 2 idle, top up with 3 to match a pool of 5
expect(createRunners).toHaveBeenCalledWith(
expect.anything(),
expect.objectContaining({ runnerOwner: ORG, runnerType: 'Org' }),
expect.objectContaining({ numberOfRunners: 3 }),
expect.anything(),
);
Expand All @@ -261,7 +297,7 @@

it('Should top up with fewer runners when there are idle prefixed runners', async () => {
// Add prefixed runners to github
mockOctokit.paginate.mockImplementation(async () => [
mockOctokit.actions.listSelfHostedRunnersForOrg.mockImplementation(async () => [
...githubRunnersRegistered,
{
id: 5,
Expand Down Expand Up @@ -301,10 +337,155 @@
await expect(await adjust({ poolSize: 5 })).resolves;
// 2 idle, 2 prefixed idle top up with 1 to match a pool of 5
expect(createRunners).toHaveBeenCalledWith(
expect.objectContaining({ runnerOwner: ORG, runnerType: 'Org' }),
expect.objectContaining({ numberOfRunners: 1 }),
expect.anything(),
);
});
});

describe('With Negative Pool Size', () => {
// effective pool size is 2 (1 queued job with matching labels x 1 workflows x 2 accessible repositories)
it('Should not top up if there are fewer queued jobs than idle runners.', async () => {
mockOctokit.actions.listWorkflowRunsForRepo.mockImplementation(async ({ owner, repo }) => [
{
repository: {
owner: { login: owner },
name: repo,
},
id: 1,
attempt_number: 1,
},
]);
mockOctokit.actions.listJobsForWorkflowRunAttempt.mockImplementation(async () => [
{
status: 'completed',
labels: LABELS,
},
{
status: 'queued',
labels: LABELS,
},
{
status: 'queued',
labels: [...LABELS, 'label3'],
},
{
status: 'queued',
labels: [],
},
]);
await expect(await adjust({ poolSize: -1 })).resolves;
expect(createRunners).not.toHaveBeenCalled();
});

// effective pool size is 8 (2 queued job with matching labels x 2 workflows x 2 accessible repositories)
it('Should top up if there are more queued jobs with matching labels than idle runners.', async () => {
mockOctokit.actions.listWorkflowRunsForRepo.mockImplementation(async ({ owner, repo }) => [
{
repository: {
owner: { login: owner },
name: repo,
},
id: 1,
attempt_number: 1,
},
{
repository: {
owner: { login: owner },
name: repo,
},
id: 2,
attempt_number: 1,
},
]);
mockOctokit.actions.listJobsForWorkflowRunAttempt.mockImplementation(async () => [
{
status: 'queued',
labels: LABELS,
},
{
status: 'queued',
labels: LABELS,
},
]);
await expect(await adjust({ poolSize: -1 })).resolves;
expect(createRunners).toHaveBeenCalledTimes(1);
expect(createRunners).toHaveBeenCalledWith(
expect.objectContaining({ runnerOwner: ORG, runnerType: 'Org' }),
expect.objectContaining({ numberOfRunners: 6 }),
expect.anything(),
);
});
});

describe('With Runner Type Repo', () => {
it('Should top up the repository runners pool', async () => {
const runnerOwner = `${ORG}/my-repo-1`;
process.env.RUNNER_OWNER = runnerOwner;
await expect(await adjust({ poolSize: 3 })).resolves;
expect(createRunners).toHaveBeenCalledTimes(1);
expect(createRunners).toHaveBeenCalledWith(
expect.objectContaining({ runnerOwner, runnerType: 'Repo' }),
expect.objectContaining({ numberOfRunners: 1 }),
expect.anything(),
);
});

it('Should top up the repository runners pool dynamically', async () => {
const runnerOwner = `${ORG}/my-repo-1`;
process.env.RUNNER_OWNER = runnerOwner;
mockOctokit.actions.listWorkflowRunsForRepo.mockImplementation(async ({ owner, repo }) => [
{
repository: {
owner: { login: owner },
name: repo,
},
id: 1,
attempt_number: 1,
},
{
repository: {
owner: { login: owner },
name: repo,
},
id: 2,
attempt_number: 1,
},
]);
mockOctokit.actions.listJobsForWorkflowRunAttempt.mockImplementation(async () => [
{
status: 'queued',
labels: LABELS,
},
{
status: 'queued',
labels: LABELS,
},
]);
await expect(await adjust({ poolSize: -1 })).resolves;
expect(createRunners).toHaveBeenCalledTimes(1);
expect(createRunners).toHaveBeenCalledWith(
expect.objectContaining({ runnerOwner, runnerType: 'Repo' }),
expect.objectContaining({ numberOfRunners: 2 }),
expect.anything(),
);
});
});

describe('With Multiple Runner Owners', () => {
it('Should top up pools for all runner owners', async () => {
const runnerOwners = [`${ORG}/my-repo-1`, `${ORG}/my-repo-2`];
process.env.RUNNER_OWNER = runnerOwners.join(',');
await expect(await adjust({ poolSize: 3 })).resolves;
expect(createRunners).toHaveBeenCalledTimes(2);
for (const runnerOwner of runnerOwners) {
expect(createRunners).toHaveBeenCalledWith(
expect.objectContaining({ runnerOwner, runnerType: 'Repo' }),
expect.objectContaining({ numberOfRunners: 1 }),
expect.anything(),
);
}
});
});
});
Loading