Skip to content
This repository was archived by the owner on Jan 16, 2025. It is now read-only.

Commit 2f323d6

Browse files
npalmScottGuymermcaulifn
authored
feat: add option ephemeral runners (#1374)
* add option ephemeral runners * fix tests * Add retry mechanisme for scaling errors * Add retry mechanisme for scaling errors Add retry mechanisme for scaling errors Add retry mechanisme for scaling errors Add retry mechanisme for scaling errors * Add tests for lamda handler * Add basic test for ephemeral case * Add basic test for scale down in lambda wrapper * Ensure check_runs are ignored for ephemeral runners * limit termination to only the instance itself * fix: add logging context to runner lambda (#1399) * fix(logging): Add context to scale logs Signed-off-by: Nathaniel McAuliffe <[email protected]> * Remove testing Signed-off-by: Nathaniel McAuliffe <[email protected]> * Remove unnecessary import Signed-off-by: Nathaniel McAuliffe <[email protected]> * Moving log fields to end, adjusting format * feat: Add hooks for prebuilt images (AMI), including amazon linux packer example (#1444) * Initial creation of runner image * Refactored startup script and added it to the per-boot folder * Make the runner location a variable So we can pass the runner version in at packer build time if we want to update the runner version. * Retrieve external config setting via tags Retrieve the required config via the instance tags so we dont have to pass in and set environment on the instance in an awkward way. * Enable tag based config Give the instance the permission to query its own tags and set the correct tags on the instance. * Add a CI job * Fix the CI build * Fix the formatting * Retain user_data provisioning and remove duplication refactored to make sure user_data continues to work with minimal breaking changes. Use a single set of scripts shared between image and user_data provisioning. * Fix interpolation issues in template file * fix build * Fix formatting * minor tweaks and fixes * Fixes from testing * Enable docker on boot * Add in output of start time for the runner * Scoop up the runner log * Add a powershell build script for windows users * Fix formatting * Use SSM parameters for configuration Its best practice to use SSM parameters for configuration of the runners. In adding this i have also added parameter path based config so its easy to extend in the future. * Make the SSM policy more specific * Update .github/workflows/packer-build.yml Co-authored-by: Niek Palm <[email protected]> * Added condition to the describe tags policy * Dont use templatefile on the tags policy Because of the use of ${} in the policy terraform is trying to replace it. * Added an option to turn off userdata scripting * Added/updated documentation * Revert policy as it has no effect on the permissions * Add reference to prebuilt images in the main readme * Add an example of deploying with prebuilt images * Update readme * Use current user as ami_owner * Update example to 5 secs * Updated ami name to include the arch * Fixed log file variable * Added explicit info about required settings to the readme * Change userdata_enabled to enabled_userdata Keep within existing naming convention Co-authored-by: Niek Palm <[email protected]> * add option ephemeral runners * Add retry mechanisme for scaling errors Add retry mechanisme for scaling errors Add retry mechanisme for scaling errors Add retry mechanisme for scaling errors * add dead letter queue, and refactor * cleanup * cleanup * sync develop * review fix Co-authored-by: Scott Guymer <[email protected]> * review fix Co-authored-by: Scott Guymer <[email protected]> * review vfix Co-authored-by: Scott Guymer <[email protected]> * review vfix Co-authored-by: Scott Guymer <[email protected]> * fix review * process review comments * process review comments * review comment * process review comments * Update examples/ephemeral/README.md Co-authored-by: Nathaniel McAuliffe <[email protected]> * Process review comments * Add docs * review comments * update docs Co-authored-by: Scott Guymer <[email protected]> Co-authored-by: Nathaniel McAuliffe <[email protected]>
1 parent 7cb73c8 commit 2f323d6

37 files changed

+803
-133
lines changed

Diff for: .ci/build-yarn.sh

+11
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,11 @@
1+
#!/usr/bin/env bash
2+
3+
# Build all the lambda's, output on the default place (inside the lambda module)
4+
5+
lambdaSrcDirs=("modules/runner-binaries-syncer/lambdas/runner-binaries-syncer" "modules/runners/lambdas/runners" "modules/webhook/lambdas/webhook")
6+
repoRoot=$(dirname $(dirname $(realpath ${BASH_SOURCE[0]})))
7+
8+
for lambdaDir in ${lambdaSrcDirs[@]}; do
9+
cd "$repoRoot/${lambdaDir}"
10+
yarn && yarn run dist
11+
done

Diff for: README.md

+41-6
Large diffs are not rendered by default.

Diff for: examples/default/main.tf

+9-3
Original file line numberDiff line numberDiff line change
@@ -30,11 +30,13 @@ module "runners" {
3030
webhook_secret = random_id.random.hex
3131
}
3232

33+
# Grab zip files via lambda_download
3334
webhook_lambda_zip = "lambdas-download/webhook.zip"
3435
runner_binaries_syncer_lambda_zip = "lambdas-download/runner-binaries-syncer.zip"
3536
runners_lambda_zip = "lambdas-download/runners.zip"
36-
enable_organization_runners = false
37-
runner_extra_labels = "default,example"
37+
38+
enable_organization_runners = false
39+
runner_extra_labels = "default,example"
3840

3941
# enable access to the runners via SSM
4042
enable_ssm_on_runners = true
@@ -61,7 +63,11 @@ module "runners" {
6163
instance_types = ["m5.large", "c5.large"]
6264

6365
# override delay of events in seconds
64-
delay_webhook_event = 5
66+
delay_webhook_event = 5
67+
runners_maximum_count = 1
68+
69+
# set up a fifo queue to remain order
70+
fifo_build_queue = true
6571

6672
# override scaling down
6773
scale_down_schedule_expression = "cron(* * * * ? *)"

Diff for: examples/ephemeral/.terraform.lock.hcl

+57
Some generated files are not rendered by default. Learn more about customizing how changed files appear on GitHub.

Diff for: examples/ephemeral/README.md

+30
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,30 @@
1+
# Action runners deployment ephemeral example
2+
3+
This example is based on the default setup, but shows how runners can be used with the ephemeral flag enabled. Once enabled, ephemeral runners will be used for one job only. Each job requires a fresh instance. This feature should be used in combination with the `workflow_job` event. See GitHub webhook endpoint configuration(link needed here). It is also suggested to use a pre-build AMI to minimize runner launch times.
4+
## Usages
5+
6+
Steps for the full setup, such as creating a GitHub app can be found in the root module's [README](../../README.md). First download the Lambda releases from GitHub. Alternatively you can build the lambdas locally with Node or Docker, there is a simple build script in `<root>/.ci/build.sh`. In the `main.tf` you can simply remove the location of the lambda zip files, the default location will work in this case.
7+
8+
> Ensure you have set the version in `lambdas-download/main.tf` for running the example. The version needs to be set to a GitHub release version, see https://github.com/philips-labs/terraform-aws-github-runner/releases
9+
10+
```bash
11+
cd lambdas-download
12+
terraform init
13+
terraform apply
14+
cd ..
15+
```
16+
17+
Before running Terraform, ensure the GitHub app is configured. See the [configuration details](../../README.md#usages) for more details.
18+
19+
```bash
20+
terraform init
21+
terraform apply
22+
```
23+
24+
You can receive the webhook details by running:
25+
26+
```bash
27+
terraform output -raw webhook_secret
28+
```
29+
30+
Be-aware some shells will print some end of line character `%`.

Diff for: examples/ephemeral/lambdas-download/main.tf

+25
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,25 @@
1+
locals {
2+
version = "<REPLACE_BY_GITHUB_RELEASE_VERSION>"
3+
}
4+
5+
module "lambdas" {
6+
source = "../../../modules/download-lambda"
7+
lambdas = [
8+
{
9+
name = "webhook"
10+
tag = local.version
11+
},
12+
{
13+
name = "runners"
14+
tag = local.version
15+
},
16+
{
17+
name = "runner-binaries-syncer"
18+
tag = local.version
19+
}
20+
]
21+
}
22+
23+
output "files" {
24+
value = module.lambdas.files
25+
}

Diff for: examples/ephemeral/main.tf

+71
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,71 @@
1+
locals {
2+
environment = "ephemeraal"
3+
aws_region = "eu-west-1"
4+
}
5+
6+
resource "random_id" "random" {
7+
byte_length = 20
8+
}
9+
10+
data "aws_caller_identity" "current" {}
11+
12+
module "runners" {
13+
source = "../../"
14+
create_service_linked_role_spot = true
15+
aws_region = local.aws_region
16+
vpc_id = module.vpc.vpc_id
17+
subnet_ids = module.vpc.private_subnets
18+
19+
environment = local.environment
20+
tags = {
21+
Project = "ProjectX"
22+
}
23+
24+
github_app = {
25+
key_base64 = var.github_app_key_base64
26+
id = var.github_app_id
27+
webhook_secret = random_id.random.hex
28+
}
29+
30+
# Grab the lambda packages from local directory. Must run /.ci/build.sh first
31+
webhook_lambda_zip = "../../lambda_output/webhook.zip"
32+
runner_binaries_syncer_lambda_zip = "../../lambda_output/runner-binaries-syncer.zip"
33+
runners_lambda_zip = "../../lambda_output/runners.zip"
34+
35+
enable_organization_runners = true
36+
runner_extra_labels = "default,example"
37+
38+
# enable access to the runners via SSM
39+
enable_ssm_on_runners = true
40+
41+
# Let the module manage the service linked role
42+
# create_service_linked_role_spot = true
43+
44+
instance_types = ["m5.large", "c5.large"]
45+
46+
# override delay of events in seconds
47+
delay_webhook_event = 0
48+
49+
# Ensure you set the number not too low, each build require a new instance
50+
runners_maximum_count = 20
51+
52+
# override scaling down
53+
scale_down_schedule_expression = "cron(* * * * ? *)"
54+
55+
enable_ephemeral_runners = true
56+
57+
# configure your pre-built AMI
58+
# enabled_userdata = false
59+
# ami_filter = { name = ["github-runner-amzn2-x86_64-2021*"] }
60+
# ami_owners = [data.aws_caller_identity.current.account_id]
61+
62+
# Enable logging
63+
# log_level = "debug"
64+
65+
# Setup a dead letter queue, by default scale up lambda will kepp retrying to process event in case of scaling error.
66+
# redrive_policy_build_queue = {
67+
# enabled = true
68+
# maxReceiveCount = 50 # 50 retries every 30 seconds => 25 minutes
69+
# deadLetterTargetArn = null
70+
# }
71+
}

Diff for: examples/ephemeral/outputs.tf

+15
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,15 @@
1+
output "runners" {
2+
value = {
3+
lambda_syncer_name = module.runners.binaries_syncer.lambda.function_name
4+
}
5+
}
6+
7+
output "webhook_endpoint" {
8+
value = module.runners.webhook.endpoint
9+
}
10+
11+
output "webhook_secret" {
12+
sensitive = true
13+
value = random_id.random.hex
14+
}
15+

Diff for: examples/ephemeral/providers.tf

+3
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,3 @@
1+
provider "aws" {
2+
region = local.aws_region
3+
}

Diff for: examples/ephemeral/variables.tf

+5
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,5 @@
1+
2+
variable "github_app_key_base64" {}
3+
4+
variable "github_app_id" {}
5+

Diff for: examples/ephemeral/versions.tf

+15
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,15 @@
1+
terraform {
2+
required_providers {
3+
aws = {
4+
source = "hashicorp/aws"
5+
version = ">= 3.27"
6+
}
7+
local = {
8+
source = "hashicorp/local"
9+
}
10+
random = {
11+
source = "hashicorp/random"
12+
}
13+
}
14+
required_version = ">= 0.14"
15+
}

Diff for: examples/ephemeral/vpc.tf

+7
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,7 @@
1+
module "vpc" {
2+
source = "git::https://github.com/philips-software/terraform-aws-vpc.git?ref=2.2.0"
3+
4+
environment = local.environment
5+
aws_region = local.aws_region
6+
create_private_hosted_zone = false
7+
}

Diff for: main.tf

+17-4
Original file line numberDiff line numberDiff line change
@@ -19,13 +19,24 @@ resource "random_string" "random" {
1919
}
2020

2121
resource "aws_sqs_queue" "queued_builds" {
22-
name = "${var.environment}-queued-builds.fifo"
22+
name = "${var.environment}-queued-builds${var.fifo_build_queue ? ".fifo" : ""}"
2323
delay_seconds = var.delay_webhook_event
2424
visibility_timeout_seconds = var.runners_scale_up_lambda_timeout
2525
message_retention_seconds = var.job_queue_retention_in_seconds
26-
fifo_queue = true
27-
receive_wait_time_seconds = 10
28-
content_based_deduplication = true
26+
fifo_queue = var.fifo_build_queue
27+
receive_wait_time_seconds = 0
28+
content_based_deduplication = var.fifo_build_queue
29+
redrive_policy = var.redrive_build_queue.enabled ? jsonencode({
30+
deadLetterTargetArn = aws_sqs_queue.queued_builds_dlq[0].arn,
31+
maxReceiveCount = var.redrive_build_queue.maxReceiveCount
32+
}) : null
33+
34+
tags = var.tags
35+
}
36+
37+
resource "aws_sqs_queue" "queued_builds_dlq" {
38+
count = var.redrive_build_queue.enabled ? 1 : 0
39+
name = "${var.environment}-queued-builds_dead_letter"
2940

3041
tags = var.tags
3142
}
@@ -48,6 +59,7 @@ module "webhook" {
4859
kms_key_arn = var.kms_key_arn
4960

5061
sqs_build_queue = aws_sqs_queue.queued_builds
62+
sqs_build_queue_fifo = var.fifo_build_queue
5163
github_app_webhook_secret_arn = module.ssm.parameters.github_app_webhook_secret.arn
5264

5365
lambda_s3_bucket = var.lambda_s3_bucket
@@ -92,6 +104,7 @@ module "runners" {
92104
sqs_build_queue = aws_sqs_queue.queued_builds
93105
github_app_parameters = local.github_app_parameters
94106
enable_organization_runners = var.enable_organization_runners
107+
enable_ephemeral_runners = var.enable_ephemeral_runners
95108
scale_down_schedule_expression = var.scale_down_schedule_expression
96109
minimum_running_time_in_minutes = var.minimum_running_time_in_minutes
97110
runner_boot_time_in_minutes = var.runner_boot_time_in_minutes

0 commit comments

Comments
 (0)