diff --git a/README.md b/README.md index 36efe008c6..8432bf9fe7 100644 --- a/README.md +++ b/README.md @@ -147,7 +147,7 @@ Talk to the forestkeepers in the `runners-channel` on Slack. | [enable\_jit\_config](#input\_enable\_jit\_config) | Overwrite the default behavior for JIT configuration. By default JIT configuration is enabled for ephemeral runners and disabled for non-ephemeral runners. In case of GHES check first if the JIT config API is avaialbe. In case you upgradeing from 3.x to 4.x you can set `enable_jit_config` to `false` to avoid a breaking change when having your own AMI. | `bool` | `null` | no | | [enable\_job\_queued\_check](#input\_enable\_job\_queued\_check) | Only scale if the job event received by the scale up lambda is in the queued state. By default enabled for non ephemeral runners and disabled for ephemeral. Set this variable to overwrite the default behavior. | `bool` | `null` | no | | [enable\_managed\_runner\_security\_group](#input\_enable\_managed\_runner\_security\_group) | Enables creation of the default managed security group. Unmanaged security groups can be specified via `runner_additional_security_group_ids`. | `bool` | `true` | no | -| [enable\_metrics\_control\_plane](#input\_enable\_metrics\_control\_plane) | (Experimental) Enable or disable the metrics for the module. Feature can change or renamed without a major release. | `bool` | `false` | no | +| [enable\_metrics\_control\_plane](#input\_enable\_metrics\_control\_plane) | (Experimental) Enable or disable the metrics for the module. Feature can change or renamed without a major release. | `bool` | `null` | no | | [enable\_organization\_runners](#input\_enable\_organization\_runners) | Register runners to organization, instead of repo level | `bool` | `false` | no | | [enable\_runner\_binaries\_syncer](#input\_enable\_runner\_binaries\_syncer) | Option to disable the lambda to sync GitHub runner distribution, useful when using a pre-build AMI. | `bool` | `true` | no | | [enable\_runner\_detailed\_monitoring](#input\_enable\_runner\_detailed\_monitoring) | Should detailed monitoring be enabled for the runner. Set this to true if you want to use detailed monitoring. See https://docs.aws.amazon.com/AWSEC2/latest/UserGuide/using-cloudwatch-new.html for details. | `bool` | `false` | no | @@ -165,7 +165,7 @@ Talk to the forestkeepers in the `runners-channel` on Slack. | [instance\_max\_spot\_price](#input\_instance\_max\_spot\_price) | Max price price for spot instances per hour. This variable will be passed to the create fleet as max spot price for the fleet. | `string` | `null` | no | | [instance\_profile\_path](#input\_instance\_profile\_path) | The path that will be added to the instance\_profile, if not set the environment name will be used. | `string` | `null` | no | | [instance\_target\_capacity\_type](#input\_instance\_target\_capacity\_type) | Default lifecycle used for runner instances, can be either `spot` or `on-demand`. | `string` | `"spot"` | no | -| [instance\_termination\_watcher](#input\_instance\_termination\_watcher) | Configuration for the instance termination watcher. This feature is Beta, changes will not trigger a major release as long in beta.

`enable`: Enable or disable the spot termination watcher.
`enable_metrics`: Enable or disable the metrics for the spot termination watcher.
`memory_size`: Memory size linit in MB of the lambda.
`s3_key`: S3 key for syncer lambda function. Required if using S3 bucket to specify lambdas.
`s3_object_version`: S3 object version for syncer lambda function. Useful if S3 versioning is enabled on source bucket.
`timeout`: Time out of the lambda in seconds.
`zip`: File location of the lambda zip file. |
object({
enable = optional(bool, false)
enable_metric = optional(object({
spot_warning = optional(bool, false)
}))
memory_size = optional(number, null)
s3_key = optional(string, null)
s3_object_version = optional(string, null)
timeout = optional(number, null)
zip = optional(string, null)
})
| `{}` | no | +| [instance\_termination\_watcher](#input\_instance\_termination\_watcher) | Configuration for the instance termination watcher. This feature is Beta, changes will not trigger a major release as long in beta.

`enable`: Enable or disable the spot termination watcher.
`memory_size`: Memory size linit in MB of the lambda.
`s3_key`: S3 key for syncer lambda function. Required if using S3 bucket to specify lambdas.
`s3_object_version`: S3 object version for syncer lambda function. Useful if S3 versioning is enabled on source bucket.
`timeout`: Time out of the lambda in seconds.
`zip`: File location of the lambda zip file. |
object({
enable = optional(bool, false)
enable_metric = optional(string, null) # deprectaed
memory_size = optional(number, null)
s3_key = optional(string, null)
s3_object_version = optional(string, null)
timeout = optional(number, null)
zip = optional(string, null)
})
| `{}` | no | | [instance\_types](#input\_instance\_types) | List of instance types for the action runner. Defaults are based on runner\_os (al2023 for linux and Windows Server Core for win). | `list(string)` |
[
"m5.large",
"c5.large"
]
| no | | [job\_queue\_retention\_in\_seconds](#input\_job\_queue\_retention\_in\_seconds) | The number of seconds the job is held in the queue before it is purged. | `number` | `86400` | no | | [job\_retry](#input\_job\_retry) | Experimental! Can be removed / changed without trigger a major release.Configure job retries. The configuration enables job retries (for ephemeral runners). After creating the insances a message will be published to a job retry queue. The job retry check lambda is checking after a delay if the job is queued. If not the message will be published again on the scale-up (build queue). Using this feature can impact the reate limit of the GitHub app.

`enable`: Enable or disable the job retry feature.
`delay_in_seconds`: The delay in seconds before the job retry check lambda will check the job status.
`delay_backoff`: The backoff factor for the delay.
`lambda_memory_size`: Memory size limit in MB for the job retry check lambda.
`lambda_timeout`: Time out of the job retry check lambda in seconds.
`max_attempts`: The maximum number of attempts to retry the job. |
object({
enable = optional(bool, false)
delay_in_seconds = optional(number, 300)
delay_backoff = optional(number, 2)
lambda_memory_size = optional(number, 256)
lambda_timeout = optional(number, 30)
max_attempts = optional(number, 1)
})
| `{}` | no | @@ -183,7 +183,8 @@ Talk to the forestkeepers in the `runners-channel` on Slack. | [logging\_kms\_key\_id](#input\_logging\_kms\_key\_id) | Specifies the kms key id to encrypt the logs with. | `string` | `null` | no | | [logging\_retention\_in\_days](#input\_logging\_retention\_in\_days) | Specifies the number of days you want to retain log events for the lambda log group. Possible values are: 0, 1, 3, 5, 7, 14, 30, 60, 90, 120, 150, 180, 365, 400, 545, 731, 1827, and 3653. | `number` | `180` | no | | [matcher\_config\_parameter\_store\_tier](#input\_matcher\_config\_parameter\_store\_tier) | The tier of the parameter store for the matcher configuration. Valid values are `Standard`, and `Advanced`. | `string` | `"Standard"` | no | -| [metrics\_namespace](#input\_metrics\_namespace) | The namespace for the metrics created by the module. Merics will only be created if explicit enabled. | `string` | `"GitHub Runners"` | no | +| [metrics](#input\_metrics) | Configuration for metrics created by the module, by default disabled to avoid additional costs. When metrics are enable all metrics are created unless explicit configured otherwise. |
object({
enable = optional(bool, false)
namespace = optional(string, "GitHub Runners")
metric = optional(object({
enable_github_app_rate_limit = optional(bool, true)
enable_job_retry = optional(bool, true)
enable_spot_termination_warning = optional(bool, true)
}), {})
})
| `{}` | no | +| [metrics\_namespace](#input\_metrics\_namespace) | The namespace for the metrics created by the module. Merics will only be created if explicit enabled. | `string` | `null` | no | | [minimum\_running\_time\_in\_minutes](#input\_minimum\_running\_time\_in\_minutes) | The time an ec2 action runner should be running at minimum before terminated, if not busy. | `number` | `null` | no | | [pool\_config](#input\_pool\_config) | The configuration for updating the pool. The `pool_size` to adjust to by the events triggered by the `schedule_expression`. For example you can configure a cron expression for weekdays to adjust the pool to 10 and another expression for the weekend to adjust the pool to 1. Use `schedule_expression_timezone` to override the schedule time zone (defaults to UTC). |
list(object({
schedule_expression = string
schedule_expression_timezone = optional(string)
size = number
}))
| `[]` | no | | [pool\_lambda\_memory\_size](#input\_pool\_lambda\_memory\_size) | Memory size limit for scale-up lambda. | `number` | `512` | no | diff --git a/docs/configuration.md b/docs/configuration.md index 53a1dd83ee..0eae2195ec 100644 --- a/docs/configuration.md +++ b/docs/configuration.md @@ -191,6 +191,15 @@ This feature has been disabled by default. The watcher will act on all spot termination notificatins and log all onses relevant to the runner module. Therefor we suggest to only deploy the watcher once. You can either deploy the watcher by enabling in one of your deployments or deploy the watcher as a stand alone module. +## Metrics + +The module supports metrics (experimental feature) to monitor the system. The metrics are disabled by default. To enable the metrics set `metrics.enable = true`. If set to true, all module managed metrics are used, you can configure the one by one via the `metrics` object. The metrics are created in the namespace `GitHub Runners`. + +### Supported metrics + +- **GitHubAppRateLimitRemaining**: Remaining rate limit for the GitHub App. +- **JobRetry**: Number of job retries, only relevant when job retry is enabled. +- **SpotInterruptionWarning**: Number of spot interruption warnings received by the termination watcher, only relevant when the termination watcher is enabled. ## Debugging diff --git a/examples/default/main.tf b/examples/default/main.tf index cb6711e11a..a775872137 100644 --- a/examples/default/main.tf +++ b/examples/default/main.tf @@ -114,21 +114,25 @@ module "runners" { instance_termination_watcher = { enable = true - enable_metric = { - spot_warning = true - } } - # enable job_retry feature. Be careful with this feature, it can lead to API rate limits. + # enable metric creation (experimental) + # metrics = { + # enable = true + # metric = { + # enable_spot_termination_warning = true + # enable_job_retry = false + # enable_github_app_rate_limit = true + # } + # } + + # enable job_retry feature. Be careful with this feature, it can lead to you hitting API rate limits. # job_retry = { # enable = true # max_attempts = 1 # delay_in_seconds = 180 # } - # enable metric creation by the control plane (experimental) - # enable_metrics_control_plane = true - # enable CMK instead of aws managed key for encryptions # kms_key_arn = aws_kms_key.github.arn } diff --git a/examples/multi-runner/main.tf b/examples/multi-runner/main.tf index 0563831d26..2c53ff47cb 100644 --- a/examples/multi-runner/main.tf +++ b/examples/multi-runner/main.tf @@ -103,8 +103,15 @@ module "runners" { # Enable to track the spot instance termination warning # instance_termination_watcher = { # enable = true - # enable_metric = { - # spot_warning = true + # } + + # Enable metrics + # metrics = { + # enable = true + # metric = { + # enable_github_app_rate_limit = true + # enable_job_retry = false + # enable_spot_termination_warning = true # } # } } diff --git a/examples/termination-watcher/main.tf b/examples/termination-watcher/main.tf index 3580b298ed..7877d04bc1 100644 --- a/examples/termination-watcher/main.tf +++ b/examples/termination-watcher/main.tf @@ -2,12 +2,15 @@ module "spot_termination_watchter" { source = "../../modules/termination-watcher" config = { - enable_metric = { - spot_warning = true + metrics = { + enable = true + metric = { + enable_spot_termination_warning = true + } } prefix = "global" tag_filters = { "ghr:Application" = "github-action-runner" } } -} \ No newline at end of file +} diff --git a/lambdas/functions/control-plane/jest.config.ts b/lambdas/functions/control-plane/jest.config.ts index b768636f97..97935de994 100644 --- a/lambdas/functions/control-plane/jest.config.ts +++ b/lambdas/functions/control-plane/jest.config.ts @@ -6,10 +6,10 @@ const config: Config = { ...defaultConfig, coverageThreshold: { global: { - statements: 97.78, - branches: 96.61, - functions: 95.84, - lines: 97.71, + statements: 97.86, + branches: 96.68, + functions: 95.95, + lines: 97.8, }, }, }; diff --git a/lambdas/functions/control-plane/package.json b/lambdas/functions/control-plane/package.json index 7af8cbbae8..26dab45fd4 100644 --- a/lambdas/functions/control-plane/package.json +++ b/lambdas/functions/control-plane/package.json @@ -38,6 +38,7 @@ "ts-node-dev": "^2.0.0" }, "dependencies": { + "@aws-lambda-powertools/parameters": "^2.7.0", "@aws-sdk/client-ec2": "^3.637.0", "@aws-sdk/client-sqs": "^3.637.0", "@aws-sdk/types": "^3.609.0", diff --git a/lambdas/functions/control-plane/src/gh-auth/gh-auth.test.ts b/lambdas/functions/control-plane/src/github/auth.test.ts similarity index 98% rename from lambdas/functions/control-plane/src/gh-auth/gh-auth.test.ts rename to lambdas/functions/control-plane/src/github/auth.test.ts index 8930f368d6..f511d08958 100644 --- a/lambdas/functions/control-plane/src/gh-auth/gh-auth.test.ts +++ b/lambdas/functions/control-plane/src/github/auth.test.ts @@ -7,7 +7,7 @@ import { mocked } from 'jest-mock'; import { MockProxy, mock } from 'jest-mock-extended'; import nock from 'nock'; -import { createGithubAppAuth, createOctokitClient } from './gh-auth'; +import { createGithubAppAuth, createOctokitClient } from './auth'; jest.mock('@terraform-aws-github-runner/aws-ssm-util'); jest.mock('@octokit/auth-app'); diff --git a/lambdas/functions/control-plane/src/gh-auth/gh-auth.ts b/lambdas/functions/control-plane/src/github/auth.ts similarity index 100% rename from lambdas/functions/control-plane/src/gh-auth/gh-auth.ts rename to lambdas/functions/control-plane/src/github/auth.ts diff --git a/lambdas/functions/control-plane/src/gh-auth/gh-octokit.test.ts b/lambdas/functions/control-plane/src/github/octokit.test.ts similarity index 95% rename from lambdas/functions/control-plane/src/gh-auth/gh-octokit.test.ts rename to lambdas/functions/control-plane/src/github/octokit.test.ts index 161b787dbb..3f22c3c4bd 100644 --- a/lambdas/functions/control-plane/src/gh-auth/gh-octokit.test.ts +++ b/lambdas/functions/control-plane/src/github/octokit.test.ts @@ -1,6 +1,6 @@ import { Octokit } from '@octokit/rest'; import { ActionRequestMessage } from '../scale-runners/scale-up'; -import { getOctokit } from './gh-octokit'; +import { getOctokit } from './octokit'; const mockOctokit = { apps: { @@ -9,7 +9,7 @@ const mockOctokit = { }, }; -jest.mock('../gh-auth/gh-auth', () => ({ +jest.mock('../github/auth', () => ({ createGithubInstallationAuth: jest.fn().mockImplementation(async (installationId) => { return { token: 'token', type: 'installation', installationId: installationId }; }), @@ -21,7 +21,7 @@ jest.mock('@octokit/rest', () => ({ Octokit: jest.fn().mockImplementation(() => mockOctokit), })); -jest.mock('../gh-auth/gh-auth'); +jest.mock('../github/auth'); describe('Test getOctokit', () => { const data = [ diff --git a/lambdas/functions/control-plane/src/gh-auth/gh-octokit.ts b/lambdas/functions/control-plane/src/github/octokit.ts similarity index 97% rename from lambdas/functions/control-plane/src/gh-auth/gh-octokit.ts rename to lambdas/functions/control-plane/src/github/octokit.ts index 4fed2e17fe..a2cce5f55d 100644 --- a/lambdas/functions/control-plane/src/gh-auth/gh-octokit.ts +++ b/lambdas/functions/control-plane/src/github/octokit.ts @@ -1,6 +1,6 @@ import { Octokit } from '@octokit/rest'; import { ActionRequestMessage } from '../scale-runners/scale-up'; -import { createGithubAppAuth, createGithubInstallationAuth, createOctokitClient } from './gh-auth'; +import { createGithubAppAuth, createGithubInstallationAuth, createOctokitClient } from './auth'; export async function getInstallationId( ghesApiUrl: string, diff --git a/lambdas/functions/control-plane/src/github/rate-limit.test.ts b/lambdas/functions/control-plane/src/github/rate-limit.test.ts new file mode 100644 index 0000000000..cf23eb83b5 --- /dev/null +++ b/lambdas/functions/control-plane/src/github/rate-limit.test.ts @@ -0,0 +1,70 @@ +import { ResponseHeaders } from '@octokit/types'; +import { createSingleMetric } from '@terraform-aws-github-runner/aws-powertools-util'; +import { MetricUnit } from '@aws-lambda-powertools/metrics'; +import { metricGitHubAppRateLimit } from './rate-limit'; + +process.env.PARAMETER_GITHUB_APP_ID_NAME = 'test'; +jest.mock('@terraform-aws-github-runner/aws-ssm-util', () => ({ + ...jest.requireActual('@terraform-aws-github-runner/aws-ssm-util'), + // get parameter name from process.env.PARAMETER_GITHUB_APP_ID_NAME rerunt 1234 + getParameter: jest.fn((name: string) => { + if (name === process.env.PARAMETER_GITHUB_APP_ID_NAME) { + return '1234'; + } else { + return ''; + } + }), +})); + +jest.mock('@terraform-aws-github-runner/aws-powertools-util', () => ({ + ...jest.requireActual('@terraform-aws-github-runner/aws-powertools-util'), + // eslint-disable-next-line @typescript-eslint/no-unused-vars + createSingleMetric: jest.fn((name: string, unit: string, value: number, dimensions?: Record) => { + return { + addMetadata: jest.fn(), + }; + }), +})); + +describe('metricGitHubAppRateLimit', () => { + beforeEach(() => { + jest.clearAllMocks(); + }); + + it('should update rate limit metric', async () => { + // set process.env.ENABLE_METRIC_GITHUB_APP_RATE_LIMIT to true + process.env.ENABLE_METRIC_GITHUB_APP_RATE_LIMIT = 'true'; + const headers: ResponseHeaders = { + 'x-ratelimit-remaining': '10', + 'x-ratelimit-limit': '60', + }; + + await metricGitHubAppRateLimit(headers); + + expect(createSingleMetric).toHaveBeenCalledWith('GitHubAppRateLimitRemaining', MetricUnit.Count, 10, { + AppId: '1234', + }); + }); + + it('should not update rate limit metric', async () => { + // set process.env.ENABLE_METRIC_GITHUB_APP_RATE_LIMIT to false + process.env.ENABLE_METRIC_GITHUB_APP_RATE_LIMIT = 'false'; + const headers: ResponseHeaders = { + 'x-ratelimit-remaining': '10', + 'x-ratelimit-limit': '60', + }; + + await metricGitHubAppRateLimit(headers); + + expect(createSingleMetric).not.toHaveBeenCalled(); + }); + + it('should not update rate limit metric if headers are undefined', async () => { + // set process.env.ENABLE_METRIC_GITHUB_APP_RATE_LIMIT to true + process.env.ENABLE_METRIC_GITHUB_APP_RATE_LIMIT = 'true'; + + await metricGitHubAppRateLimit(undefined as unknown as ResponseHeaders); + + expect(createSingleMetric).not.toHaveBeenCalled(); + }); +}); diff --git a/lambdas/functions/control-plane/src/github/rate-limit.ts b/lambdas/functions/control-plane/src/github/rate-limit.ts new file mode 100644 index 0000000000..aab8aa51d2 --- /dev/null +++ b/lambdas/functions/control-plane/src/github/rate-limit.ts @@ -0,0 +1,25 @@ +import { ResponseHeaders } from '@octokit/types'; +import { createSingleMetric, logger } from '@terraform-aws-github-runner/aws-powertools-util'; +import { MetricUnit } from '@aws-lambda-powertools/metrics'; +import yn from 'yn'; +import { getParameter } from '@terraform-aws-github-runner/aws-ssm-util'; + +export async function metricGitHubAppRateLimit(headers: ResponseHeaders): Promise { + try { + const remaining = parseInt(headers['x-ratelimit-remaining'] as string); + const limit = parseInt(headers['x-ratelimit-limit'] as string); + + logger.debug(`Rate limit remaining: ${remaining}, limit: ${limit}`); + + const updateMetric = yn(process.env.ENABLE_METRIC_GITHUB_APP_RATE_LIMIT); + if (updateMetric) { + const appId = await getParameter(process.env.PARAMETER_GITHUB_APP_ID_NAME); + const metric = createSingleMetric('GitHubAppRateLimitRemaining', MetricUnit.Count, remaining, { + AppId: appId, + }); + metric.addMetadata('AppId', appId); + } + } catch (e) { + logger.debug(`Error updating rate limit metric`, { error: e }); + } +} diff --git a/lambdas/functions/control-plane/src/modules.d.ts b/lambdas/functions/control-plane/src/modules.d.ts index 319afa7755..7570f29035 100644 --- a/lambdas/functions/control-plane/src/modules.d.ts +++ b/lambdas/functions/control-plane/src/modules.d.ts @@ -1,6 +1,7 @@ declare namespace NodeJS { export interface ProcessEnv { AWS_REGION: string; + ENABLE_METRIC_GITHUB_APP_RATE_LIMIT: string; ENABLE_ON_DEMAND_FAILOVER_FOR_ERRORS: string; ENVIRONMENT: string; GHES_URL: string; diff --git a/lambdas/functions/control-plane/src/pool/pool.test.ts b/lambdas/functions/control-plane/src/pool/pool.test.ts index 64feb47044..a7ee7b9797 100644 --- a/lambdas/functions/control-plane/src/pool/pool.test.ts +++ b/lambdas/functions/control-plane/src/pool/pool.test.ts @@ -4,7 +4,7 @@ import moment from 'moment-timezone'; import nock from 'nock'; import { listEC2Runners } from '../aws/runners'; -import * as ghAuth from '../gh-auth/gh-auth'; +import * as ghAuth from '../github/auth'; import { createRunners } from '../scale-runners/scale-up'; import { adjust } from './pool'; @@ -27,7 +27,7 @@ jest.mock('./../aws/runners', () => ({ ...jest.requireActual('./../aws/runners'), listEC2Runners: jest.fn(), })); -jest.mock('./../gh-auth/gh-auth'); +jest.mock('./../github/auth'); jest.mock('./../scale-runners/scale-up'); const mocktokit = Octokit as jest.MockedClass; diff --git a/lambdas/functions/control-plane/src/pool/pool.ts b/lambdas/functions/control-plane/src/pool/pool.ts index 543280641b..93fbfbb4db 100644 --- a/lambdas/functions/control-plane/src/pool/pool.ts +++ b/lambdas/functions/control-plane/src/pool/pool.ts @@ -4,7 +4,7 @@ import yn from 'yn'; import { bootTimeExceeded, listEC2Runners } from '../aws/runners'; import { RunnerList } from '../aws/runners.d'; -import { createGithubAppAuth, createGithubInstallationAuth, createOctokitClient } from '../gh-auth/gh-auth'; +import { createGithubAppAuth, createGithubInstallationAuth, createOctokitClient } from '../github/auth'; import { createRunners } from '../scale-runners/scale-up'; const logger = createChildLogger('pool'); diff --git a/lambdas/functions/control-plane/src/scale-runners/job-retry.test.ts b/lambdas/functions/control-plane/src/scale-runners/job-retry.test.ts index 9317e42292..ab6d9ef052 100644 --- a/lambdas/functions/control-plane/src/scale-runners/job-retry.test.ts +++ b/lambdas/functions/control-plane/src/scale-runners/job-retry.test.ts @@ -1,7 +1,7 @@ import { publishMessage } from '../aws/sqs'; import { publishRetryMessage, checkAndRetryJob } from './job-retry'; import { ActionRequestMessage, ActionRequestMessageRetry } from './scale-up'; -import { getOctokit } from '../gh-auth/gh-octokit'; +import { getOctokit } from '../github/octokit'; import { Octokit } from '@octokit/rest'; import { mocked } from 'jest-mock'; import { createSingleMetric } from '@terraform-aws-github-runner/aws-powertools-util'; @@ -34,7 +34,7 @@ const mockOctokit = { jest.mock('@octokit/rest', () => ({ Octokit: jest.fn().mockImplementation(() => mockOctokit), })); -jest.mock('../gh-auth/gh-octokit'); +jest.mock('../github/octokit'); const mockCreateOctokitClient = mocked(getOctokit, { shallow: false }); mockCreateOctokitClient.mockResolvedValue(new (Octokit as jest.MockedClass)()); @@ -179,7 +179,7 @@ describe(`Test job retry check`, () => { process.env.ENABLE_ORGANIZATION_RUNNERS = 'true'; process.env.ENVIRONMENT = 'test'; process.env.RUNNER_NAME_PREFIX = 'test'; - process.env.ENABLE_METRICS = 'true'; + process.env.ENABLE_METRIC_JOB_RETRY = 'true'; process.env.JOB_QUEUE_SCALE_UP_URL = 'https://sqs.eu-west-1.amazonaws.com/123456789/webhook_events_workflow_job_queue'; diff --git a/lambdas/functions/control-plane/src/scale-runners/job-retry.ts b/lambdas/functions/control-plane/src/scale-runners/job-retry.ts index 6a2cbe5887..7ccf9c29bb 100644 --- a/lambdas/functions/control-plane/src/scale-runners/job-retry.ts +++ b/lambdas/functions/control-plane/src/scale-runners/job-retry.ts @@ -5,7 +5,7 @@ import { } from '@terraform-aws-github-runner/aws-powertools-util'; import { publishMessage } from '../aws/sqs'; import { ActionRequestMessage, ActionRequestMessageRetry, getGitHubEnterpriseApiUrl, isJobQueued } from './scale-up'; -import { getOctokit } from '../gh-auth/gh-octokit'; +import { getOctokit } from '../github/octokit'; import { MetricUnit } from '@aws-lambda-powertools/metrics'; import yn from 'yn'; @@ -46,7 +46,7 @@ export async function checkAndRetryJob(payload: ActionRequestMessageRetry): Prom const runnerOwner = enableOrgLevel ? payload.repositoryOwner : `${payload.repositoryOwner}/${payload.repositoryName}`; const runnerNamePrefix = process.env.RUNNER_NAME_PREFIX ?? ''; const jobQueueUrl = process.env.JOB_QUEUE_SCALE_UP_URL ?? ''; - const enableMetrics = yn(process.env.ENABLE_METRICS, { default: false }); + const enableMetrics = yn(process.env.ENABLE_METRIC_JOB_RETRY, { default: false }); const environment = process.env.ENVIRONMENT; addPersistentContextToChildLogger({ diff --git a/lambdas/functions/control-plane/src/scale-runners/scale-down.test.ts b/lambdas/functions/control-plane/src/scale-runners/scale-down.test.ts index c2255f8839..680247ced0 100644 --- a/lambdas/functions/control-plane/src/scale-runners/scale-down.test.ts +++ b/lambdas/functions/control-plane/src/scale-runners/scale-down.test.ts @@ -4,7 +4,7 @@ import moment from 'moment'; import nock from 'nock'; import { RunnerInfo, RunnerList } from '../aws/runners.d'; -import * as ghAuth from '../gh-auth/gh-auth'; +import * as ghAuth from '../github/auth'; import { listEC2Runners, terminateRunner, tag } from './../aws/runners'; import { githubCache } from './cache'; import { newestFirstStrategy, oldestFirstStrategy, scaleDown } from './scale-down'; @@ -34,7 +34,7 @@ jest.mock('./../aws/runners', () => ({ terminateRunner: jest.fn(), listEC2Runners: jest.fn(), })); -jest.mock('./../gh-auth/gh-auth'); +jest.mock('./../github/auth'); jest.mock('./cache'); const mocktokit = Octokit as jest.MockedClass; diff --git a/lambdas/functions/control-plane/src/scale-runners/scale-down.ts b/lambdas/functions/control-plane/src/scale-runners/scale-down.ts index 8b222b6a47..1688226f84 100644 --- a/lambdas/functions/control-plane/src/scale-runners/scale-down.ts +++ b/lambdas/functions/control-plane/src/scale-runners/scale-down.ts @@ -2,11 +2,12 @@ import { Octokit } from '@octokit/rest'; import { createChildLogger } from '@terraform-aws-github-runner/aws-powertools-util'; import moment from 'moment'; -import { createGithubAppAuth, createGithubInstallationAuth, createOctokitClient } from '../gh-auth/gh-auth'; +import { createGithubAppAuth, createGithubInstallationAuth, createOctokitClient } from '../github/auth'; import { bootTimeExceeded, listEC2Runners, tag, terminateRunner } from './../aws/runners'; import { RunnerInfo, RunnerList } from './../aws/runners.d'; import { GhRunners, githubCache } from './cache'; import { ScalingDownConfig, getEvictionStrategy, getIdleRunnerCount } from './scale-down-config'; +import { metricGitHubAppRateLimit } from '../github/rate-limit'; const logger = createChildLogger('scale-down'); @@ -63,6 +64,8 @@ async function getGitHubRunnerBusyState(client: Octokit, ec2runner: RunnerInfo, logger.info(`Runner '${ec2runner.instanceId}' - GitHub Runner ID '${runnerId}' - Busy: ${state.data.busy}`); + metricGitHubAppRateLimit(state.headers); + return state.data.busy; } diff --git a/lambdas/functions/control-plane/src/scale-runners/scale-up.test.ts b/lambdas/functions/control-plane/src/scale-runners/scale-up.test.ts index 0376d70671..83cc20faff 100644 --- a/lambdas/functions/control-plane/src/scale-runners/scale-up.test.ts +++ b/lambdas/functions/control-plane/src/scale-runners/scale-up.test.ts @@ -1,4 +1,4 @@ -import { GetParameterCommand, PutParameterCommand, SSMClient } from '@aws-sdk/client-ssm'; +import { PutParameterCommand, SSMClient } from '@aws-sdk/client-ssm'; import { Octokit } from '@octokit/rest'; import { mockClient } from 'aws-sdk-client-mock'; import 'aws-sdk-client-mock-jest'; @@ -6,11 +6,12 @@ import { mocked } from 'jest-mock'; import nock from 'nock'; import { performance } from 'perf_hooks'; -import * as ghAuth from '../gh-auth/gh-auth'; +import * as ghAuth from '../github/auth'; import { createRunner, listEC2Runners } from './../aws/runners'; import { RunnerInputParameters } from './../aws/runners.d'; import ScaleError from './ScaleError'; import * as scaleUpModule from './scale-up'; +import { getParameter } from '@terraform-aws-github-runner/aws-ssm-util'; const mockOctokit = { paginate: jest.fn(), @@ -30,13 +31,20 @@ const mockOctokit = { const mockCreateRunner = mocked(createRunner); const mockListRunners = mocked(listEC2Runners); const mockSSMClient = mockClient(SSMClient); +const mockSSMgetParameter = mocked(getParameter); jest.mock('@octokit/rest', () => ({ Octokit: jest.fn().mockImplementation(() => mockOctokit), })); jest.mock('./../aws/runners'); -jest.mock('./../gh-auth/gh-auth'); +jest.mock('./../github/auth'); + +jest.mock('@terraform-aws-github-runner/aws-ssm-util', () => ({ + ...jest.requireActual('@terraform-aws-github-runner/aws-ssm-util'), + getParameter: jest.fn(), +})); + export type RunnerType = 'ephemeral' | 'non-ephemeral'; // for ephemeral and non-ephemeral runners @@ -77,6 +85,7 @@ let expectedRunnerParams: RunnerInputParameters; function setDefaults() { process.env = { ...cleanEnv }; + process.env.PARAMETER_GITHUB_APP_ID_NAME = 'github-app-id'; process.env.GITHUB_APP_KEY_BASE64 = 'TEST_CERTIFICATE_DATA'; process.env.GITHUB_APP_ID = '1337'; process.env.GITHUB_APP_CLIENT_ID = 'TEST_CLIENT_ID'; @@ -96,52 +105,9 @@ beforeEach(() => { jest.clearAllMocks(); setDefaults(); - mockOctokit.actions.getJobForWorkflowRun.mockImplementation(() => ({ - data: { - status: 'queued', - }, - })); - mockOctokit.paginate.mockImplementation(() => [ - { - id: 1, - name: 'Default', - }, - ]); - mockOctokit.actions.generateRunnerJitconfigForOrg.mockImplementation(() => ({ - data: { - encoded_jit_config: 'TEST_JIT_CONFIG_ORG', - }, - })); - mockOctokit.actions.generateRunnerJitconfigForRepo.mockImplementation(() => ({ - data: { - encoded_jit_config: 'TEST_JIT_CONFIG_REPO', - }, - })); - mockOctokit.checks.get.mockImplementation(() => ({ - data: { - status: 'queued', - }, - })); - const mockTokenReturnValue = { - data: { - token: '1234abcd', - }, - }; - const mockInstallationIdReturnValueOrgs = { - data: { - id: TEST_DATA.installationId, - }, - }; - const mockInstallationIdReturnValueRepos = { - data: { - id: TEST_DATA.installationId, - }, - }; + defaultSSMGetParameterMockImpl(); + defaultOctokitMockImpl(); - mockOctokit.actions.createRegistrationTokenForOrg.mockImplementation(() => mockTokenReturnValue); - mockOctokit.actions.createRegistrationTokenForRepo.mockImplementation(() => mockTokenReturnValue); - mockOctokit.apps.getOrgInstallation.mockImplementation(() => mockInstallationIdReturnValueOrgs); - mockOctokit.apps.getRepoInstallation.mockImplementation(() => mockInstallationIdReturnValueRepos); mockCreateRunner.mockImplementation(async () => { return ['i-12345']; }); @@ -213,7 +179,6 @@ describe('scaleUp with GHES', () => { expectedRunnerParams = { ...EXPECTED_RUNNER_PARAMS }; mockSSMClient.reset(); - mockSSMClient.on(GetParameterCommand).resolves({ Parameter: { Value: '1' } }); }); it('gets the current org level runners', async () => { @@ -270,7 +235,9 @@ describe('scaleUp with GHES', () => { it('Throws an error if runner group doesnt exist for ephemeral runners', async () => { process.env.RUNNER_GROUP_NAME = 'test-runner-group'; - mockSSMClient.on(GetParameterCommand).rejects(); + mockSSMgetParameter.mockImplementation(async () => { + throw new Error('ParameterNotFound'); + }); await expect(scaleUpModule.scaleUp('aws:sqs', TEST_DATA)).rejects.toBeInstanceOf(Error); expect(mockOctokit.paginate).toHaveBeenCalledTimes(1); }); @@ -284,7 +251,9 @@ describe('scaleUp with GHES', () => { }); it('create SSM parameter for runner group id if it doesnt exist', async () => { - mockSSMClient.on(GetParameterCommand).rejects(); + mockSSMgetParameter.mockImplementation(async () => { + throw new Error('ParameterNotFound'); + }); await scaleUpModule.scaleUp('aws:sqs', TEST_DATA); expect(mockOctokit.paginate).toHaveBeenCalledTimes(1); expect(mockSSMClient).toHaveReceivedCommandTimes(PutParameterCommand, 2); @@ -295,8 +264,7 @@ describe('scaleUp with GHES', () => { }); }); - it('Doesnt create SSM parameter for runner group id if it exists', async () => { - mockSSMClient.on(GetParameterCommand).resolves({ Parameter: { Value: '1' } }); + it('Does not create SSM parameter for runner group id if it exists', async () => { await scaleUpModule.scaleUp('aws:sqs', TEST_DATA); expect(mockOctokit.paginate).toHaveBeenCalledTimes(0); expect(mockSSMClient).toHaveReceivedCommandTimes(PutParameterCommand, 1); @@ -304,7 +272,7 @@ describe('scaleUp with GHES', () => { it('create start runner config for ephemeral runners ', async () => { process.env.RUNNERS_MAXIMUM_COUNT = '2'; - mockSSMClient.on(GetParameterCommand).resolves({ Parameter: { Value: '1' } }); + await scaleUpModule.scaleUp('aws:sqs', TEST_DATA); expect(mockOctokit.actions.generateRunnerJitconfigForOrg).toBeCalledWith({ org: TEST_DATA.repositoryOwner, @@ -356,7 +324,6 @@ describe('scaleUp with GHES', () => { mockListRunners.mockImplementation(async () => { return []; }); - mockSSMClient.on(GetParameterCommand).resolves({ Parameter: { Value: '1' } }); const startTime = performance.now(); const instances = [ 'i-1234', @@ -707,3 +674,65 @@ describe('scaleUp with public GH', () => { }); }); }); + +function defaultOctokitMockImpl() { + mockOctokit.actions.getJobForWorkflowRun.mockImplementation(() => ({ + data: { + status: 'queued', + }, + })); + mockOctokit.paginate.mockImplementation(() => [ + { + id: 1, + name: 'Default', + }, + ]); + mockOctokit.actions.generateRunnerJitconfigForOrg.mockImplementation(() => ({ + data: { + encoded_jit_config: 'TEST_JIT_CONFIG_ORG', + }, + })); + mockOctokit.actions.generateRunnerJitconfigForRepo.mockImplementation(() => ({ + data: { + encoded_jit_config: 'TEST_JIT_CONFIG_REPO', + }, + })); + mockOctokit.checks.get.mockImplementation(() => ({ + data: { + status: 'queued', + }, + })); + + const mockTokenReturnValue = { + data: { + token: '1234abcd', + }, + }; + const mockInstallationIdReturnValueOrgs = { + data: { + id: TEST_DATA.installationId, + }, + }; + const mockInstallationIdReturnValueRepos = { + data: { + id: TEST_DATA.installationId, + }, + }; + + mockOctokit.actions.createRegistrationTokenForOrg.mockImplementation(() => mockTokenReturnValue); + mockOctokit.actions.createRegistrationTokenForRepo.mockImplementation(() => mockTokenReturnValue); + mockOctokit.apps.getOrgInstallation.mockImplementation(() => mockInstallationIdReturnValueOrgs); + mockOctokit.apps.getRepoInstallation.mockImplementation(() => mockInstallationIdReturnValueRepos); +} + +function defaultSSMGetParameterMockImpl() { + mockSSMgetParameter.mockImplementation(async (name: string) => { + if (name === `${process.env.SSM_CONFIG_PATH}/runner-group/${process.env.RUNNER_GROUP_NAME}`) { + return '1'; + } else if (name === `${process.env.PARAMETER_GITHUB_APP_ID_NAME}`) { + return `${process.env.GITHUB_APP_ID}`; + } else { + throw new Error(`ParameterNotFound: ${name}`); + } + }); +} diff --git a/lambdas/functions/control-plane/src/scale-runners/scale-up.ts b/lambdas/functions/control-plane/src/scale-runners/scale-up.ts index 6503a1255e..ab91b64cf5 100644 --- a/lambdas/functions/control-plane/src/scale-runners/scale-up.ts +++ b/lambdas/functions/control-plane/src/scale-runners/scale-up.ts @@ -3,11 +3,12 @@ import { addPersistentContextToChildLogger, createChildLogger } from '@terraform import { getParameter, putParameter } from '@terraform-aws-github-runner/aws-ssm-util'; import yn from 'yn'; -import { createGithubAppAuth, createGithubInstallationAuth, createOctokitClient } from '../gh-auth/gh-auth'; +import { createGithubAppAuth, createGithubInstallationAuth, createOctokitClient } from '../github/auth'; import { createRunner, listEC2Runners } from './../aws/runners'; import { RunnerInputParameters } from './../aws/runners.d'; import ScaleError from './ScaleError'; import { publishRetryMessage } from './job-retry'; +import { metricGitHubAppRateLimit } from '../github/rate-limit'; const logger = createChildLogger('scale-up'); @@ -94,6 +95,9 @@ async function getGithubRunnerRegistrationToken(githubRunnerConfig: CreateGitHub owner: githubRunnerConfig.runnerOwner.split('/')[0], repo: githubRunnerConfig.runnerOwner.split('/')[1], }); + + const appId = parseInt(await getParameter(process.env.PARAMETER_GITHUB_APP_ID_NAME)); + logger.info('App id from SSM', { appId: appId }); return registrationToken.data.token; } @@ -142,6 +146,7 @@ export async function isJobQueued(githubInstallationClient: Octokit, payload: Ac owner: payload.repositoryOwner, repo: payload.repositoryName, }); + metricGitHubAppRateLimit(jobForWorkflowRun.headers); isQueued = jobForWorkflowRun.data.status === 'queued'; } else { throw Error(`Event ${payload.eventType} is not supported`); @@ -169,7 +174,7 @@ async function getRunnerGroupId(githubRunnerConfig: CreateGitHubRunnerConfig, gh } if (runnerGroup === undefined) { // get runner group id from GitHub - runnerGroupId = await GetRunnerGroupByName(ghClient, githubRunnerConfig); + runnerGroupId = await getRunnerGroupByName(ghClient, githubRunnerConfig); // store runner group id in SSM try { await putParameter( @@ -188,7 +193,7 @@ async function getRunnerGroupId(githubRunnerConfig: CreateGitHubRunnerConfig, gh return runnerGroupId; } -async function GetRunnerGroupByName(ghClient: Octokit, githubRunnerConfig: CreateGitHubRunnerConfig): Promise { +async function getRunnerGroupByName(ghClient: Octokit, githubRunnerConfig: CreateGitHubRunnerConfig): Promise { const runnerGroups: RunnerGroup[] = await ghClient.paginate(`GET /orgs/{org}/actions/runner-groups`, { org: githubRunnerConfig.runnerOwner, per_page: 100, @@ -432,6 +437,8 @@ async function createJitConfig(githubRunnerConfig: CreateGitHubRunnerConfig, ins labels: ephemeralRunnerConfig.runnerLabels, }); + metricGitHubAppRateLimit(runnerConfig.headers); + // store jit config in ssm parameter store logger.debug('Runner JIT config for ephemeral runner generated.', { instance: instance, diff --git a/lambdas/libs/aws-ssm-util/src/index.test.ts b/lambdas/libs/aws-ssm-util/src/index.test.ts index 07dbc4aa3f..ee4cd8ec2f 100644 --- a/lambdas/libs/aws-ssm-util/src/index.test.ts +++ b/lambdas/libs/aws-ssm-util/src/index.test.ts @@ -136,9 +136,6 @@ describe('Test getParameter and putParameter', () => { mockSSMClient.on(GetParameterCommand).resolves(output); // Act - const result = await getParameter(parameterName); - - // Assert - expect(result).toBe(undefined); + await expect(getParameter(parameterName)).rejects.toThrow(`Parameter ${parameterName} not found`); }); }); diff --git a/lambdas/libs/aws-ssm-util/src/index.ts b/lambdas/libs/aws-ssm-util/src/index.ts index 3824926d09..c60bb35424 100644 --- a/lambdas/libs/aws-ssm-util/src/index.ts +++ b/lambdas/libs/aws-ssm-util/src/index.ts @@ -1,10 +1,20 @@ -import { GetParameterCommand, PutParameterCommand, SSMClient, Tag } from '@aws-sdk/client-ssm'; +import { PutParameterCommand, SSMClient, Tag } from '@aws-sdk/client-ssm'; import { getTracedAWSV3Client } from '@terraform-aws-github-runner/aws-powertools-util'; +import { SSMProvider } from '@aws-lambda-powertools/parameters/ssm'; export async function getParameter(parameter_name: string): Promise { - const client = getTracedAWSV3Client(new SSMClient({ region: process.env.AWS_REGION })); - return (await client.send(new GetParameterCommand({ Name: parameter_name, WithDecryption: true }))).Parameter - ?.Value as string; + const ssmClient = getTracedAWSV3Client(new SSMClient({ region: process.env.AWS_REGION })); + const client = new SSMProvider({ awsSdkV3Client: ssmClient }); //getTracedAWSV3Client(); + const result = await client.get(parameter_name, { + decrypt: true, + maxAge: 30, // 30 seconds override default 5 seconds + }); + + // throw error if result is undefined + if (!result) { + throw new Error(`Parameter ${parameter_name} not found`); + } + return result; } export async function putParameter( diff --git a/lambdas/yarn.lock b/lambdas/yarn.lock index 8362882ecb..3286cbac2a 100644 --- a/lambdas/yarn.lock +++ b/lambdas/yarn.lock @@ -133,6 +133,35 @@ __metadata: languageName: node linkType: hard +"@aws-lambda-powertools/parameters@npm:^2.7.0": + version: 2.7.0 + resolution: "@aws-lambda-powertools/parameters@npm:2.7.0" + dependencies: + "@aws-lambda-powertools/commons": "npm:^2.7.0" + peerDependencies: + "@aws-sdk/client-appconfigdata": ">=3.x" + "@aws-sdk/client-dynamodb": ">=3.x" + "@aws-sdk/client-secrets-manager": ">=3.x" + "@aws-sdk/client-ssm": ">=3.x" + "@aws-sdk/util-dynamodb": ">=3.x" + "@middy/core": 4.x || 5.x + peerDependenciesMeta: + "@aws-sdk/client-appconfigdata": + optional: true + "@aws-sdk/client-dynamodb": + optional: true + "@aws-sdk/client-secrets-manager": + optional: true + "@aws-sdk/client-ssm": + optional: true + "@aws-sdk/util-dynamodb": + optional: true + "@middy/core": + optional: true + checksum: 10c0/7fc65a6ef975bfa2973a5babbc4e85a19e809e01b4209be491686d625810596ae51a8a4c99ff4877e898df3e045e870431677373c8f7415bcca353ddc4aab943 + languageName: node + linkType: hard + "@aws-lambda-powertools/tracer@npm:^2.7.0": version: 2.7.0 resolution: "@aws-lambda-powertools/tracer@npm:2.7.0" @@ -4815,6 +4844,7 @@ __metadata: version: 0.0.0-use.local resolution: "@terraform-aws-github-runner/control-plane@workspace:functions/control-plane" dependencies: + "@aws-lambda-powertools/parameters": "npm:^2.7.0" "@aws-sdk/client-ec2": "npm:^3.637.0" "@aws-sdk/client-sqs": "npm:^3.637.0" "@aws-sdk/types": "npm:^3.609.0" @@ -5178,16 +5208,7 @@ __metadata: languageName: node linkType: hard -"@types/node@npm:*": - version: 22.0.2 - resolution: "@types/node@npm:22.0.2" - dependencies: - undici-types: "npm:~6.11.1" - checksum: 10c0/59ee26fb1104674b2e23981d7569ad113aa8ee23c8449af8e4312aa9352ac738c5ffd0ae4d8077db0467704a3b9ccc662048e39716cb5ad51cdb24d106c7871b - languageName: node - linkType: hard - -"@types/node@npm:^22.4.1": +"@types/node@npm:*, @types/node@npm:^22.4.1": version: 22.4.1 resolution: "@types/node@npm:22.4.1" dependencies: @@ -5758,29 +5779,7 @@ __metadata: languageName: node linkType: hard -"axios@npm:^1.7.2": - version: 1.7.2 - resolution: "axios@npm:1.7.2" - dependencies: - follow-redirects: "npm:^1.15.6" - form-data: "npm:^4.0.0" - proxy-from-env: "npm:^1.1.0" - checksum: 10c0/cbd47ce380fe045313364e740bb03b936420b8b5558c7ea36a4563db1258c658f05e40feb5ddd41f6633fdd96d37ac2a76f884dad599c5b0224b4c451b3fa7ae - languageName: node - linkType: hard - -"axios@npm:^1.7.4": - version: 1.7.4 - resolution: "axios@npm:1.7.4" - dependencies: - follow-redirects: "npm:^1.15.6" - form-data: "npm:^4.0.0" - proxy-from-env: "npm:^1.1.0" - checksum: 10c0/5ea1a93140ca1d49db25ef8e1bd8cfc59da6f9220159a944168860ad15a2743ea21c5df2967795acb15cbe81362f5b157fdebbea39d53117ca27658bab9f7f17 - languageName: node - linkType: hard - -"axios@npm:^1.7.5": +"axios@npm:^1.7.2, axios@npm:^1.7.4, axios@npm:^1.7.5": version: 1.7.5 resolution: "axios@npm:1.7.5" dependencies: @@ -11040,13 +11039,6 @@ __metadata: languageName: node linkType: hard -"undici-types@npm:~6.11.1": - version: 6.11.1 - resolution: "undici-types@npm:6.11.1" - checksum: 10c0/d8f5739a8e6c779d72336c82deb49c56d5ac9f9f6e0eb2e8dd4d3f6929ae9db7cde370d2e46516fe6cad04ea53e790c5e16c4c75eed7cd0f9bd31b0763bb2fa3 - languageName: node - linkType: hard - "undici-types@npm:~6.19.2": version: 6.19.8 resolution: "undici-types@npm:6.19.8" diff --git a/main.tf b/main.tf index f23f2a3ebe..25346b30bd 100644 --- a/main.tf +++ b/main.tf @@ -293,10 +293,7 @@ module "runners" { ssm_housekeeper = var.runners_ssm_housekeeper ebs_optimized = var.runners_ebs_optimized - metrics_config = { - namespace = var.metrics_namespace - enable = var.enable_metrics_control_plane - } + metrics = var.metrics job_retry = var.job_retry } @@ -394,9 +391,9 @@ locals { logging_retention_in_days = var.logging_retention_in_days role_path = var.role_path role_permissions_boundary = var.role_permissions_boundary - metrics_namespace = var.metrics_namespace s3_bucket = var.lambda_s3_bucket tracing_config = var.tracing_config + metrics = var.metrics } } diff --git a/modules/multi-runner/README.md b/modules/multi-runner/README.md index c3bb9ae179..896b27d5a5 100644 --- a/modules/multi-runner/README.md +++ b/modules/multi-runner/README.md @@ -136,7 +136,7 @@ module "multi-runner" { | [ghes\_url](#input\_ghes\_url) | GitHub Enterprise Server URL. Example: https://github.internal.co - DO NOT SET IF USING PUBLIC GITHUB | `string` | `null` | no | | [github\_app](#input\_github\_app) | GitHub app parameters, see your github app. Ensure the key is the base64-encoded `.pem` file (the output of `base64 app.private-key.pem`, not the content of `private-key.pem`). |
object({
key_base64 = string
id = string
webhook_secret = string
})
| n/a | yes | | [instance\_profile\_path](#input\_instance\_profile\_path) | The path that will be added to the instance\_profile, if not set the environment name will be used. | `string` | `null` | no | -| [instance\_termination\_watcher](#input\_instance\_termination\_watcher) | Configuration for the spot termination watcher lambda function. This feature is Beta, changes will not trigger a major release as long in beta.

`enable`: Enable or disable the spot termination watcher.
'enable\_metrics': Enable metric for the lambda. If `spot_warning` is set to true, the lambda will emit a metric when it detects a spot termination warning.
`memory_size`: Memory size linit in MB of the lambda.
`s3_key`: S3 key for syncer lambda function. Required if using S3 bucket to specify lambdas.
`s3_object_version`: S3 object version for syncer lambda function. Useful if S3 versioning is enabled on source bucket.
`timeout`: Time out of the lambda in seconds.
`zip`: File location of the lambda zip file. |
object({
enable = optional(bool, false)
enable_metric = optional(object({
spot_warning = optional(bool, false)
}))
memory_size = optional(number, null)
s3_key = optional(string, null)
s3_object_version = optional(string, null)
timeout = optional(number, null)
zip = optional(string, null)
})
| `{}` | no | +| [instance\_termination\_watcher](#input\_instance\_termination\_watcher) | Configuration for the spot termination watcher lambda function. This feature is Beta, changes will not trigger a major release as long in beta.

`enable`: Enable or disable the spot termination watcher.
`memory_size`: Memory size linit in MB of the lambda.
`s3_key`: S3 key for syncer lambda function. Required if using S3 bucket to specify lambdas.
`s3_object_version`: S3 object version for syncer lambda function. Useful if S3 versioning is enabled on source bucket.
`timeout`: Time out of the lambda in seconds.
`zip`: File location of the lambda zip file. |
object({
enable = optional(bool, false)
enable_metrics = optional(string, null) # deprecated
memory_size = optional(number, null)
s3_key = optional(string, null)
s3_object_version = optional(string, null)
timeout = optional(number, null)
zip = optional(string, null)
})
| `{}` | no | | [key\_name](#input\_key\_name) | Key pair name | `string` | `null` | no | | [kms\_key\_arn](#input\_kms\_key\_arn) | Optional CMK Key ARN to be used for Parameter Store. | `string` | `null` | no | | [lambda\_architecture](#input\_lambda\_architecture) | AWS Lambda architecture. Lambda functions using Graviton processors ('arm64') tend to have better price/performance than 'x86\_64' functions. | `string` | `"arm64"` | no | @@ -150,7 +150,8 @@ module "multi-runner" { | [logging\_kms\_key\_id](#input\_logging\_kms\_key\_id) | Specifies the kms key id to encrypt the logs with | `string` | `null` | no | | [logging\_retention\_in\_days](#input\_logging\_retention\_in\_days) | Specifies the number of days you want to retain log events for the lambda log group. Possible values are: 0, 1, 3, 5, 7, 14, 30, 60, 90, 120, 150, 180, 365, 400, 545, 731, 1827, and 3653. | `number` | `180` | no | | [matcher\_config\_parameter\_store\_tier](#input\_matcher\_config\_parameter\_store\_tier) | The tier of the parameter store for the matcher configuration. Valid values are `Standard`, and `Advanced`. | `string` | `"Standard"` | no | -| [metrics\_namespace](#input\_metrics\_namespace) | The namespace for the metrics created by the module. Merics will only be created if explicit enabled. | `string` | `"GitHub Runners"` | no | +| [metrics](#input\_metrics) | Configuration for metrics created by the module, by default metrics are disabled to avoid additional costs. When metrics are enable all metrics are created unless explicit configured otherwise. |
object({
enable = optional(bool, false)
namespace = optional(string, "GitHub Runners")
metric = optional(object({
enable_github_app_rate_limit = optional(bool, true)
enable_job_retry = optional(bool, true)
enable_spot_termination_warning = optional(bool, true)
}), {})
})
| `{}` | no | +| [metrics\_namespace](#input\_metrics\_namespace) | The namespace for the metrics created by the module. Merics will only be created if explicit enabled. | `string` | `null` | no | | [multi\_runner\_config](#input\_multi\_runner\_config) | multi\_runner\_config = {
runner\_config: {
runner\_os: "The EC2 Operating System type to use for action runner instances (linux,windows)."
runner\_architecture: "The platform architecture of the runner instance\_type."
runner\_metadata\_options: "(Optional) Metadata options for the ec2 runner instances."
ami\_filter: "(Optional) List of maps used to create the AMI filter for the action runner AMI. By default amazon linux 2 is used."
ami\_owners: "(Optional) The list of owners used to select the AMI of action runner instances."
create\_service\_linked\_role\_spot: (Optional) create the serviced linked role for spot instances that is required by the scale-up lambda.
credit\_specification: "(Optional) The credit specification of the runner instance\_type. Can be unset, `standard` or `unlimited`.
delay\_webhook\_event: "The number of seconds the event accepted by the webhook is invisible on the queue before the scale up lambda will receive the event."
disable\_runner\_autoupdate: "Disable the auto update of the github runner agent. Be aware there is a grace period of 30 days, see also the [GitHub article](https://github.blog/changelog/2022-02-01-github-actions-self-hosted-runners-can-now-disable-automatic-updates/)"
ebs\_optimized: "The EC2 EBS optimized configuration."
enable\_ephemeral\_runners: "Enable ephemeral runners, runners will only be used once."
enable\_job\_queued\_check: "Enables JIT configuration for creating runners instead of registration token based registraton. JIT configuration will only be applied for ephemeral runners. By default JIT confiugration is enabled for ephemeral runners an can be disabled via this override. When running on GHES without support for JIT configuration this variable should be set to true for ephemeral runners."
enable\_on\_demand\_failover\_for\_errors: "Enable on-demand failover. For example to fall back to on demand when no spot capacity is available the variable can be set to `InsufficientInstanceCapacity`. When not defined the default behavior is to retry later."
enable\_organization\_runners: "Register runners to organization, instead of repo level"
enable\_runner\_binaries\_syncer: "Option to disable the lambda to sync GitHub runner distribution, useful when using a pre-build AMI."
enable\_ssm\_on\_runners: "Enable to allow access the runner instances for debugging purposes via SSM. Note that this adds additional permissions to the runner instances."
enable\_userdata: "Should the userdata script be enabled for the runner. Set this to false if you are using your own prebuilt AMI."
instance\_allocation\_strategy: "The allocation strategy for spot instances. AWS recommends to use `capacity-optimized` however the AWS default is `lowest-price`."
instance\_max\_spot\_price: "Max price price for spot intances per hour. This variable will be passed to the create fleet as max spot price for the fleet."
instance\_target\_capacity\_type: "Default lifecycle used for runner instances, can be either `spot` or `on-demand`."
instance\_types: "List of instance types for the action runner. Defaults are based on runner\_os (al2023 for linux and Windows Server Core for win)."
job\_queue\_retention\_in\_seconds: "The number of seconds the job is held in the queue before it is purged"
minimum\_running\_time\_in\_minutes: "The time an ec2 action runner should be running at minimum before terminated if not busy."
pool\_runner\_owner: "The pool will deploy runners to the GitHub org ID, set this value to the org to which you want the runners deployed. Repo level is not supported."
runner\_additional\_security\_group\_ids: "List of additional security groups IDs to apply to the runner. If added outside the multi\_runner\_config block, the additional security group(s) will be applied to all runner configs. If added inside the multi\_runner\_config, the additional security group(s) will be applied to the individual runner."
runner\_as\_root: "Run the action runner under the root user. Variable `runner_run_as` will be ignored."
runner\_boot\_time\_in\_minutes: "The minimum time for an EC2 runner to boot and register as a runner."
runner\_extra\_labels: "Extra (custom) labels for the runners (GitHub). Separate each label by a comma. Labels checks on the webhook can be enforced by setting `multi_runner_config.matcherConfig.exactMatch`. GitHub read-only labels should not be provided."
runner\_group\_name: "Name of the runner group."
runner\_name\_prefix: "Prefix for the GitHub runner name."
runner\_run\_as: "Run the GitHub actions agent as user."
runners\_maximum\_count: "The maximum number of runners that will be created. Setting the variable to `-1` desiables the maximum check."
scale\_down\_schedule\_expression: "Scheduler expression to check every x for scale down."
scale\_up\_reserved\_concurrent\_executions: "Amount of reserved concurrent executions for the scale-up lambda function. A value of 0 disables lambda from being triggered and -1 removes any concurrency limitations."
userdata\_template: "Alternative user-data template, replacing the default template. By providing your own user\_data you have to take care of installing all required software, including the action runner. Variables userdata\_pre/post\_install are ignored."
enable\_jit\_config "Overwrite the default behavior for JIT configuration. By default JIT configuration is enabled for ephemeral runners and disabled for non-ephemeral runners. In case of GHES check first if the JIT config API is avaialbe. In case you upgradeing from 3.x to 4.x you can set `enable_jit_config` to `false` to avoid a breaking change when having your own AMI."
enable\_runner\_detailed\_monitoring: "Should detailed monitoring be enabled for the runner. Set this to true if you want to use detailed monitoring. See https://docs.aws.amazon.com/AWSEC2/latest/UserGuide/using-cloudwatch-new.html for details."
enable\_cloudwatch\_agent: "Enabling the cloudwatch agent on the ec2 runner instances, the runner contains default config. Configuration can be overridden via `cloudwatch_config`."
cloudwatch\_config: "(optional) Replaces the module default cloudwatch log config. See https://docs.aws.amazon.com/AmazonCloudWatch/latest/monitoring/CloudWatch-Agent-Configuration-File-Details.html for details."
userdata\_pre\_install: "Script to be ran before the GitHub Actions runner is installed on the EC2 instances"
userdata\_post\_install: "Script to be ran after the GitHub Actions runner is installed on the EC2 instances"
runner\_ec2\_tags: "Map of tags that will be added to the launch template instance tag specifications."
runner\_iam\_role\_managed\_policy\_arns: "Attach AWS or customer-managed IAM policies (by ARN) to the runner IAM role"
vpc\_id: "The VPC for security groups of the action runners. If not set uses the value of `var.vpc_id`."
subnet\_ids: "List of subnets in which the action runners will be launched, the subnets needs to be subnets in the `vpc_id`. If not set, uses the value of `var.subnet_ids`."
idle\_config: "List of time period that can be defined as cron expression to keep a minimum amount of runners active instead of scaling down to 0. By defining this list you can ensure that in time periods that match the cron expression within 5 seconds a runner is kept idle."
runner\_log\_files: "(optional) Replaces the module default cloudwatch log config. See https://docs.aws.amazon.com/AmazonCloudWatch/latest/monitoring/CloudWatch-Agent-Configuration-File-Details.html for details."
block\_device\_mappings: "The EC2 instance block device configuration. Takes the following keys: `device_name`, `delete_on_termination`, `volume_type`, `volume_size`, `encrypted`, `iops`, `throughput`, `kms_key_id`, `snapshot_id`."
job\_retry: "Experimental! Can be removed / changed without trigger a major release. Configure job retries. The configuration enables job retries (for ephemeral runners). After creating the insances a message will be published to a job retry queue. The job retry check lambda is checking after a delay if the job is queued. If not the message will be published again on the scale-up (build queue). Using this feature can impact the reate limit of the GitHub app."
pool\_config: "The configuration for updating the pool. The `pool_size` to adjust to by the events triggered by the `schedule_expression`. For example you can configure a cron expression for week days to adjust the pool to 10 and another expression for the weekend to adjust the pool to 1. Use `schedule_expression_timezone` to override the schedule time zone (defaults to UTC)."
}
matcherConfig: {
labelMatchers: "The list of list of labels supported by the runner configuration. `[[self-hosted, linux, x64, example]]`"
exactMatch: "If set to true all labels in the workflow job must match the GitHub labels (os, architecture and `self-hosted`). When false if __any__ workflow label matches it will trigger the webhook."
priority: "If set it defines the priority of the matcher, the matcher with the lowest priority will be evaluated first. Default is 999, allowed values 0-999."
}
fifo: "Enable a FIFO queue to remain the order of events received by the webhook. Suggest to set to true for repo level runners."
redrive\_build\_queue: "Set options to attach (optional) a dead letter queue to the build queue, the queue between the webhook and the scale up lambda. You have the following options. 1. Disable by setting `enabled` to false. 2. Enable by setting `enabled` to `true`, `maxReceiveCount` to a number of max retries."
} |
map(object({
runner_config = object({
runner_os = string
runner_architecture = string
runner_metadata_options = optional(map(any), {
instance_metadata_tags = "enabled"
http_endpoint = "enabled"
http_tokens = "required"
http_put_response_hop_limit = 1
})
ami_filter = optional(map(list(string)), { state = ["available"] })
ami_owners = optional(list(string), ["amazon"])
ami_id_ssm_parameter_name = optional(string, null)
ami_kms_key_arn = optional(string, "")
create_service_linked_role_spot = optional(bool, false)
credit_specification = optional(string, null)
delay_webhook_event = optional(number, 30)
disable_runner_autoupdate = optional(bool, false)
ebs_optimized = optional(bool, false)
enable_ephemeral_runners = optional(bool, false)
enable_job_queued_check = optional(bool, null)
enable_on_demand_failover_for_errors = optional(list(string), [])
enable_organization_runners = optional(bool, false)
enable_runner_binaries_syncer = optional(bool, true)
enable_ssm_on_runners = optional(bool, false)
enable_userdata = optional(bool, true)
instance_allocation_strategy = optional(string, "lowest-price")
instance_max_spot_price = optional(string, null)
instance_target_capacity_type = optional(string, "spot")
instance_types = list(string)
job_queue_retention_in_seconds = optional(number, 86400)
minimum_running_time_in_minutes = optional(number, null)
pool_runner_owner = optional(string, null)
runner_as_root = optional(bool, false)
runner_boot_time_in_minutes = optional(number, 5)
runner_extra_labels = optional(list(string), [])
runner_group_name = optional(string, "Default")
runner_name_prefix = optional(string, "")
runner_run_as = optional(string, "ec2-user")
runners_maximum_count = number
runner_additional_security_group_ids = optional(list(string), [])
scale_down_schedule_expression = optional(string, "cron(*/5 * * * ? *)")
scale_up_reserved_concurrent_executions = optional(number, 1)
userdata_template = optional(string, null)
userdata_content = optional(string, null)
enable_jit_config = optional(bool, null)
enable_runner_detailed_monitoring = optional(bool, false)
enable_cloudwatch_agent = optional(bool, true)
cloudwatch_config = optional(string, null)
userdata_pre_install = optional(string, "")
userdata_post_install = optional(string, "")
runner_ec2_tags = optional(map(string), {})
runner_iam_role_managed_policy_arns = optional(list(string), [])
vpc_id = optional(string, null)
subnet_ids = optional(list(string), null)
idle_config = optional(list(object({
cron = string
timeZone = string
idleCount = number
evictionStrategy = optional(string, "oldest_first")
})), [])
runner_log_files = optional(list(object({
log_group_name = string
prefix_log_group = bool
file_path = string
log_stream_name = string
})), null)
block_device_mappings = optional(list(object({
delete_on_termination = optional(bool, true)
device_name = optional(string, "/dev/xvda")
encrypted = optional(bool, true)
iops = optional(number)
kms_key_id = optional(string)
snapshot_id = optional(string)
throughput = optional(number)
volume_size = number
volume_type = optional(string, "gp3")
})), [{
volume_size = 30
}])
pool_config = optional(list(object({
schedule_expression = string
schedule_expression_timezone = optional(string)
size = number
})), [])
job_retry = optional(object({
enable = optional(bool, false)
delay_in_seconds = optional(number, 300)
delay_backoff = optional(number, 2)
lambda_memory_size = optional(number, 256)
lambda_timeout = optional(number, 30)
max_attempts = optional(number, 1)
}), {})
})
matcherConfig = object({
labelMatchers = list(list(string))
exactMatch = optional(bool, false)
priority = optional(number, 999)
})
fifo = optional(bool, false)
redrive_build_queue = optional(object({
enabled = bool
maxReceiveCount = number
}), {
enabled = false
maxReceiveCount = null
})
}))
| n/a | yes | | [pool\_lambda\_reserved\_concurrent\_executions](#input\_pool\_lambda\_reserved\_concurrent\_executions) | Amount of reserved concurrent executions for the scale-up lambda function. A value of 0 disables lambda from being triggered and -1 removes any concurrency limitations. | `number` | `1` | no | | [pool\_lambda\_timeout](#input\_pool\_lambda\_timeout) | Time out for the pool lambda in seconds. | `number` | `60` | no | diff --git a/modules/multi-runner/runners.tf b/modules/multi-runner/runners.tf index 0fdfe272b4..abea1cf3ee 100644 --- a/modules/multi-runner/runners.tf +++ b/modules/multi-runner/runners.tf @@ -114,8 +114,5 @@ module "runners" { job_retry = each.value.runner_config.job_retry - metrics_config = { - namespace = var.metrics_namespace - enable = var.enable_metrics_control_plane - } + metrics = var.metrics } diff --git a/modules/multi-runner/termination-watcher.tf b/modules/multi-runner/termination-watcher.tf index 8481c50399..f317b66adf 100644 --- a/modules/multi-runner/termination-watcher.tf +++ b/modules/multi-runner/termination-watcher.tf @@ -13,10 +13,10 @@ locals { logging_retention_in_days = var.logging_retention_in_days role_path = var.role_path role_permissions_boundary = var.role_permissions_boundary - metrics_namespace = var.metrics_namespace s3_bucket = var.lambda_s3_bucket tracing_config = var.tracing_config lambda_tags = var.lambda_tags + metrics = var.metrics } } diff --git a/modules/multi-runner/variables.deprecated.tf b/modules/multi-runner/variables.deprecated.tf new file mode 100644 index 0000000000..006af01810 --- /dev/null +++ b/modules/multi-runner/variables.deprecated.tf @@ -0,0 +1,23 @@ +# tflint-ignore: terraform_unused_declarations +variable "enable_metrics_control_plane" { + description = "(Experimental) Enable or disable the metrics for the module. Feature can change or renamed without a major release." + type = bool + default = false + + validation { + condition = var.enable_metrics_control_plane == false + error_message = "The feature `enable_metrics_control_plane` is deprecated and will be removed in a future release. Please use the `metrics` variable instead." + } +} + +# tflint-ignore: terraform_unused_declarations +variable "metrics_namespace" { + description = "The namespace for the metrics created by the module. Merics will only be created if explicit enabled." + type = string + default = null + + validation { + condition = var.metrics_namespace == null + error_message = "The variable `metrics_namespace` is deprecated, use `metrics.namespace` instead." + } +} diff --git a/modules/multi-runner/variables.tf b/modules/multi-runner/variables.tf index b19561309f..fc1edefcf9 100644 --- a/modules/multi-runner/variables.tf +++ b/modules/multi-runner/variables.tf @@ -621,18 +621,11 @@ variable "runners_ssm_housekeeper" { default = { config = {} } } -variable "metrics_namespace" { - description = "The namespace for the metrics created by the module. Merics will only be created if explicit enabled." - type = string - default = "GitHub Runners" -} - variable "instance_termination_watcher" { description = <<-EOF Configuration for the spot termination watcher lambda function. This feature is Beta, changes will not trigger a major release as long in beta. `enable`: Enable or disable the spot termination watcher. - 'enable_metrics': Enable metric for the lambda. If `spot_warning` is set to true, the lambda will emit a metric when it detects a spot termination warning. `memory_size`: Memory size linit in MB of the lambda. `s3_key`: S3 key for syncer lambda function. Required if using S3 bucket to specify lambdas. `s3_object_version`: S3 object version for syncer lambda function. Useful if S3 versioning is enabled on source bucket. @@ -641,10 +634,8 @@ variable "instance_termination_watcher" { EOF type = object({ - enable = optional(bool, false) - enable_metric = optional(object({ - spot_warning = optional(bool, false) - })) + enable = optional(bool, false) + enable_metrics = optional(string, null) # deprecated memory_size = optional(number, null) s3_key = optional(string, null) s3_object_version = optional(string, null) @@ -652,6 +643,11 @@ variable "instance_termination_watcher" { zip = optional(string, null) }) default = {} + + validation { + condition = var.instance_termination_watcher.enable_metrics == null + error_message = "The feature `instance_termination_watcher` is deprecated and will be removed in a future release. Please use the `termination_watcher` variable instead." + } } variable "lambda_tags" { @@ -670,8 +666,16 @@ variable "matcher_config_parameter_store_tier" { } } -variable "enable_metrics_control_plane" { - description = "(Experimental) Enable or disable the metrics for the module. Feature can change or renamed without a major release." - type = bool - default = false +variable "metrics" { + description = "Configuration for metrics created by the module, by default metrics are disabled to avoid additional costs. When metrics are enable all metrics are created unless explicit configured otherwise." + type = object({ + enable = optional(bool, false) + namespace = optional(string, "GitHub Runners") + metric = optional(object({ + enable_github_app_rate_limit = optional(bool, true) + enable_job_retry = optional(bool, true) + enable_spot_termination_warning = optional(bool, true) + }), {}) + }) + default = {} } diff --git a/modules/runners/README.md b/modules/runners/README.md index 15d0290982..23ac6522cc 100644 --- a/modules/runners/README.md +++ b/modules/runners/README.md @@ -184,7 +184,7 @@ yarn run dist | [logging\_kms\_key\_id](#input\_logging\_kms\_key\_id) | Specifies the kms key id to encrypt the logs with | `string` | `null` | no | | [logging\_retention\_in\_days](#input\_logging\_retention\_in\_days) | Specifies the number of days you want to retain log events for the lambda log group. Possible values are: 0, 1, 3, 5, 7, 14, 30, 60, 90, 120, 150, 180, 365, 400, 545, 731, 1827, and 3653. | `number` | `180` | no | | [metadata\_options](#input\_metadata\_options) | Metadata options for the ec2 runner instances. By default, the module uses metadata tags for bootstrapping the runner, only disable `instance_metadata_tags` when using custom scripts for starting the runner. | `map(any)` |
{
"http_endpoint": "enabled",
"http_put_response_hop_limit": 1,
"http_tokens": "required",
"instance_metadata_tags": "enabled"
}
| no | -| [metrics\_config](#input\_metrics\_config) | Configuraiton to enable metrics creation by the lambdas. |
object({
enable = optional(bool, false)
namespace = optional(string, null)
})
| `{}` | no | +| [metrics](#input\_metrics) | Configuration for metrics created by the module, by default metrics are disabled to avoid additional costs. When metrics are enable all metrics are created unless explicit configured otherwise. |
object({
enable = optional(bool, false)
namespace = optional(string, "GitHub Runners")
metric = optional(object({
enable_github_app_rate_limit = optional(bool, true)
enable_job_retry = optional(bool, true)
enable_spot_termination_warning = optional(bool, true)
}), {})
})
| `{}` | no | | [minimum\_running\_time\_in\_minutes](#input\_minimum\_running\_time\_in\_minutes) | The time an ec2 action runner should be running at minimum before terminated if non busy. If not set the default is calculated based on the OS. | `number` | `null` | no | | [overrides](#input\_overrides) | This map provides the possibility to override some defaults. The following attributes are supported: `name_sg` overrides the `Name` tag for all security groups created by this module. `name_runner_agent_instance` overrides the `Name` tag for the ec2 instance defined in the auto launch configuration. `name_docker_machine_runners` overrides the `Name` tag spot instances created by the runner agent. | `map(string)` |
{
"name_runner": "",
"name_sg": ""
}
| no | | [pool\_config](#input\_pool\_config) | The configuration for updating the pool. The `pool_size` to adjust to by the events triggered by the `schedule_expression`. For example you can configure a cron expression for week days to adjust the pool to 10 and another expression for the weekend to adjust the pool to 1. Use `schedule_expression_timezone ` to override the schedule time zone (defaults to UTC). |
list(object({
schedule_expression = string
schedule_expression_timezone = optional(string)
size = number
}))
| `[]` | no | diff --git a/modules/runners/job-retry.tf b/modules/runners/job-retry.tf index 47edd1076c..99596e2a3c 100644 --- a/modules/runners/job-retry.tf +++ b/modules/runners/job-retry.tf @@ -15,7 +15,7 @@ locals { log_level = var.log_level logging_kms_key_id = var.logging_kms_key_id logging_retention_in_days = var.logging_retention_in_days - metrics_config = var.metrics_config + metrics = var.metrics role_path = var.role_path role_permissions_boundary = var.role_permissions_boundary s3_bucket = var.lambda_s3_bucket diff --git a/modules/runners/job-retry/README.md b/modules/runners/job-retry/README.md index d0e2f81bf9..cc00035fb3 100644 --- a/modules/runners/job-retry/README.md +++ b/modules/runners/job-retry/README.md @@ -42,7 +42,7 @@ The module is an inner module and used by the runner module when the opt-in feat | Name | Description | Type | Default | Required | |------|-------------|------|---------|:--------:| -| [config](#input\_config) | Configuration for the spot termination watcher lambda function.

`aws_partition`: Partition for the base arn if not 'aws'
`architecture`: AWS Lambda architecture. Lambda functions using Graviton processors ('arm64') tend to have better price/performance than 'x86\_64' functions.
`environment_variables`: Environment variables for the lambda.
`enable_metric`: Enable metric for the lambda. If `spot_warning` is set to true, the lambda will emit a metric when it detects a spot termination warning.
'ghes\_url': Optional GitHub Enterprise Server URL.
'github\_app\_parameters': Parameter Store for GitHub App Parameters.
'kms\_key\_arn': Optional CMK Key ARN instead of using the default AWS managed key.
`lambda_principals`: Add extra principals to the role created for execution of the lambda, e.g. for local testing.
`lambda_tags`: Map of tags that will be added to created resources. By default resources will be tagged with name and environment.
`log_level`: Logging level for lambda logging. Valid values are 'silly', 'trace', 'debug', 'info', 'warn', 'error', 'fatal'.
`logging_kms_key_id`: Specifies the kms key id to encrypt the logs with
`logging_retention_in_days`: Specifies the number of days you want to retain log events for the lambda log group. Possible values are: 0, 1, 3, 5, 7, 14, 30, 60, 90, 120, 150, 180, 365, 400, 545, 731, 1827, and 3653.
`memory_size`: Memory size linit in MB of the lambda.
`metrics_config`: Configuraiton to enable metrics creation by the lambda.
`prefix`: The prefix used for naming resources.
`role_path`: The path that will be added to the role, if not set the environment name will be used.
`role_permissions_boundary`: Permissions boundary that will be added to the created role for the lambda.
`runtime`: AWS Lambda runtime.
`s3_bucket`: S3 bucket from which to specify lambda functions. This is an alternative to providing local files directly.
`s3_key`: S3 key for syncer lambda function. Required if using S3 bucket to specify lambdas.
`s3_object_version`: S3 object version for syncer lambda function. Useful if S3 versioning is enabled on source bucket.
`security_group_ids`: List of security group IDs associated with the Lambda function.
'sqs\_build\_queue': SQS queue for build events to re-publish job request.
`subnet_ids`: List of subnets in which the action runners will be launched, the subnets needs to be subnets in the `vpc_id`.
`tag_filters`: Map of tags that will be used to filter the resources to be tracked. Only for which all tags are present and starting with the same value as the value in the map will be tracked.
`tags`: Map of tags that will be added to created resources. By default resources will be tagged with name and environment.
`timeout`: Time out of the lambda in seconds.
`tracing_config`: Configuration for lambda tracing.
`zip`: File location of the lambda zip file. |
object({
aws_partition = optional(string, null)
architecture = optional(string, null)
enable_organization_runners = bool
environment_variables = optional(map(string), {})
ghes_url = optional(string, null)
github_app_parameters = object({
key_base64 = map(string)
id = map(string)
})
kms_key_arn = optional(string, null)
lambda_tags = optional(map(string), {})
log_level = optional(string, null)
logging_kms_key_id = optional(string, null)
logging_retention_in_days = optional(number, null)
memory_size = optional(number, null)
metrics_config = optional(object({
enable = optional(bool, false)
namespace = optional(string, null)
}), {})
prefix = optional(string, null)
principals = optional(list(object({
type = string
identifiers = list(string)
})), [])
queue_encryption = optional(object({
kms_data_key_reuse_period_seconds = optional(number, null)
kms_master_key_id = optional(string, null)
sqs_managed_sse_enabled = optional(bool, true)
}), {})
role_path = optional(string, null)
role_permissions_boundary = optional(string, null)
runtime = optional(string, null)
s3_bucket = optional(string, null)
s3_key = optional(string, null)
s3_object_version = optional(string, null)
sqs_build_queue = object({
url = string
arn = string
})
tags = optional(map(string), {})
timeout = optional(number, 30)
tracing_config = optional(object({
mode = optional(string, null)
capture_http_requests = optional(bool, false)
capture_error = optional(bool, false)
}), {})
zip = optional(string, null)
})
| n/a | yes | +| [config](#input\_config) | Configuration for the spot termination watcher lambda function.

`aws_partition`: Partition for the base arn if not 'aws'
`architecture`: AWS Lambda architecture. Lambda functions using Graviton processors ('arm64') tend to have better price/performance than 'x86\_64' functions.
`environment_variables`: Environment variables for the lambda.
`enable_metric`: Enable metric for the lambda. If `spot_warning` is set to true, the lambda will emit a metric when it detects a spot termination warning.
'ghes\_url': Optional GitHub Enterprise Server URL.
'github\_app\_parameters': Parameter Store for GitHub App Parameters.
'kms\_key\_arn': Optional CMK Key ARN instead of using the default AWS managed key.
`lambda_principals`: Add extra principals to the role created for execution of the lambda, e.g. for local testing.
`lambda_tags`: Map of tags that will be added to created resources. By default resources will be tagged with name and environment.
`log_level`: Logging level for lambda logging. Valid values are 'silly', 'trace', 'debug', 'info', 'warn', 'error', 'fatal'.
`logging_kms_key_id`: Specifies the kms key id to encrypt the logs with
`logging_retention_in_days`: Specifies the number of days you want to retain log events for the lambda log group. Possible values are: 0, 1, 3, 5, 7, 14, 30, 60, 90, 120, 150, 180, 365, 400, 545, 731, 1827, and 3653.
`memory_size`: Memory size linit in MB of the lambda.
`metrics_config`: Configuraiton to enable metrics creation by the lambda.
`prefix`: The prefix used for naming resources.
`role_path`: The path that will be added to the role, if not set the environment name will be used.
`role_permissions_boundary`: Permissions boundary that will be added to the created role for the lambda.
`runtime`: AWS Lambda runtime.
`s3_bucket`: S3 bucket from which to specify lambda functions. This is an alternative to providing local files directly.
`s3_key`: S3 key for syncer lambda function. Required if using S3 bucket to specify lambdas.
`s3_object_version`: S3 object version for syncer lambda function. Useful if S3 versioning is enabled on source bucket.
`security_group_ids`: List of security group IDs associated with the Lambda function.
'sqs\_build\_queue': SQS queue for build events to re-publish job request.
`subnet_ids`: List of subnets in which the action runners will be launched, the subnets needs to be subnets in the `vpc_id`.
`tag_filters`: Map of tags that will be used to filter the resources to be tracked. Only for which all tags are present and starting with the same value as the value in the map will be tracked.
`tags`: Map of tags that will be added to created resources. By default resources will be tagged with name and environment.
`timeout`: Time out of the lambda in seconds.
`tracing_config`: Configuration for lambda tracing.
`zip`: File location of the lambda zip file. |
object({
aws_partition = optional(string, null)
architecture = optional(string, null)
enable_organization_runners = bool
environment_variables = optional(map(string), {})
ghes_url = optional(string, null)
github_app_parameters = object({
key_base64 = map(string)
id = map(string)
})
kms_key_arn = optional(string, null)
lambda_tags = optional(map(string), {})
log_level = optional(string, null)
logging_kms_key_id = optional(string, null)
logging_retention_in_days = optional(number, null)
memory_size = optional(number, null)
metrics = optional(object({
enable = optional(bool, false)
namespace = optional(string, null)
metric = optional(object({
enable_github_app_rate_limit = optional(bool, true)
enable_job_retry = optional(bool, true)
}), {})
}), {})
prefix = optional(string, null)
principals = optional(list(object({
type = string
identifiers = list(string)
})), [])
queue_encryption = optional(object({
kms_data_key_reuse_period_seconds = optional(number, null)
kms_master_key_id = optional(string, null)
sqs_managed_sse_enabled = optional(bool, true)
}), {})
role_path = optional(string, null)
role_permissions_boundary = optional(string, null)
runtime = optional(string, null)
s3_bucket = optional(string, null)
s3_key = optional(string, null)
s3_object_version = optional(string, null)
sqs_build_queue = object({
url = string
arn = string
})
tags = optional(map(string), {})
timeout = optional(number, 30)
tracing_config = optional(object({
mode = optional(string, null)
capture_http_requests = optional(bool, false)
capture_error = optional(bool, false)
}), {})
zip = optional(string, null)
})
| n/a | yes | ## Outputs diff --git a/modules/runners/job-retry/main.tf b/modules/runners/job-retry/main.tf index 32b50f8298..a223611f0b 100644 --- a/modules/runners/job-retry/main.tf +++ b/modules/runners/job-retry/main.tf @@ -4,7 +4,8 @@ locals { environment_variables = { ENABLE_ORGANIZATION_RUNNERS = var.config.enable_organization_runners - ENABLE_METRICS = var.config.metrics_config.enable + ENABLE_METRIC_JOB_RETRY = var.config.metrics.enable && var.config.metrics.metric.enable_job_retry + ENABLE_METRIC_GITHUB_APP_RATE_LIMIT = var.config.metrics.enable && var.config.metrics.metric.enable_github_app_rate_limit GHES_URL = var.config.ghes_url JOB_QUEUE_SCALE_UP_URL = var.config.sqs_build_queue.url PARAMETER_GITHUB_APP_ID_NAME = var.config.github_app_parameters.id.name @@ -16,7 +17,7 @@ locals { handler = "index.jobRetryCheck", zip = local.lambda_zip, environment_variables = local.environment_variables - metrics_namespace = var.config.metrics_config.namespace + metrics_namespace = var.config.metrics.namespace }) } diff --git a/modules/runners/job-retry/variables.tf b/modules/runners/job-retry/variables.tf index 1255cceb1c..475944b0ed 100644 --- a/modules/runners/job-retry/variables.tf +++ b/modules/runners/job-retry/variables.tf @@ -48,9 +48,13 @@ variable "config" { logging_kms_key_id = optional(string, null) logging_retention_in_days = optional(number, null) memory_size = optional(number, null) - metrics_config = optional(object({ + metrics = optional(object({ enable = optional(bool, false) namespace = optional(string, null) + metric = optional(object({ + enable_github_app_rate_limit = optional(bool, true) + enable_job_retry = optional(bool, true) + }), {}) }), {}) prefix = optional(string, null) principals = optional(list(object({ diff --git a/modules/runners/scale-down.tf b/modules/runners/scale-down.tf index 08138dcf3e..60e3d47ecb 100644 --- a/modules/runners/scale-down.tf +++ b/modules/runners/scale-down.tf @@ -23,6 +23,7 @@ resource "aws_lambda_function" "scale_down" { environment { variables = { ENVIRONMENT = var.prefix + ENABLE_METRIC_GITHUB_APP_RATE_LIMIT = var.metrics.enable && var.metrics.metric.enable_github_app_rate_limit GHES_URL = var.ghes_url LOG_LEVEL = var.log_level MINIMUM_RUNNING_TIME_IN_MINUTES = coalesce(var.minimum_running_time_in_minutes, local.min_runtime_defaults[var.runner_os]) @@ -33,6 +34,7 @@ resource "aws_lambda_function" "scale_down" { RUNNER_BOOT_TIME_IN_MINUTES = var.runner_boot_time_in_minutes SCALE_DOWN_CONFIG = jsonencode(var.idle_config) POWERTOOLS_SERVICE_NAME = "runners-scale-down" + POWERTOOLS_METRICS_NAMESPACE = var.metrics.namespace POWERTOOLS_TRACE_ENABLED = var.tracing_config.mode != null ? true : false POWERTOOLS_TRACER_CAPTURE_HTTPS_REQUESTS = var.tracing_config.capture_http_requests POWERTOOLS_TRACER_CAPTURE_ERROR = var.tracing_config.capture_error diff --git a/modules/runners/scale-up.tf b/modules/runners/scale-up.tf index 0e962ad973..9ab6b4c57e 100644 --- a/modules/runners/scale-up.tf +++ b/modules/runners/scale-up.tf @@ -30,6 +30,7 @@ resource "aws_lambda_function" "scale_up" { ENABLE_EPHEMERAL_RUNNERS = var.enable_ephemeral_runners ENABLE_JIT_CONFIG = var.enable_jit_config ENABLE_JOB_QUEUED_CHECK = local.enable_job_queued_check + ENABLE_METRIC_GITHUB_APP_RATE_LIMIT = var.metrics.enable && var.metrics.metric.enable_github_app_rate_limit ENABLE_ORGANIZATION_RUNNERS = var.enable_organization_runners ENVIRONMENT = var.prefix GHES_URL = var.ghes_url @@ -44,6 +45,7 @@ resource "aws_lambda_function" "scale_up" { PARAMETER_GITHUB_APP_ID_NAME = var.github_app_parameters.id.name PARAMETER_GITHUB_APP_KEY_BASE64_NAME = var.github_app_parameters.key_base64.name POWERTOOLS_LOGGER_LOG_EVENT = var.log_level == "debug" ? "true" : "false" + POWERTOOLS_METRICS_NAMESPACE = var.metrics.namespace POWERTOOLS_TRACE_ENABLED = var.tracing_config.mode != null ? true : false POWERTOOLS_TRACER_CAPTURE_HTTPS_REQUESTS = var.tracing_config.capture_http_requests POWERTOOLS_TRACER_CAPTURE_ERROR = var.tracing_config.capture_error diff --git a/modules/runners/variables.tf b/modules/runners/variables.tf index 3fe2fa40b6..1c84bf9c02 100644 --- a/modules/runners/variables.tf +++ b/modules/runners/variables.tf @@ -670,11 +670,16 @@ variable "lambda_tags" { default = {} } -variable "metrics_config" { - description = "Configuraiton to enable metrics creation by the lambdas." +variable "metrics" { + description = "Configuration for metrics created by the module, by default metrics are disabled to avoid additional costs. When metrics are enable all metrics are created unless explicit configured otherwise." type = object({ enable = optional(bool, false) - namespace = optional(string, null) + namespace = optional(string, "GitHub Runners") + metric = optional(object({ + enable_github_app_rate_limit = optional(bool, true) + enable_job_retry = optional(bool, true) + enable_spot_termination_warning = optional(bool, true) + }), {}) }) default = {} } diff --git a/modules/termination-watcher/README.md b/modules/termination-watcher/README.md index fc6326b003..1735b11ccf 100644 --- a/modules/termination-watcher/README.md +++ b/modules/termination-watcher/README.md @@ -88,7 +88,7 @@ yarn run dist | Name | Description | Type | Default | Required | |------|-------------|------|---------|:--------:| -| [config](#input\_config) | Configuration for the spot termination watcher lambda function.

`aws_partition`: Partition for the base arn if not 'aws'
`architecture`: AWS Lambda architecture. Lambda functions using Graviton processors ('arm64') tend to have better price/performance than 'x86\_64' functions.
`environment_variables`: Environment variables for the lambda.
`enable_metric`: Enable metric for the lambda. If `spot_warning` is set to true, the lambda will emit a metric when it detects a spot termination warning.
`lambda_principals`: Add extra principals to the role created for execution of the lambda, e.g. for local testing.
`lambda_tags`: Map of tags that will be added to created resources. By default resources will be tagged with name and environment.
`log_level`: Logging level for lambda logging. Valid values are 'silly', 'trace', 'debug', 'info', 'warn', 'error', 'fatal'.
`logging_kms_key_id`: Specifies the kms key id to encrypt the logs with
`logging_retention_in_days`: Specifies the number of days you want to retain log events for the lambda log group. Possible values are: 0, 1, 3, 5, 7, 14, 30, 60, 90, 120, 150, 180, 365, 400, 545, 731, 1827, and 3653.
`memory_size`: Memory size linit in MB of the lambda.
`metrics_namespace`: Namespace for the metrics emitted by the lambda.
`prefix`: The prefix used for naming resources.
`role_path`: The path that will be added to the role, if not set the environment name will be used.
`role_permissions_boundary`: Permissions boundary that will be added to the created role for the lambda.
`runtime`: AWS Lambda runtime.
`s3_bucket`: S3 bucket from which to specify lambda functions. This is an alternative to providing local files directly.
`s3_key`: S3 key for syncer lambda function. Required if using S3 bucket to specify lambdas.
`s3_object_version`: S3 object version for syncer lambda function. Useful if S3 versioning is enabled on source bucket.
`security_group_ids`: List of security group IDs associated with the Lambda function.
`subnet_ids`: List of subnets in which the action runners will be launched, the subnets needs to be subnets in the `vpc_id`.
`tag_filters`: Map of tags that will be used to filter the resources to be tracked. Only for which all tags are present and starting with the same value as the value in the map will be tracked.
`tags`: Map of tags that will be added to created resources. By default resources will be tagged with name and environment.
`timeout`: Time out of the lambda in seconds.
`tracing_config`: Configuration for lambda tracing.
`zip`: File location of the lambda zip file. |
object({
aws_partition = optional(string, null)
architecture = optional(string, null)
enable_metric = optional(object({
spot_warning = optional(bool, false)
}))
environment_variables = optional(map(string), {})
lambda_tags = optional(map(string), {})
log_level = optional(string, null)
logging_kms_key_id = optional(string, null)
logging_retention_in_days = optional(number, null)
memory_size = optional(number, null)
metrics_namespace = optional(string, null)
prefix = optional(string, null)
principals = optional(list(object({
type = string
identifiers = list(string)
})), [])
role_path = optional(string, null)
role_permissions_boundary = optional(string, null)
runtime = optional(string, null)
s3_bucket = optional(string, null)
s3_key = optional(string, null)
s3_object_version = optional(string, null)
security_group_ids = optional(list(string), [])
subnet_ids = optional(list(string), [])
tag_filters = optional(map(string), null)
tags = optional(map(string), {})
timeout = optional(number, null)
tracing_config = optional(object({
mode = optional(string, null)
capture_http_requests = optional(bool, false)
capture_error = optional(bool, false)
}), {})
zip = optional(string, null)
})
| n/a | yes | +| [config](#input\_config) | Configuration for the spot termination watcher lambda function.

`aws_partition`: Partition for the base arn if not 'aws'
`architecture`: AWS Lambda architecture. Lambda functions using Graviton processors ('arm64') tend to have better price/performance than 'x86\_64' functions.
`environment_variables`: Environment variables for the lambda.
`lambda_principals`: Add extra principals to the role created for execution of the lambda, e.g. for local testing.
`lambda_tags`: Map of tags that will be added to created resources. By default resources will be tagged with name and environment.
`log_level`: Logging level for lambda logging. Valid values are 'silly', 'trace', 'debug', 'info', 'warn', 'error', 'fatal'.
`logging_kms_key_id`: Specifies the kms key id to encrypt the logs with
`logging_retention_in_days`: Specifies the number of days you want to retain log events for the lambda log group. Possible values are: 0, 1, 3, 5, 7, 14, 30, 60, 90, 120, 150, 180, 365, 400, 545, 731, 1827, and 3653.
`memory_size`: Memory size linit in MB of the lambda.
`prefix`: The prefix used for naming resources.
`role_path`: The path that will be added to the role, if not set the environment name will be used.
`role_permissions_boundary`: Permissions boundary that will be added to the created role for the lambda.
`runtime`: AWS Lambda runtime.
`s3_bucket`: S3 bucket from which to specify lambda functions. This is an alternative to providing local files directly.
`s3_key`: S3 key for syncer lambda function. Required if using S3 bucket to specify lambdas.
`s3_object_version`: S3 object version for syncer lambda function. Useful if S3 versioning is enabled on source bucket.
`security_group_ids`: List of security group IDs associated with the Lambda function.
`subnet_ids`: List of subnets in which the action runners will be launched, the subnets needs to be subnets in the `vpc_id`.
`tag_filters`: Map of tags that will be used to filter the resources to be tracked. Only for which all tags are present and starting with the same value as the value in the map will be tracked.
`tags`: Map of tags that will be added to created resources. By default resources will be tagged with name and environment.
`timeout`: Time out of the lambda in seconds.
`tracing_config`: Configuration for lambda tracing.
`zip`: File location of the lambda zip file. |
object({
aws_partition = optional(string, null)
architecture = optional(string, null)
enable_metric = optional(string, null)
environment_variables = optional(map(string), {})
lambda_tags = optional(map(string), {})
log_level = optional(string, null)
logging_kms_key_id = optional(string, null)
logging_retention_in_days = optional(number, null)
memory_size = optional(number, null)
metrics = optional(object({
enable = optional(bool, false)
namespace = optional(string, "GitHub Runners")
metric = optional(object({
enable_spot_termination_warning = optional(bool, true)
}), {})
}), {})
prefix = optional(string, null)
principals = optional(list(object({
type = string
identifiers = list(string)
})), [])
role_path = optional(string, null)
role_permissions_boundary = optional(string, null)
runtime = optional(string, null)
s3_bucket = optional(string, null)
s3_key = optional(string, null)
s3_object_version = optional(string, null)
security_group_ids = optional(list(string), [])
subnet_ids = optional(list(string), [])
tag_filters = optional(map(string), null)
tags = optional(map(string), {})
timeout = optional(number, null)
tracing_config = optional(object({
mode = optional(string, null)
capture_http_requests = optional(bool, false)
capture_error = optional(bool, false)
}), {})
zip = optional(string, null)
})
| n/a | yes | ## Outputs diff --git a/modules/termination-watcher/main.tf b/modules/termination-watcher/main.tf index 48e03fd6e3..acf41f83be 100644 --- a/modules/termination-watcher/main.tf +++ b/modules/termination-watcher/main.tf @@ -3,11 +3,17 @@ locals { name = "spot-termination-watcher" environment_variables = { - ENABLE_METRICS_SPOT_WARNING = var.config.enable_metric != null ? var.config.enable_metric.spot_warning : false + ENABLE_METRICS_SPOT_WARNING = var.config.metrics != null ? var.config.metrics.enable && var.config.metrics.metric.enable_spot_termination_warning : false TAG_FILTERS = jsonencode(var.config.tag_filters) } - config = merge(var.config, { name = local.name, handler = "index.interruptionWarning", zip = local.lambda_zip, environment_variables = local.environment_variables }) + config = merge(var.config, { + name = local.name, + handler = "index.interruptionWarning", + zip = local.lambda_zip, + environment_variables = local.environment_variables + metrics_namespace = var.config.metrics.namespace + }) } module "termination_warning_watcher" { diff --git a/modules/termination-watcher/variables.tf b/modules/termination-watcher/variables.tf index e343dc3445..968a35908f 100644 --- a/modules/termination-watcher/variables.tf +++ b/modules/termination-watcher/variables.tf @@ -5,14 +5,12 @@ variable "config" { `aws_partition`: Partition for the base arn if not 'aws' `architecture`: AWS Lambda architecture. Lambda functions using Graviton processors ('arm64') tend to have better price/performance than 'x86_64' functions. `environment_variables`: Environment variables for the lambda. - `enable_metric`: Enable metric for the lambda. If `spot_warning` is set to true, the lambda will emit a metric when it detects a spot termination warning. `lambda_principals`: Add extra principals to the role created for execution of the lambda, e.g. for local testing. `lambda_tags`: Map of tags that will be added to created resources. By default resources will be tagged with name and environment. `log_level`: Logging level for lambda logging. Valid values are 'silly', 'trace', 'debug', 'info', 'warn', 'error', 'fatal'. `logging_kms_key_id`: Specifies the kms key id to encrypt the logs with `logging_retention_in_days`: Specifies the number of days you want to retain log events for the lambda log group. Possible values are: 0, 1, 3, 5, 7, 14, 30, 60, 90, 120, 150, 180, 365, 400, 545, 731, 1827, and 3653. `memory_size`: Memory size linit in MB of the lambda. - `metrics_namespace`: Namespace for the metrics emitted by the lambda. `prefix`: The prefix used for naming resources. `role_path`: The path that will be added to the role, if not set the environment name will be used. `role_permissions_boundary`: Permissions boundary that will be added to the created role for the lambda. @@ -29,19 +27,23 @@ variable "config" { `zip`: File location of the lambda zip file. EOF type = object({ - aws_partition = optional(string, null) - architecture = optional(string, null) - enable_metric = optional(object({ - spot_warning = optional(bool, false) - })) + aws_partition = optional(string, null) + architecture = optional(string, null) + enable_metric = optional(string, null) environment_variables = optional(map(string), {}) lambda_tags = optional(map(string), {}) log_level = optional(string, null) logging_kms_key_id = optional(string, null) logging_retention_in_days = optional(number, null) memory_size = optional(number, null) - metrics_namespace = optional(string, null) - prefix = optional(string, null) + metrics = optional(object({ + enable = optional(bool, false) + namespace = optional(string, "GitHub Runners") + metric = optional(object({ + enable_spot_termination_warning = optional(bool, true) + }), {}) + }), {}) + prefix = optional(string, null) principals = optional(list(object({ type = string identifiers = list(string) @@ -64,4 +66,9 @@ variable "config" { }), {}) zip = optional(string, null) }) + + validation { + condition = var.config.enable_metric == null + error_message = "enable_metric is deprecated, use metrics.enable instead." + } } diff --git a/variables.deprecated.tf b/variables.deprecated.tf index 9e4af44a81..c26c614510 100644 --- a/variables.deprecated.tf +++ b/variables.deprecated.tf @@ -27,4 +27,29 @@ variable "runners_scale_up_Lambda_memory_size" { description = "Memory size limit in MB for scale-up lambda." type = number default = null -} \ No newline at end of file +} + +# tflint-ignore: terraform_unused_declarations +variable "enable_metrics_control_plane" { + description = "(Experimental) Enable or disable the metrics for the module. Feature can change or renamed without a major release." + type = bool + default = null + + # depcreated + validation { + condition = var.enable_metrics_control_plane == null + error_message = "The variable `enable_metrics_control_plane` is deprecated, use `metrics.enabled` instead." + } +} + +# tflint-ignore: terraform_unused_declarations +variable "metrics_namespace" { + description = "The namespace for the metrics created by the module. Merics will only be created if explicit enabled." + type = string + default = null + + validation { + condition = var.metrics_namespace == null + error_message = "The variable `metrics_namespace` is deprecated, use `metrics.namespace` instead." + } +} diff --git a/variables.tf b/variables.tf index 7491296a3a..68b64fd8a7 100644 --- a/variables.tf +++ b/variables.tf @@ -862,10 +862,18 @@ variable "runners_ssm_housekeeper" { default = { config = {} } } -variable "metrics_namespace" { - description = "The namespace for the metrics created by the module. Merics will only be created if explicit enabled." - type = string - default = "GitHub Runners" +variable "metrics" { + description = "Configuration for metrics created by the module, by default disabled to avoid additional costs. When metrics are enable all metrics are created unless explicit configured otherwise." + type = object({ + enable = optional(bool, false) + namespace = optional(string, "GitHub Runners") + metric = optional(object({ + enable_github_app_rate_limit = optional(bool, true) + enable_job_retry = optional(bool, true) + enable_spot_termination_warning = optional(bool, true) + }), {}) + }) + default = {} } variable "instance_termination_watcher" { @@ -873,7 +881,6 @@ variable "instance_termination_watcher" { Configuration for the instance termination watcher. This feature is Beta, changes will not trigger a major release as long in beta. `enable`: Enable or disable the spot termination watcher. - `enable_metrics`: Enable or disable the metrics for the spot termination watcher. `memory_size`: Memory size linit in MB of the lambda. `s3_key`: S3 key for syncer lambda function. Required if using S3 bucket to specify lambdas. `s3_object_version`: S3 object version for syncer lambda function. Useful if S3 versioning is enabled on source bucket. @@ -882,10 +889,8 @@ variable "instance_termination_watcher" { EOF type = object({ - enable = optional(bool, false) - enable_metric = optional(object({ - spot_warning = optional(bool, false) - })) + enable = optional(bool, false) + enable_metric = optional(string, null) # deprectaed memory_size = optional(number, null) s3_key = optional(string, null) s3_object_version = optional(string, null) @@ -893,6 +898,11 @@ variable "instance_termination_watcher" { zip = optional(string, null) }) default = {} + + validation { + condition = var.instance_termination_watcher.enable_metric == null + error_message = "The variable `instance_termination_watcher.enable_metric` is deprecated, use `metrics` instead." + } } variable "runners_ebs_optimized" { @@ -907,12 +917,6 @@ variable "lambda_tags" { default = {} } -variable "enable_metrics_control_plane" { - description = "(Experimental) Enable or disable the metrics for the module. Feature can change or renamed without a major release." - type = bool - default = false -} - variable "job_retry" { description = <<-EOF Experimental! Can be removed / changed without trigger a major release.Configure job retries. The configuration enables job retries (for ephemeral runners). After creating the insances a message will be published to a job retry queue. The job retry check lambda is checking after a delay if the job is queued. If not the message will be published again on the scale-up (build queue). Using this feature can impact the reate limit of the GitHub app.