Skip to content
This repository was archived by the owner on Jan 16, 2025. It is now read-only.

Commit d7cdaed

Browse files
npalmforest-pr[bot]stuartp44
authored
feat: Add metric to track GitHub app rate limit (#4088)
## Description This PR adds an optional metric to keep track of the remaining rate limit for teh GItHub app. ## Notes - Refactored the metric configuration to align the metric configuration usages in all submodules. All changed are only impacting experimental features. Which means non breaking. - Refactored nameing gh-auth package, see separate commit. --------- Co-authored-by: forest-pr|bot <forest-pr[bot]@users.noreply.github.com> Co-authored-by: Stuart Pearson <[email protected]>
1 parent 9fc5dbc commit d7cdaed

Some content is hidden

Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.

44 files changed

+441
-182
lines changed

Diff for: README.md

+4-3
Original file line numberDiff line numberDiff line change
@@ -147,7 +147,7 @@ Talk to the forestkeepers in the `runners-channel` on Slack.
147147
| <a name="input_enable_jit_config"></a> [enable\_jit\_config](#input\_enable\_jit\_config) | Overwrite the default behavior for JIT configuration. By default JIT configuration is enabled for ephemeral runners and disabled for non-ephemeral runners. In case of GHES check first if the JIT config API is avaialbe. In case you upgradeing from 3.x to 4.x you can set `enable_jit_config` to `false` to avoid a breaking change when having your own AMI. | `bool` | `null` | no |
148148
| <a name="input_enable_job_queued_check"></a> [enable\_job\_queued\_check](#input\_enable\_job\_queued\_check) | Only scale if the job event received by the scale up lambda is in the queued state. By default enabled for non ephemeral runners and disabled for ephemeral. Set this variable to overwrite the default behavior. | `bool` | `null` | no |
149149
| <a name="input_enable_managed_runner_security_group"></a> [enable\_managed\_runner\_security\_group](#input\_enable\_managed\_runner\_security\_group) | Enables creation of the default managed security group. Unmanaged security groups can be specified via `runner_additional_security_group_ids`. | `bool` | `true` | no |
150-
| <a name="input_enable_metrics_control_plane"></a> [enable\_metrics\_control\_plane](#input\_enable\_metrics\_control\_plane) | (Experimental) Enable or disable the metrics for the module. Feature can change or renamed without a major release. | `bool` | `false` | no |
150+
| <a name="input_enable_metrics_control_plane"></a> [enable\_metrics\_control\_plane](#input\_enable\_metrics\_control\_plane) | (Experimental) Enable or disable the metrics for the module. Feature can change or renamed without a major release. | `bool` | `null` | no |
151151
| <a name="input_enable_organization_runners"></a> [enable\_organization\_runners](#input\_enable\_organization\_runners) | Register runners to organization, instead of repo level | `bool` | `false` | no |
152152
| <a name="input_enable_runner_binaries_syncer"></a> [enable\_runner\_binaries\_syncer](#input\_enable\_runner\_binaries\_syncer) | Option to disable the lambda to sync GitHub runner distribution, useful when using a pre-build AMI. | `bool` | `true` | no |
153153
| <a name="input_enable_runner_detailed_monitoring"></a> [enable\_runner\_detailed\_monitoring](#input\_enable\_runner\_detailed\_monitoring) | Should detailed monitoring be enabled for the runner. Set this to true if you want to use detailed monitoring. See https://docs.aws.amazon.com/AWSEC2/latest/UserGuide/using-cloudwatch-new.html for details. | `bool` | `false` | no |
@@ -165,7 +165,7 @@ Talk to the forestkeepers in the `runners-channel` on Slack.
165165
| <a name="input_instance_max_spot_price"></a> [instance\_max\_spot\_price](#input\_instance\_max\_spot\_price) | Max price price for spot instances per hour. This variable will be passed to the create fleet as max spot price for the fleet. | `string` | `null` | no |
166166
| <a name="input_instance_profile_path"></a> [instance\_profile\_path](#input\_instance\_profile\_path) | The path that will be added to the instance\_profile, if not set the environment name will be used. | `string` | `null` | no |
167167
| <a name="input_instance_target_capacity_type"></a> [instance\_target\_capacity\_type](#input\_instance\_target\_capacity\_type) | Default lifecycle used for runner instances, can be either `spot` or `on-demand`. | `string` | `"spot"` | no |
168-
| <a name="input_instance_termination_watcher"></a> [instance\_termination\_watcher](#input\_instance\_termination\_watcher) | Configuration for the instance termination watcher. This feature is Beta, changes will not trigger a major release as long in beta.<br><br>`enable`: Enable or disable the spot termination watcher.<br>`enable_metrics`: Enable or disable the metrics for the spot termination watcher.<br>`memory_size`: Memory size linit in MB of the lambda.<br>`s3_key`: S3 key for syncer lambda function. Required if using S3 bucket to specify lambdas.<br>`s3_object_version`: S3 object version for syncer lambda function. Useful if S3 versioning is enabled on source bucket.<br>`timeout`: Time out of the lambda in seconds.<br>`zip`: File location of the lambda zip file. | <pre>object({<br> enable = optional(bool, false)<br> enable_metric = optional(object({<br> spot_warning = optional(bool, false)<br> }))<br> memory_size = optional(number, null)<br> s3_key = optional(string, null)<br> s3_object_version = optional(string, null)<br> timeout = optional(number, null)<br> zip = optional(string, null)<br> })</pre> | `{}` | no |
168+
| <a name="input_instance_termination_watcher"></a> [instance\_termination\_watcher](#input\_instance\_termination\_watcher) | Configuration for the instance termination watcher. This feature is Beta, changes will not trigger a major release as long in beta.<br><br>`enable`: Enable or disable the spot termination watcher.<br>`memory_size`: Memory size linit in MB of the lambda.<br>`s3_key`: S3 key for syncer lambda function. Required if using S3 bucket to specify lambdas.<br>`s3_object_version`: S3 object version for syncer lambda function. Useful if S3 versioning is enabled on source bucket.<br>`timeout`: Time out of the lambda in seconds.<br>`zip`: File location of the lambda zip file. | <pre>object({<br> enable = optional(bool, false)<br> enable_metric = optional(string, null) # deprectaed<br> memory_size = optional(number, null)<br> s3_key = optional(string, null)<br> s3_object_version = optional(string, null)<br> timeout = optional(number, null)<br> zip = optional(string, null)<br> })</pre> | `{}` | no |
169169
| <a name="input_instance_types"></a> [instance\_types](#input\_instance\_types) | List of instance types for the action runner. Defaults are based on runner\_os (al2023 for linux and Windows Server Core for win). | `list(string)` | <pre>[<br> "m5.large",<br> "c5.large"<br>]</pre> | no |
170170
| <a name="input_job_queue_retention_in_seconds"></a> [job\_queue\_retention\_in\_seconds](#input\_job\_queue\_retention\_in\_seconds) | The number of seconds the job is held in the queue before it is purged. | `number` | `86400` | no |
171171
| <a name="input_job_retry"></a> [job\_retry](#input\_job\_retry) | Experimental! Can be removed / changed without trigger a major release.Configure job retries. The configuration enables job retries (for ephemeral runners). After creating the insances a message will be published to a job retry queue. The job retry check lambda is checking after a delay if the job is queued. If not the message will be published again on the scale-up (build queue). Using this feature can impact the reate limit of the GitHub app.<br><br>`enable`: Enable or disable the job retry feature.<br>`delay_in_seconds`: The delay in seconds before the job retry check lambda will check the job status.<br>`delay_backoff`: The backoff factor for the delay.<br>`lambda_memory_size`: Memory size limit in MB for the job retry check lambda.<br>`lambda_timeout`: Time out of the job retry check lambda in seconds.<br>`max_attempts`: The maximum number of attempts to retry the job. | <pre>object({<br> enable = optional(bool, false)<br> delay_in_seconds = optional(number, 300)<br> delay_backoff = optional(number, 2)<br> lambda_memory_size = optional(number, 256)<br> lambda_timeout = optional(number, 30)<br> max_attempts = optional(number, 1)<br> })</pre> | `{}` | no |
@@ -183,7 +183,8 @@ Talk to the forestkeepers in the `runners-channel` on Slack.
183183
| <a name="input_logging_kms_key_id"></a> [logging\_kms\_key\_id](#input\_logging\_kms\_key\_id) | Specifies the kms key id to encrypt the logs with. | `string` | `null` | no |
184184
| <a name="input_logging_retention_in_days"></a> [logging\_retention\_in\_days](#input\_logging\_retention\_in\_days) | Specifies the number of days you want to retain log events for the lambda log group. Possible values are: 0, 1, 3, 5, 7, 14, 30, 60, 90, 120, 150, 180, 365, 400, 545, 731, 1827, and 3653. | `number` | `180` | no |
185185
| <a name="input_matcher_config_parameter_store_tier"></a> [matcher\_config\_parameter\_store\_tier](#input\_matcher\_config\_parameter\_store\_tier) | The tier of the parameter store for the matcher configuration. Valid values are `Standard`, and `Advanced`. | `string` | `"Standard"` | no |
186-
| <a name="input_metrics_namespace"></a> [metrics\_namespace](#input\_metrics\_namespace) | The namespace for the metrics created by the module. Merics will only be created if explicit enabled. | `string` | `"GitHub Runners"` | no |
186+
| <a name="input_metrics"></a> [metrics](#input\_metrics) | Configuration for metrics created by the module, by default disabled to avoid additional costs. When metrics are enable all metrics are created unless explicit configured otherwise. | <pre>object({<br> enable = optional(bool, false)<br> namespace = optional(string, "GitHub Runners")<br> metric = optional(object({<br> enable_github_app_rate_limit = optional(bool, true)<br> enable_job_retry = optional(bool, true)<br> enable_spot_termination_warning = optional(bool, true)<br> }), {})<br> })</pre> | `{}` | no |
187+
| <a name="input_metrics_namespace"></a> [metrics\_namespace](#input\_metrics\_namespace) | The namespace for the metrics created by the module. Merics will only be created if explicit enabled. | `string` | `null` | no |
187188
| <a name="input_minimum_running_time_in_minutes"></a> [minimum\_running\_time\_in\_minutes](#input\_minimum\_running\_time\_in\_minutes) | The time an ec2 action runner should be running at minimum before terminated, if not busy. | `number` | `null` | no |
188189
| <a name="input_pool_config"></a> [pool\_config](#input\_pool\_config) | The configuration for updating the pool. The `pool_size` to adjust to by the events triggered by the `schedule_expression`. For example you can configure a cron expression for weekdays to adjust the pool to 10 and another expression for the weekend to adjust the pool to 1. Use `schedule_expression_timezone` to override the schedule time zone (defaults to UTC). | <pre>list(object({<br> schedule_expression = string<br> schedule_expression_timezone = optional(string)<br> size = number<br> }))</pre> | `[]` | no |
189190
| <a name="input_pool_lambda_memory_size"></a> [pool\_lambda\_memory\_size](#input\_pool\_lambda\_memory\_size) | Memory size limit for scale-up lambda. | `number` | `512` | no |

Diff for: docs/configuration.md

+9
Original file line numberDiff line numberDiff line change
@@ -191,6 +191,15 @@ This feature has been disabled by default.
191191

192192
The watcher will act on all spot termination notificatins and log all onses relevant to the runner module. Therefor we suggest to only deploy the watcher once. You can either deploy the watcher by enabling in one of your deployments or deploy the watcher as a stand alone module.
193193

194+
## Metrics
195+
196+
The module supports metrics (experimental feature) to monitor the system. The metrics are disabled by default. To enable the metrics set `metrics.enable = true`. If set to true, all module managed metrics are used, you can configure the one by one via the `metrics` object. The metrics are created in the namespace `GitHub Runners`.
197+
198+
### Supported metrics
199+
200+
- **GitHubAppRateLimitRemaining**: Remaining rate limit for the GitHub App.
201+
- **JobRetry**: Number of job retries, only relevant when job retry is enabled.
202+
- **SpotInterruptionWarning**: Number of spot interruption warnings received by the termination watcher, only relevant when the termination watcher is enabled.
194203

195204
## Debugging
196205

Diff for: examples/default/main.tf

+11-7
Original file line numberDiff line numberDiff line change
@@ -114,21 +114,25 @@ module "runners" {
114114

115115
instance_termination_watcher = {
116116
enable = true
117-
enable_metric = {
118-
spot_warning = true
119-
}
120117
}
121118

122-
# enable job_retry feature. Be careful with this feature, it can lead to API rate limits.
119+
# enable metric creation (experimental)
120+
# metrics = {
121+
# enable = true
122+
# metric = {
123+
# enable_spot_termination_warning = true
124+
# enable_job_retry = false
125+
# enable_github_app_rate_limit = true
126+
# }
127+
# }
128+
129+
# enable job_retry feature. Be careful with this feature, it can lead to you hitting API rate limits.
123130
# job_retry = {
124131
# enable = true
125132
# max_attempts = 1
126133
# delay_in_seconds = 180
127134
# }
128135

129-
# enable metric creation by the control plane (experimental)
130-
# enable_metrics_control_plane = true
131-
132136
# enable CMK instead of aws managed key for encryptions
133137
# kms_key_arn = aws_kms_key.github.arn
134138
}

Diff for: examples/multi-runner/main.tf

+9-2
Original file line numberDiff line numberDiff line change
@@ -103,8 +103,15 @@ module "runners" {
103103
# Enable to track the spot instance termination warning
104104
# instance_termination_watcher = {
105105
# enable = true
106-
# enable_metric = {
107-
# spot_warning = true
106+
# }
107+
108+
# Enable metrics
109+
# metrics = {
110+
# enable = true
111+
# metric = {
112+
# enable_github_app_rate_limit = true
113+
# enable_job_retry = false
114+
# enable_spot_termination_warning = true
108115
# }
109116
# }
110117
}

Diff for: examples/termination-watcher/main.tf

+6-3
Original file line numberDiff line numberDiff line change
@@ -2,12 +2,15 @@ module "spot_termination_watchter" {
22
source = "../../modules/termination-watcher"
33

44
config = {
5-
enable_metric = {
6-
spot_warning = true
5+
metrics = {
6+
enable = true
7+
metric = {
8+
enable_spot_termination_warning = true
9+
}
710
}
811
prefix = "global"
912
tag_filters = {
1013
"ghr:Application" = "github-action-runner"
1114
}
1215
}
13-
}
16+
}

Diff for: lambdas/functions/control-plane/jest.config.ts

+4-4
Original file line numberDiff line numberDiff line change
@@ -6,10 +6,10 @@ const config: Config = {
66
...defaultConfig,
77
coverageThreshold: {
88
global: {
9-
statements: 97.78,
10-
branches: 96.61,
11-
functions: 95.84,
12-
lines: 97.71,
9+
statements: 97.86,
10+
branches: 96.68,
11+
functions: 95.95,
12+
lines: 97.8,
1313
},
1414
},
1515
};

Diff for: lambdas/functions/control-plane/package.json

+1
Original file line numberDiff line numberDiff line change
@@ -38,6 +38,7 @@
3838
"ts-node-dev": "^2.0.0"
3939
},
4040
"dependencies": {
41+
"@aws-lambda-powertools/parameters": "^2.7.0",
4142
"@aws-sdk/client-ec2": "^3.637.0",
4243
"@aws-sdk/client-sqs": "^3.637.0",
4344
"@aws-sdk/types": "^3.609.0",

Diff for: lambdas/functions/control-plane/src/gh-auth/gh-auth.test.ts renamed to lambdas/functions/control-plane/src/github/auth.test.ts

+1-1
Original file line numberDiff line numberDiff line change
@@ -7,7 +7,7 @@ import { mocked } from 'jest-mock';
77
import { MockProxy, mock } from 'jest-mock-extended';
88
import nock from 'nock';
99

10-
import { createGithubAppAuth, createOctokitClient } from './gh-auth';
10+
import { createGithubAppAuth, createOctokitClient } from './auth';
1111

1212
jest.mock('@terraform-aws-github-runner/aws-ssm-util');
1313
jest.mock('@octokit/auth-app');

Diff for: lambdas/functions/control-plane/src/gh-auth/gh-octokit.test.ts renamed to lambdas/functions/control-plane/src/github/octokit.test.ts

+3-3
Original file line numberDiff line numberDiff line change
@@ -1,6 +1,6 @@
11
import { Octokit } from '@octokit/rest';
22
import { ActionRequestMessage } from '../scale-runners/scale-up';
3-
import { getOctokit } from './gh-octokit';
3+
import { getOctokit } from './octokit';
44

55
const mockOctokit = {
66
apps: {
@@ -9,7 +9,7 @@ const mockOctokit = {
99
},
1010
};
1111

12-
jest.mock('../gh-auth/gh-auth', () => ({
12+
jest.mock('../github/auth', () => ({
1313
createGithubInstallationAuth: jest.fn().mockImplementation(async (installationId) => {
1414
return { token: 'token', type: 'installation', installationId: installationId };
1515
}),
@@ -21,7 +21,7 @@ jest.mock('@octokit/rest', () => ({
2121
Octokit: jest.fn().mockImplementation(() => mockOctokit),
2222
}));
2323

24-
jest.mock('../gh-auth/gh-auth');
24+
jest.mock('../github/auth');
2525

2626
describe('Test getOctokit', () => {
2727
const data = [

Diff for: lambdas/functions/control-plane/src/gh-auth/gh-octokit.ts renamed to lambdas/functions/control-plane/src/github/octokit.ts

+1-1
Original file line numberDiff line numberDiff line change
@@ -1,6 +1,6 @@
11
import { Octokit } from '@octokit/rest';
22
import { ActionRequestMessage } from '../scale-runners/scale-up';
3-
import { createGithubAppAuth, createGithubInstallationAuth, createOctokitClient } from './gh-auth';
3+
import { createGithubAppAuth, createGithubInstallationAuth, createOctokitClient } from './auth';
44

55
export async function getInstallationId(
66
ghesApiUrl: string,
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,70 @@
1+
import { ResponseHeaders } from '@octokit/types';
2+
import { createSingleMetric } from '@terraform-aws-github-runner/aws-powertools-util';
3+
import { MetricUnit } from '@aws-lambda-powertools/metrics';
4+
import { metricGitHubAppRateLimit } from './rate-limit';
5+
6+
process.env.PARAMETER_GITHUB_APP_ID_NAME = 'test';
7+
jest.mock('@terraform-aws-github-runner/aws-ssm-util', () => ({
8+
...jest.requireActual('@terraform-aws-github-runner/aws-ssm-util'),
9+
// get parameter name from process.env.PARAMETER_GITHUB_APP_ID_NAME rerunt 1234
10+
getParameter: jest.fn((name: string) => {
11+
if (name === process.env.PARAMETER_GITHUB_APP_ID_NAME) {
12+
return '1234';
13+
} else {
14+
return '';
15+
}
16+
}),
17+
}));
18+
19+
jest.mock('@terraform-aws-github-runner/aws-powertools-util', () => ({
20+
...jest.requireActual('@terraform-aws-github-runner/aws-powertools-util'),
21+
// eslint-disable-next-line @typescript-eslint/no-unused-vars
22+
createSingleMetric: jest.fn((name: string, unit: string, value: number, dimensions?: Record<string, string>) => {
23+
return {
24+
addMetadata: jest.fn(),
25+
};
26+
}),
27+
}));
28+
29+
describe('metricGitHubAppRateLimit', () => {
30+
beforeEach(() => {
31+
jest.clearAllMocks();
32+
});
33+
34+
it('should update rate limit metric', async () => {
35+
// set process.env.ENABLE_METRIC_GITHUB_APP_RATE_LIMIT to true
36+
process.env.ENABLE_METRIC_GITHUB_APP_RATE_LIMIT = 'true';
37+
const headers: ResponseHeaders = {
38+
'x-ratelimit-remaining': '10',
39+
'x-ratelimit-limit': '60',
40+
};
41+
42+
await metricGitHubAppRateLimit(headers);
43+
44+
expect(createSingleMetric).toHaveBeenCalledWith('GitHubAppRateLimitRemaining', MetricUnit.Count, 10, {
45+
AppId: '1234',
46+
});
47+
});
48+
49+
it('should not update rate limit metric', async () => {
50+
// set process.env.ENABLE_METRIC_GITHUB_APP_RATE_LIMIT to false
51+
process.env.ENABLE_METRIC_GITHUB_APP_RATE_LIMIT = 'false';
52+
const headers: ResponseHeaders = {
53+
'x-ratelimit-remaining': '10',
54+
'x-ratelimit-limit': '60',
55+
};
56+
57+
await metricGitHubAppRateLimit(headers);
58+
59+
expect(createSingleMetric).not.toHaveBeenCalled();
60+
});
61+
62+
it('should not update rate limit metric if headers are undefined', async () => {
63+
// set process.env.ENABLE_METRIC_GITHUB_APP_RATE_LIMIT to true
64+
process.env.ENABLE_METRIC_GITHUB_APP_RATE_LIMIT = 'true';
65+
66+
await metricGitHubAppRateLimit(undefined as unknown as ResponseHeaders);
67+
68+
expect(createSingleMetric).not.toHaveBeenCalled();
69+
});
70+
});
+25
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,25 @@
1+
import { ResponseHeaders } from '@octokit/types';
2+
import { createSingleMetric, logger } from '@terraform-aws-github-runner/aws-powertools-util';
3+
import { MetricUnit } from '@aws-lambda-powertools/metrics';
4+
import yn from 'yn';
5+
import { getParameter } from '@terraform-aws-github-runner/aws-ssm-util';
6+
7+
export async function metricGitHubAppRateLimit(headers: ResponseHeaders): Promise<void> {
8+
try {
9+
const remaining = parseInt(headers['x-ratelimit-remaining'] as string);
10+
const limit = parseInt(headers['x-ratelimit-limit'] as string);
11+
12+
logger.debug(`Rate limit remaining: ${remaining}, limit: ${limit}`);
13+
14+
const updateMetric = yn(process.env.ENABLE_METRIC_GITHUB_APP_RATE_LIMIT);
15+
if (updateMetric) {
16+
const appId = await getParameter(process.env.PARAMETER_GITHUB_APP_ID_NAME);
17+
const metric = createSingleMetric('GitHubAppRateLimitRemaining', MetricUnit.Count, remaining, {
18+
AppId: appId,
19+
});
20+
metric.addMetadata('AppId', appId);
21+
}
22+
} catch (e) {
23+
logger.debug(`Error updating rate limit metric`, { error: e });
24+
}
25+
}

Diff for: lambdas/functions/control-plane/src/modules.d.ts

+1
Original file line numberDiff line numberDiff line change
@@ -1,6 +1,7 @@
11
declare namespace NodeJS {
22
export interface ProcessEnv {
33
AWS_REGION: string;
4+
ENABLE_METRIC_GITHUB_APP_RATE_LIMIT: string;
45
ENABLE_ON_DEMAND_FAILOVER_FOR_ERRORS: string;
56
ENVIRONMENT: string;
67
GHES_URL: string;

Diff for: lambdas/functions/control-plane/src/pool/pool.test.ts

+2-2
Original file line numberDiff line numberDiff line change
@@ -4,7 +4,7 @@ import moment from 'moment-timezone';
44
import nock from 'nock';
55

66
import { listEC2Runners } from '../aws/runners';
7-
import * as ghAuth from '../gh-auth/gh-auth';
7+
import * as ghAuth from '../github/auth';
88
import { createRunners } from '../scale-runners/scale-up';
99
import { adjust } from './pool';
1010

@@ -27,7 +27,7 @@ jest.mock('./../aws/runners', () => ({
2727
...jest.requireActual('./../aws/runners'),
2828
listEC2Runners: jest.fn(),
2929
}));
30-
jest.mock('./../gh-auth/gh-auth');
30+
jest.mock('./../github/auth');
3131
jest.mock('./../scale-runners/scale-up');
3232

3333
const mocktokit = Octokit as jest.MockedClass<typeof Octokit>;

0 commit comments

Comments
 (0)