Skip to content

feat: added changes to enable tracing in lambdas. #3554

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 39 commits into from
Nov 8, 2023
Merged
Show file tree
Hide file tree
Changes from 4 commits
Commits
Show all changes
39 commits
Select commit Hold shift + click to select a range
6eabc8a
feat: added changes to enable tracing in lambdas.
Oct 20, 2023
9060e15
docs: auto update terraform docs
github-actions[bot] Oct 20, 2023
864e5ba
Merge branch 'main' into nav/enable-tracing
Oct 20, 2023
dad0312
fix: missed this file.
Oct 20, 2023
91abc18
Merge branch 'nav/enable-tracing' of github.com:philips-labs/terrafor…
Oct 20, 2023
85ae608
fix: multi runners.
Oct 21, 2023
77a0aa6
docs: auto update terraform docs
github-actions[bot] Oct 21, 2023
afdd086
fix: multi runner.
Oct 21, 2023
8138f8c
Merge branch 'nav/enable-tracing' of github.com:philips-labs/terrafor…
Oct 21, 2023
dcd9611
fix: default.
Oct 21, 2023
751443d
docs: auto update terraform docs
github-actions[bot] Oct 21, 2023
a08798d
fix: more changes.
Oct 25, 2023
926e345
docs: auto update terraform docs
github-actions[bot] Oct 25, 2023
856f7ef
fix: added tracing for github apis.
Oct 28, 2023
e4f62f2
Merge branch 'nav/enable-tracing' of github.com:philips-labs/terrafor…
Oct 28, 2023
9a306b1
docs: auto update terraform docs
github-actions[bot] Oct 28, 2023
0c8a9b1
fix: more changes.
Oct 30, 2023
696cc6d
fix: start script.
Oct 30, 2023
a5caa41
fix: added tracing config section.
Oct 31, 2023
1fb8f9e
Merge branch 'main' into nav/enable-tracing
Oct 31, 2023
ab0a1c8
docs: auto update terraform docs
github-actions[bot] Oct 31, 2023
0aab19e
fix: comments.
Oct 31, 2023
115eb42
Merge branch 'nav/enable-tracing' of github.com:philips-labs/terrafor…
Oct 31, 2023
190ff01
docs: auto update terraform docs
github-actions[bot] Oct 31, 2023
b23df30
fix: ami housekeeper.
Oct 31, 2023
3fa0e8e
docs: auto update terraform docs
github-actions[bot] Oct 31, 2023
b1e59e9
fix: ssm housekeeper.
Oct 31, 2023
93f6adb
Merge branch 'nav/enable-tracing' of github.com:philips-labs/terrafor…
Oct 31, 2023
c88e508
fix: tests.
Oct 31, 2023
fe3342d
comments.
Oct 31, 2023
930ed0d
fix: added comment.
Nov 2, 2023
f165fd0
Merge branch 'main' into nav/enable-tracing
npalm Nov 3, 2023
1677f9a
fix: comments.
Nov 6, 2023
916b2c8
Merge branch 'nav/enable-tracing' of github.com:philips-labs/terrafor…
Nov 6, 2023
a6d6df9
fix: comments.
Nov 6, 2023
d7a5319
docs: auto update terraform docs
github-actions[bot] Nov 6, 2023
ed7f05b
fix: comments.
Nov 7, 2023
e9ffa1a
Merge branch 'main' into nav/enable-tracing
npalm Nov 8, 2023
1ed4561
Merge branch 'main' into nav/enable-tracing
Nov 8, 2023
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
12 changes: 12 additions & 0 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -32,6 +32,7 @@ This [Terraform](https://www.terraform.io/) module creates the required infrastr
- [Examples](#examples)
- [Sub modules](#sub-modules)
- [Logging](#logging)
- [Tracing](#tracing)
- [Debugging](#debugging)
- [Security Considerations](#security-considerations)
- [Requirements](#requirements)
Expand Down Expand Up @@ -427,6 +428,16 @@ An example log message of the scale-up function:
}
}
```
## Tracing
For the distributed architecture of this application it can be difficult to troubleshoot this application.
We support the option to enable tracing for all the lambda functions created by this application. To enable tracing user can simply provide the `tracing_config` option inside the root module or inner modules.

This tracing config generates timelines for following events:
- Basic lifecycle of lambda function
- Traces for Github API calls (can be configured by capture_http_requests).
- Traces for all AWS SDK calls



## Debugging

Expand Down Expand Up @@ -543,6 +554,7 @@ We welcome any improvement to the standard module to make the default as secure
| <a name="input_lambda_s3_bucket"></a> [lambda\_s3\_bucket](#input\_lambda\_s3\_bucket) | S3 bucket from which to specify lambda functions. This is an alternative to providing local files directly. | `string` | `null` | no |
| <a name="input_lambda_security_group_ids"></a> [lambda\_security\_group\_ids](#input\_lambda\_security\_group\_ids) | List of security group IDs associated with the Lambda function. | `list(string)` | `[]` | no |
| <a name="input_lambda_subnet_ids"></a> [lambda\_subnet\_ids](#input\_lambda\_subnet\_ids) | List of subnets in which the action runners will be launched, the subnets needs to be subnets in the `vpc_id`. | `list(string)` | `[]` | no |
| <a name="input_lambda_tracing_mode"></a> [lambda\_tracing\_mode](#input\_lambda\_tracing\_mode) | DEPRECATED: Replaced by `tracing_config`. | `string` | `null` | no |
| <a name="input_log_level"></a> [log\_level](#input\_log\_level) | Logging level for lambda logging. Valid values are 'silly', 'trace', 'debug', 'info', 'warn', 'error', 'fatal'. | `string` | `"info"` | no |
| <a name="input_logging_kms_key_id"></a> [logging\_kms\_key\_id](#input\_logging\_kms\_key\_id) | Specifies the kms key id to encrypt the logs with. | `string` | `null` | no |
| <a name="input_logging_retention_in_days"></a> [logging\_retention\_in\_days](#input\_logging\_retention\_in\_days) | Specifies the number of days you want to retain log events for the lambda log group. Possible values are: 0, 1, 3, 5, 7, 14, 30, 60, 90, 120, 150, 180, 365, 400, 545, 731, 1827, and 3653. | `number` | `180` | no |
Expand Down
2 changes: 1 addition & 1 deletion lambdas/functions/control-plane/src/aws/runners.d.ts
Original file line number Diff line number Diff line change
Expand Up @@ -39,5 +39,5 @@ export interface RunnerInputParameters {
};
numberOfRunners?: number;
amiIdSsmParameterName?: string;
runnerTracingEnabled?: boolean;
tracingEnabled?: boolean;
}
12 changes: 6 additions & 6 deletions lambdas/functions/control-plane/src/aws/runners.test.ts
Original file line number Diff line number Diff line change
Expand Up @@ -240,10 +240,10 @@ describe('create runner', () => {
it('calls create fleet of 1 instance with runner tracing enabled', async () => {
tracer.getRootXrayTraceId = jest.fn().mockReturnValue('123');

await createRunner(createRunnerConfig({ ...defaultRunnerConfig, runnerTracingEnabled: true }));
await createRunner(createRunnerConfig({ ...defaultRunnerConfig, tracingEnabled: true }));

expect(mockEC2Client).toHaveReceivedCommandWith(CreateFleetCommand, {
...expectedCreateFleetRequest({ ...defaultExpectedFleetRequestValues, runnerTracingEnabled: true }),
...expectedCreateFleetRequest({ ...defaultExpectedFleetRequestValues, tracingEnabled: true }),
});
});
});
Expand Down Expand Up @@ -360,7 +360,7 @@ interface RunnerConfig {
allocationStrategy: SpotAllocationStrategy;
maxSpotPrice?: string;
amiIdSsmParameterName?: string;
runnerTracingEnabled?: boolean;
tracingEnabled?: boolean;
}

function createRunnerConfig(runnerConfig: RunnerConfig): RunnerInputParameters {
Expand All @@ -377,7 +377,7 @@ function createRunnerConfig(runnerConfig: RunnerConfig): RunnerInputParameters {
},
subnets: ['subnet-123', 'subnet-456'],
amiIdSsmParameterName: runnerConfig.amiIdSsmParameterName,
runnerTracingEnabled: runnerConfig.runnerTracingEnabled,
tracingEnabled: runnerConfig.tracingEnabled,
};
}

Expand All @@ -388,7 +388,7 @@ interface ExpectedFleetRequestValues {
maxSpotPrice?: string;
totalTargetCapacity: number;
imageId?: string;
runnerTracingEnabled?: boolean;
tracingEnabled?: boolean;
}

function expectedCreateFleetRequest(expectedValues: ExpectedFleetRequestValues): CreateFleetCommandInput {
Expand All @@ -398,7 +398,7 @@ function expectedCreateFleetRequest(expectedValues: ExpectedFleetRequestValues):
{ Key: 'ghr:Type', Value: expectedValues.type },
{ Key: 'ghr:Owner', Value: REPO_NAME },
];
if (expectedValues.runnerTracingEnabled) {
if (expectedValues.tracingEnabled) {
const traceId = tracer.getRootXrayTraceId();
tags.push({ Key: 'ghr:trace_id', Value: traceId! });
}
Expand Down
2 changes: 1 addition & 1 deletion lambdas/functions/control-plane/src/aws/runners.ts
Original file line number Diff line number Diff line change
Expand Up @@ -154,7 +154,7 @@ export async function createRunner(runnerParameters: Runners.RunnerInputParamete
{ Key: 'ghr:Owner', Value: runnerParameters.runnerOwner },
];

if (runnerParameters.runnerTracingEnabled) {
if (runnerParameters.tracingEnabled) {
const traceId = tracer.getRootXrayTraceId();
tags.push({ Key: 'ghr:trace_id', Value: traceId! });
}
Expand Down
1 change: 1 addition & 0 deletions lambdas/functions/control-plane/src/lambda.ts
Original file line number Diff line number Diff line change
Expand Up @@ -59,6 +59,7 @@ export const addMiddleware = () => {
middy(scaleUpHandler).use(handler);
middy(scaleDownHandler).use(handler);
middy(adjustPool).use(handler);
middy(ssmHousekeeper).use(handler);
};
addMiddleware();

Expand Down
4 changes: 2 additions & 2 deletions lambdas/functions/control-plane/src/pool/pool.ts
Original file line number Diff line number Diff line change
Expand Up @@ -36,7 +36,7 @@ export async function adjust(event: PoolEvent): Promise<void> {
const instanceAllocationStrategy = process.env.INSTANCE_ALLOCATION_STRATEGY || 'lowest-price'; // same as AWS default
const runnerOwner = process.env.RUNNER_OWNER;
const amiIdSsmParameterName = process.env.AMI_ID_SSM_PARAMETER_NAME;
const runnerTracingEnabled = yn(process.env.POWERTOOLS_TRACE_ENABLED, { default: false });
const tracingEnabled = yn(process.env.POWERTOOLS_TRACE_ENABLED, { default: false });

let ghesApiUrl = '';
if (ghesBaseUrl) {
Expand Down Expand Up @@ -119,7 +119,7 @@ export async function adjust(event: PoolEvent): Promise<void> {
subnets,
numberOfRunners: topUp,
amiIdSsmParameterName,
runnerTracingEnabled,
tracingEnabled,
},
githubInstallationClient,
);
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -77,7 +77,7 @@ const EXPECTED_RUNNER_PARAMS: RunnerInputParameters = {
instanceAllocationStrategy: 'lowest-price',
},
subnets: ['subnet-123'],
runnerTracingEnabled: false,
tracingEnabled: false,
};
let expectedRunnerParams: RunnerInputParameters;

Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -50,7 +50,7 @@ interface CreateEC2RunnerConfig {
ec2instanceCriteria: RunnerInputParameters['ec2instanceCriteria'];
numberOfRunners?: number;
amiIdSsmParameterName?: string;
runnerTracingEnabled?: boolean;
tracingEnabled?: boolean;
}

function generateRunnerServiceConfig(githubRunnerConfig: CreateGitHubRunnerConfig, token: string) {
Expand Down Expand Up @@ -236,7 +236,7 @@ export async function scaleUp(eventSource: string, payload: ActionRequestMessage
const amiIdSsmParameterName = process.env.AMI_ID_SSM_PARAMETER_NAME;
const runnerNamePrefix = process.env.RUNNER_NAME_PREFIX || '';
const ssmConfigPath = process.env.SSM_CONFIG_PATH || '';
const runnerTracingEnabled = yn(process.env.POWERTOOLS_TRACE_ENABLED, { default: false });
const tracingEnabled = yn(process.env.POWERTOOLS_TRACE_ENABLED, { default: false });

if (ephemeralEnabled && payload.eventType !== 'workflow_job') {
logger.warn(`${payload.eventType} event is not supported in combination with ephemeral runners.`);
Expand Down Expand Up @@ -306,7 +306,7 @@ export async function scaleUp(eventSource: string, payload: ActionRequestMessage
launchTemplateName,
subnets,
amiIdSsmParameterName,
runnerTracingEnabled,
tracingEnabled,
},
githubInstallationClient,
);
Expand Down
9 changes: 1 addition & 8 deletions modules/runners/templates/start-runner.sh
Original file line number Diff line number Diff line change
Expand Up @@ -65,13 +65,6 @@ cleanup() {

if [ "$exit_code" -ne 0 ]; then
echo "ERROR: runner-start-failed with exit code $exit_code occurred on $error_location"
# Create a CloudWatch metric for error
aws cloudwatch put-metric-data \
--metric-name "RunnerInstanceUnhealthy" \
--namespace "Github Runners metrics" \
--value 1 \
--region "$region" \
--dimensions "InstanceId=$instance_id"
create_xray_error_segment "$SEGMENT" "runner-start-failed with exit code $exit_code occurred on $error_location - $error_lineno"
else
create_xray_success_segment "$SEGMENT"
Expand Down Expand Up @@ -160,7 +153,7 @@ if [[ "$xray_trace_id" != "" ]]; then
# run xray service
curl https://s3.us-east-2.amazonaws.com/aws-xray-assets.us-east-2/xray-daemon/aws-xray-daemon-linux-3.x.zip -o aws-xray-daemon-linux-3.x.zip
unzip aws-xray-daemon-linux-3.x.zip -d aws-xray-daemon-linux-3.x
sudo chmod +x ./aws-xray-daemon-linux-3.x/xray
chmod +x ./aws-xray-daemon-linux-3.x/xray
./aws-xray-daemon-linux-3.x/xray -o -n "$region" &


Expand Down
10 changes: 10 additions & 0 deletions variables.deprecated.tf
Original file line number Diff line number Diff line change
@@ -0,0 +1,10 @@
variable "lambda_tracing_mode" {
description = "DEPRECATED: Replaced by `tracing_config`."
type = string
default = null

validation {
condition = anytrue([var.lambda_tracing_mode == null])
error_message = "DEPRECATED, Replaced by `tracing_config`."
}
}