Skip to content

fix: only tag spot requests if no on-demand fallback #4585

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 3 commits into
base: main
Choose a base branch
from

Conversation

pwo3
Copy link

@pwo3 pwo3 commented May 14, 2025

Hi,

This PR prevents tagging spot instance requests when an on-demand fallback is configured.

It addresses my comment on this issue.

The previous fix was working only if the instance_target_capacity_type is set to on-demand but not in case of on-demand fallback from a spot request.

This approach isn’t ideal but I didn’t find a cleaner way to cancel the tagging directly in lambdas/functions/control-plane/src/aws/runners.ts when the on-demand fallback is triggered.

@pwo3 pwo3 requested a review from a team as a code owner May 14, 2025 09:35
@@ -207,7 +207,7 @@ resource "aws_launch_template" "runner" {
}

dynamic "tag_specifications" {
for_each = var.instance_target_capacity_type == "spot" ? [1] : [] # Include the block only if the value is "spot"
for_each = var.instance_target_capacity_type == "spot" && var.enable_on_demand_failover_for_errors == null ? [1] : [] # Include the block only if the value is "spot" and on_demand_failover_for_errors is not enabled
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This will solve the problem, but will avoid to tag the spot request if ondeamdn failover is active. A better place would be the lambda in my point of view.

const instancesOnDemand = await createRunner({

Would you have time to provide a fix in the lambda?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That was my initial approach but I couldn't find a way to overwrite the tags directly in the lambda.

I'll take a second look to see if I can find a solution

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don’t think we can apply a simple fix on the Lambda

Since the tags are defined in the launch template, it’s not possible to override them using the CreateFleetCommand

We could remove the spot-instances-request tags from the launch template and set them only in the Lambda’s CreateFleetCommand, but in that case, we lose access to the tags defined in the Terraform configuration

I don’t see a simple solution for now, do you have any thoughts on this?

Thanks

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I arrived to the same conclusion when I investigated a workaround.

Copy link
Member

@npalm npalm May 21, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we should consider to move the tagging spot request the runner.ts instead of setting it in the template. In that case we can do it properly based if a spot instance is requested or not. Already some tags are set here. Would you like to give it a shot?

see

TagSpecifications: [
{
ResourceType: 'instance',
Tags: tags,
},
{
ResourceType: 'volume',
Tags: tags,
},
],
Type: 'instant',
});

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@npalm I tried this approach, but we would lose all the tags defined in the Terraform module, only those tags would be applied

  const tags = [
    { Key: 'ghr:Application', Value: 'github-action-runner' },
    { Key: 'ghr:created_by', Value: runnerParameters.numberOfRunners === 1 ? 'scale-up-lambda' : 'pool-lambda' },
    { Key: 'ghr:Type', Value: runnerParameters.runnerType },
    { Key: 'ghr:Owner', Value: runnerParameters.runnerOwner },
  ];

However, I can try fetching the tags from the launch template first and then applying them to the spot-instances-request via the TagSpecifications

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I pushed the changes, but I’m not really sure how to test it in real conditions

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I will run some tests to check spot request are correctly tagged. The case spot is not available is not testable as far I know.

@pwo3 pwo3 force-pushed the fix-spot-tag-on-fallback branch 3 times, most recently from 2af3700 to 272fce0 Compare May 15, 2025 13:15
@r-bk
Copy link

r-bk commented May 20, 2025

@pwo3 @npalm
Any updates on this PR?
Are there any outstanding blockers to merge it?

@npalm
Copy link
Member

npalm commented May 20, 2025

I will have a look at the PR asap

@pwo3 pwo3 requested a review from npalm May 23, 2025 08:36
Copy link
Member

@npalm npalm left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have tested the PR, the code was not working do to the falt map was creating duplicated tags and missing describe permission for the lanunch template.

After making some changes the lambda was working. However no tags on the spot request. Changed several things. But did not got it working at all. Really strange, debug showed clearly the correct elements in the TagSpecification.

After updating to main, I got my tag on the spot request back. Maybe we should revert back to the previous approach. And make a not in the terraform code tht this should not be the place but tagging via the spotfleetrequest was not working at all.

@pwo3 pwo3 force-pushed the fix-spot-tag-on-fallback branch from 56799e4 to cd31b74 Compare May 26, 2025 07:39
@pwo3
Copy link
Author

pwo3 commented May 26, 2025

I have tested the PR, the code was not working do to the falt map was creating duplicated tags and missing describe permission for the lanunch template.

After making some changes the lambda was working. However no tags on the spot request. Changed several things. But did not got it working at all. Really strange, debug showed clearly the correct elements in the TagSpecification.

After updating to main, I got my tag on the spot request back. Maybe we should revert back to the previous approach. And make a not in the terraform code tht this should not be the place but tagging via the spotfleetrequest was not working at all.

Thanks for your tests @npalm, too bad it's not working... I reverted to the previous fix and added a comment in the code to explain why we're using this approach

@pwo3 pwo3 requested a review from npalm May 26, 2025 07:41
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants