Skip to content

powertools-cloudformation custom resource exception handling #1081

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
scottgerring opened this issue Mar 9, 2023 · 2 comments
Closed

powertools-cloudformation custom resource exception handling #1081

scottgerring opened this issue Mar 9, 2023 · 2 comments
Assignees
Labels
bug Something isn't working triage

Comments

@scottgerring
Copy link
Contributor

Expected Behavior

I expect powertools-cloudformation to use the physicalResourceId sent by CloudFormation during UPDATE events when (A) none are explicitly set, or (B) a RuntimeException was thrown.

Current Behavior

Expand for a primer in CloudFormation Custom Resources

Checkout this page for a complete explanation: CloudFormation Custom Resources

For the context of this issue, we will focus on the three main high level resource lifecycle events: CREATE, UPDATE, DELETE.

For each event, a unique identifier called Physical Resource ID is used to control whether a resource needs to be created, updated, replaced, or deleted. From here on, we'll refer to this as physicalResourceId.

In this context, the typical workflow for a custom resource runs as follows:

  • CREATE event is invoked when a resource is being provisioned for the first time.

    • Expected response. CloudFormation expects a physicalResourceId in a successful custom resource response, where its uniqueness is determined by the custom resource itself. From here on, CloudFormation will send the same physicalResourceId for any subsequent operation related to this resource.
  • UPDATE event is invoked when a resource had a property changed. It sends the physicalResourceId previously known for this resource.

    • Expected response. For a successful update of the same resource, CloudFormation expects the same physicalResourceId. However, if physicalResourceId is different, CloudFormation recognizes it needs to dispatch a DELETE event to clean up the previous physicalResourceId it had stored - effectively a resource replacement.
  • DELETE event is invoked when a resource was either removed from the template or its stack is being deleted. It sends the physicalResourceId previously known for this resource.

    • Expected Response. CloudFormation expects a non-error response to be sure the resource was deleted successfully.

Problem

For successful operations (no-exception), powertools-cloudformation implicitly defines the physicalResourceId value with the CloudWatch Log Stream name.

For unsuccessful operations where you throw a RuntimeException, powertools-cloudformation implements a catch-all exception handler that also sets the physicalResourceId value with the CloudWatch Log Stream name.

This behaviour creates the following unexpected side effects:

Physical resource ID defaulting to CloudWatch Log Stream name on exception

An UPDATE operation that experienced a RuntimeException might lead to a resource replacement, because powertools-cloudformation always set the physicalResourceId value with the CloudWatch Log Stream name, which might be different to the previously recorded physicalResourceId.

@Override
protected Response update(CloudFormationCustomResourceEvent cloudFormationCustomResourceEvent, Context context) {

    // this will lead to a response containing `physicalResourceId: context.getLogStreamName()`
    throw new RuntimeException("something went wrong"); 

    return Response.builder().physicalResourceId(cloudFormationCustomResourceEvent.getPhysicalResourceId()).build();
}

Distinct physical resource IDs in CREATE and UPDATE events

You might explicitly set an unique identifier for physicalResourceId during CREATE event, but doesn't use the same value in UPDATE event, or let powertools-cloudformation implicitly discover one.

@Override
protected Response create(CloudFormationCustomResourceEvent cloudFormationCustomResourceEvent, Context context) {

    final String resourceId = UUID.randomUUID().toString();

    return Response.builder().physicalResourceId(resourceId).build();
}

@Override
protected Response update(CloudFormationCustomResourceEvent cloudFormationCustomResourceEvent, Context context) {

    return Response.success() // physicalResourceID will be the value of CloudWatch Log Group name
}

Distinct physical resource IDs in infrequent UPDATE events

In subsequent UPDATE events, CloudWatch Log Stream name might differ due to Lambda scaling model, leading to a resource replacement.

@Override
protected Response update(CloudFormationCustomResourceEvent cloudFormationCustomResourceEvent, Context context) {

    // FIRST update: 2023/03/08/[$LATEST]aabbccdd
    // SECOND update: 2023/03/08/[$LATEST]eeffggaa
    log.info(context.getLogStreamName()); // FIRST and SECOND update might have different results
    return Response.success();
}

Possible Solution

Backwards compatible change

Since the behaviour is incorrect (a bug), a non-breaking change would be to update the Response class to use the physicalResourceId available in the UPDATE event instead of CloudWatch Log Stream name that is prone to error.

Deprecate 'Response.success()' and 'Response.failure()' and add new functionality which easily allows customers to follow best practices.

If a customer handler throws an exception, we would still provide the physicalResourceId from the CloudFormation event.

Backwards incompatible change

Do all of (1), and modify the interface to the user to throw if no ID is provided. I believe we should make it explicit this is necessary, and never try and guess something to plug in for them - the confusion here arises due to the impedance mismatch between Cloudformation's interface and powertools-cloudformations.

Remove 'Response.success()' and 'Response.failure()' in favour of a new one as described in 1

Documentation update

We should provide (1) a more complete custom resource example to demonstrate a create, update, and delete handler, and (2) an example on how to explicitly set physicalResourceId with the Response builder.

Steps to Reproduce (for bugs)

  1. Create any custom resource using powertools-cloudformation for create, update, and delete handler
  2. For create, explicitly set physicalResourceId with any String value
  3. For update, simply return a successful response Response.success()
  4. Launch a CloudFormation stack
  5. Modify any Custom Resource attribute in the CloudFormation template and trigger a Stack Update
  6. The resource created should be replaced in the update

Environment

  • Powertools version used: v1.14.0
  • Packaging format (Layers, Maven/Gradle): Maven
  • AWS Lambda function runtime: Java11
  • Debugging logs None

How to enable debug mode**


CloudFormation Events

{
  "RequestType": "Create",
  "ResponseURL": "http://pre-signed-s3-url-for-response/",
  "StackId": "arn:aws:cloudformation:us-east-1:123456789012:stack/MyStack/guid",
  "RequestId": "unique id for this create request",
  "ResourceType": "Custom::TestResource",
  "ResourceProperties": {
    "StackName": "MyStack",
  }
}
{
  "RequestType": "Update",
  "ResponseURL": "http://pre-signed-s3-url-for-response/",
  "StackId": "arn:aws:cloudformation:us-east-1:123456789012:stack/MyStack/guid",
  "RequestId": "unique id for this create request",
  "ResourceType": "Custom::TestResource",
  "LogicalResourceId": "MyTestResource",
  "PhysicalResourceId": "physicalResource-123",
  "ResourceProperties": {
    "StackName": "MyStack"
  }
}
{
  "RequestType": "Delete",
  "ResponseURL": "http://pre-signed-s3-url-for-response/",
  "StackId": "arn:aws:cloudformation:us-east-1:123456789012:stack/MyStack/guid",
  "RequestId": "unique id for this create request",
  "ResourceType": "Custom::TestResource",
  "LogicalResourceId": "MyTestResource",
  "PhysicalResourceId": "physicalResource-123",
  "ResourceProperties": {
    "StackName": "MyStack"
  }
}
@scottgerring
Copy link
Contributor Author

Re-opening until docs are done.

@scottgerring
Copy link
Contributor Author

Change + docs implemented, in next release

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working triage
Projects
Status: Shipped
Development

No branches or pull requests

2 participants