Skip to content

Bug: Custom log formatter code not being reached #5503

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
austin-ironside opened this issue Nov 4, 2024 · 5 comments
Closed

Bug: Custom log formatter code not being reached #5503

austin-ironside opened this issue Nov 4, 2024 · 5 comments
Assignees
Labels
bug Something isn't working triage Pending triage from maintainers

Comments

@austin-ironside
Copy link

Expected Behaviour

We have hundreds of lambda functions deployed with SAM. We globally set POWERTOOLS_LOGGER_LOG_EVENT to True in order for the event of each invocation to be logged. In order to not log sensitve information, such as authoriztion headers from an APIGW event, I'm redacting certain fields due to security and in efforts to also reduce the overall size of the log in order to keep CloudWatch costs down.

The custom logger should delete these certain fields from the event payload before being logged.

Current Behaviour

The serialize() method never gets reached and none of my logger logic is performed. When I import the logger in a python shell, it works as expected and the custom code is ran. However, my logger doesn't appear to call the def serializer(self) method unless I add these two lines below. Problem with this method is that the lambda event also gets modified. So when I try to redact the Authorization key in the headers, my lambda authorizer throws a key error because it's missing from the event payload now.:

handler = logger.handlers[0]
handler.setFormatter(CustomFormatter())

Code snippet

Lambda handler:

logger = Logger(logger_formatter=CustomFormatter())

@tracer.capture_lambda_handler(
    capture_response=False
) 
@logger.inject_lambda_context()
def lambda_authorizer_handler(event: dict, context: LambdaContext) -> dict:
    return authorizer.handler(event, context, logger)

Custom Logger

class CustomFormatter(LambdaPowertoolsFormatter):
    """
    A custom logging formatter that redacts specific fields from log data before serialization.
    This is in order to remove sensitive informtion and reduce CloudWatch log costs as it greatly
    reduces the size of the log

    Attributes:
        HEADERS_TO_REDACT (Set[str]): Headers to be redacted from the log's 'headers' field.
        REQUEST_CONTEXT_TO_REDACT (Set[str]): Request context fields to redact from the log's 'requestContext'.
        MESSAGE_FIELDS_TO_REDACT (Set[str]): Message-level fields to redact from the log's 'message' field.
        LAMBDA_EVENT_FIELDS_TO_REDACT (Set[str]): Top-level Fields specific to Lambda events to redact from the main log body.
    """

    HEADERS_TO_REDACT: Set[str] = {
        "Authorization",
        "baggage",
    }

    REQUEST_CONTEXT_TO_REDACT: Set[str] = {
        "resourcePath",
        "extendedRequestId",
        "protocol",
        "stage",
        "domainPrefix",
        "identity",
        "domainName",
        "deploymentId",
        "apiId",
    }

    MESSAGE_FIELDS_TO_REDACT: Set[str] = {
        "multiValueHeaders",
        "multiValueQueryStringParameters",
        "pathParameters",
        "stageVariables",
        "isBase64Encoded",
    }

    LAMBDA_EVENT_FIELDS_TO_REDACT: Set[str] = {
        "function_memory_size",
        "function_arn",
        "cold_start",
    }

    def __init__(self, *args, **kwargs):
        print("Inside INIT of custom logger")
        super().__init__(*args, **kwargs)

    def serialize(self, log: LogRecord) -> str:
        print(f"In Serialize method: {log}")
        try:
            self._redact_fields(log)
        except Exception:
            print("FAILURE")
            pass

        return self.json_serializer(log)

    def _pop_fields(self, log: Dict[str, Any], keys_to_redact: Set[str]) -> None:
        for key in keys_to_redact:
            log.pop(key, None)

    def _redact_fields(self, log: LogRecord) -> None:
        message = log.get("message", {})
        if isinstance(message, dict):
            headers = message.get("headers")
            if isinstance(headers, dict):
                self._pop_fields(headers, self.HEADERS_TO_REDACT)

            request_context = message.get("requestContext")
            if isinstance(request_context, dict):
                self._pop_fields(request_context, self.REQUEST_CONTEXT_TO_REDACT)

            self._pop_fields(message, self.MESSAGE_FIELDS_TO_REDACT)

        self._pop_fields(log, self.LAMBDA_EVENT_FIELDS_TO_REDACT)  # type: ignore
        log["message"] = message

Possible Solution

No response

Steps to Reproduce

Setup lambda function to use the CustomLogger as the logger_formatter and invoke the function locally or in the cloud

Powertools for AWS Lambda (Python) version

3.2.0

AWS Lambda function runtime

3.9

Packaging format used

Lambda Layers

Debugging logs

No response

@austin-ironside austin-ironside added bug Something isn't working triage Pending triage from maintainers labels Nov 4, 2024
Copy link

boring-cyborg bot commented Nov 4, 2024

Thanks for opening your first issue here! We'll come back to you as soon as we can.
In the meantime, check out the #python channel on our Powertools for AWS Lambda Discord: Invite link

@leandrodamascena
Copy link
Contributor

Hi @austin-ironside! Reading this now and trying to replicate the scenario in my environment. I'll get back to you with some updates as soon as possible.

@leandrodamascena leandrodamascena self-assigned this Nov 6, 2024
@austin-ironside
Copy link
Author

Thanks for the response! I was able to instantiate the CustomFormatter without having to alter the handlers directly. Not sure exactly where/when it started working. Might have been a SAM local build situation.

The other issue of the lambda event being mutated from my changes, I worked around by using a deepcopy on the log record and made my changes there before returning it. My custom formatter is really aimed at API Gateway events and trimming those log records down. Not sure if there are any side effects of this yet, but for now, this seems to perform the way I was expecting.

from aws_lambda_powertools.logging.formatter import (
    LambdaPowertoolsFormatter,
)
from aws_lambda_powertools.logging import Logger
from aws_lambda_powertools.logging.types import LogRecord

from copy import deepcopy
from typing import Any, Dict, Set


class CustomFormatter(LambdaPowertoolsFormatter):
    """
    A custom logging formatter that redacts specific fields from log data before serialization.
    This is in order to remove sensitive informtion and reduce CloudWatch log costs as it greatly
    reduces the size of the log
    Attributes:
        HEADERS_TO_REDACT (Set[str]): Headers to be redacted from the APIGW log's 'headers' field.
        REQUEST_CONTEXT_TO_REDACT (Set[str]): Request context fields to redact from the APIGW log's 'requestContext'.
        MESSAGE_FIELDS_TO_REDACT (Set[str]): Message-level fields to redact from the log's 'message' field.
        LAMBDA_EVENT_FIELDS_TO_REDACT (Set[str]): Top-level Fields specific to Lambda events to redact from the main log body.
    """

    FIELDS_TO_REDACT_HEADERS: Set[str] = {
        "Authorization",
        "baggage",
    }

    FIELDS_TO_REDACT_REQUEST_CONTEXT: Set[str] = {
        "resourcePath",
        "extendedRequestId",
        "protocol",
        "stage",
        "domainPrefix",
        "identity",
        "domainName",
        "deploymentId",
        "apiId",
    }

    FIELDS_TO_REDACT_MESSAGE: Set[str] = {
        "multiValueHeaders",
        "multiValueQueryStringParameters",
        "pathParameters",
        "stageVariables",
        "isBase64Encoded",
    }

    FIELDS_TO_REDACT_LAMBDA_EVENT: Set[str] = {
        "function_memory_size",
        "function_arn",
        "cold_start",
    }

    def serialize(self, log: LogRecord) -> str:
        """
        Update the log record ito redact sensitive  or non-essential fields.
        If any failures occur in the process, fall back to the original log
        statement in order to prevent throwing errors to users.

        Args:
            log: The log record to serialize
        """
        try:
            self._pop_fields(log, self.FIELDS_TO_REDACT_LAMBDA_EVENT)

            if self._needs_redaction(log):
                # deepcopy ensures lambda event payload remains unaltered.
                log_copy = deepcopy(log)
                self._redact_fields(log_copy)  # type: ignore
                return self.json_serializer(log_copy)
            return self.json_serializer(log)
        except Exception as e:
            self.append_keys(redaction_failed=True, redaction_error=str(e))
            return self.json_serializer(log)

    def _redact_fields(self, log: Dict[str, Any]) -> None:
        """
        Remove sensitive or unnecessary fields from the log record.
        """
        message = log.get("message")
        if isinstance(message, Dict):
            headers = message.get("headers")
            if isinstance(headers, Dict):
                self._pop_fields(headers, self.FIELDS_TO_REDACT_HEADERS)

            request_context = message.get("requestContext")
            if isinstance(request_context, Dict):
                self._pop_fields(request_context, self.FIELDS_TO_REDACT_REQUEST_CONTEXT)

            self._pop_fields(message, self.FIELDS_TO_REDACT_MESSAGE)

        log["message"] = message

    def _needs_redaction(self, log: LogRecord) -> bool:
        """
        Check if the log record contains any fields that need to be redacted.
        A log needs redactions if it is an API Gateway event.

        Args:
            log: The log record to check

        Returns:
            bool: True if the log record needs redaction, False otherwise
        """
        message = log.get("message")
        if not isinstance(message, dict):
            return False

        return (
            "headers" in message
            or "requestContext" in message
            or any(field in message for field in self.FIELDS_TO_REDACT_MESSAGE)
        )

    def _pop_fields(self, log: LogRecord, keys_to_redact: Set[str]) -> None:
        """
        Remove the specified keys from the log record.

        Args:
            log: The log record to remove keys from
            keys_to_redact: The keys to remove from the log record
        """
        if isinstance(log, dict):
            for key in keys_to_redact:
                log.pop(key, None)  # type: ignore

@leandrodamascena
Copy link
Contributor

Might have been a SAM local build situation.

I don't think this was created by SAM or anything related to local build, but I'm glad you were able to make it work.

I worked around by using a deepcopy on the log record and made my changes there before returning it

The payload doesn't change much, but yes, it's better to do a deep copy than a shallow copy, that way you'll be sure you're copying the entire structure.

Not sure if there are any side effects of this yet, but for now, this seems to perform the way I was expecting.

I don't see any problem here. You're wrapping your main logic in a try/except block, so you can see when things go wrong and not raise exceptions.

Thanks for reporting that it's working @austin-ironside! I'm closing this issue, feel free to reopen.

Copy link
Contributor

github-actions bot commented Nov 6, 2024

⚠️COMMENT VISIBILITY WARNING⚠️

This issue is now closed. Please be mindful that future comments are hard for our team to see.

If you need more assistance, please either tag a team member or open a new issue that references this one.

If you wish to keep having a conversation with other community members under this issue feel free to do so.

@leandrodamascena leandrodamascena moved this from Coming soon to Closed in Powertools for AWS Lambda (Python) Jan 27, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working triage Pending triage from maintainers
Projects
Development

No branches or pull requests

2 participants