Skip to content

Nesting log_metrics only results in one set of metrics being logged #1668

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
tibbe opened this issue Oct 28, 2022 · 9 comments · Fixed by #1676
Closed

Nesting log_metrics only results in one set of metrics being logged #1668

tibbe opened this issue Oct 28, 2022 · 9 comments · Fixed by #1676
Assignees
Labels
documentation Improvements or additions to documentation feature-request feature request not-a-bug

Comments

@tibbe
Copy link
Contributor

tibbe commented Oct 28, 2022

Expected Behaviour

When using two Metrics objects and two nested call to log_metrics I expected both sets of metrics to be serialized. The reason for two Metrics object is that every metric is output twice, with a different set of dimensions. Using single_metric for everything works but seems sub-optimal from a performance standpoint.

Example:

app = ...  # FastAPI app

# API handler metrics with dimensions: Stage, service
api_metrics = Metrics()
api_metrics.set_default_dimensions(Stage=os.environ['STAGE'])

# API handler metrics with dimensions: Method, Resource, Stage, service
detailed_api_metrics = Metrics()
detailed_api_metrics.set_default_dimensions(Stage=os.environ['STAGE'])

handler = Mangum(app)
handler = logging.getLogger().inject_lambda_context(handler, clear_state=True)
# Add metrics last to properly flush metrics.
handler = api_metrics.log_metrics(handler, capture_cold_start_metric=True)
handler = detailed_api_metrics.log_metrics(handler)

Current Behaviour

Looking in the CloudWatch logs only the JSON for one set of metrics (api_metrics above) is output.

Code snippet

from aws_lambda_powertools.metrics import Metrics, MetricUnit

# API handler metrics with dimensions: Stage, service
api_metrics = Metrics()
api_metrics.set_default_dimensions(Stage='Test')

# API handler metrics with dimensions: Method, Resource, Stage, service
detailed_api_metrics = Metrics()
detailed_api_metrics.set_default_dimensions(Stage='Test')

def handler(event, context):
    detailed_api_metrics.add_dimension(
        name='Method', value='GET')
    detailed_api_metrics.add_dimension(
        name='Resource', value='/some/path')

    api_metrics.add_metric(name='Count',
                           unit=MetricUnit.Count, value=1)
    detailed_api_metrics.add_metric(
        name='Count', unit=MetricUnit.Count, value=1)

# Add metrics last to properly flush metrics.
handler = api_metrics.log_metrics(handler, capture_cold_start_metric=True)
handler = detailed_api_metrics.log_metrics(handler)

Possible Solution

No response

Steps to Reproduce

  • Run the above code on Lambda.
  • Look in the logs.
  • See that the JSON from one set of metrics is missing.

AWS Lambda Powertools for Python version

1.25.10

AWS Lambda function runtime

3.9

Packaging format used

PyPi

Debugging logs

I do see the two following things in the log:


/opt/python/aws_lambda_powertools/metrics/metrics.py:189: UserWarning: No metrics to publish, skipping
warnings.warn("No metrics to publish, skipping")

Note that the two Metrics objects are used unconditionally right after each other so it would be weird for one to have metrics and one to have not.

{
    "_aws": {
        "Timestamp": 1666944145372,
        "CloudWatchMetrics": [
            {
                "Namespace": "Benetics/Backend",
                "Dimensions": [
                    [
                        "Stage",
                        "service"
                    ]
                ],
                "Metrics": [
                    {
                        "Name": "Count",
                        "Unit": "Count"
                    },
                    {
                        "Name": "4xx",
                        "Unit": "Count"
                    },
                    {
                        "Name": "5xx",
                        "Unit": "Count"
                    },
                    {
                        "Name": "Latency",
                        "Unit": "Milliseconds"
                    }
                ]
            }
        ]
    },
    "Stage": "prod",
    "service": "Api",
    "Count": [
        1,
        1
    ],
    "4xx": [
        0,
        0
    ],
    "5xx": [
        0,
        0
    ],
    "Latency": [
        184,
        184
    ]
}
@tibbe tibbe added bug Something isn't working triage Pending triage from maintainers labels Oct 28, 2022
@boring-cyborg
Copy link

boring-cyborg bot commented Oct 28, 2022

Thanks for opening your first issue here! We'll come back to you as soon as we can.
In the meantime, check out the #python channel on our AWS Lambda Powertools Discord: Invite link

@tibbe
Copy link
Contributor Author

tibbe commented Oct 28, 2022

Here's a standalone example with the logging turned up to 11 showing that only one set of metrics gets output:

import logging
import sys

from aws_lambda_powertools.metrics import Metrics, MetricUnit

# API handler metrics with dimensions: Stage, service
api_metrics = Metrics()
api_metrics.set_default_dimensions(Stage='Test')

# API handler metrics with dimensions: Method, Resource, Stage, service
detailed_api_metrics = Metrics()
detailed_api_metrics.set_default_dimensions(Stage='Test')


class MockLambdaContext:
    function_name: str = 'FUNCTION_NAME'


def handler(event, context):
    detailed_api_metrics.add_dimension(
        name='Method', value='GET')
    detailed_api_metrics.add_dimension(
        name='Resource', value='/some/path')

    api_metrics.add_metric(name='Count',
                           unit=MetricUnit.Count, value=1)
    detailed_api_metrics.add_metric(
        name='Count', unit=MetricUnit.Count, value=1)


# Add metrics last to properly flush metrics.
handler = api_metrics.log_metrics(handler, capture_cold_start_metric=True)
handler = detailed_api_metrics.log_metrics(handler)


def test_handler(caplog):
    loggers = [logging.getLogger(name)
               for name in logging.root.manager.loggerDict]
    for logger in loggers:
        stream = logging.StreamHandler(sys.stdout)
        logger.setLevel(logging.DEBUG)
        logger.addHandler(stream)

    handler({}, MockLambdaContext())
    raise ValueError()  # Force pytest to fail and emit logs.

Run with pytest.

@tibbe
Copy link
Contributor Author

tibbe commented Oct 28, 2022

The issue seems to be these 4 class attributes in Metrics:

    _metrics: Dict[str, Any] = {}
    _dimensions: Dict[str, str] = {}
    _metadata: Dict[str, Any] = {}
    _default_dimensions: Dict[str, Any] = {}

This causes sharing across Metrics instances, which makes things not work out.

Just making these instance attributes fixes the issue. That being said I don't know what we lose. Presumably if people create Metrics instance all over the place instead of e.g. using one instance per dimension combination there will be e.g. extra allocation.

@heitorlessa heitorlessa added feature-request feature request not-a-bug and removed bug Something isn't working labels Oct 28, 2022
@heitorlessa
Copy link
Contributor

We briefly discussed on Discord on why this being expected behaviour, and I'll provide a proper answer as soon as I can.

I'll rephrase my last comment on that Discord thread as a proper response here and gather your ideas on what UX would be good to unlock this and other ISV use cases (e.g., multiple namespaces).

image

@heitorlessa heitorlessa self-assigned this Oct 28, 2022
@heitorlessa heitorlessa added documentation Improvements or additions to documentation and removed triage Pending triage from maintainers labels Oct 28, 2022
@heitorlessa
Copy link
Contributor

Confirming that we'll be working on this for this Friday's release. In that thread (Discord), we weren't able to find a better class name. Based on what we know, the most suitable candidate would be a flag in the existing Metrics class.

  • Example: Metrics(singleton=False)

At a first glance, the only "side effect" is that set_default_dimensions method would have the same effect as add_dimension in this case -- when singleton=True (default), set_default_dimensions ensures these dimensions are automatically added when metrics are flushed/cleared.

If you do have a better naming idea for a separate Class altogether, please shout out!

@github-actions github-actions bot added the pending-release Fix or implementation already in dev waiting to be released label Nov 1, 2022
@heitorlessa
Copy link
Contributor

heitorlessa commented Nov 1, 2022

Just merged the new class EphemeralMetrics to address this and similar use cases. I've also updated the docs to go in detail on the different metric ingestion mechanisms in CloudWatch that led us to have the default Metrics "singleton/borg" design.

We intend to release it this Friday by EOD.

As promised, here's the full answer to your original comments: why a new class is needed, why not making Metrics a non-"singleton" by default (breaking change), and the common challenges that led us to this design.

Staged docs: https://awslabs.github.io/aws-lambda-powertools-python/develop/core/metrics/#metrics-isolation


Metrics isolation

You can use EphemeralMetrics class when looking to isolate multiple instances of metrics with distinct namespaces and/or dimensions.

NOTE: "This is a typical use case is for multi-tenant, or emitting same metrics for distinct applications."

from aws_lambda_powertools.metrics import EphemeralMetrics, MetricUnit
from aws_lambda_powertools.utilities.typing import LambdaContext

metrics = EphemeralMetrics()


@metrics.log_metrics
def lambda_handler(event: dict, context: LambdaContext):
    metrics.add_metric(name="SuccessfulBooking", unit=MetricUnit.Count, value=1)

Differences between EphemeralMetrics and Metrics

EphemeralMetrics has only two differences while keeping nearly the exact same set of features:

Feature Metrics EphemeralMetrics
Share data across instances (metrics, dimensions, metadata, etc.) Yes -
Default dimensions that persists across Lambda invocations (metric flush) Yes -

"Why not changing the default Metrics behaviour to not share data across instances?"

This is an intentional design to prevent accidental data deduplication or data loss issues due to CloudWatch EMF metric dimension constraint.

In CloudWatch, there are two metric ingestion mechanisms: EMF (async) and PutMetricData API (sync).

The former creates metrics asynchronously via CloudWatch Logs, and the latter uses a synchronous and more flexible ingestion API.

Pause for a key concept

CloudWatch considers a metric unique by a combination of metric name, metric namespace, and zero or more metric dimensions.

With EMF, metric dimensions are shared with any metrics you define. With PutMetricData API, you can set a list defining one or more metrics with distinct dimensions.

This is a subtle yet important distinction. Imagine you had the following metrics to emit:

Metric Name Dimension Intent
SuccessfulBooking service="booking", tenant_id="sample" Application metric
IntegrationLatency service="booking", function_name="sample" Operational metric
ColdStart service="booking", function_name="sample" Operational metric

The tenant_id dimension could vary leading to two common issues:

  1. ColdStart metric will be created multiple times (N * number of unique tenant_id dimension value), despite the function_name being the same
  2. IntegrationLatency metric will be also created multiple times due to tenant_id as well as function_name (may or not be intentional)

These issues are exacerbated when you create (A) metric dimensions conditionally, (B) multiple metrics' instances throughout your code instead of reusing them (globals). Subsequent metrics' instances will have (or lack) different metric dimensions resulting in different metrics and data points with the same name.

Intentional design to address these scenarios

On 1, when you enable capture_start_metric feature, we transparently create and flush an additional EMF JSON Blob that is independent from your application metrics. This prevents data pollution.

On 2, you can use EphemeralMetrics to create an additional EMF JSON Blob from your application metric (SuccessfulBooking). This ensures that IntegrationLatency operational metric data points aren't tied to any dynamic dimension values like tenant_id.

That is why Metrics shares data across instances by default, as that covers 80% of use cases and different personas using Powertools. This allows them to instantiate Metrics in multiple places throughout their code - be a separate file, a middleware, or an abstraction that sets default dimensions.

@heitorlessa heitorlessa linked a pull request Nov 1, 2022 that will close this issue
8 tasks
@github-actions
Copy link
Contributor

github-actions bot commented Nov 7, 2022

This is now released under 2.2.0 version!

@github-actions github-actions bot closed this as completed Nov 7, 2022
@github-actions github-actions bot removed the pending-release Fix or implementation already in dev waiting to be released label Nov 7, 2022
@heitorlessa
Copy link
Contributor

@tibbe this is now available as part of v2.2.0 release (Lambda Layer v13): https://github.com/awslabs/aws-lambda-powertools-python/releases/tag/v2.2.0

Let me know if that doesn't address your original ask.

@tibbe
Copy link
Contributor Author

tibbe commented Nov 8, 2022

@heitorlessa thanks! I've now integrated 2.2.0 in our backend and things seem to be working well so far!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
documentation Improvements or additions to documentation feature-request feature request not-a-bug
Projects
None yet
2 participants