Nesting log_metrics only results in one set of metrics being logged #1668

tibbe · 2022-10-28T08:34:06Z

Expected Behaviour

When using two Metrics objects and two nested call to log_metrics I expected both sets of metrics to be serialized. The reason for two Metrics object is that every metric is output twice, with a different set of dimensions. Using single_metric for everything works but seems sub-optimal from a performance standpoint.

Example:

app = ...  # FastAPI app

# API handler metrics with dimensions: Stage, service
api_metrics = Metrics()
api_metrics.set_default_dimensions(Stage=os.environ['STAGE'])

# API handler metrics with dimensions: Method, Resource, Stage, service
detailed_api_metrics = Metrics()
detailed_api_metrics.set_default_dimensions(Stage=os.environ['STAGE'])

handler = Mangum(app)
handler = logging.getLogger().inject_lambda_context(handler, clear_state=True)
# Add metrics last to properly flush metrics.
handler = api_metrics.log_metrics(handler, capture_cold_start_metric=True)
handler = detailed_api_metrics.log_metrics(handler)

Current Behaviour

Looking in the CloudWatch logs only the JSON for one set of metrics (api_metrics above) is output.

Code snippet

from aws_lambda_powertools.metrics import Metrics, MetricUnit

# API handler metrics with dimensions: Stage, service
api_metrics = Metrics()
api_metrics.set_default_dimensions(Stage='Test')

# API handler metrics with dimensions: Method, Resource, Stage, service
detailed_api_metrics = Metrics()
detailed_api_metrics.set_default_dimensions(Stage='Test')

def handler(event, context):
    detailed_api_metrics.add_dimension(
        name='Method', value='GET')
    detailed_api_metrics.add_dimension(
        name='Resource', value='/some/path')

    api_metrics.add_metric(name='Count',
                           unit=MetricUnit.Count, value=1)
    detailed_api_metrics.add_metric(
        name='Count', unit=MetricUnit.Count, value=1)

# Add metrics last to properly flush metrics.
handler = api_metrics.log_metrics(handler, capture_cold_start_metric=True)
handler = detailed_api_metrics.log_metrics(handler)

Possible Solution

No response

Steps to Reproduce

Run the above code on Lambda.
Look in the logs.
See that the JSON from one set of metrics is missing.

AWS Lambda Powertools for Python version

1.25.10

AWS Lambda function runtime

3.9

Packaging format used

PyPi

Debugging logs

I do see the two following things in the log:


/opt/python/aws_lambda_powertools/metrics/metrics.py:189: UserWarning: No metrics to publish, skipping
warnings.warn("No metrics to publish, skipping")

Note that the two Metrics objects are used unconditionally right after each other so it would be weird for one to have metrics and one to have not.

{
    "_aws": {
        "Timestamp": 1666944145372,
        "CloudWatchMetrics": [
            {
                "Namespace": "Benetics/Backend",
                "Dimensions": [
                    [
                        "Stage",
                        "service"
                    ]
                ],
                "Metrics": [
                    {
                        "Name": "Count",
                        "Unit": "Count"
                    },
                    {
                        "Name": "4xx",
                        "Unit": "Count"
                    },
                    {
                        "Name": "5xx",
                        "Unit": "Count"
                    },
                    {
                        "Name": "Latency",
                        "Unit": "Milliseconds"
                    }
                ]
            }
        ]
    },
    "Stage": "prod",
    "service": "Api",
    "Count": [
        1,
        1
    ],
    "4xx": [
        0,
        0
    ],
    "5xx": [
        0,
        0
    ],
    "Latency": [
        184,
        184
    ]
}

The text was updated successfully, but these errors were encountered:

boring-cyborg · 2022-10-28T08:34:09Z

Thanks for opening your first issue here! We'll come back to you as soon as we can.
In the meantime, check out the #python channel on our AWS Lambda Powertools Discord: Invite link

tibbe · 2022-10-28T09:00:05Z

Here's a standalone example with the logging turned up to 11 showing that only one set of metrics gets output:

import logging
import sys

from aws_lambda_powertools.metrics import Metrics, MetricUnit

# API handler metrics with dimensions: Stage, service
api_metrics = Metrics()
api_metrics.set_default_dimensions(Stage='Test')

# API handler metrics with dimensions: Method, Resource, Stage, service
detailed_api_metrics = Metrics()
detailed_api_metrics.set_default_dimensions(Stage='Test')


class MockLambdaContext:
    function_name: str = 'FUNCTION_NAME'


def handler(event, context):
    detailed_api_metrics.add_dimension(
        name='Method', value='GET')
    detailed_api_metrics.add_dimension(
        name='Resource', value='/some/path')

    api_metrics.add_metric(name='Count',
                           unit=MetricUnit.Count, value=1)
    detailed_api_metrics.add_metric(
        name='Count', unit=MetricUnit.Count, value=1)


# Add metrics last to properly flush metrics.
handler = api_metrics.log_metrics(handler, capture_cold_start_metric=True)
handler = detailed_api_metrics.log_metrics(handler)


def test_handler(caplog):
    loggers = [logging.getLogger(name)
               for name in logging.root.manager.loggerDict]
    for logger in loggers:
        stream = logging.StreamHandler(sys.stdout)
        logger.setLevel(logging.DEBUG)
        logger.addHandler(stream)

    handler({}, MockLambdaContext())
    raise ValueError()  # Force pytest to fail and emit logs.

Run with pytest.

tibbe · 2022-10-28T09:33:02Z

The issue seems to be these 4 class attributes in Metrics:

    _metrics: Dict[str, Any] = {}
    _dimensions: Dict[str, str] = {}
    _metadata: Dict[str, Any] = {}
    _default_dimensions: Dict[str, Any] = {}

This causes sharing across Metrics instances, which makes things not work out.

Just making these instance attributes fixes the issue. That being said I don't know what we lose. Presumably if people create Metrics instance all over the place instead of e.g. using one instance per dimension combination there will be e.g. extra allocation.

heitorlessa · 2022-10-28T10:25:17Z

We briefly discussed on Discord on why this being expected behaviour, and I'll provide a proper answer as soon as I can.

I'll rephrase my last comment on that Discord thread as a proper response here and gather your ideas on what UX would be good to unlock this and other ISV use cases (e.g., multiple namespaces).

heitorlessa · 2022-10-31T10:04:05Z

Confirming that we'll be working on this for this Friday's release. In that thread (Discord), we weren't able to find a better class name. Based on what we know, the most suitable candidate would be a flag in the existing Metrics class.

Example: Metrics(singleton=False)

At a first glance, the only "side effect" is that set_default_dimensions method would have the same effect as add_dimension in this case -- when singleton=True (default), set_default_dimensions ensures these dimensions are automatically added when metrics are flushed/cleared.

If you do have a better naming idea for a separate Class altogether, please shout out!

heitorlessa · 2022-11-01T11:50:19Z

Just merged the new class EphemeralMetrics to address this and similar use cases. I've also updated the docs to go in detail on the different metric ingestion mechanisms in CloudWatch that led us to have the default Metrics "singleton/borg" design.

We intend to release it this Friday by EOD.

As promised, here's the full answer to your original comments: why a new class is needed, why not making Metrics a non-"singleton" by default (breaking change), and the common challenges that led us to this design.

Staged docs: https://awslabs.github.io/aws-lambda-powertools-python/develop/core/metrics/#metrics-isolation

Metrics isolation

You can use EphemeralMetrics class when looking to isolate multiple instances of metrics with distinct namespaces and/or dimensions.

NOTE: "This is a typical use case is for multi-tenant, or emitting same metrics for distinct applications."

from aws_lambda_powertools.metrics import EphemeralMetrics, MetricUnit
from aws_lambda_powertools.utilities.typing import LambdaContext

metrics = EphemeralMetrics()


@metrics.log_metrics
def lambda_handler(event: dict, context: LambdaContext):
    metrics.add_metric(name="SuccessfulBooking", unit=MetricUnit.Count, value=1)

Differences between `EphemeralMetrics` and `Metrics`

EphemeralMetrics has only two differences while keeping nearly the exact same set of features:

Feature	Metrics	EphemeralMetrics
Share data across instances (metrics, dimensions, metadata, etc.)	Yes	-
Default dimensions that persists across Lambda invocations (metric flush)	Yes	-

"Why not changing the default Metrics behaviour to not share data across instances?"

This is an intentional design to prevent accidental data deduplication or data loss issues due to CloudWatch EMF metric dimension constraint.

In CloudWatch, there are two metric ingestion mechanisms: EMF (async) and PutMetricData API (sync).

The former creates metrics asynchronously via CloudWatch Logs, and the latter uses a synchronous and more flexible ingestion API.

Pause for a key concept

CloudWatch considers a metric unique by a combination of metric name, metric namespace, and zero or more metric dimensions.

With EMF, metric dimensions are shared with any metrics you define. With PutMetricData API, you can set a list defining one or more metrics with distinct dimensions.

This is a subtle yet important distinction. Imagine you had the following metrics to emit:

Metric Name	Dimension	Intent
SuccessfulBooking	service="booking", tenant_id="sample"	Application metric
IntegrationLatency	service="booking", function_name="sample"	Operational metric
ColdStart	service="booking", function_name="sample"	Operational metric

The tenant_id dimension could vary leading to two common issues:

ColdStart metric will be created multiple times (N * number of unique tenant_id dimension value), despite the function_name being the same
IntegrationLatency metric will be also created multiple times due to tenant_id as well as function_name (may or not be intentional)

These issues are exacerbated when you create (A) metric dimensions conditionally, (B) multiple metrics' instances throughout your code instead of reusing them (globals). Subsequent metrics' instances will have (or lack) different metric dimensions resulting in different metrics and data points with the same name.

Intentional design to address these scenarios

On 1, when you enable capture_start_metric feature, we transparently create and flush an additional EMF JSON Blob that is independent from your application metrics. This prevents data pollution.

On 2, you can use EphemeralMetrics to create an additional EMF JSON Blob from your application metric (SuccessfulBooking). This ensures that IntegrationLatency operational metric data points aren't tied to any dynamic dimension values like tenant_id.

That is why Metrics shares data across instances by default, as that covers 80% of use cases and different personas using Powertools. This allows them to instantiate Metrics in multiple places throughout their code - be a separate file, a middleware, or an abstraction that sets default dimensions.

github-actions · 2022-11-07T17:15:35Z

This is now released under 2.2.0 version!

heitorlessa · 2022-11-07T17:26:40Z

@tibbe this is now available as part of v2.2.0 release (Lambda Layer v13): https://github.com/awslabs/aws-lambda-powertools-python/releases/tag/v2.2.0

Let me know if that doesn't address your original ask.

tibbe · 2022-11-08T07:03:41Z

@heitorlessa thanks! I've now integrated 2.2.0 in our backend and things seem to be working well so far!

tibbe added bug Something isn't working triage Pending triage from maintainers labels Oct 28, 2022

heitorlessa added feature-request feature request not-a-bug and removed bug Something isn't working labels Oct 28, 2022

heitorlessa self-assigned this Oct 28, 2022

heitorlessa added documentation Improvements or additions to documentation and removed triage Pending triage from maintainers labels Oct 28, 2022

heitorlessa mentioned this issue Oct 31, 2022

feat(metrics): add EphemeralMetrics as a non-singleton option #1676

Merged

8 tasks

github-actions bot added the pending-release Fix or implementation already in dev waiting to be released label Nov 1, 2022

heitorlessa linked a pull request Nov 1, 2022 that will close this issue

feat(metrics): add EphemeralMetrics as a non-singleton option #1676

Merged

8 tasks

github-actions bot closed this as completed Nov 7, 2022

github-actions bot removed the pending-release Fix or implementation already in dev waiting to be released label Nov 7, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Nesting log_metrics only results in one set of metrics being logged #1668

Nesting log_metrics only results in one set of metrics being logged #1668

tibbe commented Oct 28, 2022

boring-cyborg bot commented Oct 28, 2022

tibbe commented Oct 28, 2022 •

edited

Loading

tibbe commented Oct 28, 2022

heitorlessa commented Oct 28, 2022

heitorlessa commented Oct 31, 2022

heitorlessa commented Nov 1, 2022 •

edited

Loading

github-actions bot commented Nov 7, 2022

heitorlessa commented Nov 7, 2022

tibbe commented Nov 8, 2022

Nesting log_metrics only results in one set of metrics being logged #1668

Nesting log_metrics only results in one set of metrics being logged #1668

Comments

tibbe commented Oct 28, 2022

Expected Behaviour

Current Behaviour

Code snippet

Possible Solution

Steps to Reproduce

AWS Lambda Powertools for Python version

AWS Lambda function runtime

Packaging format used

Debugging logs

boring-cyborg bot commented Oct 28, 2022

tibbe commented Oct 28, 2022 • edited Loading

tibbe commented Oct 28, 2022

heitorlessa commented Oct 28, 2022

heitorlessa commented Oct 31, 2022

heitorlessa commented Nov 1, 2022 • edited Loading

Metrics isolation

Differences between EphemeralMetrics and Metrics

github-actions bot commented Nov 7, 2022

heitorlessa commented Nov 7, 2022

tibbe commented Nov 8, 2022

tibbe commented Oct 28, 2022 •

edited

Loading

heitorlessa commented Nov 1, 2022 •

edited

Loading

Differences between `EphemeralMetrics` and `Metrics`