Writing EMF metrics significantly increases lambda execution time #303

tgip-work · 2021-03-03T10:37:55Z

I am using the metrics functionality of the Lambda Powertools for generating custom Cloudwatch metrics (via writing EMF records into Cloudwatch logs).
The overall execution time of our Lambda functions has significantly increased since we are using the Powertools.
With "significantly" I mean approx. 50 to 100ms per metric on a 128 MB lambda function.

Expected Behavior

Adding metrics should only increase the execution time of a Lambda function by < 1ms per metric.

Current Behavior

Each metric added increases execution time by approx. 50 to 100ms on a Lambda function running with 128MB of RAM.

Possible Solution

I drilled down the code and guess that the schema validation is causing the performance issue. See [code] (https://github.com/awslabs/aws-lambda-powertools-python/blob/develop/aws_lambda_powertools/metrics/base.py#L214).

Possible solutions:

Offer option to disable/skip validation
Precompile the schema and keep validation object in a global variable/scope so it can be reused - see https://horejsek.github.io/python-fastjsonschema/

Steps to Reproduce (for bugs)

from aws_lambda_powertools import Metrics
from aws_lambda_powertools.metrics import MetricUnit
metrics = Metrics(namespace="Company/Product", service="some-service")

@metrics.log_metrics(capture_cold_start_metric=True)
def lambda_handler(event, context):
    metrics.add_metric(name="Ok", unit=MetricUnit.Count, value=1)
    metrics.add_metric(name="Errors", unit=MetricUnit.Count, value=1)

Environment

Powertools version used: 1.10.5
Packaging format (Layers, PyPi): PyPi
AWS Lambda function runtime: Python 3.6 and also Python 3.8 with 128 MB RAM
Debugging logs

How to enable debug mode**

2021-03-03T11:28:05.105+01:00 START RequestId: 20607f86-e159-4f0e-a89b-7298f2ddbbed Version: $LATEST
[INFO]	2021-03-03T10:28:05.105Z		Found credentials in environment variables.
2021-03-03 10:28:05,229 aws_lambda_powertools.metrics.metrics [DEBUG] Decorator called with parameters
2021-03-03 10:28:05,275 aws_lambda_powertools.metrics.base [DEBUG] Adding metric: RecordsInserted with defaultdict(<class 'list'>, {'Unit': 'Count', 'Value': [1.0]})
2021-03-03 10:28:05,278 aws_lambda_powertools.metrics.metrics [DEBUG] Adding cold start metric and function_name dimension
2021-03-03 10:28:05,279 aws_lambda_powertools.metrics.base [DEBUG] Adding metric: ColdStart with defaultdict(<class 'list'>, {'Unit': 'Count', 'Value': [1.0]})
2021-03-03 10:28:05,279 aws_lambda_powertools.metrics.base [DEBUG] Adding dimension: function_name:my-function
2021-03-03 10:28:05,279 aws_lambda_powertools.metrics.base [DEBUG] Adding dimension: service:my-service
2021-03-03 10:28:05,279 aws_lambda_powertools.metrics.base [DEBUG] {'details': 'Serializing metrics', 'metrics': {'ColdStart': defaultdict(<class 'list'>, {'Unit': 'Count', 'Value': [1.0]})}, 'dimensions': {'function_name': 'my-function', 'service': 'my-service'}}
2021-03-03 10:28:05,279 aws_lambda_powertools.metrics.base [DEBUG] Validating serialized metrics against CloudWatch EMF schema
2021-03-03T11:28:05.398+01:00
{
    "_aws": {
        "Timestamp": 1614767285279,
        "CloudWatchMetrics": [
            {
                "Namespace": "Company/Product",
                "Dimensions": [
                    [
                        "function_name",
                        "service"
                    ]
                ],
                "Metrics": [
                    {
                        "Name": "ColdStart",
                        "Unit": "Count"
                    }
                ]
            }
        ]
    },
    "function_name": "my-function",
    "service": "my-service",
    "ColdStart": [
        1
    ]
}
2021-03-03 10:28:05,398 aws_lambda_powertools.metrics.base [DEBUG] {'details': 'Serializing metrics', 'metrics': {'RecordsInserted': defaultdict(<class 'list'>, {'Unit': 'Count', 'Value': [1.0]})}, 'dimensions': {'service': 'my-service'}}
2021-03-03 10:28:05,398 aws_lambda_powertools.metrics.base [DEBUG] Validating serialized metrics against CloudWatch EMF schema
2021-03-03 10:28:05,498 aws_lambda_powertools.metrics.metrics [DEBUG] Clearing out existing metric set from memory
2021-03-03T11:28:05.498+01:00
{
    "_aws": {
        "Timestamp": 1614767285398,
        "CloudWatchMetrics": [
            {
                "Namespace": "Company/Product",
                "Dimensions": [
                    [
                        "service"
                    ]
                ],
                "Metrics": [
                    {
                        "Name": "RecordsInserted",
                        "Unit": "Count"
                    }
                ]
            }
        ]
    },
    "service": "my-service",
    "RecordsInserted": [
        1
    ]
}
END RequestId: 20607f86-e159-4f0e-a89b-7298f2ddbbed

heitorlessa · 2021-03-03T12:10:47Z

Hey @tgip-work - thanks for the detailed issue.

Question: Do you see the behaviour consistently in non-cold start executions?

Last I remember checking fastjsonschema the cold start validation and serialisation was roughly 25ms, and single digit thereafter (mostly JSON Serialisation).

If that however is happening even after cold start then it's a problem I'd like to prioritise to our next release (big one already).

Thanks

tgip-work · 2021-03-03T12:14:52Z

Hi @heitorlessa ,

thanks for your quick reply.
This also happens on non-coldstart executions. It does not seem to matter if cold or warm.

heitorlessa · 2021-03-03T12:26:12Z

Understood, thank you! While I don't think we can get to <1 ms because of how JSON works in python, uJSON could but that would be another dep.... we can surely decrease this time with a few perf improvements.

I'll post an update this week on how far I get. Meantime, I've also finished this optimization that will be available in the next release on import times - Specially if you don't use Tracer (75% improvement).

#301 (comment)

heitorlessa · 2021-03-03T15:52:49Z

Initial perf test on my local machine (8 core, 32G RAM):

Scenario: Add and serialize 99 metrics, one dimension, and namespace

JSON Schema validation (as-is): 25ms
No validation: 100 microseconds
Native python validation: TBW

Fair to say we might be able reach <1ms indeed @tgip-work !! I'll experiment with native Python validation and keep you posted

@pytest.mark.perf
def test_metrics_validation_sla(namespace, dimension, ninety_nine_metrics):
    # GIVEN Metrics is initialized
    my_metrics = Metrics(namespace=namespace)
    my_metrics.add_dimension(**dimension)

    # WHEN we add 99 metrics
    for _metric in ninety_nine_metrics:
        my_metrics.add_metric(**_metric)

    with timing() as t:
        my_metrics.serialize_metric_set()

    elapsed = t()
    if elapsed > METRICS_VALIDATION_SLA:
        pytest.fail(f"Meh, can't let @tgip-work, at least below 10ms we should do it...: {elapsed}")

tgip-work · 2021-03-03T16:44:53Z

Hi @heitorlessa : Yes, great work. This looks very promising! Happily looking forward for the next release :-)

heitorlessa · 2021-03-04T14:02:31Z

Turns out my test was wrong - I've converted from JSON Schema validation (~25ms) to Native Python (500-700usec), however, what was missing in the test was JSON Serialization.

Adding JSON Serialization and printing the metrics makes us fail the goal of <1ms as we go from microseconds to ~7ms in total.

IF we add ujson we easily reach <1ms, including other utilities that need JSON serialization like the upcoming idempotence will benefit from that, HOWEVER it'll increase the package size by 1M.

I honestly think it's a great tradeoff, and that even means customers could benefit from that too since that will be installed in their Lambda functions -- Allow me to consult with the rest of the team to check on that.

Worst case, we'll go from ~25ms to <10ms without having to even disable validation.

Thanks

tgip-work · 2021-03-04T14:18:03Z

Hi @heitorlessa,
have you tried serializing into compacted JSON: json.dumps(data, separators=(',', ':'))
This reduces the length of the generated string and should improve performance a little bit.

I am absolutely fine with reduced execution time if it is also possible to turn of validation as you already implemented as an option in your changes

heitorlessa · 2021-03-04T14:18:59Z

Actually no, lemme try that ;) thanks @tgip-work

heitorlessa · 2021-03-04T14:25:54Z

Yep not much, it reduces the len but the hog really is in the str conversion. I'm gonna stop chasing stars, commit what I've got, run in a function with low compute to see how much it actually brings besides perf tests, and swap with ujson to measure it loosely, since cProfile will lie here.

heitorlessa · 2021-03-04T14:43:37Z

hahaha this is getting micro of microoptimization... removing static method calls and dictionary lookup before validation I can get down to 0.9usec-1.08ms now.

I'll seriously stop, and commit this and try on the lowest function memory now :)

heitorlessa · 2021-03-04T15:42:24Z

Alright, here are the results with the lowest memory (128m) using the same 99 metrics -- @tgip-work @nmoutschen @michaelbrewer

I'm pretty happy with it, and will not proceed with ujson as the underlying machine Lambda uses doesn't seem to make much use tbh - it could be different in other scenarios perhaps. I'll adjust the test SLA to take into account GitHub CI slower machines but I'd consider it solved... let me know otherwise @tgip-work :)

BEFORE - Current public Powertools

Fastest Lambda execution according to X-Ray: 54ms

All virtual users finished
Summary report @ 16:09:49(+0100) 2021-03-04
Scenarios launched: 10
Scenarios completed: 10
Requests completed: 2000
Mean response/sec: 48.23
Response time (msec):
min: 124.1
max: 1212.6
median: 189.9
p95: 244.5
p99: 385.2
Scenario counts:
0: 10 (100%)
Codes:
200: 1999
502: 1

AFTER -- New Powertools with Lazy Loading and this PR optimization

Fastest Lambda execution according to X-Ray: 2.2ms

All virtual users finished
Summary report @ 16:13:18(+0100) 2021-03-04
Scenarios launched: 10
Scenarios completed: 10
Requests completed: 2000
Mean response/sec: 100.5
Response time (msec):
min: 63.1
max: 1128.9
median: 81.1
p95: 132.5
p99: 209
Scenario counts:
0: 10 (100%)
Codes:
200: 2000

AFTER - New optimizations + ujson

Fastest Lambda execution according to X-Ray: 2.6ms

All virtual users finished
Summary report @ 16:28:12(+0100) 2021-03-04
Scenarios launched: 10
Scenarios completed: 10
Requests completed: 2000
Mean response/sec: 105.71
Response time (msec):
min: 62.3
max: 525.1
median: 81
p95: 134
p99: 211.5
Scenario counts:
0: 10 (100%)
Codes:
200: 2000

Details

Load test with artillery using REST API + Lambda (could be even faster w/ HTTP API): artillery quick --count 10 -n 200 <endpoint>

Source code

from aws_lambda_powertools import Metrics

# https://awslabs.github.io/aws-lambda-powertools-python/#features
metrics = Metrics(namespace="perftest", service="perftest")


@metrics.log_metrics
def lambda_handler(event, context):
    for i in range(99):
        metrics.add_metric(name=f"metric_{i}", value=1, unit="Count")

    return {"statusCode": 200, "body": "hello world"}

heitorlessa · 2021-03-04T16:31:41Z

Merged :) It'll be available in the next release. Once #306 PR gets merged, we'll start drafting the next release ;)

Thanks a lot for raising this with us @tgip-work and for the tip on json separator, much appreciated!

heitorlessa · 2021-03-05T17:17:53Z

This is now available in 1.11.0 - Literally just launched ;) 🎉

tgip-work added bug Something isn't working triage Pending triage from maintainers labels Mar 3, 2021

heitorlessa added the area/metrics label Mar 3, 2021

heitorlessa added enhancement and removed bug Something isn't working triage Pending triage from maintainers labels Mar 3, 2021

heitorlessa self-assigned this Mar 3, 2021

heitorlessa mentioned this issue Mar 4, 2021

refactor(metrics): optimize validation and serialization #307

Merged

10 tasks

heitorlessa added the pending-release Fix or implementation already in dev waiting to be released label Mar 4, 2021

heitorlessa closed this as completed Mar 5, 2021

heitorlessa removed the pending-release Fix or implementation already in dev waiting to be released label Mar 5, 2021

dreamorosi added this to Powertools for AWS Lambda (Python) Jan 22, 2025

github-project-automation bot moved this to Triage in Powertools for AWS Lambda (Python) Jan 22, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Writing EMF metrics significantly increases lambda execution time #303

Writing EMF metrics significantly increases lambda execution time #303

tgip-work commented Mar 3, 2021

heitorlessa commented Mar 3, 2021

tgip-work commented Mar 3, 2021

heitorlessa commented Mar 3, 2021

heitorlessa commented Mar 3, 2021

tgip-work commented Mar 3, 2021 •

edited

Loading

heitorlessa commented Mar 4, 2021

tgip-work commented Mar 4, 2021 •

edited

Loading

heitorlessa commented Mar 4, 2021

heitorlessa commented Mar 4, 2021

heitorlessa commented Mar 4, 2021

heitorlessa commented Mar 4, 2021 •

edited

Loading

heitorlessa commented Mar 4, 2021

heitorlessa commented Mar 5, 2021

Writing EMF metrics significantly increases lambda execution time #303

Writing EMF metrics significantly increases lambda execution time #303

Comments

tgip-work commented Mar 3, 2021

Expected Behavior

Current Behavior

Possible Solution

Steps to Reproduce (for bugs)

Environment

heitorlessa commented Mar 3, 2021

tgip-work commented Mar 3, 2021

heitorlessa commented Mar 3, 2021

heitorlessa commented Mar 3, 2021

tgip-work commented Mar 3, 2021 • edited Loading

heitorlessa commented Mar 4, 2021

tgip-work commented Mar 4, 2021 • edited Loading

heitorlessa commented Mar 4, 2021

heitorlessa commented Mar 4, 2021

heitorlessa commented Mar 4, 2021

heitorlessa commented Mar 4, 2021 • edited Loading

BEFORE - Current public Powertools

AFTER -- New Powertools with Lazy Loading and this PR optimization

AFTER - New optimizations + ujson

heitorlessa commented Mar 4, 2021

heitorlessa commented Mar 5, 2021

tgip-work commented Mar 3, 2021 •

edited

Loading

tgip-work commented Mar 4, 2021 •

edited

Loading

heitorlessa commented Mar 4, 2021 •

edited

Loading