Skip to content

Writing EMF metrics significantly increases lambda execution time #303

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
tgip-work opened this issue Mar 3, 2021 · 13 comments
Closed

Writing EMF metrics significantly increases lambda execution time #303

tgip-work opened this issue Mar 3, 2021 · 13 comments
Assignees

Comments

@tgip-work
Copy link

I am using the metrics functionality of the Lambda Powertools for generating custom Cloudwatch metrics (via writing EMF records into Cloudwatch logs).
The overall execution time of our Lambda functions has significantly increased since we are using the Powertools.
With "significantly" I mean approx. 50 to 100ms per metric on a 128 MB lambda function.

Expected Behavior

Adding metrics should only increase the execution time of a Lambda function by < 1ms per metric.

Current Behavior

Each metric added increases execution time by approx. 50 to 100ms on a Lambda function running with 128MB of RAM.

Possible Solution

I drilled down the code and guess that the schema validation is causing the performance issue. See [code] (https://github.com/awslabs/aws-lambda-powertools-python/blob/develop/aws_lambda_powertools/metrics/base.py#L214).

Possible solutions:

  1. Offer option to disable/skip validation
  2. Precompile the schema and keep validation object in a global variable/scope so it can be reused - see https://horejsek.github.io/python-fastjsonschema/

Steps to Reproduce (for bugs)

from aws_lambda_powertools import Metrics
from aws_lambda_powertools.metrics import MetricUnit
metrics = Metrics(namespace="Company/Product", service="some-service")

@metrics.log_metrics(capture_cold_start_metric=True)
def lambda_handler(event, context):
    metrics.add_metric(name="Ok", unit=MetricUnit.Count, value=1)
    metrics.add_metric(name="Errors", unit=MetricUnit.Count, value=1)

Environment

  • Powertools version used: 1.10.5
  • Packaging format (Layers, PyPi): PyPi
  • AWS Lambda function runtime: Python 3.6 and also Python 3.8 with 128 MB RAM
  • Debugging logs

How to enable debug mode**

2021-03-03T11:28:05.105+01:00 START RequestId: 20607f86-e159-4f0e-a89b-7298f2ddbbed Version: $LATEST
[INFO]	2021-03-03T10:28:05.105Z		Found credentials in environment variables.
2021-03-03 10:28:05,229 aws_lambda_powertools.metrics.metrics [DEBUG] Decorator called with parameters
2021-03-03 10:28:05,275 aws_lambda_powertools.metrics.base [DEBUG] Adding metric: RecordsInserted with defaultdict(<class 'list'>, {'Unit': 'Count', 'Value': [1.0]})
2021-03-03 10:28:05,278 aws_lambda_powertools.metrics.metrics [DEBUG] Adding cold start metric and function_name dimension
2021-03-03 10:28:05,279 aws_lambda_powertools.metrics.base [DEBUG] Adding metric: ColdStart with defaultdict(<class 'list'>, {'Unit': 'Count', 'Value': [1.0]})
2021-03-03 10:28:05,279 aws_lambda_powertools.metrics.base [DEBUG] Adding dimension: function_name:my-function
2021-03-03 10:28:05,279 aws_lambda_powertools.metrics.base [DEBUG] Adding dimension: service:my-service
2021-03-03 10:28:05,279 aws_lambda_powertools.metrics.base [DEBUG] {'details': 'Serializing metrics', 'metrics': {'ColdStart': defaultdict(<class 'list'>, {'Unit': 'Count', 'Value': [1.0]})}, 'dimensions': {'function_name': 'my-function', 'service': 'my-service'}}
2021-03-03 10:28:05,279 aws_lambda_powertools.metrics.base [DEBUG] Validating serialized metrics against CloudWatch EMF schema
2021-03-03T11:28:05.398+01:00
{
    "_aws": {
        "Timestamp": 1614767285279,
        "CloudWatchMetrics": [
            {
                "Namespace": "Company/Product",
                "Dimensions": [
                    [
                        "function_name",
                        "service"
                    ]
                ],
                "Metrics": [
                    {
                        "Name": "ColdStart",
                        "Unit": "Count"
                    }
                ]
            }
        ]
    },
    "function_name": "my-function",
    "service": "my-service",
    "ColdStart": [
        1
    ]
}
2021-03-03 10:28:05,398 aws_lambda_powertools.metrics.base [DEBUG] {'details': 'Serializing metrics', 'metrics': {'RecordsInserted': defaultdict(<class 'list'>, {'Unit': 'Count', 'Value': [1.0]})}, 'dimensions': {'service': 'my-service'}}
2021-03-03 10:28:05,398 aws_lambda_powertools.metrics.base [DEBUG] Validating serialized metrics against CloudWatch EMF schema
2021-03-03 10:28:05,498 aws_lambda_powertools.metrics.metrics [DEBUG] Clearing out existing metric set from memory
2021-03-03T11:28:05.498+01:00
{
    "_aws": {
        "Timestamp": 1614767285398,
        "CloudWatchMetrics": [
            {
                "Namespace": "Company/Product",
                "Dimensions": [
                    [
                        "service"
                    ]
                ],
                "Metrics": [
                    {
                        "Name": "RecordsInserted",
                        "Unit": "Count"
                    }
                ]
            }
        ]
    },
    "service": "my-service",
    "RecordsInserted": [
        1
    ]
}
END RequestId: 20607f86-e159-4f0e-a89b-7298f2ddbbed
@tgip-work tgip-work added bug Something isn't working triage Pending triage from maintainers labels Mar 3, 2021
@heitorlessa
Copy link
Contributor

Hey @tgip-work - thanks for the detailed issue.

Question: Do you see the behaviour consistently in non-cold start executions?

Last I remember checking fastjsonschema the cold start validation and serialisation was roughly 25ms, and single digit thereafter (mostly JSON Serialisation).

If that however is happening even after cold start then it's a problem I'd like to prioritise to our next release (big one already).

Thanks

@tgip-work
Copy link
Author

Hi @heitorlessa ,

thanks for your quick reply.
This also happens on non-coldstart executions. It does not seem to matter if cold or warm.

@heitorlessa heitorlessa added enhancement and removed bug Something isn't working triage Pending triage from maintainers labels Mar 3, 2021
@heitorlessa
Copy link
Contributor

Understood, thank you! While I don't think we can get to <1 ms because of how JSON works in python, uJSON could but that would be another dep.... we can surely decrease this time with a few perf improvements.

I'll post an update this week on how far I get. Meantime, I've also finished this optimization that will be available in the next release on import times - Specially if you don't use Tracer (75% improvement).

#301 (comment)

@heitorlessa heitorlessa self-assigned this Mar 3, 2021
@heitorlessa
Copy link
Contributor

Initial perf test on my local machine (8 core, 32G RAM):

Scenario: Add and serialize 99 metrics, one dimension, and namespace

  • JSON Schema validation (as-is): 25ms
  • No validation: 100 microseconds
  • Native python validation: TBW

Fair to say we might be able reach <1ms indeed @tgip-work !! I'll experiment with native Python validation and keep you posted


@pytest.mark.perf
def test_metrics_validation_sla(namespace, dimension, ninety_nine_metrics):
    # GIVEN Metrics is initialized
    my_metrics = Metrics(namespace=namespace)
    my_metrics.add_dimension(**dimension)

    # WHEN we add 99 metrics
    for _metric in ninety_nine_metrics:
        my_metrics.add_metric(**_metric)

    with timing() as t:
        my_metrics.serialize_metric_set()

    elapsed = t()
    if elapsed > METRICS_VALIDATION_SLA:
        pytest.fail(f"Meh, can't let @tgip-work, at least below 10ms we should do it...: {elapsed}")

@tgip-work
Copy link
Author

tgip-work commented Mar 3, 2021

Hi @heitorlessa : Yes, great work. This looks very promising! Happily looking forward for the next release :-)

@heitorlessa
Copy link
Contributor

Turns out my test was wrong - I've converted from JSON Schema validation (~25ms) to Native Python (500-700usec), however, what was missing in the test was JSON Serialization.

Adding JSON Serialization and printing the metrics makes us fail the goal of <1ms as we go from microseconds to ~7ms in total.

IF we add ujson we easily reach <1ms, including other utilities that need JSON serialization like the upcoming idempotence will benefit from that, HOWEVER it'll increase the package size by 1M.

I honestly think it's a great tradeoff, and that even means customers could benefit from that too since that will be installed in their Lambda functions -- Allow me to consult with the rest of the team to check on that.

Worst case, we'll go from ~25ms to <10ms without having to even disable validation.

Thanks

@tgip-work
Copy link
Author

tgip-work commented Mar 4, 2021

Hi @heitorlessa,
have you tried serializing into compacted JSON: json.dumps(data, separators=(',', ':'))
This reduces the length of the generated string and should improve performance a little bit.

I am absolutely fine with reduced execution time if it is also possible to turn of validation as you already implemented as an option in your changes

@heitorlessa
Copy link
Contributor

Actually no, lemme try that ;) thanks @tgip-work

@heitorlessa
Copy link
Contributor

Yep not much, it reduces the len but the hog really is in the str conversion. I'm gonna stop chasing stars, commit what I've got, run in a function with low compute to see how much it actually brings besides perf tests, and swap with ujson to measure it loosely, since cProfile will lie here.

@heitorlessa
Copy link
Contributor

hahaha this is getting micro of microoptimization... removing static method calls and dictionary lookup before validation I can get down to 0.9usec-1.08ms now.

I'll seriously stop, and commit this and try on the lowest function memory now :)

@heitorlessa
Copy link
Contributor

heitorlessa commented Mar 4, 2021

Alright, here are the results with the lowest memory (128m) using the same 99 metrics -- @tgip-work @nmoutschen @michaelbrewer

I'm pretty happy with it, and will not proceed with ujson as the underlying machine Lambda uses doesn't seem to make much use tbh - it could be different in other scenarios perhaps. I'll adjust the test SLA to take into account GitHub CI slower machines but I'd consider it solved... let me know otherwise @tgip-work :)

BEFORE - Current public Powertools

Fastest Lambda execution according to X-Ray: 54ms

image

All virtual users finished
Summary report @ 16:09:49(+0100) 2021-03-04
Scenarios launched: 10
Scenarios completed: 10
Requests completed: 2000
Mean response/sec: 48.23
Response time (msec):
min: 124.1
max: 1212.6
median: 189.9
p95: 244.5
p99: 385.2
Scenario counts:
0: 10 (100%)
Codes:
200: 1999
502: 1


AFTER -- New Powertools with Lazy Loading and this PR optimization

Fastest Lambda execution according to X-Ray: 2.2ms

image

All virtual users finished
Summary report @ 16:13:18(+0100) 2021-03-04
Scenarios launched: 10
Scenarios completed: 10
Requests completed: 2000
Mean response/sec: 100.5
Response time (msec):
min: 63.1
max: 1128.9
median: 81.1
p95: 132.5
p99: 209
Scenario counts:
0: 10 (100%)
Codes:
200: 2000


AFTER - New optimizations + ujson

Fastest Lambda execution according to X-Ray: 2.6ms

image

All virtual users finished
Summary report @ 16:28:12(+0100) 2021-03-04
Scenarios launched: 10
Scenarios completed: 10
Requests completed: 2000
Mean response/sec: 105.71
Response time (msec):
min: 62.3
max: 525.1
median: 81
p95: 134
p99: 211.5
Scenario counts:
0: 10 (100%)
Codes:
200: 2000


Details

Load test with artillery using REST API + Lambda (could be even faster w/ HTTP API): artillery quick --count 10 -n 200 <endpoint>

Source code

from aws_lambda_powertools import Metrics

# https://awslabs.github.io/aws-lambda-powertools-python/#features
metrics = Metrics(namespace="perftest", service="perftest")


@metrics.log_metrics
def lambda_handler(event, context):
    for i in range(99):
        metrics.add_metric(name=f"metric_{i}", value=1, unit="Count")

    return {"statusCode": 200, "body": "hello world"}

@heitorlessa heitorlessa added the pending-release Fix or implementation already in dev waiting to be released label Mar 4, 2021
@heitorlessa
Copy link
Contributor

Merged :) It'll be available in the next release. Once #306 PR gets merged, we'll start drafting the next release ;)

Thanks a lot for raising this with us @tgip-work and for the tip on json separator, much appreciated!

@heitorlessa
Copy link
Contributor

This is now available in 1.11.0 - Literally just launched ;) 🎉

@heitorlessa heitorlessa removed the pending-release Fix or implementation already in dev waiting to be released label Mar 5, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
Development

No branches or pull requests

2 participants