Skip to content

Feature request: Concurrency control (avoidance of duplicated runs) #2023

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
2 tasks done
pmarko1711 opened this issue Mar 17, 2023 · 11 comments
Closed
2 tasks done

Feature request: Concurrency control (avoidance of duplicated runs) #2023

pmarko1711 opened this issue Mar 17, 2023 · 11 comments
Assignees
Labels
feature-request feature request

Comments

@pmarko1711
Copy link
Contributor

pmarko1711 commented Mar 17, 2023

Use case

Sometimes, I need to limit concurrency of a lambda function on the level of its payload. Concurrent invocations are fine but only if they have different payloads.

Background:

I already faced this need in two of my projects:

  • The solution I came up with in my first project was very similar to what's behind Powertools' Idempotency - I created a decorator that before running my handler placed a lock item (with a ttl set) in a dynamodb table. The same decorator then released the lock after the handler finished successfully. In my case, I also wrote to the table any caught exceptions and "dynamic" parameters that I wanted to make available to subsequent invocations (with the same payload). This was before I was aware of Idempotency
  • In my second project I needed the same kind of concurrency control, but without a need to preserve additional info in the underlying table. This time, I considered using Idempotency, but I couldn't as it always preserves function's output in its persistent storage layer (e.g. dynamodb, I believe here). It also does not release the cache/lock on success. I understand why - it offers idempotency, not a concurrency control. Still, it felt like I could (mis)use it for my usecase as well. Not wanting to hack it, I created another decorator ...

Solution/User Experience

Implement a decorator similar to @idempotent that would:

  • prevent concurrent executions of a given lambda handler with the same payload making use of a lock placed in a dynamodb table
  • decorator would release the lock on exceptions, but also if the decorated handler finishes successfully
  • to safeguard against lambda timeouts it'd also set a ttl attribute (should correspond to the lambda function timeout)
  • in contrast to @idempotent, it would not cache result and again, it'd release the lock on success

Similarly to @idempotent, payload should be either the entire event or its subset.

Optionally

  • decorator could (perhaps optionally) write any exceptions caught to the underlying dynamodb table
  • functionality to preserve parameters in the underlying dynamodb table, so that they are available to subsequent executions of the same lambda function with the same payload

Alternative solutions

Modify the @idempotent decorator to allow for switching off caching and releasing the lock on success. However, then the decorator can hardly be called idempotent ...

Acknowledgment

@pmarko1711 pmarko1711 added feature-request feature request triage Pending triage from maintainers labels Mar 17, 2023
@heitorlessa heitorlessa added need-customer-feedback Requires more customers feedback before making or revisiting a decision and removed triage Pending triage from maintainers labels Mar 20, 2023
@heitorlessa
Copy link
Contributor

hey @pmarko1711 - thank you for a great feature request (details), I have an initial question: Have you considered Step Functions instead?

I've seen this a number of times where a Semaphore pattern would work out great, but instead done in an upstream component - a workflow of sorts.

I can guess a few reasons why you'd want to do in Lambda instead, so I'd love to hear more from you on why inside Lambda runtime vs an orchestration service like Step Functions.

Hope that makes sense

@heitorlessa heitorlessa added the need-more-information Pending information to continue label Mar 20, 2023
@pmarko1711
Copy link
Contributor Author

pmarko1711 commented Mar 20, 2023

Hi @heitorlessa , many thanks for replying. To be honest, I did not notice this pattern, so many thanks for sharing.

Indeed, it seems doable this way. Nevertheless; I still feel that in my case where I wanted to limit concurrency to 1 concurrent run (of the same payload of the same lambda; while allowing concurrent runs if different payload), it's just much easier with a decorator. Once I have it defined and I share it in my projects, activating such a concurrency control is just adding one or two lines of code. And I don't need to understand state machines (although those are arguably easier than a custom decorator :) )

Btw, my current logic looks this way:

from collections import namedtuple
from aws_lambda_powertools.middleware_factory import lambda_handler_decorator

MutexSettings = namedtuple(
    "MutexSettings",
    (
        "dyndb_resource", "dyndb_table",
        "attribute_partition_key", "attribute_sort_key", "attribute_ttl", "ttl",
        "attribute_mutex",  # (optional) name of the mutex attribute
        "mutex_locked",  # (optional) value to indicate a locked mutex
        "mutex_unlocked",  # (optional) value to indicate an unlocked mutex
        "suffix_partition_key",  # (optional) suffix to add to the partition key values for mutex items
        "suffix_sort_key",  # (optional) suffix to add to the sort key values for mutex items
    ),
    defaults=(
        "mutex",
        "LOCKED",
        "UNLOCKED",
        "",
        "",
    ),  # defaults are applied to the rightmost parameters
)


@lambda_handler_decorator()
def add_mutex(handler, event, context):

    # MUTEX_SETTINGS - MutexSettings  namedtuple instance 
    # here I also parse the event to get my partition_key and sort_key for which I want to
    # to set the lock
    # this could be replaced with something like idempontency_key

    # try locking (this raises an exception if already locked)
    # my own function that does a conditional update_item, setting a ttl item property
    acquire_mutex(partition_key, sort_key, MUTEX_SETTINGS)

    # run the handler and release the lock when the handler
    # - succeeds, but also when it fails
    try:
        return_val = handler(event, context)
        # my own function that releases the lock
        release_mutex(partition_key, sort_key, MUTEX_SETTINGS)
        return return_val

    except Exception as err:
        # my own function that releases the lock
        release_mutex(partition_key, sort_key, MUTEX_SETTINGS)
        raise err

@lambda_handler_decorator
def lambda_handler(event, context):
    pass

I can also provide my acquire_mutex, release_mutex (those are a bit larger), but they just do what I indicate in the comments above considering the settings object passed. And I was thinking to maybe make the decorator parametrizable, but this needs to be done smartly.

@heitorlessa
Copy link
Contributor

heitorlessa commented Mar 22, 2023

Activating such a concurrency control is just adding one or two lines of code. And I don't need to understand state machines

yeah I figured that was the reason. It's about trade-off at the end. In simplistic terms: one one hand, this makes Lambda responsible for many things and may go out of sync if that spans to other functions, while delegating to an orchestrator requires domain knowledge and additional failure modes to be aware of (but solves semantics, concurrent invocations etc).

I'd love to hear whether this is useful for other customers too (I'll ping on Discord), so we can prioritize it.

If anything, I'd love to find a better umbrella for this feature, so we could incrementally improve operational posture for customers - think retry mechanisms (a thin wrapper for tenacity), a circuit breaker, and whatnot. If we work on a Concurrency Control utility, I'm unsure whether these falls under that too to ease discoverability.

I don't have any answers for now, just thinking out and loud.

@heitorlessa heitorlessa removed the need-more-information Pending information to continue label Mar 22, 2023
@heitorlessa
Copy link
Contributor

@pmarko1711 I'd love to have a call with you to learn a bit more your mutex use - I think there's a pattern across all new feature requests, RFCs, and customer interviews we've done recently... a call will be quicker than writing a narrative.

If you're interested and free next week and onwards, please ping us an email and we'll reply with a link to schedule a meeting on your earliest convenience: [email protected]

Thank you!!

@walmsles
Copy link
Contributor

🤔 hmmmm this seems to be a FIFO based requirement. Could FIFO SQS be used as a buffer here with message group id customised to group like messages? That way can also get batch execution safely within the lambda concurrency for efficiencies.

Although it also depends on other constraints and requirements regarding ordering of event processing, once you start using virtual FIFO queues ordering outside of groups goes away.

FIFO SQS message group id docs

I think we always need to look for the infrastructure to drive our code rather than add more actual code.

@pmarko1711
Copy link
Contributor Author

pmarko1711 commented Mar 24, 2023

Well, I'm not sure one can rely on SQS FIFO queues alone:

  • (SQS FIFO) offers message deduplication, but only for 5 minute intervals. So a message triggers a Lambda that runs for 7 minutes and after 6 minutes the same message is sent to the queue, the function will be triggered again.
  • It offers guarantee exactly-once processing, not exactly-once delivery. There are edge cases where a message can be delivered multiple times (and in my case thus trigger my lambda function).

https://ably.com/blog/sqs-fifo-queues-message-ordering-and-exactly-once-processing-guaranteed
https://www.spektor.dev/is-sqs-really-exactly-once/

It gets complex, I don't fully understand it, but what I get out of it is that if I want exactly once execution of my lambda function (with a given payload), I need to implement it in my lambda function.

(The group ids would help me to put the same payloads together, but I'd still be exposed to these problems.)

Please let me know if I'm wrong.

@heitorlessa
Copy link
Contributor

Peter, quickly sharing the example we just discussed on the call to satisfy processing the same payload after X time (e.g., pre-aggregation use case) while limiting concurrency: https://github.com/awslabs/aws-lambda-powertools-python/blob/develop/tests/e2e/idempotency/handlers/ttl_cache_timeout_handler.py#L12

Contrary to DynamoDB TTL that is primarily used to remove DynamoDB items to save on cost + other use cases like archiving, Idempotency expires_after_seconds is used to signal when an existing COMPLETED should be updated to EXPIRED and completely ignored (allow the incoming transaction and record its output).

As for the output, whatever output your function returns we'll use it for later. But in your case, you can simply return an empty string to save on DynamoDB costs, since you're not gonna use it either way -- https://awslabs.github.io/aws-lambda-powertools-python/2.10.0/utilities/idempotency/#idempotent_function-decorator

@heitorlessa heitorlessa self-assigned this Mar 30, 2023
@heitorlessa heitorlessa removed the need-customer-feedback Requires more customers feedback before making or revisiting a decision label Mar 30, 2023
@heitorlessa
Copy link
Contributor

Took me longer than expected but I've just completed a documentation revamp in Idempotency to prevent this confusion from happening again - pending @leandrodamascena review.

What did I change?

I've added a new terminology that was missing - Idempotency Record. While it's internal to us, it's a key information to later address the confusion between our expiration validation mechanism vs DynamoDB TTL (that deletes the actual record within 48 hours of expiration).

I've added new and revamped existing sequence diagrams to explain how this feature works in different scenarios. For this issue, the key one is the new Expired idempotency records - it walks through how we handle a record that is not yet removed from a persistence layer but it is already expired based on timestamps.

I've created an explicit documentation banner to call out the differences between Idempotency Record expiration vs DynamoDB TTL.

In tandem, this could've saved Peter a few days. I'll close this issue once the PR is merged as this feature will no longer be needed (e.g., expiry_after_seconds=10 with an empty string return "" has the same effect here).

Thank you @pmarko1711 for your patience and more importantly for taking the time to meet with me earlier this week to clarify this - hope you like this improvement.


New and revamped sequence diagrams depicting Idempotency Record Status too

image

New Idempotency Record terminology

image

Address confusion between Idempotency Record expiration vs DynamoDB time-to-live (TTL)

image

@github-actions github-actions bot added the pending-release Fix or implementation already in dev waiting to be released label Mar 31, 2023
@heitorlessa
Copy link
Contributor

Latest docs are now rebuilt with these changes - closing, please feel free to reopen if that didn't work, Peter!

Once again, thank you for jumping on a call to clarify the need, and help us improve the documentation for everyone.


@github-actions
Copy link
Contributor

github-actions bot commented Apr 3, 2023

⚠️COMMENT VISIBILITY WARNING⚠️

This issue is now closed. Please be mindful that future comments are hard for our team to see.

If you need more assistance, please either tag a team member or open a new issue that references this one.

If you wish to keep having a conversation with other community members under this issue feel free to do so.

@github-actions
Copy link
Contributor

github-actions bot commented Apr 7, 2023

This is now released under 2.12.0 version!

@github-actions github-actions bot removed the pending-release Fix or implementation already in dev waiting to be released label Apr 7, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
feature-request feature request
Projects
None yet
Development

No branches or pull requests

3 participants