Skip to content

Add support for MemoryDB/ElasticCache/Redis as Idempotency backend #1181

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
whardier opened this issue Aug 21, 2021 · 15 comments · Fixed by #2567
Closed

Add support for MemoryDB/ElasticCache/Redis as Idempotency backend #1181

whardier opened this issue Aug 21, 2021 · 15 comments · Fixed by #2567
Assignees
Labels
feature-request feature request idempotency Idempotency utility

Comments

@whardier
Copy link
Contributor

Updated topic to mention ElasticCache and general Redis as well.

https://aws.amazon.com/about-aws/whats-new/2021/08/amazon-memorydb-redis/

@gmcrocetti
Copy link
Contributor

I'm putting myself available to tackle this issue.

@heitorlessa
Copy link
Contributor

heitorlessa commented Sep 4, 2021 via email

@danielloader
Copy link

Isn't it meant to be the same client implementation for both?

@heitorlessa
Copy link
Contributor

heitorlessa commented Sep 4, 2021 via email

@heitorlessa
Copy link
Contributor

heitorlessa commented Sep 4, 2021 via email

@whardier
Copy link
Contributor Author

whardier commented Sep 4, 2021

Redis in general would be fairly rad to have available via an extra requirement. I wasn't entirely sure if a broad implementation in aws-lambda-powertools for redis support would ultimately support both ElasticCache and MemoryDB.

@whardier whardier changed the title Add support for MemoryDB as Idemopotency backend Add support for MemoryDB/ElasticCache/Redis as Idemopotency backend Sep 4, 2021
@gmcrocetti
Copy link
Contributor

@heitorlessa . Things have changed, won't have bandwidth to tackle this issue :/. I hope someone else will volunteer. Enjoy your time ! Obrigado 👍

@heitorlessa heitorlessa transferred this issue from aws-powertools/powertools-lambda-python Nov 13, 2021
@heitorlessa heitorlessa transferred this issue from aws-powertools/powertools-lambda Apr 28, 2022
@heitorlessa heitorlessa changed the title Add support for MemoryDB/ElasticCache/Redis as Idemopotency backend Add support for MemoryDB/ElasticCache/Redis as Idempotency backend Apr 28, 2022
@leandrodamascena
Copy link
Contributor

leandrodamascena commented Jun 30, 2022

Hi all!

I was looking at some issues and opportunities to start contributing to Lambda PowerTools and came across this issue. This implementation looks very interesting and adding a new backend to persist Idempotency data seems like a great idea. Even DynamoDB is a great lightweight option, not everyone wants to create a new table and/or manage a new resource on AWS.

I'll try to summarize some information before I start writing code and tests.

1 - As everyone mentioned, MemoryDB is fully compatible with the Redis protocol and we can confirm that here https://docs.aws.amazon.com/memorydb/latest/devguide/memorydb-guide.pdf.pdf (page 1)

2 - @heitorlessa I'm not sure if the https://github.com/Grokzen/redis-py-cluster library is a good choice at this point. It might be a good choice when this thread starts, but now they suggest migrating to the official RedisLabs library (https://github.com/redis/redis-py)

3 - The official library supports connection and operations on a single node or cluster. https://github.com/redis/redis-py#cluster-mode

4 - Even the Idempotency utility will write and read few keys in Redis, a performance test is indicated to see if it can impact somewhere. I'll take care of that and share the results as I write the code.

5 - I create a new class to test how we can implement this feature and it is already saving and getting keys from Redis. Of course it's missing a lot of things like tests, parameters, code optimization, comments and other things, but I think it can be a start and I can start working on this.

image
image
image
image

6 - I will try to cover most scenarios in the first version.

I know this project is frozen until next month I think. So there will be plenty of time to write code and test.

Thanks for reading and suggestions are welcome.

@heitorlessa
Copy link
Contributor

Quick update: @Vandita2020 is working on this

@leandrodamascena
Copy link
Contributor

leandrodamascena commented Jan 29, 2023

Hey all! To give visibility to everyone of the work that @Vandita2020 and I are doing so that everyone can follow and give feedback.

  • For this implementation, we are using the official Redis Library - https://pypi.org/project/redis/

  • Vandita created a code base and together we are modifying it to make it ready to open PR.

  • Redis supports three types of architecture in its deployment: Standalone, Cluster and Sentinel. Each one has its characteristics and restrictions, which is not the case to explain more here, but it is extremely important for us to decide the types of connections we are going to support. Currently the code is ready to support Standalone and Cluster, adding Sentinel brings some complexity in how to initiate the connection and select the host, so we'll be planning to add this at the final.

  • Redis supports several types of keys, including a HASH type key. We chose the HASH type key because it supports a collection of field value pairs. This key was chosen due to the characteristics of the Idempotency utility, where we need to have fields such as: status, data_response, in_progress.
    image

  • Redis supports the database index, which is a kind of namespace for information to be saved/stored between them. Our code already supports this.

  • Redis supports User ACL and we've added this support as well.

  • Below is a list of what we've done, what's missing and bugs. We already intend to open the PR as a draft this week so that everyone has access to the work done.

  • Support for Standalone Redis

  • Support for Redis Cluster

  • Support for Redis Sentinel

  • User ACL support

  • Database index support

  • Add, update, get and delete idempotency record

  • Tests

  • Documentation

  • Fix TTL logic

  • Review the get_record logic - It uses logic that only works with DynamoDB, but fails with everything else.

  • Improve docstring

We're making progress and hope to have this code ready for merge soon.

@leandrodamascena
Copy link
Contributor

I've been writing code to connect to Redis Sentinel and it changes parameters and options a lot, so I thought of a new UX to make everything simpler. The way it is programmed now is:

from aws_lambda_powertools.utilities.idempotency import (
       idempotent,
       RedisCachePersistenceLayer,
       IdempotencyConfig
)

persistence_layer = RedisCachePersistenceLayer(host="192.168.68.112", port="6379", user="xxx", password="xxxx", db_index=0, static_pk_value="test",...)

The way this would be more readable to create a connection would be:

Standalone

from aws_lambda_powertools.utilities.idempotency.redis.connection import RedisStandalone
from aws_lambda_powertools.utilities.idempotency import RedisCachePersistenceLayer

conn_config = RedisStandalone(host.. pass.. user)
persistence_layer = RedisCachePersistenceLayer(connection=conn_config, ...)

Cluster

from aws_lambda_powertools.utilities.idempotency.redis.connection import RedisCluster
from aws_lambda_powertools.utilities.idempotency import RedisCachePersistenceLayer

conn_config = RedisCluster(host... startup_nodes)
persistence_layer = RedisCachePersistenceLayer(connection=conn_config, ...)

Sentinel

from aws_lambda_powertools.utilities.idempotency.redis.connection import RedisSentinel
from aws_lambda_powertools.utilities.idempotency import RedisCachePersistenceLayer

conn_config = RedisSentinel(sentinels..socket…)
persistence_layer = RedisCachePersistenceLayer(connection=conn_config, ....)

Makes sense? I would like to hear feedback on this.

@heitorlessa
Copy link
Contributor

We're reviewing the last edge case on concurrency and docs tomorrow - hoping to get this released next week

Copy link
Contributor

⚠️COMMENT VISIBILITY WARNING⚠️

This issue is now closed. Please be mindful that future comments are hard for our team to see.

If you need more assistance, please either tag a team member or open a new issue that references this one.

If you wish to keep having a conversation with other community members under this issue feel free to do so.

@github-actions github-actions bot added the pending-release Fix or implementation already in dev waiting to be released label Jan 10, 2024
Copy link
Contributor

This is now released under 2.32.0 version!

@github-actions github-actions bot removed the pending-release Fix or implementation already in dev waiting to be released label Jan 19, 2024
@leandrodamascena leandrodamascena moved this from Coming soon to Shipped in Powertools for AWS Lambda (Python) Jan 23, 2024
@leandrodamascena
Copy link
Contributor

To enable other runtimes and customers to benefit from the research and implementation carried out in this pull request, I have provided the Redis implementation text below.

Many thanks to @roger-zhangg for the partnership and deep dive into this extensive and fantastic work, which undoubtedly raised the bar for this project!

API Design

We kept the same user experience when switching from DynamoDB to Redis. This is very important because one of the core principles of Powertools is to ensure the developer experience is as seamless as possible.

By keeping the persistence abstraction layer interchangeable between DynamoDB and Redis, we enabled a smooth transition that required minimal code changes for developers. The interfaces remained consistent, reducing friction when migrating the data storage technology.

image

Connection

In the initial version, we provided a dedicated Connection Class to assist our customers in creating Redis connections. This class wrapped the Redis client, both standalone and cluster, enabling customers to provide their Redis connection details (host, port, passwords) for establishing connections. Once the connection was established, this class could be passed to the Idempotency Layer for use. However, this design had a few significant drawbacks. Firstly, it was challenging for this design to support Redis sentinel connections, as sentinel connections are set up differently from standalone and cluster connections. Secondly, the default Redis client had more than 40 parameters. In our connection design, we chose to support only the most commonly used ones, passing all other parameters using **kwargs directly to the wrapped Redis Client. This could result in a less-than-optimal experience for our customers, as they would need to figure out the parameter names without IDE typing hints when passing parameters using **kwargs.

Keeping these considerations in mind, we drafted the second version of the Redis Connection. We made a few changes to the logic in the Idempotency Layer class so that it can now accept an established Redis client. With this design, customers can pass in any Redis client they prefer, as long as it adheres to the schema defined in the protocol class. Additionally, using the original Redis clients allows customers to leverage their prior experience with Redis and easily transfer and adapt their existing code. However, after some discussions, we concluded that this design may not be user-friendly for individuals without prior Redis experience. Therefore, we believe it's still beneficial to provide a helper class for creating connections to assist such users.

In the third and final design, we have opted to implement a helper connection class that assists customers in creating Redis connections with only the most commonly used Redis parameters (host, port, username, password, db_index, url, ssl). Simultaneously, we enable our customers to bring their own Redis connection if they prefer. One common use case, for example, is when customers want to use Redis with their certificates. This added flexibility allows them to establish secure Redis connections using their custom certificates while benefiting from the simplified connection setup provided by our helper class.

Example

from redis import Redis

from aws_lambda_powertools.utilities.idempotency.persistence.redis import (
    RedisCachePersistenceLayer,
)

client = Redis(
    host="host",
    port=6379,
    ssl=True,
    ssl_certfile=ssl_certfile,
    ssl_keyfile=ssl_keyfile,
    ssl_cert_reqs="required",
    ssl_ca_certs=ssl_ca_certs,
)

persistence_layer = RedisCachePersistenceLayer(client=client)

Orphan Records

Each idempotency record includes attributes such as expire_time, and inprogress_expire_time. These records should be automatically deleted when the current time reaches expire_time in the "completed" status or inprogress_expire_time in the "in_progress" status.

However, due to factors like Lambda handler timeouts, exceptions, or potential Redis expiration issues, there may be instances where idempotency records persist in Redis even after the current time has exceeded inprogress_expire_time while the record status is still "in_progress," or the current time has surpassed expire_time while the record status is "completed." We refer to these invalid records as "Orphan Records."

It's important to note that the method we implement to address these orphan records must be executed with caution to avoid potential race conditions. Further details on this issue will be elaborated upon in the following two paragraphs.

Redis HSET vs SET

In the idempotency workflow, we need to store idempotency records with multiple attributes in Redis. This can be achieved by using HSET with the idempotency key followed by attributes. Alternatively, we can encapsulate the idempotency key and all its attributes into a JSON format and employ SET to store the entire JSON structure.

In the initial design, we employed HSET to set idempotency records. HSET offers several advantages over SET: hash lookups are typically faster, and HSET allows us to set multiple attributes using the same hash key. This aligns perfectly with the requirements of idempotency records, as we need to store the idempotency key, expire_time, in_progress_expire_time, status, and payload under a single key. Thus, the utilization of HSET for storing idempotency records appears to be an optimal choice for this project.

However, there are two major drawbacks using this method.

  • Firstly, SET allows us to set an expiration time when creating the record, but HSET doesn't offer this capability. To set an expiration time, we must make an additional Redis call using EXPIRE for the respective key after using HSET. This could lead to reduced performance since it requires sending two Redis calls, significantly increasing latency due to the double Round Trip Time (RTT) to Redis. This also introduces the potential for orphan records. For example, if the Lambda handler times out immediately after creating the Idempotency record but before setting the expiration time, the record will become an orphan record as it won't automatically expire in Redis. One way to optimize this issue is to use Redis commands like a pipeline to combine HSET and EXPIRE into a single Redis request. However, this workaround also increases the complexity of the code.

  • Secondly, SET allows us to use the "nx" (non-exist) parameter to write only on keys that do not exist. However, HSET does not support this "nx" parameter. "nx" plays a crucial role in preventing race conditions, which will be explained in the following section on Redis Race Conditions.

During our experiments, we used pipelines with HSET to reduce the Round Trip Time (RTT). However, pipelines introduce complexity that can make code writing and testing more challenging:

  • Batch Logic - Pipelines do not execute commands immediately, requiring the inclusion of batch logic to group related operations and ensure they are executed together. This complexity adds to the application code.
  • Error Handling - Errors occur asynchronously after pipelined commands are sent, making error handling more challenging compared to synchronous execution.
  • Testing Complexity - Writing isolated unit tests that assert on command outcomes becomes more challenging, as test code must handle asynchronous results. Additionally, race conditions can occur more easily.
  • Debugging Challenges - In the event of a pipeline failure, it becomes more challenging to identify which specific command in the batch caused the issues. Extra logging must be implemented.
  • Transactional Integrity - Pipelines do not offer transactional integrity. If commands require transactions, additional logic is necessary to manually implement this on top of pipelining.
  • Compability - Some versions of Redis Cluster doesn't supports pipeline in transactional mode like pipe = r.pipeline(transaction=True)

Due to HSET lacking support for "EX" (expiration time) and "NX" (non-exist), and the challenges associated with pipelines, we have made the decision to transition to using SET in our final design.

Redis Race condition

There are two potential race conditions in the Redis Idempotency workflow. Although the probability of these race conditions occurring is low, if not addressed effectively, certain payloads may bypass the idempotency layer. This could lead to the underlying Lambda handler executing more than once for identical payloads, which is an undesirable scenario for our customers when utilizing our idempotency layer.

The first potential race condition occurs when two Lambda handlers simultaneously perform SET/HSET operations without using 'nx' on a non-existent record. Consider a scenario where, at the same time, two lambda handlers with the same payload both use GET or HGET to verify if a particular record is empty. While they both assume the record is empty, they both decide to write to the record using SET/HSET, resulting in the record being updated twice. Eventually, both Lambda handlers with the same payload pass the idempotency check and proceed to the execution stage. In this case, the idempotency layer would fails to prevent the handler from running twice on the same payload. See graph below.

image

This race condition can be resolved by adding nx=True when using SET. Once we apply nx=True to SET, even if two handlers execute SET at the exact same time, only one of them will succeed, and the other handler will recognize that the record exists and return accordingly.

The second potential race condition is similar to the first one but more complicated. This scenario occurs when two lambda handers are both trying to fix the same orphan record they found at the same time. In our idempotency workflow, if a Lambda handler encounters an existing idempotency record and identifies it as a corrupted or orphaned record, it proceeds to overwrite it with a valid record. However, in this case, we are attempting to overwrite a record in Redis, so we cannot use the nx=True flag in SET as we are modifying an existing record. Consequently, this scenario introduces the possibility of a race condition where two SET operations are executed simultaneously by two different Lambda handlers, potentially resulting in the record being updated twice, and both lambda handlers advancing to the execution stage. This is a situation that the idempotency mechanism should prevent. One solution for this is to use SET on a key with nx to serve as a lock, ensuring that only the handler that successfully acquires the lock proceeds to update the orphan record. See graph below showing each scenario:

without lock

image

with lock

image

Tests

We had challenges while writing tests to test the Redis interface and functionality, primarily because testing the Protocol without establishing a real connection is a little bit hard. Our initial approach was create an integration testing by running Redis locally in Docker containers, but spinning up container environments locally has some downsides, such as:

  • Slow - Launching containers locally introduced overhead and slowed down test execution.
  • Fragile - Tests became vulnerable to failures if containers didn't start correctly or if the local environment wasn't properly configured to support containers.
  • Resource intensive - Containers consume substantial amounts of RAM and CPU on developers' local machines.
  • Docker Registry Issues - Occasionally, downloading a container from Docker registries encountered failures.

Our solution was to shift towards more functional testing by injecting a fake "Redis client" class instead of using a real Redis connection. This approach offers the following advantages:

  • Speed - There is no spin-up time, resulting in significantly faster test execution.
  • Isolation - Tests concentrate on the application code logic without relying on external dependencies.
  • Lightweight - There's no requirement to install and configure containers on local machines.

Thanks

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
feature-request feature request idempotency Idempotency utility
Projects
Status: Shipped
Development

Successfully merging a pull request may close this issue.

7 participants