Skip to content

Feature request: support for Redis in Idempotency #3183

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
1 of 2 tasks
dreamorosi opened this issue Oct 10, 2024 · 12 comments
Open
1 of 2 tasks

Feature request: support for Redis in Idempotency #3183

dreamorosi opened this issue Oct 10, 2024 · 12 comments
Assignees
Labels
discussing The issue needs to be discussed, elaborated, or refined feature-request This item refers to a feature request for an existing or new utility idempotency This item relates to the Idempotency Utility need-customer-feedback Requires more customers feedback before making or revisiting a decision

Comments

@dreamorosi
Copy link
Contributor

Use case

The Idempotency utility currently supports only DynamoDB as persistence layer.

With AWS announcing Amazon ElastiCache for Valkey, we would like to understand if there's demand for the Idempotency utility in Powertools for AWS Lambda (TypeScript) supporting Redis-compatible persistence layers.

Important

We are opening this issue to gauge demand for this feature. If you're interested please leave a 👍 under this issue. If you'd like, consider also leaving a comment with your use case. If you are not comfortable sharing details in public, you can also do so by emailing us at [email protected] with your work email.

Solution/User Experience

From a customer perspective, using ElastiCache as persistence layer should be as transparent as possible and the DX should look the same as today except that instead of instantiating a DynamoDBPersistenceLayer, you'd be instantiating an ElastiCachePersistenceLayer (Name TBD).

Below a high level example of how it'd look like:

import { randomUUID } from 'node:crypto';
import { makeIdempotent } from '@aws-lambda-powertools/idempotency';
import { ElastiCachePersistenceLayer } from '@aws-lambda-powertools/idempotency/elasticache';
import type { Context } from 'aws-lambda';
import type { Request, Response, SubscriptionResult } from './types.js';

const persistenceStore = new ElastiCachePersistenceLayer({
  url: 'redis://<cache-name>.serverless.<region-id>.cache.amazonaws.com:6379',
  // password: presignedUrl, (optional) - default is RBAC with serverless mode
  // username: 'default', (optional) - default is RBAC with serverless mode
  // clientConfig: {} (optional) - object to configure underlying client
});

const createSubscriptionPayment = async (
  event: Request
): Promise<SubscriptionResult> => {
  // ... create payment
  return {
    id: randomUUID(),
    productId: event.productId,
  };
};

export const handler = makeIdempotent(
  async (event: Request, _context: Context): Promise<Response> => {
    try {
      const payment = await createSubscriptionPayment(event);

      return {
        paymentId: payment.id,
        message: 'success',
        statusCode: 200,
      };
    } catch (error) {
      throw new Error('Error creating payment');
    }
  },
  {
    persistenceStore,
  }
);

Note

The API shown above is just for illustration purposes and might be different in the final implementation. We however welcome comments and feedback if you have any.

Alternative solutions

The feature is already available in Powertools for AWS Lambda (Python), so we should use that as reference.

Acknowledgment

Future readers

Please react with 👍 and your use case to help us understand customer demand.

@dreamorosi dreamorosi added discussing The issue needs to be discussed, elaborated, or refined feature-request This item refers to a feature request for an existing or new utility idempotency This item relates to the Idempotency Utility need-customer-feedback Requires more customers feedback before making or revisiting a decision labels Oct 10, 2024
@dreamorosi dreamorosi pinned this issue Oct 10, 2024
@dreamorosi dreamorosi changed the title Feedback wanted: support for ElastiCache in Idempotency Feedback wanted: support for Redis in Idempotency Oct 10, 2024
@dreamorosi dreamorosi changed the title Feedback wanted: support for Redis in Idempotency Feature request: support for Redis in Idempotency Feb 10, 2025
@arnabrahman
Copy link
Contributor

arnabrahman commented Apr 9, 2025

Sounds interesting. Since powertools-python has it, typescript should have this feature too. I'm interested in contributing @dreamorosi

@dreamorosi
Copy link
Contributor Author

Hi @arnabrahman, nice to see you again here - thank you for offering to help!

I have to admit that I am not very familiar with Redis myself, so please bear with me.

Based on my understanding, this is where the Python implementation is.

There are a few items I'd like to discuss/mention before we move on the implementation:

  • in terms of naming, we should probably name the persistence layer the same as them, so RedisCachePersistenceLayer instead of what I had put in the code examples above
  • we need to choose a client library that we'll use under the hood - I'd like us to evaluate options, so if you could propose something it'd be great. We're looking for something 1/ widely used, 2/ maintained and preferably with defined ownership/release processes, 3/ optimized for Node.js, 4/ performant & not too bloated
  • the Python implementation has a concept of lock here that we don't have in the DynamoDB persistence layer - unsure why's that
  • the Python implementation has a class called RedisClientProtocol here unsure if we need it as well

I have assigned the issue to you, please take a look at the reference implementation if you haven't and let us know if you have any questions. Once we have addressed the points above I think we can start the implementation.

Also, if you think it's useful before or during the implementation - we're happy to jump on a call and discuss this issue at any point, especially @leandrodamascena who has worked on the Python implementation.

Thanks again, this is exciting!

@dreamorosi dreamorosi moved this from Ideas to Backlog in Powertools for AWS Lambda (TypeScript) Apr 9, 2025
@leandrodamascena
Copy link
Contributor

leandrodamascena commented Apr 9, 2025

Hi @arnabrahman and @dreamorosi! This is super nice we will add support for Redis in TS. Let me share some ideas/challenges I had while implementing Python. I may be repetitive at some points as Andrea has already shared.

1/ We allow customers to create a new instance of RedisCachePersistenceLayer by passing basic (most used) parameters like username, password, db_index and others, but it is true that Redis has some complex connection configurations (ssl, retries, timeout and others) and we cannot predict all of them, so we allow customers to bring a pre-built Redis client.

2/ All the idempotency logic must be handled by Idempotency classes as it currently happens with DynamoDB, this new class only serves to save and retrieve the record in Redis.

3/ This implementation should support both standalone Redis connections and Redis clusters. In theory, you only need to change a few things during the connection; the underlying commands remain the same. We are planning to add support for Sentinel clients, but we haven't heard any customer demand yet.

4/ Serverless Cache as a Service is a new trend in the Serverless market, so the library/client to be used must implement the RESP protocol to be compatible with any type of service that implements Redis. I want to evaluate AWS Valkey Glide client to use in Python, but I didn't have time yet. The official Redis Client for node is a good option.

5/ In Python, we are not forcing the Redis version, but it would be interesting to see if we can enforce Redis 7+ for performance reasons. This is not mandatory, just a tip.

6/ In the first implementation (during PoC), we considered using pipelines to handle multiple commands and reduce round-trip time (RTT) to optimize Redis data/network/connection exchange, but we opted out and are using set and delete commands to perform single operations. We could use hset to set the data as an array – especially for payload return – but since we are not handling the request/response on behalf of the customer, we serialize this data and insert it into a single item. The same happens in DynamoDB.

7/ To implement atomic operations or optimistic locking in Redis, Lua scripts are required. Redis does not natively support optimistic locking without Lua scripts. While Lua scripts provide atomic execution by running all commands within the script as a single operation, they are restricted in some managed services or may be disallowed due to security policies. This limitation can be a blocker for adoption by clients who are not authorized to use Lua scripts.

To address concurrency challenges, such as those arising from simultaneous transactions in environments like AWS Lambda, we wrote a lock acquisition mechanism to ensure execution uniqueness and prevent race conditions. This approach avoids the need for Lua scripts and relies on native Redis commands like SET with the nx flag for mutual exclusion.
I have to be honest: I have a feeling that in very specific race condition scenarios with super high-concurrency this solution might fail, but this is just an impression and I don't have real data. I tested with 10k concurrent Lambda executions and got no errors.

As @dreamorosi said: I'm happy to connect if you need any help.

@arnabrahman
Copy link
Contributor

Thanks a lot @leandrodamascena and @dreamorosi, really appreciate the thoughtful responses and solid starting points. I’ll dig into these and share an update once I’ve made some headway.

@asafpamzn
Copy link

Hi @leandrodamascena, @dreamorosi , @arnabrahman ,

I'm a valkey-glide maintainer and have been part of the ElastiCache team for the past 5 years. I have extensive knowledge of Redis/Valkey best practices and would be happy to help, design/coding whatever is needed.

Valkey-glide was designed to be a robust client for Valkey and Redis while minimizing downtime. The idea was to create a robust core written in Rust with thin wrappers for various programming languages. Currently, we support Python, Java, Node.js, and Go, with .NET, Ruby, and C++ support in development. The API and behavior are consistent across languages, so if you have a working version in Python, it will work in Node.js as well.

I would be happy to help with this migration. Perhaps we could schedule a quick call next week to meet (which would be nice) and share knowledge to determine the fastest and most appropriate way to move forward. While I'm not very familiar with this package or Lambda functions, I can share my expertise regarding Valkey/Redis clients and Redis/Valkey databases.

@arnabrahman
Copy link
Contributor

arnabrahman commented Apr 13, 2025

Ok, I had an initial look at the Python implementation and thanks to the clean nature of the code and well-described comments, I think i understand the high-level flow. I’ll go over some of the points that @dreamorosi mentioned:

  • I agree that the class name should be RedisCachePersistenceLayer to stay consistent with the Python implementation.

  • Regarding the client libraries, it looks like there are a few options for Redis clients. I’ll go through them one by one:

    • Valkey-Glide: This is the recommended library in the AWS ElastiCache console. It's also supported by AWS. It supports Valkey 7.2+ and Redis OSS versions 6.2, 7.0, 7.1, and 7.2. Redis 8 is not supported. It supports both standalone and cluster modes, but I didn’t see any mention of sentinel support in the docs. @asafpamzn, correct me if i am wrong.

    • node-redis: This is the official Redis client. However, it currently lacks sentinel support, this is planned for their next major release v5. (reference).

    • ioredis: This is the library used in the example code for Node.js from AWS ElastiCache console. The library was acquired by Redis (reference) and is still maintained. That being said, the package description recommends using node-redis for new projects. However, ioredis does support sentinel mode along with other modes.

      My personal opinion is that we should use valkey-glide, since it’s now the recommended library for connecting to ElastiCache by AWS. However, if a customer is using Redis 8, the library might not be compatible. That said, if we’re only targeting AWS ElastiCache, we might not need to worry about Redis 8, as it's currently only possible to create Redis OSS versions through the console.

      We also need to consider the lack of support for sentinel mode although the official redis-client also does not have this mode either.

  • Given the various ways to connect to Redis, we should allow customers to bring their own client similar to python implementation. So, an approach similar to RedisClientProtocol would be needed. If we decide to use valkey-glide, this can be an option for customers to use redis8 & sentinel mode.

  • As for the lock, @leandrodamascena already explained why it’s needed and we should keep it. I found this [diagram] visualizing how the lock works from the Python implementation.

Let me know what you guys think of this.

@asafpamzn
Copy link

Aws elastiache supports valkey 8.0. Aws elasticache does not support sentinel. We will be able to add missing featurs to valkey-glide, There is also cooperation with gcp and we work together to make valkey-glide better and better. See the dev pace at the repo. I recommend to use cluster mode but to be honest I don't fully understand your requirements.
I will be happy to help.

@dreamorosi
Copy link
Contributor Author

Thank you both, especially @arnabrahman for the comparison.

I would not worry about Sentinel at this stage since ElastiCache doesn't support it.

Regarding the client library selection, based on the above I would automatically exclude ioredis.

I went ahead and made some very basic tests and I have a couple additional considerations that are important for this project regarding the other two libraries.

CommonJS / ES Modules support

The @valkey/valkey-glide library seems to work with both functions built using CJS and ESM, while the @redis/client one only works with CJS unless you add a import { createRequire } from 'module';const require = createRequire(import.meta.url); banner to your function bundle.

This is not a huge deal since we do the same with Tracer & X-Ray SDK, and starting from Node.js 24 either of the two should be able to import the other.

Overall usage

@redis/client has 3.1M weekly downloads.

@valkey/valkey-glide has 3.7K weekly downloads.

While it's true that the GLIDE library is 7mo only vs the other having a 2+ yrs head start, it's clear that the Redis one has appears to be used a few orders of magnitude more than the newer one.

Low usage/downloads is not a disqualifying factor by itself, but if we are thinking in terms of DX and we want to allow customers to pass their own client to the persistence layer, then maybe using the @redis/client has the potential to cover a larger customer base.

Provenance & Supply chain

When choosing a 3rd party dependency we look at two things when it comes to OSS supply chain security & governance:

Neither of the two libraries publishes provenance statements with their release.

In terms of dependencies:

  • @redis/client has 3 direct dependencies, none of them has dependencies
  • @valkey/valkey-glide has 1 direct dependency plus 3 dependencies brought in by the architecture-specific package (i.e. @valkey/valkey-glide-linux-arm64), of these:
    • one has 0 dependencies (long)
    • one has 1 dependency (npmignore) with one transitive dependency - however this looks like it should be a dev dependency instead
    • one (protobufjs) brings in a dozen plus transitive dependencies

While having provenance statement would be a big differentiator for us, if we look at dependency tree alone the Redis client seems to have a smaller surface area when it comes to modules brought into the node_modules.

Architecture

@redis/client is written in pure JavaScript, this means it can be bundled, minified, and deployed without any special considerations about architecture.

@valkey/valkey-glide on the other hand has a core written in Rust, which means it cannot be bundled and must be handled

As a customer, when it comes to TypeScript/JavaScript functions, having native libraries in the dependency tree means I now have to choose between two options:

  • build my function using a Docker container using the same architecture as your target in Lambda in my host/CI
  • create and manage a Lambda layer for this dependency for each target architecture in my Lambda functions

As a library author, since we publish and offer public Lambda layers that include all Powertools for AWS utilities and their dependencies, it means we will need to start publishing two set of the layer, one for each architecture in every region - functionally doubling our deployment targets.

Given that the change above will also result in new ARNs for the Lambda layers it means we'll need to do this in a major release (no ETA as of today) and introduce additional management overhead for our customers over a feature that at this point - also considering the low interest on the post above - is marginal at best in the context of Powertools for AWS.

All the above is not necessarily a disqualifying factor for using @valkey/valkey-glide, but in order to assume all the complexity above and ask our customers to do the same, we'll need to see a clear performance gain or other compelling argument.

Performance

I deployed a Valkey Serverless ElastiCache in my account and created two Lambda functions, one using @valkey/valkey-glide and another @redis/client. Both functions:

  • are in the same VPC as the ElastiCache
  • have the same exact memory allocation (512MB)
  • are fronted by an API Gateway (HTTP API).
  • instantiate the the client during the INIT_PHASE and reuse it across requests
  • have no other package installed/imported
  • read & write a value
Click here to see CDK stack
import {
  Stack,
  type StackProps,
  CfnOutput,
  RemovalPolicy,
  Duration,
} from 'aws-cdk-lib';
import type { Construct } from 'constructs';
import {
  Architecture,
  Code,
  LayerVersion,
  Runtime,
  Tracing,
} from 'aws-cdk-lib/aws-lambda';
import { NodejsFunction, OutputFormat } from 'aws-cdk-lib/aws-lambda-nodejs';
import { LogGroup, RetentionDays } from 'aws-cdk-lib/aws-logs';
import { aws_elasticache } from '@open-constructs/aws-cdk';
import { Port, SecurityGroup, Vpc } from 'aws-cdk-lib/aws-ec2';
import { HttpApi, HttpMethod } from 'aws-cdk-lib/aws-apigatewayv2';
import { HttpLambdaIntegration } from 'aws-cdk-lib/aws-apigatewayv2-integrations';

export class ValkeyStack extends Stack {
  constructor(scope: Construct, id: string, props?: StackProps) {
    super(scope, id, props);

    // #region Shared

    const vpc = new Vpc(this, 'MyVpc', {
      maxAzs: 2, // Default is all AZs in the region
    });

    const fnSecurityGroup = new SecurityGroup(this, 'ValkeyFnSecurityGroup', {
      vpc,
      allowAllOutbound: true,
      description: 'Security group for Valkey function',
    });

    // #region Valkey Cluster

    const serverlessCacheSecurityGroup = new SecurityGroup(
      this,
      'ServerlessCacheSecurityGroup',
      {
        vpc,
        allowAllOutbound: true,
        description: 'Security group for serverless cache',
      }
    );
    serverlessCacheSecurityGroup.addIngressRule(
      fnSecurityGroup,
      Port.tcp(6379),
      'Allow Lambda to connect to serverless cache'
    );

    const serverlessCache = new aws_elasticache.ServerlessCache(
      this,
      'ServerlessCache',
      {
        engine: aws_elasticache.Engine.VALKEY,
        majorEngineVersion: aws_elasticache.MajorVersion.VER_8,
        serverlessCacheName: 'my-serverless-cache',
        vpc,
        securityGroups: [serverlessCacheSecurityGroup],
      }
    );

    // #region Glide Valkey version

    const valkeyLayer = new LayerVersion(this, 'ValkeyLayer', {
      removalPolicy: RemovalPolicy.DESTROY,
      compatibleArchitectures: [Architecture.ARM_64],
      compatibleRuntimes: [Runtime.NODEJS_22_X],
      code: Code.fromAsset('./lib/layers/valkey-glide'),
    });

    const fnName = 'ValkeyFn';
    const logGroup = new LogGroup(this, 'MyLogGroup', {
      logGroupName: `/aws/lambda/${fnName}`,
      removalPolicy: RemovalPolicy.DESTROY,
      retention: RetentionDays.ONE_DAY,
    });
    const fn = new NodejsFunction(this, 'MyFunction', {
      functionName: fnName,
      logGroup,
      runtime: Runtime.NODEJS_22_X,
      architecture: Architecture.ARM_64,
      memorySize: 512,
      timeout: Duration.seconds(30),
      entry: './src/index.ts',
      handler: 'handler',
      layers: [valkeyLayer],
      bundling: {
        minify: true,
        mainFields: ['module', 'main'],
        sourceMap: true,
        format: OutputFormat.ESM,
        externalModules: ['@valkey/valkey-glide'],
        metafile: true,
      },
      vpc,
      securityGroups: [fnSecurityGroup],
    });
    fn.addEnvironment('CACHE_ENDPOINT', serverlessCache.endpointAddress);
    fn.addEnvironment('CACHE_PORT', serverlessCache.endpointPort.toString());

    // #region Redis Client version

    const fnName2 = 'RedisFn';
    const logGroup2 = new LogGroup(this, 'MyLogGroup2', {
      logGroupName: `/aws/lambda/${fnName2}`,
      removalPolicy: RemovalPolicy.DESTROY,
      retention: RetentionDays.ONE_DAY,
    });
    const fn2 = new NodejsFunction(this, 'MyFunction2', {
      functionName: fnName2,
      logGroup: logGroup2,
      runtime: Runtime.NODEJS_22_X,
      architecture: Architecture.ARM_64,
      memorySize: 512,
      timeout: Duration.seconds(30),
      entry: './src/redis-client.ts',
      handler: 'handler',
      bundling: {
        minify: true,
        mainFields: ['module', 'main'],
        sourceMap: true,
        format: OutputFormat.ESM,
        banner:
          "import { createRequire } from 'module';const require = createRequire(import.meta.url);",
        metafile: true,
      },
      vpc,
      securityGroups: [fnSecurityGroup],
    });
    fn2.addEnvironment('CACHE_ENDPOINT', serverlessCache.endpointAddress);
    fn2.addEnvironment('CACHE_PORT', serverlessCache.endpointPort.toString());

    // #region API Gateway

    const api = new HttpApi(this, 'HttpApi');

    api.addRoutes({
      path: '/valkey',
      methods: [HttpMethod.GET],
      integration: new HttpLambdaIntegration('ValkeyIntegration', fn),
    });

    api.addRoutes({
      path: '/redis',
      methods: [HttpMethod.GET],
      integration: new HttpLambdaIntegration('RedisIntegration', fn2),
    });

    new CfnOutput(this, 'APIEndpoint', {
      value: api.apiEndpoint,
    });
  }
}
Click here to see `@valkey/valkey-glide` function
import { GlideClient } from '@valkey/valkey-glide';

const endpoint = process.env.CACHE_ENDPOINT || '';
const port = process.env.CACHE_PORT || '6379';

const redis = await GlideClient.createClient({
  addresses: [
    {
      host: endpoint,
      port: Number(port),
    },
  ],
  useTLS: true,
});

export const handler = async () => {
  // write
  await redis.set('valkey-key', 'value');
  console.log('Set key to value');

  // read
  const value = await redis.get('valkey-key');
  console.log('Got value:', value);

  return {
    statusCode: 200,
    body: JSON.stringify('Hello, World!'),
  };
};
Click here to see `@redis/client` function
import { createClient } from '@redis/client';

const endpoint = process.env.CACHE_ENDPOINT || '';
const port = process.env.CACHE_PORT || '6379';

const redis = createClient({
  username: 'default',
  socket: {
    tls: true,
    host: endpoint,
    port: Number(port),
  },
});
await redis.connect();

export const handler = async () => {
  // write
  await redis.set('redis-key', 'value');
  console.log('Set key to value');

  // read
  const value = await redis.get('redis-key');
  console.log('Got value:', value);

  return {
    statusCode: 200,
    body: JSON.stringify('Hello, World!'),
  };
};

I ran the test by making 2K requests with 5 concurrent connections made using 5 parallel requests - aka 25 workers. The load test was carried out using oha.

oha -n 2000 -c 5 -p 5 --latency-correction --disable-keepalive $API_ENDPOINT/valkey -o valkey.txt --no-tui
oha -n 2000 -c 5 -p 5 --latency-correction --disable-keepalive $API_ENDPOINT/redis -o redis.txt --no-tui

I repeated the tests 3 times and here's a sample of results for both:

`@valkey/valkey-glide

Summary:
  Success rate:	100.00%
  Total:	61.9826 secs
  Slowest:	0.2301 secs
  Fastest:	0.1313 secs
  Average:	0.1548 secs
  Requests/sec:	32.2671

  Total data:	29.30 KiB
  Size/request:	15 B
  Size/sec:	484 B

Response time histogram:
  0.131 [1]   |
  0.141 [285] |■■■■■■■■■■■■■■
  0.151 [620] |■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■
  0.161 [456] |■■■■■■■■■■■■■■■■■■■■■■■
  0.171 [391] |■■■■■■■■■■■■■■■■■■■■
  0.181 [203] |■■■■■■■■■■
  0.191 [37]  |■
  0.200 [4]   |
  0.210 [1]   |
  0.220 [1]   |
  0.230 [1]   |

Response time distribution:
  10.00% in 0.1398 secs
  25.00% in 0.1441 secs
  50.00% in 0.1530 secs
  75.00% in 0.1641 secs
  90.00% in 0.1727 secs
  95.00% in 0.1766 secs
  99.00% in 0.1843 secs
  99.90% in 0.2198 secs
  99.99% in 0.2301 secs


Details (average, fastest, slowest):
  DNS+dialup:	0.0925 secs, 0.0785 secs, 0.1533 secs
  DNS-lookup:	0.0001 secs, 0.0000 secs, 0.0462 secs

Status code distribution:
  [200] 2000 responses

`@redis/client

Summary:
  Success rate:	100.00%
  Total:	62.0723 secs
  Slowest:	0.2635 secs
  Fastest:	0.1305 secs
  Average:	0.1550 secs
  Requests/sec:	32.2205

  Total data:	29.30 KiB
  Size/request:	15 B
  Size/sec:	483 B

Response time histogram:
  0.130 [1]   |
  0.144 [458] |■■■■■■■■■■■■■■■■■■■■
  0.157 [726] |■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■
  0.170 [557] |■■■■■■■■■■■■■■■■■■■■■■■■
  0.184 [231] |■■■■■■■■■■
  0.197 [16]  |
  0.210 [4]   |
  0.224 [2]   |
  0.237 [0]   |
  0.250 [0]   |
  0.264 [5]   |

Response time distribution:
  10.00% in 0.1395 secs
  25.00% in 0.1445 secs
  50.00% in 0.1533 secs
  75.00% in 0.1635 secs
  90.00% in 0.1726 secs
  95.00% in 0.1773 secs
  99.00% in 0.1848 secs
  99.90% in 0.2560 secs
  99.99% in 0.2635 secs


Details (average, fastest, slowest):
  DNS+dialup:	0.0923 secs, 0.0778 secs, 0.1525 secs
  DNS-lookup:	0.0001 secs, 0.0000 secs, 0.0621 secs

Status code distribution:
  [200] 2000 responses

Both performed quite similarly across all metrics with less than 1% of variance in all key metrics:

Metric Valkey Redis Difference
Success rate 100.00% 100.00% No difference
Total benchmark time 61.98 secs 62.07 secs +0.09 secs (0.15%) for Redis
Throughput (req/sec) 32.27 32.22 -0.05 req/sec (0.15%) for Redis
Average response time 0.1548 secs 0.1550 secs +0.0002 secs (0.13%) for Redis

With latency profiles are nearly identical:

  • Median (50%): ~0.153 seconds for both
  • 75th percentile: ~0.164 seconds for both
  • 95th percentile: ~0.177 seconds for both

Redis has slightly higher maximum latency (0.2635s vs 0.2301s for Valkey), but this affects only the very top percentiles (99.9%+).


Conclusion

Based on what I see above, I am inclined to choose @redis/client as default based on these points (in order of importance):

  • it showed a comparable performance to the alternative
  • it brings less deployment complexity and less dependencies
  • has a larger customer base - that we want to cater to

With that said, even if by default in our dev environment and Lambda layer we'll go with @redis/client, I would like us to explore the possibility of supporting both of them.

I expect our use case to really just use a handful of methods: get, set, and locking mechanisms - so there's a good chance we can abstract the differences between the two libraries if they are not already the same. The main other difference I see is that based on the scarce docs for @valkey/valkey-glide when you create a client it comes already connected while @redis/client requires you to call .connect() on it.

Finally, if you see any mistake or inaccuracy in the arguments above or in the benchmarks, please do point them out - I will be more than happy to amend the recommendation.

@arnabrahman
Copy link
Contributor

@dreamorosi Great analysis. After reading your comment, I now think it makes more sense to use @redis/client for powertools-typeScript. I don’t have any further observations. Since we’ve addressed the points raised above, I’ll start the initial implementation, hopefully this weekend.

@avifenesh
Copy link

avifenesh commented Apr 18, 2025

I apologize for the long comment, it's out of love for our craft, not argumentative.

Performance wise, did you consider the ability to use mget and mset on a cluster comes with glide? (which users are excited about, when get to know it exists)
Compare it to promise all on separate set commands.

What about the security in a client that is and will follow Valkey, and open source?
For example, the AZ affinity feature of valkey, available just in glide currently, and was ready in the second Valkey released it. The feature lets the user prefer nodes in the same AZ, in most cases improves performance, but in all cases, and that's the big deal, save huge amount of money. Transaction in the same AZ doesn't cost.
To add to that, compare Redis last OSS version performance to valkey 8.1. You'll see why it's clear that the shift is to valkey, and users will follow.

For the next version (in a month, give or take), we're introducing batching for cluster as well, meaning you can pipeline multi shards commands together while handled by the library.
We are introducing, moreover, clear performance improvements for the node client.
Would love to run your benchmark on the new implementation and supply results.

But, on top of performance, what glide was designed for in the first place, and excel in doing so, is the reliability, fault tolerance, and solving real pains, which we learn through years of working with users.
See glide ability to restore PubSub, and sharded PubSub, bench the time it takes to go back to action after failover, test how long it takes for the app to understand new topology after sharding.

Glide also has an awesome community, and strong backup, while the giants pushing strong forward, there's also a lot of community initiative that is amazing to see.

For the different types of connection, we will release lazy connection feature, then the behavior will be the same if chosen.

For comparison of fault tolerance, see the number of errors per client, for valkey 8.1 vs. Redis 7 performance see iovalkey vs. ioredis, which has the same code, but connected to the same cluster:
Rate limit bench results

Bench repo with the code to fork and test and an explanatory README.

The bench is for the implementation of glide, iovalkey, and ioredis to rate-limiter-flexible, and doesn't compare direct performances of the clients.
I picked a different implementation of usage then the iovalkey and ioredis when adding support for glide, the pros of knowing the guts of the server and the client.
But the errors counts of the clients, and valkey vs. redis are relevant results.

But for whatever you decide, you are great, and it's really just because I love our project.
If I can help with something, both if for glide, and if not for glide, also for generic Elasticache/Valkey/Redis-OSS, just ping 😺 tagging will work.

@dreamorosi
Copy link
Contributor Author

Thank you for your reply, no need to apologize for being passionate - it's nice to see it and I get it.

Let me address some of your points:

Performance wise, did you consider the ability to use mget and mset on a cluster comes with glide? (which users are excited about, when get to know it exists)
Compare it to promise all on separate set commands.

No, I didn't consider it because our use case reads and writes exactly and at most one value at the time for each request coming to an AWS Lambda function.

If I am understanding correctly the docs for these two methods, they're used to set/get a list of values. If so, then our use case won't benefit from them because Lambda's programming model always processes one request at the time, thus our Idempotency utility also only needs to set/get one item at the time.

See the request flow diagrams in our docs to understand what I mean.

What about the security in a client that is and will follow Valkey, and open source?

I hear you, and I am aware of the history in this space, but I am not going to make a decision based on FUD.

As of today, both @redis/client (MIT) and @valkey/valkey-glide (Apache-2.0) have licenses that fit with our requirements. If in the future things change we'll evaluate alternatives.

Conversely, knowing that a client is following Valkey, while a nice to have, for us is not necessarily a goal. I'm very excited about Valkey personally, and I am glad that AWS is investing in it but when it comes to Powertools for AWS, we want to make sure our customers can use our Idempotency utility with as many engines as possible.

For example, the AZ affinity feature of valkey, available just in glide currently, and was ready in the second Valkey released it. The feature lets the user prefer nodes in the same AZ, in most cases improves performance, but in all cases, and that's the big deal, save huge amount of money. Transaction in the same AZ doesn't cost.

This is interesting, and so far the only tangible benefit in favor of the @valkey/valkey-glide client, however I am not sure how it would work in Lambda.

Based on what I see here, the developer needs to provide the client AZ to the @valkey/valkey-glide client for the client affinity to work. While EC2 instances and some containers have instance metadata endpoints that can help with getting this info, I am not aware of any method to get this dynamically in Lambda.

This means that in practice, customers need to hardcode the value and also configure their functions to run in a single Subnet, which is the only way to guarantee that they'll run in a given AZ. Not sure that these are good ideas - but if there's a way to get the current AZ from within a Lambda function, then my entire argument is wrong and this is actually a plus.

To add to that, compare Redis last OSS version performance to valkey 8.1. You'll see why it's clear that the shift is to valkey, and users will follow.

I agree that when wanting to run an in-memory, high performance, key-value datastore on AWS, Valkey is probably the top option today, however this is not what this argument is about. In this discussion we're trying to choose which Node.js client we'll support in Powertools for AWS.

For the next version (in a month, give or take), we're introducing batching for cluster as well, meaning you can pipeline multi shards commands together while handled by the library.

As mentioned above in relation to mget & mset, I don't think this will benefit our use case.

See glide ability to restore PubSub, and sharded PubSub, bench the time it takes to go back to action after failover, test how long it takes for the app to understand new topology after sharding.

Same as mget/mset and command pipelining, PubSub is not something that applies to this use case.

For the different types of connection, we will release lazy connection feature, then the behavior will be the same if chosen.

This is nice and it'd be useful for us. Do you have an ETA for when will this released?

We are introducing, moreover, clear performance improvements for the node client.

That is great to hear, obviously my tests above are evaluating what's available now. Once @arnabrahman's implementation is nearly done I think it will be easy enough to swap the client and run the benchmark again - this way we can compare the actual implementation and not a toy example like I did above.


Overall it's great to see that there's a lot of movement on the Valkey client and also a clear roadmap. The main concerns about tradeoffs of deployment complexity still stand though. Is by chance your team planning on publishing public Lambda layers for the client? This would definitely help the argument, and our team is happy to do a knowledge transfer to help you set things up if needed.

Like I said in my previous comment, I still want us to explore the option of supporting both clients and making it very clear in the docs that we support both.

However when it comes to our Powertools for AWS Lambda layer, we're not ready to take on the complexity of supporting architecture-specific dependencies. There are a couple of ideas around it that I'd like to test, but I won't be able to do so before a couple of weeks from now.

To be clear, for now @arnabrahman can continue the implementation with whichever of the two clients. Once we have a PR up I'll spend some time seeing if we can make it generic enough to support both. Then we'll take it from there.

@avifenesh
Copy link

@dreamorosi Thanks for the comprehensive answer!

I'll react to some points.

If I am understanding correctly the docs for these two methods, they're used to set/get a list of values. If so, then our use case won't benefit from them because Lambda's programming model always processes one request at the time, thus our Idempotency utility also only needs to set/get one item at the time.

That is a wrong assumption, one trigger != one set of commands.
Let's say you are an online music record store (real example, you may even know them).
If users got connected, it is one trigger but -
You store the user session + you pull the first page of records from the cache/if it's not in the cache, you pull from DB and set to the cache. Same with the user scrolling down. When the user finish to choose record and check his basket, and so on.
This is real users who migrated to glide, running on a lambda, from x client, to valkey glide for Node.js.
They were using the node best practices, promise.all, and the benches of mset and mget crashed it.

Conversely, knowing that a client is following Valkey, while a nice to have, for us is not necessarily a goal. I'm very excited about Valkey personally, and I am glad that AWS is investing in it but when it comes to Powertools for AWS, we want to make sure our customers can use our Idempotency utility with as many engines as possible.

It is actually the opposite, we support any open-source versions of Redis from 6.2 forward, it's part of our test matrices
We give it the same emphasis as any other version. We’ll do our best to avoid breaking it, but we can’t check close-source code or control Redis’s compatibility issues with Valkey.
While trying to maintain alignment, the other client you decide doesn’t make such a promise.

Your Lambda users mainly need support for Elasticache versions or available stores in the Linux distros they use for their KV store.
EC won’t support the close-source versions of Redis, meaning everything after version 7.2, which we support and promised to keep supporting. It will be Valkey from now on, as evidenced by the 20-30% price drop for choosing Valkey instances on Elasticache compared to Redis instances.
Every known Linux distro will replace Redis’s close-source version with Valkey.

Therefore, the chances that Lambda users will want integration with Redis 7.4 are lower than the versions we support, for example Redis 7 or Valkey 8.

On the other hand, Jedis’s client-side caching for Redis 7.4 forward only is a clear vendor lock-in move by the client. Client-side caching has nothing to do with newer versions; it’s a feature on the client side, not a missing feature in the engine.
We’re working on client-side caching, and support will start from version 6.2, possibly even lower, but not tested. We won’t introduce the feature just from Valkey 8.0.

Following Valkey means benefiting from new features without locking users from using OSS versions of Redis.
Not using Valkey or not using Valkey are intending to break backward and forward compatibility.
Breaking will happen from Redis’s side if they break compatibility.

This means that in practice, customers need to hardcode the value and also configure their functions to run in a single Subnet, which is the only way to guarantee that they'll run in a given AZ. Not sure that these are good ideas - but if there's a way to get the current AZ from within a Lambda function, then my entire argument is wrong and this is actually a plus.

Two points -
A. While functions are going up and down frequently, the best practice with clients, especially in memory fast clients, is to leave them up globally, and use them from the functions, so even if your functions are dynamic, your client should be static. Establishing connection is time-consuming, and you should leave it connected.

B. You can, not directly through the lambda API, but this is possible if your lambada is attached to a VPC,
See https://docs.aws.amazon.com/lambda/latest/dg/configuration-vpc.html and http://docs.aws.amazon.com/lambda/latest/dg/configuration-vpc.html

I know users who are doing it, so I'm sure it is possible, that a huge cost reduction, ec2 doesn't charge you for the transaction.
See example from the blog written by the amazing @adarovadya

Example: An application in AWS with a Valkey cluster of 2 shards, each with 1 primary and 2 replicas, the instance type is m7g.xlarge. The cluster processes 250MB of data per second and to simplify the example 100% of the traffic is read operation. 50% of this traffic crosses AZs at a cost of $0.01 per GB, the monthly cross-AZ data transfer cost would be approximately $3,285. In addition the cost of the cluster is $0.252 per hour per node. Total of $1,088 per month. By implementing AZ affinity routing, you can reduce the total cost from $4,373 to $1,088, as all traffic remains within the same AZ.

I agree that when wanting to run an in-memory, high performance, key-value datastore on AWS, Valkey is probably the top option today, however this is not what this argument is about. In this discussion we're trying to choose which Node.js client we'll support in Powertools for AWS.

I think we answered about it and our view on it in other places in the discussion. My main point is people will follow valkey in the AWS environment, and you better have a client that supports the features it gives, rather than one give features exist just in close source versions of redis.

Same as mget/mset and command pipelining, PubSub is not something that applies to this use case

Same answer, but with more emphasis, you can accumulate together all the actions you need for something and perform it at once, both setting the user sessions and getting his recommended audio tracks page.

This is nice and it'd be useful for us. Do you have an ETA for when will this released?

A little more than a month. But a workaround already exists, until then.

Overall it's great to see that there's a lot of movement on the Valkey client and also a clear roadmap. The main concerns about tradeoffs of deployment complexity still stand though. Is by chance your team planning on publishing public Lambda layers for the client? This would definitely help the argument, and our team is happy to do a knowledge transfer to help you set things up if needed.

Like I said in my previous comment, I still want us to explore the option of supporting both clients and making it very clear in the docs that we support both.

However when it comes to our Powertools for AWS Lambda layer, we're not ready to take on the complexity of supporting architecture-specific dependencies. There are a couple of ideas around it that I'd like to test, but I won't be able to do so before a couple of weeks from now.

Let's meet and talk about that, if we can work together to create such a thing, there will be an enormous amount of happy users. I think that exactly the kind of thing that we can create together, each bringing his specialty and knowledge, and creating healthy collaboration.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
discussing The issue needs to be discussed, elaborated, or refined feature-request This item refers to a feature request for an existing or new utility idempotency This item relates to the Idempotency Utility need-customer-feedback Requires more customers feedback before making or revisiting a decision
Projects
Development

No branches or pull requests

5 participants