Skip to content

Commit a615c24

Browse files
authored
feat(batch): add option to continue processing other group IDs on failure in SqsFifoPartialProcessor (#2590)
1 parent 9def9fa commit a615c24

File tree

9 files changed

+414
-26
lines changed

9 files changed

+414
-26
lines changed

Diff for: docs/utilities/batch.md

+43-7
Original file line numberDiff line numberDiff line change
@@ -141,14 +141,25 @@ Processing batches from SQS works in three stages:
141141

142142
#### FIFO queues
143143

144-
When using [SQS FIFO queues](https://docs.aws.amazon.com/AWSSimpleQueueService/latest/SQSDeveloperGuide/FIFO-queues.html){target="_blank"}, we will stop processing messages after the first failure, and return all failed and unprocessed messages in `batchItemFailures`.
145-
This helps preserve the ordering of messages in your queue.
144+
When using [SQS FIFO queues](https://docs.aws.amazon.com/AWSSimpleQueueService/latest/SQSDeveloperGuide/sqs-fifo-queues.html){target="_blank"}, a batch may include messages from different group IDs.
146145

147-
```typescript hl_lines="1-4 8 20-22"
148-
--8<-- "examples/snippets/batch/gettingStartedSQSFifo.ts"
149-
```
146+
By default, we will stop processing at the first failure and mark unprocessed messages as failed to preserve ordering. However, this behavior may not be optimal for customers who wish to proceed with processing messages from a different group ID.
147+
148+
Enable the `skipGroupOnError` option for seamless processing of messages from various group IDs. This setup ensures that messages from a failed group ID are sent back to SQS, enabling uninterrupted processing of messages from the subsequent group ID.
150149

151-
1. **Step 1**. Creates a partial failure batch processor for SQS FIFO queues. See [partial failure mechanics for details](#partial-failure-mechanics)
150+
=== "Recommended"
151+
152+
```typescript hl_lines="1-4 8"
153+
--8<-- "examples/snippets/batch/gettingStartedSQSFifo.ts"
154+
```
155+
156+
1. **Step 1**. Creates a partial failure batch processor for SQS FIFO queues. See [partial failure mechanics for details](#partial-failure-mechanics)
157+
158+
=== "Enabling skipGroupOnError flag"
159+
160+
```typescript hl_lines="1-4 13 30"
161+
--8<-- "examples/snippets/batch/gettingStartedSQSFifoSkipGroupOnError.ts"
162+
```
152163

153164
!!! Note
154165
Note that SqsFifoPartialProcessor is synchronous using `processPartialResponseSync`.
@@ -283,7 +294,7 @@ sequenceDiagram
283294

284295
> Read more about [Batch Failure Reporting feature in AWS Lambda](https://docs.aws.amazon.com/lambda/latest/dg/with-sqs.html#services-sqs-batchfailurereporting){target="_blank"}.
285296
286-
Sequence diagram to explain how [`SqsFifoPartialProcessor` works](#fifo-queues) with SQS FIFO queues.
297+
Sequence diagram to explain how [`SqsFifoPartialProcessor` works](#fifo-queues) with SQS FIFO queues without `skipGroupOnError` flag.
287298

288299
<center>
289300
```mermaid
@@ -307,6 +318,31 @@ sequenceDiagram
307318
<i>SQS FIFO mechanism with Batch Item Failures</i>
308319
</center>
309320

321+
Sequence diagram to explain how [`SqsFifoPartialProcessor` works](#fifo-queues) with SQS FIFO queues with `skipGroupOnError` flag.
322+
323+
<center>
324+
```mermaid
325+
sequenceDiagram
326+
autonumber
327+
participant SQS queue
328+
participant Lambda service
329+
participant Lambda function
330+
Lambda service->>SQS queue: Poll
331+
Lambda service->>Lambda function: Invoke (batch event)
332+
activate Lambda function
333+
Lambda function-->Lambda function: Process 2 out of 10 batch items
334+
Lambda function--xLambda function: Fail on 3rd batch item
335+
Lambda function-->Lambda function: Process messages from another MessageGroupID
336+
Lambda function->>Lambda service: Report 3rd batch item and all messages within the same MessageGroupID as failure
337+
deactivate Lambda function
338+
activate SQS queue
339+
Lambda service->>SQS queue: Delete successful messages processed
340+
SQS queue-->>SQS queue: Failed messages return
341+
deactivate SQS queue
342+
```
343+
<i>SQS FIFO mechanism with Batch Item Failures</i>
344+
</center>
345+
310346
#### Kinesis and DynamoDB Streams
311347

312348
> Read more about [Batch Failure Reporting feature](https://docs.aws.amazon.com/lambda/latest/dg/with-kinesis.html#services-kinesis-batchfailurereporting){target="_blank"}.
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,32 @@
1+
import {
2+
SqsFifoPartialProcessor,
3+
processPartialResponseSync,
4+
} from '@aws-lambda-powertools/batch';
5+
import { Logger } from '@aws-lambda-powertools/logger';
6+
import type {
7+
SQSEvent,
8+
SQSRecord,
9+
Context,
10+
SQSBatchResponse,
11+
} from 'aws-lambda';
12+
13+
const processor = new SqsFifoPartialProcessor();
14+
const logger = new Logger();
15+
16+
const recordHandler = (record: SQSRecord): void => {
17+
const payload = record.body;
18+
if (payload) {
19+
const item = JSON.parse(payload);
20+
logger.info('Processed item', { item });
21+
}
22+
};
23+
24+
export const handler = async (
25+
event: SQSEvent,
26+
context: Context
27+
): Promise<SQSBatchResponse> => {
28+
return processPartialResponseSync(event, recordHandler, processor, {
29+
context,
30+
skipGroupOnError: true,
31+
});
32+
};

Diff for: packages/batch/src/SqsFifoPartialProcessor.ts

+101-12
Original file line numberDiff line numberDiff line change
@@ -1,7 +1,17 @@
1+
import { SQSRecord } from 'aws-lambda';
12
import { BatchProcessorSync } from './BatchProcessorSync.js';
23
import { EventType } from './constants.js';
3-
import { SqsFifoShortCircuitError } from './errors.js';
4-
import type { FailureResponse, SuccessResponse } from './types.js';
4+
import {
5+
BatchProcessingError,
6+
SqsFifoMessageGroupShortCircuitError,
7+
SqsFifoShortCircuitError,
8+
} from './errors.js';
9+
import type {
10+
BaseRecord,
11+
EventSourceDataClassTypes,
12+
FailureResponse,
13+
SuccessResponse,
14+
} from './types.js';
515

616
/**
717
* Batch processor for SQS FIFO queues
@@ -35,8 +45,36 @@ import type { FailureResponse, SuccessResponse } from './types.js';
3545
* ```
3646
*/
3747
class SqsFifoPartialProcessor extends BatchProcessorSync {
48+
/**
49+
* The ID of the current message group being processed.
50+
*/
51+
#currentGroupId?: string;
52+
/**
53+
* A set of group IDs that have already encountered failures.
54+
*/
55+
#failedGroupIds: Set<string>;
56+
3857
public constructor() {
3958
super(EventType.SQS);
59+
this.#failedGroupIds = new Set<string>();
60+
}
61+
62+
/**
63+
* Handles a failure for a given record.
64+
* Adds the current group ID to the set of failed group IDs if `skipGroupOnError` is true.
65+
* @param record - The record that failed.
66+
* @param exception - The error that occurred.
67+
* @returns The failure response.
68+
*/
69+
public failureHandler(
70+
record: EventSourceDataClassTypes,
71+
exception: Error
72+
): FailureResponse {
73+
if (this.options?.skipGroupOnError && this.#currentGroupId) {
74+
this.#addToFailedGroup(this.#currentGroupId);
75+
}
76+
77+
return super.failureHandler(record, exception);
4078
}
4179

4280
/**
@@ -48,8 +86,11 @@ class SqsFifoPartialProcessor extends BatchProcessorSync {
4886
* The method calls the prepare hook to initialize the processor and then
4987
* iterates over each record in the batch, processing them one by one.
5088
*
51-
* If one of them fails, the method short circuits the processing and fails
52-
* the remaining records in the batch.
89+
* If one of them fails and `skipGroupOnError` is not true, the method short circuits
90+
* the processing and fails the remaining records in the batch.
91+
*
92+
* If one of them fails and `skipGroupOnError` is true, then the method fails the current record
93+
* if the message group has any previous failure, otherwise keeps processing.
5394
*
5495
* Then, it calls the clean hook to clean up the processor and returns the
5596
* processed records.
@@ -60,13 +101,31 @@ class SqsFifoPartialProcessor extends BatchProcessorSync {
60101
const processedRecords: (SuccessResponse | FailureResponse)[] = [];
61102
let currentIndex = 0;
62103
for (const record of this.records) {
63-
// If we have any failed messages, it means the last message failed
64-
// We should then short circuit the process and fail remaining messages
65-
if (this.failureMessages.length != 0) {
104+
this.#setCurrentGroup((record as SQSRecord).attributes?.MessageGroupId);
105+
106+
// If we have any failed messages, we should then short circuit the process and
107+
// fail remaining messages unless `skipGroupOnError` is true
108+
const shouldShortCircuit =
109+
!this.options?.skipGroupOnError && this.failureMessages.length !== 0;
110+
if (shouldShortCircuit) {
66111
return this.shortCircuitProcessing(currentIndex, processedRecords);
67112
}
68113

69-
processedRecords.push(this.processRecordSync(record));
114+
// If `skipGroupOnError` is true and the current group has previously failed,
115+
// then we should skip processing the current group.
116+
const shouldSkipCurrentGroup =
117+
this.options?.skipGroupOnError &&
118+
this.#currentGroupId &&
119+
this.#failedGroupIds.has(this.#currentGroupId);
120+
121+
const result = shouldSkipCurrentGroup
122+
? this.#processFailRecord(
123+
record,
124+
new SqsFifoMessageGroupShortCircuitError()
125+
)
126+
: this.processRecordSync(record);
127+
128+
processedRecords.push(result);
70129
currentIndex++;
71130
}
72131

@@ -94,16 +153,46 @@ class SqsFifoPartialProcessor extends BatchProcessorSync {
94153
const remainingRecords = this.records.slice(firstFailureIndex);
95154

96155
for (const record of remainingRecords) {
97-
const data = this.toBatchType(record, this.eventType);
98-
processedRecords.push(
99-
this.failureHandler(data, new SqsFifoShortCircuitError())
100-
);
156+
this.#processFailRecord(record, new SqsFifoShortCircuitError());
101157
}
102158

103159
this.clean();
104160

105161
return processedRecords;
106162
}
163+
164+
/**
165+
* Adds the specified group ID to the set of failed group IDs.
166+
*
167+
* @param group - The group ID to be added to the set of failed group IDs.
168+
*/
169+
#addToFailedGroup(group: string): void {
170+
this.#failedGroupIds.add(group);
171+
}
172+
173+
/**
174+
* Processes a fail record.
175+
*
176+
* @param record - The record that failed.
177+
* @param exception - The error that occurred.
178+
*/
179+
#processFailRecord(
180+
record: BaseRecord,
181+
exception: BatchProcessingError
182+
): FailureResponse {
183+
const data = this.toBatchType(record, this.eventType);
184+
185+
return this.failureHandler(data, exception);
186+
}
187+
188+
/**
189+
* Sets the current group ID for the message being processed.
190+
*
191+
* @param group - The group ID of the current message being processed.
192+
*/
193+
#setCurrentGroup(group?: string): void {
194+
this.#currentGroupId = group;
195+
}
107196
}
108197

109198
export { SqsFifoPartialProcessor };

Diff for: packages/batch/src/errors.ts

+12
Original file line numberDiff line numberDiff line change
@@ -37,6 +37,17 @@ class SqsFifoShortCircuitError extends BatchProcessingError {
3737
}
3838
}
3939

40+
/**
41+
* Error thrown by the Batch Processing utility when a previous record from
42+
* SQS FIFO queue message group fails processing.
43+
*/
44+
class SqsFifoMessageGroupShortCircuitError extends BatchProcessingError {
45+
public constructor() {
46+
super('A previous record from this message group failed processing');
47+
this.name = 'SqsFifoMessageGroupShortCircuitError';
48+
}
49+
}
50+
4051
/**
4152
* Error thrown by the Batch Processing utility when a partial processor receives an unexpected
4253
* batch type.
@@ -56,5 +67,6 @@ export {
5667
BatchProcessingError,
5768
FullBatchFailureError,
5869
SqsFifoShortCircuitError,
70+
SqsFifoMessageGroupShortCircuitError,
5971
UnexpectedBatchTypeError,
6072
};

Diff for: packages/batch/src/index.ts

+1
Original file line numberDiff line numberDiff line change
@@ -3,6 +3,7 @@ export {
33
BatchProcessingError,
44
FullBatchFailureError,
55
SqsFifoShortCircuitError,
6+
SqsFifoMessageGroupShortCircuitError,
67
UnexpectedBatchTypeError,
78
} from './errors.js';
89
export { BasePartialBatchProcessor } from './BasePartialBatchProcessor.js';

Diff for: packages/batch/src/processPartialResponseSync.ts

+30-4
Original file line numberDiff line numberDiff line change
@@ -42,16 +42,42 @@ import type {
4242
* });
4343
* ```
4444
*
45+
* When working with SQS FIFO queues, we will stop processing at the first failure
46+
* and mark unprocessed messages as failed to preserve ordering. However, if you want to
47+
* continue processing messages from different group IDs, you can enable the `skipGroupOnError`
48+
* option for seamless processing of messages from various group IDs.
49+
*
50+
* @example
51+
* ```typescript
52+
* import {
53+
* SqsFifoPartialProcessor,
54+
* processPartialResponseSync,
55+
* } from '@aws-lambda-powertools/batch';
56+
* import type { SQSRecord, SQSHandler } from 'aws-lambda';
57+
*
58+
* const processor = new SqsFifoPartialProcessor();
59+
*
60+
* const recordHandler = async (record: SQSRecord): Promise<void> => {
61+
* const payload = JSON.parse(record.body);
62+
* };
63+
*
64+
* export const handler: SQSHandler = async (event, context) =>
65+
* processPartialResponseSync(event, recordHandler, processor, {
66+
* context,
67+
* skipGroupOnError: true
68+
* });
69+
* ```
70+
*
4571
* @param event The event object containing the batch of records
4672
* @param recordHandler Sync function to process each record from the batch
4773
* @param processor Batch processor instance to handle the batch processing
48-
* @param options Batch processing options
74+
* @param options Batch processing options, which can vary with chosen batch processor implementation
4975
*/
50-
const processPartialResponseSync = (
76+
const processPartialResponseSync = <T extends BasePartialBatchProcessor>(
5177
event: { Records: BaseRecord[] },
5278
recordHandler: CallableFunction,
53-
processor: BasePartialBatchProcessor,
54-
options?: BatchProcessingOptions
79+
processor: T,
80+
options?: BatchProcessingOptions<T>
5581
): PartialItemFailureResponse => {
5682
if (!event.Records || !Array.isArray(event.Records)) {
5783
throw new UnexpectedBatchTypeError();

Diff for: packages/batch/src/types.ts

+11-2
Original file line numberDiff line numberDiff line change
@@ -4,18 +4,27 @@ import type {
44
KinesisStreamRecord,
55
SQSRecord,
66
} from 'aws-lambda';
7+
import { SqsFifoPartialProcessor } from './SqsFifoPartialProcessor.js';
8+
import { BasePartialBatchProcessor } from './BasePartialBatchProcessor.js';
79

810
/**
911
* Options for batch processing
1012
*
13+
* @template T The type of the batch processor, defaults to BasePartialBatchProcessor
1114
* @property context The context object provided by the AWS Lambda runtime
15+
* @property skipGroupOnError The option to group on error during processing
1216
*/
13-
type BatchProcessingOptions = {
17+
type BatchProcessingOptions<T = BasePartialBatchProcessor> = {
1418
/**
1519
* The context object provided by the AWS Lambda runtime. When provided,
1620
* it's made available to the handler function you specify
1721
*/
18-
context: Context;
22+
context?: Context;
23+
/**
24+
* This option is only available for SqsFifoPartialProcessor.
25+
* If true skip the group on error during processing.
26+
*/
27+
skipGroupOnError?: T extends SqsFifoPartialProcessor ? boolean : never;
1928
};
2029

2130
/**

Diff for: packages/batch/tests/helpers/factories.ts

+2-1
Original file line numberDiff line numberDiff line change
@@ -5,7 +5,7 @@ import type {
55
} from 'aws-lambda';
66
import { randomInt, randomUUID } from 'node:crypto';
77

8-
const sqsRecordFactory = (body: string): SQSRecord => {
8+
const sqsRecordFactory = (body: string, messageGroupId?: string): SQSRecord => {
99
return {
1010
messageId: randomUUID(),
1111
receiptHandle: 'AQEBwJnKyrHigUMZj6rYigCgxlaS3SLy0a',
@@ -15,6 +15,7 @@ const sqsRecordFactory = (body: string): SQSRecord => {
1515
SentTimestamp: '1545082649183',
1616
SenderId: 'AIDAIENQZJOLO23YVJ4VO',
1717
ApproximateFirstReceiveTimestamp: '1545082649185',
18+
...(messageGroupId ? { MessageGroupId: messageGroupId } : {}),
1819
},
1920
messageAttributes: {},
2021
md5OfBody: 'e4e68fb7bd0e697a0ae8f1bb342846b3',

0 commit comments

Comments
 (0)