Skip to content

Commit c50e62c

Browse files
authored
feat(bedrock): expose bda parsing strategy for data sources (#1096)
* feat(parsing): add bda support as a parsing strategy for data sources
1 parent 93d844d commit c50e62c

22 files changed

+133
-43
lines changed

apidocs/@cdklabs/namespaces/bedrock/README.md

+3-3
Original file line numberDiff line numberDiff line change
@@ -33,7 +33,7 @@
3333
- [ModalityType](enumerations/ModalityType.md)
3434
- [OrchestrationType](enumerations/OrchestrationType.md)
3535
- [ParsingModality](enumerations/ParsingModality.md)
36-
- [ParsingStategyType](enumerations/ParsingStategyType.md)
36+
- [ParsingStrategyType](enumerations/ParsingStrategyType.md)
3737
- [PromptTemplateType](enumerations/PromptTemplateType.md)
3838
- [RelayConversationHistoryType](enumerations/RelayConversationHistoryType.md)
3939
- [SalesforceDataSourceAuthType](enumerations/SalesforceDataSourceAuthType.md)
@@ -80,7 +80,7 @@
8080
- [Memory](classes/Memory.md)
8181
- [OrchestrationExecutor](classes/OrchestrationExecutor.md)
8282
- [ParentActionGroupSignature](classes/ParentActionGroupSignature.md)
83-
- [ParsingStategy](classes/ParsingStategy.md)
83+
- [ParsingStrategy](classes/ParsingStrategy.md)
8484
- [Prompt](classes/Prompt.md)
8585
- [PromptBase](classes/PromptBase.md)
8686
- [PromptOverrideConfiguration](classes/PromptOverrideConfiguration.md)
@@ -128,7 +128,7 @@
128128
- [CustomTopicProps](interfaces/CustomTopicProps.md)
129129
- [DataSourceAssociationProps](interfaces/DataSourceAssociationProps.md)
130130
- [FoundationModelContextEnrichmentProps](interfaces/FoundationModelContextEnrichmentProps.md)
131-
- [FoundationModelParsingStategyProps](interfaces/FoundationModelParsingStategyProps.md)
131+
- [FoundationModelParsingStrategyProps](interfaces/FoundationModelParsingStrategyProps.md)
132132
- [GraphKnowledgeBaseAttributes](interfaces/GraphKnowledgeBaseAttributes.md)
133133
- [GraphKnowledgeBaseProps](interfaces/GraphKnowledgeBaseProps.md)
134134
- [GuardrailAttributes](interfaces/GuardrailAttributes.md)

apidocs/@cdklabs/namespaces/bedrock/classes/ParsingStategy.md renamed to apidocs/@cdklabs/namespaces/bedrock/classes/ParsingStrategy.md

+21-7
Original file line numberDiff line numberDiff line change
@@ -2,9 +2,9 @@
22

33
***
44

5-
[@cdklabs/generative-ai-cdk-constructs](../../../../README.md) / [bedrock](../README.md) / ParsingStategy
5+
[@cdklabs/generative-ai-cdk-constructs](../../../../README.md) / [bedrock](../README.md) / ParsingStrategy
66

7-
# Class: `abstract` ParsingStategy
7+
# Class: `abstract` ParsingStrategy
88

99
Represents an advanced parsing strategy configuration for Knowledge Base ingestion.
1010

@@ -16,11 +16,11 @@ https://docs.aws.amazon.com/bedrock/latest/userguide/kb-chunking-parsing.html#kb
1616

1717
### Constructor
1818

19-
> **new ParsingStategy**(): `ParsingStategy`
19+
> **new ParsingStrategy**(): `ParsingStrategy`
2020
2121
#### Returns
2222

23-
`ParsingStategy`
23+
`ParsingStrategy`
2424

2525
## Properties
2626

@@ -42,9 +42,23 @@ The CloudFormation property representation of this configuration
4242

4343
***
4444

45+
### bedrockDataAutomation()
46+
47+
> `static` **bedrockDataAutomation**(): `ParsingStrategy`
48+
49+
Creates a Bedrock Data Automation-based parsing strategy for processing multimodal data.
50+
It leverages generative AI to automate the transformation of multi-modal data into structured formats.
51+
If the parsing fails, the Amazon Bedrock default parser is used instead.
52+
53+
#### Returns
54+
55+
`ParsingStrategy`
56+
57+
***
58+
4559
### foundationModel()
4660

47-
> `static` **foundationModel**(`props`): `ParsingStategy`
61+
> `static` **foundationModel**(`props`): `ParsingStrategy`
4862
4963
Creates a Foundation Model-based parsing strategy for extracting non-textual information
5064
from documents such as tables and charts.
@@ -55,11 +69,11 @@ from documents such as tables and charts.
5569

5670
##### props
5771

58-
[`FoundationModelParsingStategyProps`](../interfaces/FoundationModelParsingStategyProps.md)
72+
[`FoundationModelParsingStrategyProps`](../interfaces/FoundationModelParsingStrategyProps.md)
5973

6074
#### Returns
6175

62-
`ParsingStategy`
76+
`ParsingStrategy`
6377

6478
#### See
6579

apidocs/@cdklabs/namespaces/bedrock/enumerations/ParsingStategyType.md renamed to apidocs/@cdklabs/namespaces/bedrock/enumerations/ParsingStrategyType.md

+2-2
Original file line numberDiff line numberDiff line change
@@ -2,9 +2,9 @@
22

33
***
44

5-
[@cdklabs/generative-ai-cdk-constructs](../../../../README.md) / [bedrock](../README.md) / ParsingStategyType
5+
[@cdklabs/generative-ai-cdk-constructs](../../../../README.md) / [bedrock](../README.md) / ParsingStrategyType
66

7-
# Enumeration: ParsingStategyType
7+
# Enumeration: ParsingStrategyType
88

99
Enum representing the types of parsing strategies available for Amazon Bedrock Knowledge Bases.
1010

apidocs/@cdklabs/namespaces/bedrock/interfaces/ConfluenceDataSourceAssociationProps.md

+1-1
Original file line numberDiff line numberDiff line change
@@ -200,7 +200,7 @@ The KMS key to use to encrypt the data source.
200200

201201
### parsingStrategy?
202202

203-
> `readonly` `optional` **parsingStrategy**: [`ParsingStategy`](../classes/ParsingStategy.md)
203+
> `readonly` `optional` **parsingStrategy**: [`ParsingStrategy`](../classes/ParsingStrategy.md)
204204
205205
The parsing strategy to use.
206206

apidocs/@cdklabs/namespaces/bedrock/interfaces/ConfluenceDataSourceProps.md

+1-1
Original file line numberDiff line numberDiff line change
@@ -220,7 +220,7 @@ The knowledge base to associate with the data source.
220220

221221
### parsingStrategy?
222222

223-
> `readonly` `optional` **parsingStrategy**: [`ParsingStategy`](../classes/ParsingStategy.md)
223+
> `readonly` `optional` **parsingStrategy**: [`ParsingStrategy`](../classes/ParsingStrategy.md)
224224
225225
The parsing strategy to use.
226226

apidocs/@cdklabs/namespaces/bedrock/interfaces/CustomDataSourceAssociationProps.md

+1-1
Original file line numberDiff line numberDiff line change
@@ -148,7 +148,7 @@ The KMS key to use to encrypt the data source.
148148

149149
### parsingStrategy?
150150

151-
> `readonly` `optional` **parsingStrategy**: [`ParsingStategy`](../classes/ParsingStategy.md)
151+
> `readonly` `optional` **parsingStrategy**: [`ParsingStrategy`](../classes/ParsingStrategy.md)
152152
153153
The parsing strategy to use.
154154

apidocs/@cdklabs/namespaces/bedrock/interfaces/CustomDataSourceProps.md

+1-1
Original file line numberDiff line numberDiff line change
@@ -152,7 +152,7 @@ The knowledge base to associate with the data source.
152152

153153
### parsingStrategy?
154154

155-
> `readonly` `optional` **parsingStrategy**: [`ParsingStategy`](../classes/ParsingStategy.md)
155+
> `readonly` `optional` **parsingStrategy**: [`ParsingStrategy`](../classes/ParsingStrategy.md)
156156
157157
The parsing strategy to use.
158158

apidocs/@cdklabs/namespaces/bedrock/interfaces/DataSourceAssociationProps.md

+1-1
Original file line numberDiff line numberDiff line change
@@ -121,7 +121,7 @@ The KMS key to use to encrypt the data source.
121121

122122
### parsingStrategy?
123123

124-
> `readonly` `optional` **parsingStrategy**: [`ParsingStategy`](../classes/ParsingStategy.md)
124+
> `readonly` `optional` **parsingStrategy**: [`ParsingStrategy`](../classes/ParsingStrategy.md)
125125
126126
The parsing strategy to use.
127127

apidocs/@cdklabs/namespaces/bedrock/interfaces/FoundationModelParsingStategyProps.md renamed to apidocs/@cdklabs/namespaces/bedrock/interfaces/FoundationModelParsingStrategyProps.md

+2-2
Original file line numberDiff line numberDiff line change
@@ -2,9 +2,9 @@
22

33
***
44

5-
[@cdklabs/generative-ai-cdk-constructs](../../../../README.md) / [bedrock](../README.md) / FoundationModelParsingStategyProps
5+
[@cdklabs/generative-ai-cdk-constructs](../../../../README.md) / [bedrock](../README.md) / FoundationModelParsingStrategyProps
66

7-
# Interface: FoundationModelParsingStategyProps
7+
# Interface: FoundationModelParsingStrategyProps
88

99
Properties for configuring a Foundation Model parsing strategy.
1010

apidocs/@cdklabs/namespaces/bedrock/interfaces/S3DataSourceAssociationProps.md

+1-1
Original file line numberDiff line numberDiff line change
@@ -170,7 +170,7 @@ The KMS key to use to encrypt the data source.
170170

171171
### parsingStrategy?
172172

173-
> `readonly` `optional` **parsingStrategy**: [`ParsingStategy`](../classes/ParsingStategy.md)
173+
> `readonly` `optional` **parsingStrategy**: [`ParsingStrategy`](../classes/ParsingStrategy.md)
174174
175175
The parsing strategy to use.
176176

apidocs/@cdklabs/namespaces/bedrock/interfaces/S3DataSourceProps.md

+1-1
Original file line numberDiff line numberDiff line change
@@ -182,7 +182,7 @@ The knowledge base to associate with the data source.
182182

183183
### parsingStrategy?
184184

185-
> `readonly` `optional` **parsingStrategy**: [`ParsingStategy`](../classes/ParsingStategy.md)
185+
> `readonly` `optional` **parsingStrategy**: [`ParsingStrategy`](../classes/ParsingStrategy.md)
186186
187187
The parsing strategy to use.
188188

apidocs/@cdklabs/namespaces/bedrock/interfaces/SalesforceDataSourceAssociationProps.md

+1-1
Original file line numberDiff line numberDiff line change
@@ -186,7 +186,7 @@ The KMS key to use to encrypt the data source.
186186

187187
### parsingStrategy?
188188

189-
> `readonly` `optional` **parsingStrategy**: [`ParsingStategy`](../classes/ParsingStategy.md)
189+
> `readonly` `optional` **parsingStrategy**: [`ParsingStrategy`](../classes/ParsingStrategy.md)
190190
191191
The parsing strategy to use.
192192

apidocs/@cdklabs/namespaces/bedrock/interfaces/SalesforceDataSourceProps.md

+1-1
Original file line numberDiff line numberDiff line change
@@ -202,7 +202,7 @@ The knowledge base to associate with the data source.
202202

203203
### parsingStrategy?
204204

205-
> `readonly` `optional` **parsingStrategy**: [`ParsingStategy`](../classes/ParsingStategy.md)
205+
> `readonly` `optional` **parsingStrategy**: [`ParsingStrategy`](../classes/ParsingStrategy.md)
206206
207207
The parsing strategy to use.
208208

apidocs/@cdklabs/namespaces/bedrock/interfaces/SharePointDataSourceAssociationProps.md

+1-1
Original file line numberDiff line numberDiff line change
@@ -186,7 +186,7 @@ The KMS key to use to encrypt the data source.
186186

187187
### parsingStrategy?
188188

189-
> `readonly` `optional` **parsingStrategy**: [`ParsingStategy`](../classes/ParsingStategy.md)
189+
> `readonly` `optional` **parsingStrategy**: [`ParsingStrategy`](../classes/ParsingStrategy.md)
190190
191191
The parsing strategy to use.
192192

apidocs/@cdklabs/namespaces/bedrock/interfaces/SharePointDataSourceProps.md

+1-1
Original file line numberDiff line numberDiff line change
@@ -202,7 +202,7 @@ The knowledge base to associate with the data source.
202202

203203
### parsingStrategy?
204204

205-
> `readonly` `optional` **parsingStrategy**: [`ParsingStategy`](../classes/ParsingStategy.md)
205+
> `readonly` `optional` **parsingStrategy**: [`ParsingStrategy`](../classes/ParsingStrategy.md)
206206
207207
The parsing strategy to use.
208208

apidocs/@cdklabs/namespaces/bedrock/interfaces/WebCrawlerDataSourceAssociationProps.md

+1-1
Original file line numberDiff line numberDiff line change
@@ -208,7 +208,7 @@ no web pages will be ingested.
208208

209209
### parsingStrategy?
210210

211-
> `readonly` `optional` **parsingStrategy**: [`ParsingStategy`](../classes/ParsingStategy.md)
211+
> `readonly` `optional` **parsingStrategy**: [`ParsingStrategy`](../classes/ParsingStrategy.md)
212212
213213
The parsing strategy to use.
214214

apidocs/@cdklabs/namespaces/bedrock/interfaces/WebCrawlerDataSourceProps.md

+1-1
Original file line numberDiff line numberDiff line change
@@ -228,7 +228,7 @@ no web pages will be ingested.
228228

229229
### parsingStrategy?
230230

231-
> `readonly` `optional` **parsingStrategy**: [`ParsingStategy`](../classes/ParsingStategy.md)
231+
> `readonly` `optional` **parsingStrategy**: [`ParsingStrategy`](../classes/ParsingStrategy.md)
232232
233233
The parsing strategy to use.
234234

src/cdk-lib/bedrock/data-sources/README.md

+21-3
Original file line numberDiff line numberDiff line change
@@ -453,24 +453,42 @@ two parsing strategies:
453453

454454
- **Foundation Model Parsing Strategy**: This strategy uses a foundation model to describe
455455
the contents of the document. It is particularly useful for improved processing of PDF files
456-
with tables and images. To use this strategy, set the `parsingStrategy` in a data source as below.
456+
with tables and images. To use this strategy, set the `parsingStrategy` in a data source as below.
457+
For the list of supported models, please refer to the [documentation](https://docs.aws.amazon.com/bedrock/latest/userguide/knowledge-base-supported.html#knowledge-base-supported-parsing)
457458

458459
#### TypeScript
459460

460461
```ts
461-
bedrock.ParsingStategy.foundationModel({
462+
bedrock.ParsingStrategy.foundationModel({
462463
model: BedrockFoundationModel.ANTHROPIC_CLAUDE_SONNET_V1_0,
463464
});
464465
```
465466

466467
#### Python
467468

468469
```python
469-
bedrock.ParsingStategy.foundation_model(
470+
bedrock.ParsingStrategy.foundation_model(
470471
parsing_model=BedrockFoundationModel.ANTHROPIC_CLAUDE_SONNET_V1_0
471472
)
472473
```
473474

475+
- **Bedrock Data Automation**: A fully-managed service that effectively processes multimodal data, without the need to provide any additional prompting. The cost of this parser depends on the number of pages in the document or number of images to be processed. Currently, only documents and images are supported, using standard output.
476+
477+
#### TypeScript
478+
```ts
479+
const parsingStrategy = ParsingStrategy.bedrockDataAutomation();
480+
```
481+
482+
#### Python
483+
```python
484+
bedrock.ParsingStrategy.bedrock_data_automation()
485+
```
486+
487+
If the chosen parsing strategy fails to parse a file, the Amazon Bedrock default parser is used as a fallback.
488+
489+
> warning
490+
> If you choose Amazon Bedrock Data Automation or foundation models as a parser, the method that you choose will be used to parse all .pdf files in your data source, even if the .pdf files contain only text. The default parser won’t be used to parse these .pdf files. Your account incurs charges for the use of Amazon Bedrock Data Automation or the foundation model in parsing these files.
491+
474492
For additional information regarding parsing, please refer to the [parsing documentation](https://docs.aws.amazon.com/bedrock/latest/userguide/kb-advanced-parsing.html)
475493

476494
### Context Enrichment

src/cdk-lib/bedrock/data-sources/base-data-source.ts

+2-2
Original file line numberDiff line numberDiff line change
@@ -21,7 +21,7 @@ import { IKnowledgeBase } from './../knowledge-bases/knowledge-base';
2121
import { ChunkingStrategy } from './chunking';
2222
import { ContextEnrichment } from './context-enrichment';
2323
import { CustomTransformation } from './custom-transformation';
24-
import { ParsingStategy } from './parsing';
24+
import { ParsingStrategy } from './parsing';
2525
/**
2626
* Specifies the policy for handling data when a data source resource is deleted.
2727
* This policy affects the vector embeddings created from the data source.
@@ -153,7 +153,7 @@ export interface DataSourceAssociationProps {
153153
*
154154
* @default - No Parsing Stategy is used.
155155
*/
156-
readonly parsingStrategy?: ParsingStategy;
156+
readonly parsingStrategy?: ParsingStrategy;
157157

158158
/**
159159
* The custom transformation strategy to use.

src/cdk-lib/bedrock/data-sources/parsing.ts

+44-6
Original file line numberDiff line numberDiff line change
@@ -11,6 +11,7 @@
1111
* and limitations under the License.
1212
*/
1313

14+
import { Aws } from 'aws-cdk-lib';
1415
import { CfnDataSource } from 'aws-cdk-lib/aws-bedrock';
1516
import { PolicyStatement } from 'aws-cdk-lib/aws-iam';
1617
import { DEFAULT_PARSING_PROMPT } from './default-parsing-prompt';
@@ -26,7 +27,7 @@ export enum ParsingModality {
2627
* Enum representing the types of parsing strategies available for Amazon Bedrock Knowledge Bases.
2728
* @see https://docs.aws.amazon.com/bedrock/latest/userguide/kb-advanced-parsing.html
2829
*/
29-
export enum ParsingStategyType {
30+
export enum ParsingStrategyType {
3031
/**
3132
* Uses a Bedrock Foundation Model for advanced parsing of non-textual information from documents.
3233
*/
@@ -43,7 +44,7 @@ export enum ParsingStategyType {
4344
/**
4445
* Properties for configuring a Foundation Model parsing strategy.
4546
*/
46-
export interface FoundationModelParsingStategyProps {
47+
export interface FoundationModelParsingStrategyProps {
4748
/**
4849
* The Foundation Model to use for parsing non-textual information.
4950
* Currently supported models are Claude 3 Sonnet and Claude 3 Haiku.
@@ -69,7 +70,7 @@ export interface FoundationModelParsingStategyProps {
6970
* Represents an advanced parsing strategy configuration for Knowledge Base ingestion.
7071
* @see https://docs.aws.amazon.com/bedrock/latest/userguide/kb-chunking-parsing.html#kb-advanced-parsing
7172
*/
72-
export abstract class ParsingStategy {
73+
export abstract class ParsingStrategy {
7374
// ------------------------------------------------------
7475
// FM Parsing Strategy
7576
// ------------------------------------------------------
@@ -80,8 +81,8 @@ export abstract class ParsingStategy {
8081
* - There are limits on file types (PDF) and total data that can be parsed using advanced parsing.
8182
* @see https://docs.aws.amazon.com/bedrock/latest/userguide/knowledge-base-ds.html#kb-ds-supported-doc-formats-limits
8283
*/
83-
public static foundationModel(props: FoundationModelParsingStategyProps): ParsingStategy {
84-
class FoundationModelTransformation extends ParsingStategy {
84+
public static foundationModel(props: FoundationModelParsingStrategyProps): ParsingStrategy {
85+
class FoundationModelTransformation extends ParsingStrategy {
8586
/** The CloudFormation property representation of this configuration */
8687
public readonly configuration = {
8788
bedrockFoundationModelConfiguration: {
@@ -90,7 +91,7 @@ export abstract class ParsingStategy {
9091
parsingPromptText: props.parsingPrompt ?? DEFAULT_PARSING_PROMPT,
9192
},
9293
},
93-
parsingStrategy: ParsingStategyType.FOUNDATION_MODEL,
94+
parsingStrategy: ParsingStrategyType.FOUNDATION_MODEL,
9495
};
9596

9697
public generatePolicyStatements(): PolicyStatement[] {
@@ -105,6 +106,43 @@ export abstract class ParsingStategy {
105106

106107
return new FoundationModelTransformation();
107108
}
109+
110+
/**
111+
* Creates a Bedrock Data Automation-based parsing strategy for processing multimodal data.
112+
* It leverages generative AI to automate the transformation of multi-modal data into structured formats.
113+
* If the parsing fails, the Amazon Bedrock default parser is used instead.
114+
*/
115+
public static bedrockDataAutomation(): ParsingStrategy {
116+
class BedrockDataAutomationTransformation extends ParsingStrategy {
117+
/** The CloudFormation property representation of this configuration */
118+
public readonly configuration = {
119+
bedrockDataAutomationConfiguration: {
120+
parsingModality: ParsingModality.MULTIMODAL,
121+
},
122+
parsingStrategy: ParsingStrategyType.DATA_AUTOMATION,
123+
};
124+
125+
public generatePolicyStatements(): PolicyStatement[] {
126+
return [
127+
new PolicyStatement({
128+
actions: ['bedrock:InvokeDataAutomationAsync'],
129+
resources: [
130+
`arn:${Aws.PARTITION}:bedrock:${Aws.REGION}:${Aws.PARTITION}:data-automation-project/public-rag-default`,
131+
`arn:${Aws.PARTITION}:bedrock:*:${Aws.ACCOUNT_ID}:data-automation-profile/us.data-automation-v1`, // see https://docs.aws.amazon.com/bedrock/latest/userguide/bda-cris.html
132+
],
133+
}),
134+
new PolicyStatement({
135+
actions: ['bedrock:GetDataAutomationStatus'],
136+
resources: [
137+
`arn:${Aws.PARTITION}:bedrock:${Aws.REGION}:${Aws.ACCOUNT_ID}:data-automation-invocation/*`,
138+
],
139+
}),
140+
];
141+
}
142+
}
143+
144+
return new BedrockDataAutomationTransformation();
145+
}
108146
// ------------------------------------------------------
109147
// Properties
110148
// ------------------------------------------------------

0 commit comments

Comments
 (0)