Skip to content

Commit 04e1efb

Browse files
aws-rafamskrokoko
andauthored
feat(bedrock): implement new data source structure (#668)
* feat(bedrock): add data source implementation and new chuncking strategies --------- Co-authored-by: Alain Krok <[email protected]>
1 parent a686a3e commit 04e1efb

File tree

69 files changed

+9920
-639
lines changed

Some content is hidden

Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.

69 files changed

+9920
-639
lines changed

Diff for: .gitignore

-9
Some generated files are not rendered by default. Learn more about customizing how changed files appear on GitHub.

Diff for: .npmignore

-2
Some generated files are not rendered by default. Learn more about customizing how changed files appear on GitHub.

Diff for: .projen/tasks.json

-69
Some generated files are not rendered by default. Learn more about customizing how changed files appear on GitHub.

Diff for: apidocs/namespaces/bedrock/README.md

+40-1
Original file line numberDiff line numberDiff line change
@@ -11,8 +11,12 @@
1111
### Enumerations
1212

1313
- [CanadaSpecific](enumerations/CanadaSpecific.md)
14-
- [ChunkingStrategy](enumerations/ChunkingStrategy.md)
14+
- [ConfluenceDataSourceAuthType](enumerations/ConfluenceDataSourceAuthType.md)
15+
- [ConfluenceObjectType](enumerations/ConfluenceObjectType.md)
1516
- [ContextualGroundingFilterConfigType](enumerations/ContextualGroundingFilterConfigType.md)
17+
- [CrawlingScope](enumerations/CrawlingScope.md)
18+
- [DataDeletionPolicy](enumerations/DataDeletionPolicy.md)
19+
- [DataSourceType](enumerations/DataSourceType.md)
1620
- [FiltersConfigStrength](enumerations/FiltersConfigStrength.md)
1721
- [FiltersConfigType](enumerations/FiltersConfigType.md)
1822
- [Finance](enumerations/Finance.md)
@@ -24,6 +28,11 @@
2428
- [PromptState](enumerations/PromptState.md)
2529
- [PromptTemplateType](enumerations/PromptTemplateType.md)
2630
- [PromptType](enumerations/PromptType.md)
31+
- [SalesforceDataSourceAuthType](enumerations/SalesforceDataSourceAuthType.md)
32+
- [SalesforceObjectType](enumerations/SalesforceObjectType.md)
33+
- [SharePointDataSourceAuthType](enumerations/SharePointDataSourceAuthType.md)
34+
- [SharePointObjectType](enumerations/SharePointObjectType.md)
35+
- [TransformationStep](enumerations/TransformationStep.md)
2736
- [UKSpecific](enumerations/UKSpecific.md)
2837
- [USASpecific](enumerations/USASpecific.md)
2938

@@ -34,18 +43,28 @@
3443
- [AgentAlias](classes/AgentAlias.md)
3544
- [ApiSchema](classes/ApiSchema.md)
3645
- [BedrockFoundationModel](classes/BedrockFoundationModel.md)
46+
- [ChunkingStrategy](classes/ChunkingStrategy.md)
47+
- [ConfluenceDataSource](classes/ConfluenceDataSource.md)
3748
- [ContentPolicyConfig](classes/ContentPolicyConfig.md)
49+
- [CustomTransformation](classes/CustomTransformation.md)
50+
- [DataSource](classes/DataSource.md)
51+
- [DataSourceBase](classes/DataSourceBase.md)
52+
- [DataSourceNew](classes/DataSourceNew.md)
3853
- [Guardrail](classes/Guardrail.md)
3954
- [GuardrailVersion](classes/GuardrailVersion.md)
4055
- [InlineApiSchema](classes/InlineApiSchema.md)
4156
- [KnowledgeBase](classes/KnowledgeBase.md)
57+
- [ParsingStategy](classes/ParsingStategy.md)
4258
- [Prompt](classes/Prompt.md)
4359
- [PromptVariant](classes/PromptVariant.md)
4460
- [PromptVersion](classes/PromptVersion.md)
4561
- [S3ApiSchema](classes/S3ApiSchema.md)
4662
- [S3DataSource](classes/S3DataSource.md)
63+
- [SalesforceDataSource](classes/SalesforceDataSource.md)
4764
- [SensitiveInformationPolicyConfig](classes/SensitiveInformationPolicyConfig.md)
65+
- [SharePointDataSource](classes/SharePointDataSource.md)
4866
- [Topic](classes/Topic.md)
67+
- [WebCrawlerDataSource](classes/WebCrawlerDataSource.md)
4968

5069
### Interfaces
5170

@@ -57,23 +76,43 @@
5776
- [ApiSchemaConfig](interfaces/ApiSchemaConfig.md)
5877
- [BedrockFoundationModelProps](interfaces/BedrockFoundationModelProps.md)
5978
- [CommonPromptVariantProps](interfaces/CommonPromptVariantProps.md)
79+
- [ConfluenceCrawlingFilters](interfaces/ConfluenceCrawlingFilters.md)
80+
- [ConfluenceDataSourceAssociationProps](interfaces/ConfluenceDataSourceAssociationProps.md)
81+
- [ConfluenceDataSourceProps](interfaces/ConfluenceDataSourceProps.md)
6082
- [ContentPolicyConfigProps](interfaces/ContentPolicyConfigProps.md)
6183
- [ContextualGroundingPolicyConfigProps](interfaces/ContextualGroundingPolicyConfigProps.md)
84+
- [CrawlingFilters](interfaces/CrawlingFilters.md)
85+
- [DataSourceAssociationProps](interfaces/DataSourceAssociationProps.md)
86+
- [FoundationModelParsingStategyProps](interfaces/FoundationModelParsingStategyProps.md)
6287
- [GuardrailConfiguration](interfaces/GuardrailConfiguration.md)
6388
- [GuardrailProps](interfaces/GuardrailProps.md)
89+
- [HierarchicalChunkingProps](interfaces/HierarchicalChunkingProps.md)
6490
- [IAgentAlias](interfaces/IAgentAlias.md)
91+
- [IDataSource](interfaces/IDataSource.md)
92+
- [IKnowledgeBase](interfaces/IKnowledgeBase.md)
6593
- [InferenceConfiguration](interfaces/InferenceConfiguration.md)
6694
- [IPrompt](interfaces/IPrompt.md)
95+
- [KnowledgeBaseAttributes](interfaces/KnowledgeBaseAttributes.md)
6796
- [KnowledgeBaseProps](interfaces/KnowledgeBaseProps.md)
97+
- [LambdaCustomTransformationProps](interfaces/LambdaCustomTransformationProps.md)
6898
- [PromptConfiguration](interfaces/PromptConfiguration.md)
6999
- [PromptOverrideConfiguration](interfaces/PromptOverrideConfiguration.md)
70100
- [PromptProps](interfaces/PromptProps.md)
71101
- [PromptVersionProps](interfaces/PromptVersionProps.md)
102+
- [S3DataSourceAssociationProps](interfaces/S3DataSourceAssociationProps.md)
72103
- [S3DataSourceProps](interfaces/S3DataSourceProps.md)
73104
- [S3Identifier](interfaces/S3Identifier.md)
105+
- [SalesforceCrawlingFilters](interfaces/SalesforceCrawlingFilters.md)
106+
- [SalesforceDataSourceAssociationProps](interfaces/SalesforceDataSourceAssociationProps.md)
107+
- [SalesforceDataSourceProps](interfaces/SalesforceDataSourceProps.md)
74108
- [SensitiveInformationPolicyConfigProps](interfaces/SensitiveInformationPolicyConfigProps.md)
109+
- [SharePointCrawlingFilters](interfaces/SharePointCrawlingFilters.md)
110+
- [SharePointDataSourceAssociationProps](interfaces/SharePointDataSourceAssociationProps.md)
111+
- [SharePointDataSourceProps](interfaces/SharePointDataSourceProps.md)
75112
- [TextPromptVariantProps](interfaces/TextPromptVariantProps.md)
76113
- [TopicProps](interfaces/TopicProps.md)
114+
- [WebCrawlerDataSourceAssociationProps](interfaces/WebCrawlerDataSourceAssociationProps.md)
115+
- [WebCrawlerDataSourceProps](interfaces/WebCrawlerDataSourceProps.md)
77116

78117
### Functions
79118

+129
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,129 @@
1+
[**@cdklabs/generative-ai-cdk-constructs**](../../../README.md)**Docs**
2+
3+
***
4+
5+
[@cdklabs/generative-ai-cdk-constructs](../../../README.md) / [bedrock](../README.md) / ChunkingStrategy
6+
7+
# Class: `abstract` ChunkingStrategy
8+
9+
## Properties
10+
11+
### configuration
12+
13+
> `abstract` **configuration**: `ChunkingConfigurationProperty`
14+
15+
The CloudFormation property representation of this configuration
16+
17+
***
18+
19+
### DEFAULT
20+
21+
> `readonly` `static` **DEFAULT**: [`ChunkingStrategy`](ChunkingStrategy.md)
22+
23+
Fixed Sized Chunking with the default chunk size of 300 tokens and 20% overlap.
24+
25+
***
26+
27+
### FIXED\_SIZE
28+
29+
> `readonly` `static` **FIXED\_SIZE**: [`ChunkingStrategy`](ChunkingStrategy.md)
30+
31+
Fixed Sized Chunking with the default chunk size of 300 tokens and 20% overlap.
32+
You can adjust these values based on your specific requirements using the
33+
`ChunkingStrategy.fixedSize(params)` method.
34+
35+
***
36+
37+
### HIERARCHICAL\_COHERE
38+
39+
> `readonly` `static` **HIERARCHICAL\_COHERE**: [`ChunkingStrategy`](ChunkingStrategy.md)
40+
41+
Hierarchical Chunking with the default for Cohere Models.
42+
- Overlap tokens: 30
43+
- Max parent token size: 500
44+
- Max child token size: 100
45+
46+
***
47+
48+
### HIERARCHICAL\_TITAN
49+
50+
> `readonly` `static` **HIERARCHICAL\_TITAN**: [`ChunkingStrategy`](ChunkingStrategy.md)
51+
52+
Hierarchical Chunking with the default for Titan Models.
53+
- Overlap tokens: 60
54+
- Max parent token size: 1500
55+
- Max child token size: 300
56+
57+
***
58+
59+
### NONE
60+
61+
> `readonly` `static` **NONE**: [`ChunkingStrategy`](ChunkingStrategy.md)
62+
63+
Amazon Bedrock treats each file as one chunk. Suitable for documents that
64+
are already pre-processed or text split.
65+
66+
***
67+
68+
### SEMANTIC
69+
70+
> `readonly` `static` **SEMANTIC**: [`ChunkingStrategy`](ChunkingStrategy.md)
71+
72+
Semantic Chunking with the default of bufferSize: 0,
73+
breakpointPercentileThreshold: 95, and maxTokens: 300.
74+
You can adjust these values based on your specific requirements using the
75+
`ChunkingStrategy.semantic(params)` method.
76+
77+
## Methods
78+
79+
### fixedSize()
80+
81+
> `static` **fixedSize**(`props`): [`ChunkingStrategy`](ChunkingStrategy.md)
82+
83+
Method for customizing a fixed sized chunking strategy.
84+
85+
#### Parameters
86+
87+
**props**: `FixedSizeChunkingConfigurationProperty`
88+
89+
#### Returns
90+
91+
[`ChunkingStrategy`](ChunkingStrategy.md)
92+
93+
***
94+
95+
### hierarchical()
96+
97+
> `static` **hierarchical**(`props`): [`ChunkingStrategy`](ChunkingStrategy.md)
98+
99+
Method for customizing a hierarchical chunking strategy.
100+
For custom chunking, the maximum token chunk size depends on the model.
101+
- Amazon Titan Text Embeddings: 8192
102+
- Cohere Embed models: 512
103+
104+
#### Parameters
105+
106+
**props**: [`HierarchicalChunkingProps`](../interfaces/HierarchicalChunkingProps.md)
107+
108+
#### Returns
109+
110+
[`ChunkingStrategy`](ChunkingStrategy.md)
111+
112+
***
113+
114+
### semantic()
115+
116+
> `static` **semantic**(`props`): [`ChunkingStrategy`](ChunkingStrategy.md)
117+
118+
Method for customizing a semantic chunking strategy.
119+
For custom chunking, the maximum token chunk size depends on the model.
120+
- Amazon Titan Text Embeddings: 8192
121+
- Cohere Embed models: 512
122+
123+
#### Parameters
124+
125+
**props**: `SemanticChunkingConfigurationProperty`
126+
127+
#### Returns
128+
129+
[`ChunkingStrategy`](ChunkingStrategy.md)

0 commit comments

Comments
 (0)