Skip to content

Commit 1396385

Browse files
author
Dinesh Sajwan
committed
feat(visualqa): updated documentation for image embeddings
1 parent b635df8 commit 1396385

File tree

1 file changed

+25
-6
lines changed
  • src/patterns/gen-ai/aws-rag-appsync-stepfn-opensearch

1 file changed

+25
-6
lines changed

src/patterns/gen-ai/aws-rag-appsync-stepfn-opensearch/README.md

+25-6
Original file line numberDiff line numberDiff line change
@@ -36,14 +36,16 @@
3636

3737
This CDK construct creates a pipeline for RAG (retrieval augmented generation) source. It ingests documents and then converts them into text formats. The output can be used for scenarios with long context windows. This means that your system can now consider and analyze a significant amount of surrounding information when processing and understanding text. This is especially valuable in tasks like language understanding and document summarization.
3838

39-
Files in PDF format are uploaded to an input Amazon Simple Storage Service (S3) bucket. Authorized clients (Amazon Cognito user pool) will trigger an AWS AppSync mutation to start the ingestion process, and can use subscriptions to get notifications on the ingestion status. The mutation call will trigger an AWS Step Function with three different steps:
39+
PDF files and images(.jpg,.jpeg,.svg,.png) are uploaded to an input Amazon Simple Storage Service (S3) bucket. Authorized clients (Amazon Cognito user pool) will trigger an AWS AppSync mutation to start the ingestion process, and can use subscriptions to get notifications on the ingestion status. The mutation call will trigger an AWS Step Function with three different steps:
4040
- Input validation: an AWS Lambda function will verify the input formats of the files requested for ingestion. If the files are in a format which is not supported by the pipeline, an error message will be returned.
4141
- Transformation: the input files are processed in parallel using a [Map](https://docs.aws.amazon.com/step-functions/latest/dg/amazon-states-language-map-state.html) state through an AWS Lambda. The function uses the [LangChain](https://www.langchain.com/) client to get the content of each file and store the text file in the output bucket. This is useful for workflows which want to use a long context window approach and send the entire file as context to a large language model. If the file name already exists in the output bucket, the input file will not be processed.
42-
- Embeddings step: Files processed and stored in the output S3 bucket are consumed by an AWS Lambda function. Chunks from documents are created, as well as text embeddings using Amazon Bedrock (model: amazon.titan-embed-text-v1). The chunks and embeddings are then stored in a knowledge base (OpenSearch provisioned cluster). Make sure the model (amazon.titan-embed-text-v1) is enabled in your account. Please follow the [Amazon Bedrock User Guide](https://docs.aws.amazon.com/bedrock/latest/userguide/model-access.html) for steps related to enabling model access.
42+
For image files the the transformation step use [Amazon Rekognition](https://aws.amazon.com/rekognition/) to detect lables and image moderation. It then generate a descriptive text of the image using anthropic.claude-v2:1 and save the text file in processed s3 bucket.
43+
- Embeddings step: Files processed and stored in the output S3 bucket are consumed by an AWS Lambda function. Chunks from documents are created, as well as text embeddings using Amazon Bedrock (model: amazon.titan-embed-text-v1). For uploaded images multimodality embeddings are created using Amazon Bedrock (model: amazon.titan-embed-image-v1)
44+
The chunks and embeddings are then stored in a knowledge base (OpenSearch provisioned cluster). Make sure the model (amazon.titan-embed-text-v1,amazon.titan-embed-image-v1,anthropic.claude-v2:1) is enabled in your account. Please follow the [Amazon Bedrock User Guide](https://docs.aws.amazon.com/bedrock/latest/userguide/model-access.html) for steps related to enabling model access.
4345

4446
Documents stored in the knowledge base contain the following metadata:
4547
- Timestamp: when the embeddings were created (current time in seconds since the Epoch)
46-
- Embeddings model used: amazon.titan-embed-text-v1
48+
- Embeddings model used: amazon.titan-embed-text-v1 , amazon.titan-embed-image-v1
4749

4850
If you have multiple workflows using GraphQL endpoints and want to use a single endpoint, you can use an [AppSync Merged API](https://docs.aws.amazon.com/appsync/latest/devguide/merged-api.html). This construct can take as a parameter an existing AppSync Merged API; if provided, the mutation call and subscription updates will be targeted at the Merged API.
4951

@@ -136,6 +138,7 @@ subscription MySubscription {
136138
files {
137139
name
138140
status
141+
imageurl
139142
}
140143
}
141144
}
@@ -148,11 +151,13 @@ Expected response:
148151
"files": [
149152
{
150153
"name": "a.pdf",
151-
"status": "succeed"
154+
"status": "succeed",
155+
"imageurl":"s3presignedurl"
152156
}
153157
{
154158
"name": "b.pdf",
155-
"status": "succeed"
159+
"status": "succeed",
160+
"imageurl":"s3presignedurl"
156161
}
157162
]
158163
}
@@ -169,7 +174,20 @@ Mutation call to trigger the ingestion process:
169174

170175
```
171176
mutation MyMutation {
172-
ingestDocuments(ingestioninput: {files: [{status: "", name: "a.pdf"}, {status: "", name: "b.pdf"}], ingestionjobid: "123"}) {
177+
ingestDocuments(ingestioninput: {
178+
embeddings_model:
179+
{
180+
provider: "Bedrock",
181+
modelId: "amazon.titan-embed-text-v1",
182+
streaming: true
183+
},
184+
files: [{status: "", name: "a.pdf"}],
185+
ingestionjobid: "123",
186+
ignore_existing: true}) {
187+
files {
188+
imageurl
189+
status
190+
}
173191
ingestionjobid
174192
}
175193
}
@@ -188,6 +206,7 @@ Where:
188206
- files.status: this field will be used by the subscription to update the status of the ingestion for the file specified
189207
- files.name: name of the file stored in the input S3 bucket
190208
- ingestionjobid: id which can be used to filter subscriptions on client side
209+
- embeddings_model: Based on type of modality (text or image ) the model provider , model id can be used.
191210

192211

193212

0 commit comments

Comments
 (0)