Skip to content

Commit f4ddc8f

Browse files
dineshSajwanDinesh Sajwangithub-actionskrokoko
authored
feat(construct): updated summarization architecture and added support… (#351)
* feat(construct): updated summarization architecture and added support for images with claude 3 * chore: self mutation Signed-off-by: github-actions <[email protected]> * feat(construct): removed unwanted dead code --------- Signed-off-by: github-actions <[email protected]> Signed-off-by: Dinesh Sajwan <[email protected]> Co-authored-by: Dinesh Sajwan <[email protected]> Co-authored-by: github-actions <[email protected]> Co-authored-by: Alain Krok <[email protected]>
1 parent a5987ff commit f4ddc8f

File tree

17 files changed

+508
-462
lines changed

17 files changed

+508
-462
lines changed

apidocs/classes/SummarizationAppsyncStepfn.md

Lines changed: 0 additions & 9 deletions
Original file line numberDiff line numberDiff line change
@@ -27,7 +27,6 @@
2727
- [lambdaTracing](SummarizationAppsyncStepfn.md#lambdatracing)
2828
- [node](SummarizationAppsyncStepfn.md#node)
2929
- [processedAssetBucket](SummarizationAppsyncStepfn.md#processedassetbucket)
30-
- [redisCluster](SummarizationAppsyncStepfn.md#rediscluster)
3130
- [retention](SummarizationAppsyncStepfn.md#retention)
3231
- [securityGroup](SummarizationAppsyncStepfn.md#securitygroup)
3332
- [stage](SummarizationAppsyncStepfn.md#stage)
@@ -200,14 +199,6 @@ Returns the instance of s3.IBucket used by the construct
200199

201200
___
202201

203-
### redisCluster
204-
205-
`Readonly` **redisCluster**: `CfnCacheCluster`
206-
207-
Returns an instance of redis cluster created by the construct
208-
209-
___
210-
211202
### retention
212203

213204
**retention**: `RetentionDays` = `logs.RetentionDays.TEN_YEARS`

apidocs/interfaces/SummarizationAppsyncStepfnProps.md

Lines changed: 0 additions & 38 deletions
Original file line numberDiff line numberDiff line change
@@ -8,7 +8,6 @@
88

99
- [bucketInputsAssetsProps](SummarizationAppsyncStepfnProps.md#bucketinputsassetsprops)
1010
- [bucketProcessedAssetsProps](SummarizationAppsyncStepfnProps.md#bucketprocessedassetsprops)
11-
- [cfnCacheClusterProps](SummarizationAppsyncStepfnProps.md#cfncacheclusterprops)
1211
- [cognitoUserPool](SummarizationAppsyncStepfnProps.md#cognitouserpool)
1312
- [customDocumentReaderDockerLambdaProps](SummarizationAppsyncStepfnProps.md#customdocumentreaderdockerlambdaprops)
1413
- [customInputValidationDockerLambdaProps](SummarizationAppsyncStepfnProps.md#custominputvalidationdockerlambdaprops)
@@ -19,7 +18,6 @@
1918
- [existingInputAssetsBucketObj](SummarizationAppsyncStepfnProps.md#existinginputassetsbucketobj)
2019
- [existingMergedApi](SummarizationAppsyncStepfnProps.md#existingmergedapi)
2120
- [existingProcessedAssetsBucketObj](SummarizationAppsyncStepfnProps.md#existingprocessedassetsbucketobj)
22-
- [existingRedisCulster](SummarizationAppsyncStepfnProps.md#existingredisculster)
2321
- [existingSecurityGroup](SummarizationAppsyncStepfnProps.md#existingsecuritygroup)
2422
- [existingVpc](SummarizationAppsyncStepfnProps.md#existingvpc)
2523
- [isFileTransformationRequired](SummarizationAppsyncStepfnProps.md#isfiletransformationrequired)
@@ -61,27 +59,6 @@ Providing both this and `existingProcessedAssetsBucketObj` will cause an error.
6159

6260
___
6361

64-
### cfnCacheClusterProps
65-
66-
`Optional` `Readonly` **cfnCacheClusterProps**: `CfnCacheClusterProps`
67-
68-
Optional. Custom cfnCacheClusterProps for Redis.
69-
Providing existingRedisCulster and cfnCacheClusterProps together will result in error.
70-
71-
**`Default`**
72-
73-
```ts
74-
cacheNodeType - 'cache.r6g.xlarge'
75-
```
76-
77-
**`Default`**
78-
79-
```ts
80-
numCacheNodes- 1
81-
```
82-
83-
___
84-
8562
### cognitoUserPool
8663

8764
`Readonly` **cognitoUserPool**: `IUserPool`
@@ -217,21 +194,6 @@ If None is provided then this contruct will create one.
217194

218195
___
219196

220-
### existingRedisCulster
221-
222-
`Optional` `Readonly` **existingRedisCulster**: `CfnCacheCluster`
223-
224-
Optional. Existing Redis cluster to cache the generated summary
225-
for subsequent request of same document.
226-
227-
**`Default`**
228-
229-
```ts
230-
- none
231-
```
232-
233-
___
234-
235197
### existingSecurityGroup
236198

237199
`Optional` `Readonly` **existingSecurityGroup**: `ISecurityGroup`

docs/generative_ai_cdk_constructs.drawio

Lines changed: 68 additions & 71 deletions
Large diffs are not rendered by default.

lambda/aws-summarization-appsync-stepfn/document_reader/helper.py

Lines changed: 40 additions & 12 deletions
Original file line numberDiff line numberDiff line change
@@ -21,6 +21,7 @@
2121
tracer = Tracer(service="SUMMARY_DOCUMENT_READER")
2222

2323
s3 = boto3.resource('s3')
24+
rekognition_client=boto3.client('rekognition')
2425

2526
@tracer.capture_method
2627
def read_file_from_s3(bucket, key):
@@ -59,17 +60,44 @@ def get_file_transformation(transformed_asset_bucket,transformed_file_name,
5960
}
6061
if (check_file_exists(transformed_asset_bucket, transformed_file_name) is False):
6162
logger.info("Starting file transformation")
62-
loader = S3FileLoaderInMemory(input_asset_bucket, original_file_name)
63-
document_content = loader.load()
64-
if not document_content:
65-
response['status'] = 'Error'
66-
response['summary'] = 'Not able to transform the file.'
67-
return response
68-
encoded_string = document_content.encode("utf-8")
69-
s3.Bucket(transformed_asset_bucket).put_object(Key=transformed_file_name, Body=encoded_string)
70-
response['status'] = 'File transformed'
71-
response['name'] = transformed_file_name
72-
response['summary']=''
63+
if(original_file_name.endswith('.pdf')):
64+
loader = S3FileLoaderInMemory(input_asset_bucket, original_file_name)
65+
document_content = loader.load()
66+
if not document_content:
67+
response['status'] = 'Error'
68+
response['summary'] = 'Not able to transform the file.'
69+
return response
70+
encoded_string = document_content.encode("utf-8")
71+
s3.Bucket(transformed_asset_bucket).put_object(Key=transformed_file_name, Body=encoded_string)
72+
response['status'] = 'File transformed'
73+
response['name'] = transformed_file_name
74+
response['summary']=''
75+
else:
76+
with open(original_file_name, "rb") as img_file:
77+
image_bytes = {"Bytes": img_file.read()}
78+
if(moderate_image(image_bytes) is False):
79+
logger.info("Upload image to processed assets bucket")
80+
s3.Bucket(transformed_asset_bucket).put_object(Key=original_file_name, Body=image_bytes)
81+
response['status'] = 'File transformed'
82+
response['name'] = original_file_name
83+
response['summary']=''
84+
7385
else:
7486
logger.info("File already exists,skip transformation.")
75-
return response
87+
return response
88+
89+
def moderate_image(image_bytes)-> str:
90+
isToxicImage = False
91+
try:
92+
rekognition_response = rekognition_client.detect_moderation_labels(
93+
Image=image_bytes)
94+
print(rekognition_response)
95+
for label in rekognition_response['ModerationLabels']:
96+
if(label['Confidence'] > 0.60):
97+
isToxicImage=True
98+
print(f'Image failed moderation check, exit image uploading')
99+
break
100+
except Exception as exp:
101+
print(f"Couldn't analyze image: {exp}")
102+
103+
return isToxicImage

lambda/aws-summarization-appsync-stepfn/document_reader/lambda.py

Lines changed: 16 additions & 57 deletions
Original file line numberDiff line numberDiff line change
@@ -12,7 +12,6 @@
1212
#
1313
import os
1414
from helper import check_file_exists,get_file_transformation
15-
import redis
1615

1716
from update_summary_status import updateSummaryJobStatus
1817
from aws_lambda_powertools import Logger, Tracer, Metrics
@@ -34,40 +33,30 @@
3433
def handler(event, context: LambdaContext):
3534

3635
logger.info(f"{event=}")
37-
ignore_existing = event.get("ignore_existing", False)
3836

3937
original_file_name = event["name"]
40-
job_id = event["jobid"]
38+
job_id = event["jobid"]
39+
summary_model = event["summary_model"]
40+
language = event["language"]
4141
response = {
42-
"is_summary_available": False,
4342
"summary_job_id": job_id,
4443
"file_name": original_file_name,
45-
"status": "Pending",
44+
"status": "Working on generating the summary",
4645
"summary": "",
4746
"transformed_file_name":'',
47+
"summary_model": summary_model,
48+
"language": language,
4849
}
4950

5051
logger.set_correlation_id(job_id)
5152
metrics.add_metadata(key='correlationId', value=job_id)
5253
tracer.put_annotation(key="correlationId", value=job_id)
5354

54-
filesummary = get_summary_from_cache(original_file_name) if not ignore_existing else None
5555

56-
if filesummary is not None:
57-
metrics.add_metric(name="summary_cache_hit",unit=MetricUnit.Count, value=1)
58-
response.update(
59-
{
60-
"file_name": original_file_name,
61-
"status": "Completed",
62-
"summary": filesummary,
63-
"is_summary_available": True,
64-
}
65-
)
66-
else:
67-
metrics.add_metric(name="summary_llm_hit", unit=MetricUnit.Count, value=1)
68-
transformed_file_name = original_file_name.replace(".pdf", ".txt")
56+
metrics.add_metric(name="summary_llm_hit", unit=MetricUnit.Count, value=1)
57+
transformed_file_name = original_file_name.replace(".pdf", ".txt")
6958

70-
if(is_file_tranformation_required):
59+
if(is_file_tranformation_required):
7160
logger.info("File transformation required")
7261
transformed_file = get_file_transformation(transformed_bucket_name,
7362
transformed_file_name,
@@ -79,13 +68,12 @@ def handler(event, context: LambdaContext):
7968
"status": transformed_file['status'],
8069
"summary": transformed_file['summary'],
8170
"transformed_file_name":transformed_file_name,
82-
"is_summary_available": False
8371
}
8472
)
85-
else:
86-
pdf_transformed_file = check_file_exists(transformed_bucket_name,
73+
else:
74+
pdf_transformed_file = check_file_exists(transformed_bucket_name,
8775
transformed_file_name)
88-
if pdf_transformed_file is False:
76+
if pdf_transformed_file is False:
8977
response.update(
9078
{
9179
"file_name": original_file_name,
@@ -97,37 +85,8 @@ def handler(event, context: LambdaContext):
9785

9886

9987
logger.info({"document reader response:::": response})
100-
updateSummaryJobStatus({'jobid': job_id, 'files':
101-
[{ 'status':response["status"],
102-
'name':response['file_name'] ,
103-
'summary':response["summary"] }]})
88+
updateSummaryJobStatus({'summary_job_id': job_id,
89+
'status':response["status"],
90+
'name':response['file_name'] ,
91+
'summary':response["summary"] })
10492
return response
105-
106-
@tracer.capture_method
107-
def get_summary_from_cache(file_name):
108-
109-
logger.info({"Searching Redis for cached summary file: "+file_name})
110-
redis_host = os.environ.get("REDIS_HOST")
111-
redis_port = os.environ.get("REDIS_PORT")
112-
113-
logger.info(f"Redis host: {redis_host}")
114-
logger.info(f"Redis port: {redis_port}")
115-
116-
if redis_host is None or redis_port is None:
117-
logger.exception({"Redis host or port is not set"})
118-
else:
119-
try:
120-
logger.info({"Connecting Redis......"})
121-
redis_client = redis.Redis(host=redis_host, port=redis_port)
122-
fileSummary = redis_client.get(file_name)
123-
except (ValueError, redis.ConnectionError) as e:
124-
logger.exception({"An error occured while connecting to Redis" : e})
125-
return
126-
127-
if fileSummary:
128-
logger.info({"File summary found in cache: ": fileSummary})
129-
return fileSummary.decode()
130-
131-
132-
logger.info("File summary not found in cache, generating it from llm")
133-

lambda/aws-summarization-appsync-stepfn/document_reader/update_summary_status.py

Lines changed: 9 additions & 14 deletions
Original file line numberDiff line numberDiff line change
@@ -45,29 +45,24 @@ def get_credentials(secret_id: str, region_name: str) -> str:
4545
def updateSummaryJobStatus(variables):
4646

4747
logger.info(f"send status variables :: {variables}")
48+
summary = variables['summary']
49+
4850
query = """
4951
mutation updateSummaryJobStatus {
50-
updateSummaryJobStatus(files: $files, summary_job_id: \"$jobid\") {
51-
files {
52+
updateSummaryJobStatus(summary_job_id: \"$summary_job_id\",
53+
name: \"$name\",status: \"$status\",summary: \""""+summary+"""\",) {
54+
summary_job_id
5255
name
5356
status
5457
summary
55-
}
56-
summary_job_id
58+
5759
}
5860
}
5961
"""
6062

61-
query = query.replace("$jobid", variables['jobid'])
62-
query = query.replace("$files", str(variables['files']).replace("\'", "\""))
63-
query = query.replace("\"name\"", "name")
64-
query = query.replace("\"status\"", "status")
65-
query = query.replace("\"summary\"", "summary")
66-
67-
68-
# query = query.replace("\"file_name\"", "file_name")
69-
# query = query.replace("\"status\"", "status")
70-
query = query.replace("\n", "")
63+
query = query.replace("$summary_job_id", variables['summary_job_id'])
64+
query = query.replace("$name", variables['name'])
65+
query = query.replace("$status", variables['status'])
7166

7267
request = {'query':query}
7368

0 commit comments

Comments
 (0)