Skip to content

Commit 1aa86a6

Browse files
author
AWS
committed
Amazon Textract Update: This release adds support for asynchronously analyzing invoice and receipt documents through two new APIs: StartExpenseAnalysis and GetExpenseAnalysis
1 parent 927860c commit 1aa86a6

File tree

2 files changed

+144
-4
lines changed

2 files changed

+144
-4
lines changed
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,6 @@
1+
{
2+
"type": "feature",
3+
"category": "Amazon Textract",
4+
"contributor": "",
5+
"description": "This release adds support for asynchronously analyzing invoice and receipt documents through two new APIs: StartExpenseAnalysis and GetExpenseAnalysis"
6+
}

services/textract/src/main/resources/codegen-resources/service-2.json

+138-4
Original file line numberDiff line numberDiff line change
@@ -116,6 +116,26 @@
116116
],
117117
"documentation":"<p>Gets the results for an Amazon Textract asynchronous operation that detects text in a document. Amazon Textract can detect lines of text and the words that make up a line of text.</p> <p>You start asynchronous text detection by calling <a>StartDocumentTextDetection</a>, which returns a job identifier (<code>JobId</code>). When the text detection operation finishes, Amazon Textract publishes a completion status to the Amazon Simple Notification Service (Amazon SNS) topic that's registered in the initial call to <code>StartDocumentTextDetection</code>. To get the results of the text-detection operation, first check that the status value published to the Amazon SNS topic is <code>SUCCEEDED</code>. If so, call <code>GetDocumentTextDetection</code>, and pass the job identifier (<code>JobId</code>) from the initial call to <code>StartDocumentTextDetection</code>.</p> <p> <code>GetDocumentTextDetection</code> returns an array of <a>Block</a> objects. </p> <p>Each document page has as an associated <code>Block</code> of type PAGE. Each PAGE <code>Block</code> object is the parent of LINE <code>Block</code> objects that represent the lines of detected text on a page. A LINE <code>Block</code> object is a parent for each word that makes up the line. Words are represented by <code>Block</code> objects of type WORD.</p> <p>Use the MaxResults parameter to limit the number of blocks that are returned. If there are more results than specified in <code>MaxResults</code>, the value of <code>NextToken</code> in the operation response contains a pagination token for getting the next set of results. To get the next page of results, call <code>GetDocumentTextDetection</code>, and populate the <code>NextToken</code> request parameter with the token value that's returned from the previous call to <code>GetDocumentTextDetection</code>.</p> <p>For more information, see <a href=\"https://docs.aws.amazon.com/textract/latest/dg/how-it-works-detecting.html\">Document Text Detection</a>.</p>"
118118
},
119+
"GetExpenseAnalysis":{
120+
"name":"GetExpenseAnalysis",
121+
"http":{
122+
"method":"POST",
123+
"requestUri":"/"
124+
},
125+
"input":{"shape":"GetExpenseAnalysisRequest"},
126+
"output":{"shape":"GetExpenseAnalysisResponse"},
127+
"errors":[
128+
{"shape":"InvalidParameterException"},
129+
{"shape":"AccessDeniedException"},
130+
{"shape":"ProvisionedThroughputExceededException"},
131+
{"shape":"InvalidJobIdException"},
132+
{"shape":"InternalServerError"},
133+
{"shape":"ThrottlingException"},
134+
{"shape":"InvalidS3ObjectException"},
135+
{"shape":"InvalidKMSKeyException"}
136+
],
137+
"documentation":"<p>Gets the results for an Amazon Textract asynchronous operation that analyzes invoices and receipts. Amazon Textract finds contact information, items purchased, and vendor name, from input invoices and receipts.</p> <p>You start asynchronous invoice/receipt analysis by calling <a>StartExpenseAnalysis</a>, which returns a job identifier (<code>JobId</code>). Upon completion of the invoice/receipt analysis, Amazon Textract publishes the completion status to the Amazon Simple Notification Service (Amazon SNS) topic. This topic must be registered in the initial call to <code>StartExpenseAnalysis</code>. To get the results of the invoice/receipt analysis operation, first ensure that the status value published to the Amazon SNS topic is <code>SUCCEEDED</code>. If so, call <code>GetExpenseAnalysis</code>, and pass the job identifier (<code>JobId</code>) from the initial call to <code>StartExpenseAnalysis</code>.</p> <p>Use the MaxResults parameter to limit the number of blocks that are returned. If there are more results than specified in <code>MaxResults</code>, the value of <code>NextToken</code> in the operation response contains a pagination token for getting the next set of results. To get the next page of results, call <code>GetExpenseAnalysis</code>, and populate the <code>NextToken</code> request parameter with the token value that's returned from the previous call to <code>GetExpenseAnalysis</code>.</p> <p>For more information, see <a href=\"https://docs.aws.amazon.com/textract/latest/dg/invoices-receipts.html\">Analyzing Invoices and Receipts</a>.</p>"
138+
},
119139
"StartDocumentAnalysis":{
120140
"name":"StartDocumentAnalysis",
121141
"http":{
@@ -138,7 +158,7 @@
138158
{"shape":"ThrottlingException"},
139159
{"shape":"LimitExceededException"}
140160
],
141-
"documentation":"<p>Starts the asynchronous analysis of an input document for relationships between detected items such as key-value pairs, tables, and selection elements.</p> <p> <code>StartDocumentAnalysis</code> can analyze text in documents that are in JPEG, PNG, and PDF format. The documents are stored in an Amazon S3 bucket. Use <a>DocumentLocation</a> to specify the bucket name and file name of the document. </p> <p> <code>StartDocumentAnalysis</code> returns a job identifier (<code>JobId</code>) that you use to get the results of the operation. When text analysis is finished, Amazon Textract publishes a completion status to the Amazon Simple Notification Service (Amazon SNS) topic that you specify in <code>NotificationChannel</code>. To get the results of the text analysis operation, first check that the status value published to the Amazon SNS topic is <code>SUCCEEDED</code>. If so, call <a>GetDocumentAnalysis</a>, and pass the job identifier (<code>JobId</code>) from the initial call to <code>StartDocumentAnalysis</code>.</p> <p>For more information, see <a href=\"https://docs.aws.amazon.com/textract/latest/dg/how-it-works-analyzing.html\">Document Text Analysis</a>.</p>"
161+
"documentation":"<p>Starts the asynchronous analysis of an input document for relationships between detected items such as key-value pairs, tables, and selection elements.</p> <p> <code>StartDocumentAnalysis</code> can analyze text in documents that are in JPEG, PNG, TIFF, and PDF format. The documents are stored in an Amazon S3 bucket. Use <a>DocumentLocation</a> to specify the bucket name and file name of the document. </p> <p> <code>StartDocumentAnalysis</code> returns a job identifier (<code>JobId</code>) that you use to get the results of the operation. When text analysis is finished, Amazon Textract publishes a completion status to the Amazon Simple Notification Service (Amazon SNS) topic that you specify in <code>NotificationChannel</code>. To get the results of the text analysis operation, first check that the status value published to the Amazon SNS topic is <code>SUCCEEDED</code>. If so, call <a>GetDocumentAnalysis</a>, and pass the job identifier (<code>JobId</code>) from the initial call to <code>StartDocumentAnalysis</code>.</p> <p>For more information, see <a href=\"https://docs.aws.amazon.com/textract/latest/dg/how-it-works-analyzing.html\">Document Text Analysis</a>.</p>"
142162
},
143163
"StartDocumentTextDetection":{
144164
"name":"StartDocumentTextDetection",
@@ -162,7 +182,31 @@
162182
{"shape":"ThrottlingException"},
163183
{"shape":"LimitExceededException"}
164184
],
165-
"documentation":"<p>Starts the asynchronous detection of text in a document. Amazon Textract can detect lines of text and the words that make up a line of text.</p> <p> <code>StartDocumentTextDetection</code> can analyze text in documents that are in JPEG, PNG, and PDF format. The documents are stored in an Amazon S3 bucket. Use <a>DocumentLocation</a> to specify the bucket name and file name of the document. </p> <p> <code>StartTextDetection</code> returns a job identifier (<code>JobId</code>) that you use to get the results of the operation. When text detection is finished, Amazon Textract publishes a completion status to the Amazon Simple Notification Service (Amazon SNS) topic that you specify in <code>NotificationChannel</code>. To get the results of the text detection operation, first check that the status value published to the Amazon SNS topic is <code>SUCCEEDED</code>. If so, call <a>GetDocumentTextDetection</a>, and pass the job identifier (<code>JobId</code>) from the initial call to <code>StartDocumentTextDetection</code>.</p> <p>For more information, see <a href=\"https://docs.aws.amazon.com/textract/latest/dg/how-it-works-detecting.html\">Document Text Detection</a>.</p>"
185+
"documentation":"<p>Starts the asynchronous detection of text in a document. Amazon Textract can detect lines of text and the words that make up a line of text.</p> <p> <code>StartDocumentTextDetection</code> can analyze text in documents that are in JPEG, PNG, TIFF, and PDF format. The documents are stored in an Amazon S3 bucket. Use <a>DocumentLocation</a> to specify the bucket name and file name of the document. </p> <p> <code>StartTextDetection</code> returns a job identifier (<code>JobId</code>) that you use to get the results of the operation. When text detection is finished, Amazon Textract publishes a completion status to the Amazon Simple Notification Service (Amazon SNS) topic that you specify in <code>NotificationChannel</code>. To get the results of the text detection operation, first check that the status value published to the Amazon SNS topic is <code>SUCCEEDED</code>. If so, call <a>GetDocumentTextDetection</a>, and pass the job identifier (<code>JobId</code>) from the initial call to <code>StartDocumentTextDetection</code>.</p> <p>For more information, see <a href=\"https://docs.aws.amazon.com/textract/latest/dg/how-it-works-detecting.html\">Document Text Detection</a>.</p>"
186+
},
187+
"StartExpenseAnalysis":{
188+
"name":"StartExpenseAnalysis",
189+
"http":{
190+
"method":"POST",
191+
"requestUri":"/"
192+
},
193+
"input":{"shape":"StartExpenseAnalysisRequest"},
194+
"output":{"shape":"StartExpenseAnalysisResponse"},
195+
"errors":[
196+
{"shape":"InvalidParameterException"},
197+
{"shape":"InvalidS3ObjectException"},
198+
{"shape":"InvalidKMSKeyException"},
199+
{"shape":"UnsupportedDocumentException"},
200+
{"shape":"DocumentTooLargeException"},
201+
{"shape":"BadDocumentException"},
202+
{"shape":"AccessDeniedException"},
203+
{"shape":"ProvisionedThroughputExceededException"},
204+
{"shape":"InternalServerError"},
205+
{"shape":"IdempotentParameterMismatchException"},
206+
{"shape":"ThrottlingException"},
207+
{"shape":"LimitExceededException"}
208+
],
209+
"documentation":"<p>Starts the asynchronous analysis of invoices or receipts for data like contact information, items purchased, and vendor names.</p> <p> <code>StartExpenseAnalysis</code> can analyze text in documents that are in JPEG, PNG, and PDF format. The documents must be stored in an Amazon S3 bucket. Use the <a>DocumentLocation</a> parameter to specify the name of your S3 bucket and the name of the document in that bucket. </p> <p> <code>StartExpenseAnalysis</code> returns a job identifier (<code>JobId</code>) that you will provide to <code>GetExpenseAnalysis</code> to retrieve the results of the operation. When the analysis of the input invoices/receipts is finished, Amazon Textract publishes a completion status to the Amazon Simple Notification Service (Amazon SNS) topic that you provide to the <code>NotificationChannel</code>. To obtain the results of the invoice and receipt analysis operation, ensure that the status value published to the Amazon SNS topic is <code>SUCCEEDED</code>. If so, call <a>GetExpenseAnalysis</a>, and pass the job identifier (<code>JobId</code>) that was returned by your call to <code>StartExpenseAnalysis</code>.</p> <p>For more information, see <a href=\"https://docs.aws.amazon.com/textract/latest/dg/invoice-receipts.html\">Analyzing Invoices and Receipts</a>.</p>"
166210
}
167211
},
168212
"shapes":{
@@ -296,7 +340,7 @@
296340
},
297341
"Page":{
298342
"shape":"UInteger",
299-
"documentation":"<p>The page on which a block was detected. <code>Page</code> is returned by asynchronous operations. Page values greater than 1 are only returned for multipage documents that are in PDF format. A scanned image (JPEG/PNG), even if it contains multiple document pages, is considered to be a single-page document. The value of <code>Page</code> is always 1. Synchronous operations don't return <code>Page</code> because every input document is considered to be a single-page document.</p>"
343+
"documentation":"<p>The page on which a block was detected. <code>Page</code> is returned by asynchronous operations. Page values greater than 1 are only returned for multipage documents that are in PDF or TIFF format. A scanned image (JPEG/PNG), even if it contains multiple document pages, is considered to be a single-page document. The value of <code>Page</code> is always 1. Synchronous operations don't return <code>Page</code> because every input document is considered to be a single-page document.</p>"
300344
}
301345
},
302346
"documentation":"<p>A <code>Block</code> represents items that are recognized in a document within a group of pixels close to each other. The information returned in a <code>Block</code> object depends on the type of operation. In text detection for documents (for example <a>DetectDocumentText</a>), you get information about the detected words and lines of text. In text analysis (for example <a>AnalyzeDocument</a>), you can also get information about the fields, tables, and selection elements that are detected in the document.</p> <p>An array of <code>Block</code> objects is returned by both synchronous and asynchronous operations. In synchronous operations, such as <a>DetectDocumentText</a>, the array of <code>Block</code> objects is the entire set of results. In asynchronous operations, such as <a>GetDocumentAnalysis</a>, the array is returned over one or more responses.</p> <p>For more information, see <a href=\"https://docs.aws.amazon.com/textract/latest/dg/how-it-works.html\">How Amazon Textract Works</a>.</p>"
@@ -646,6 +690,57 @@
646690
}
647691
}
648692
},
693+
"GetExpenseAnalysisRequest":{
694+
"type":"structure",
695+
"required":["JobId"],
696+
"members":{
697+
"JobId":{
698+
"shape":"JobId",
699+
"documentation":"<p>A unique identifier for the text detection job. The <code>JobId</code> is returned from <code>StartExpenseAnalysis</code>. A <code>JobId</code> value is only valid for 7 days.</p>"
700+
},
701+
"MaxResults":{
702+
"shape":"MaxResults",
703+
"documentation":"<p>The maximum number of results to return per paginated call. The largest value you can specify is 20. If you specify a value greater than 20, a maximum of 20 results is returned. The default value is 20.</p>"
704+
},
705+
"NextToken":{
706+
"shape":"PaginationToken",
707+
"documentation":"<p>If the previous response was incomplete (because there are more blocks to retrieve), Amazon Textract returns a pagination token in the response. You can use this pagination token to retrieve the next set of blocks.</p>"
708+
}
709+
}
710+
},
711+
"GetExpenseAnalysisResponse":{
712+
"type":"structure",
713+
"members":{
714+
"DocumentMetadata":{
715+
"shape":"DocumentMetadata",
716+
"documentation":"<p>Information about a document that Amazon Textract processed. <code>DocumentMetadata</code> is returned in every page of paginated responses from an Amazon Textract operation.</p>"
717+
},
718+
"JobStatus":{
719+
"shape":"JobStatus",
720+
"documentation":"<p>The current status of the text detection job.</p>"
721+
},
722+
"NextToken":{
723+
"shape":"PaginationToken",
724+
"documentation":"<p>If the response is truncated, Amazon Textract returns this token. You can use this token in the subsequent request to retrieve the next set of text-detection results.</p>"
725+
},
726+
"ExpenseDocuments":{
727+
"shape":"ExpenseDocumentList",
728+
"documentation":"<p>The expenses detected by Amazon Textract.</p>"
729+
},
730+
"Warnings":{
731+
"shape":"Warnings",
732+
"documentation":"<p>A list of warnings that occurred during the text-detection operation for the document.</p>"
733+
},
734+
"StatusMessage":{
735+
"shape":"StatusMessage",
736+
"documentation":"<p>Returns if the detection job could not be completed. Contains explanation for what error occured. </p>"
737+
},
738+
"AnalyzeExpenseModelVersion":{
739+
"shape":"String",
740+
"documentation":"<p>The current model version of AnalyzeExpense.</p>"
741+
}
742+
}
743+
},
649744
"HumanLoopActivationConditionsEvaluationResults":{
650745
"type":"string",
651746
"max":10240
@@ -982,7 +1077,7 @@
9821077
},
9831078
"Name":{
9841079
"shape":"S3ObjectName",
985-
"documentation":"<p>The file name of the input document. Synchronous operations can use image files that are in JPEG or PNG format. Asynchronous operations also support PDF format files.</p>"
1080+
"documentation":"<p>The file name of the input document. Synchronous operations can use image files that are in JPEG or PNG format. Asynchronous operations also support PDF and TIFF format files.</p>"
9861081
},
9871082
"Version":{
9881083
"shape":"S3ObjectVersion",
@@ -1101,6 +1196,45 @@
11011196
}
11021197
}
11031198
},
1199+
"StartExpenseAnalysisRequest":{
1200+
"type":"structure",
1201+
"required":["DocumentLocation"],
1202+
"members":{
1203+
"DocumentLocation":{
1204+
"shape":"DocumentLocation",
1205+
"documentation":"<p>The location of the document to be processed.</p>"
1206+
},
1207+
"ClientRequestToken":{
1208+
"shape":"ClientRequestToken",
1209+
"documentation":"<p>The idempotent token that's used to identify the start request. If you use the same token with multiple <code>StartDocumentTextDetection</code> requests, the same <code>JobId</code> is returned. Use <code>ClientRequestToken</code> to prevent the same job from being accidentally started more than once. For more information, see <a href=\"https://docs.aws.amazon.com/textract/latest/dg/api-async.html\">Calling Amazon Textract Asynchronous Operations</a> </p>"
1210+
},
1211+
"JobTag":{
1212+
"shape":"JobTag",
1213+
"documentation":"<p>An identifier you specify that's included in the completion notification published to the Amazon SNS topic. For example, you can use <code>JobTag</code> to identify the type of document that the completion notification corresponds to (such as a tax form or a receipt).</p>"
1214+
},
1215+
"NotificationChannel":{
1216+
"shape":"NotificationChannel",
1217+
"documentation":"<p>The Amazon SNS topic ARN that you want Amazon Textract to publish the completion status of the operation to. </p>"
1218+
},
1219+
"OutputConfig":{
1220+
"shape":"OutputConfig",
1221+
"documentation":"<p>Sets if the output will go to a customer defined bucket. By default, Amazon Textract will save the results internally to be accessed by the <code>GetExpenseAnalysis</code> operation.</p>"
1222+
},
1223+
"KMSKeyId":{
1224+
"shape":"KMSKeyId",
1225+
"documentation":"<p>The KMS key used to encrypt the inference results. This can be in either Key ID or Key Alias format. When a KMS key is provided, the KMS key will be used for server-side encryption of the objects in the customer bucket. When this parameter is not enabled, the result will be encrypted server side,using SSE-S3.</p>"
1226+
}
1227+
}
1228+
},
1229+
"StartExpenseAnalysisResponse":{
1230+
"type":"structure",
1231+
"members":{
1232+
"JobId":{
1233+
"shape":"JobId",
1234+
"documentation":"<p>A unique identifier for the text detection job. The <code>JobId</code> is returned from <code>StartExpenseAnalysis</code>. A <code>JobId</code> value is only valid for 7 days.</p>"
1235+
}
1236+
}
1237+
},
11041238
"StatusMessage":{"type":"string"},
11051239
"String":{"type":"string"},
11061240
"TextType":{

0 commit comments

Comments
 (0)