Skip to content

Commit 86c5ba1

Browse files
author
AWS
committed
Amazon Transcribe Streaming Service Update: Amazon Transcribe now supports PII Identification and Redaction for streaming transcription.
1 parent 343015e commit 86c5ba1

File tree

2 files changed

+104
-10
lines changed

2 files changed

+104
-10
lines changed
Lines changed: 6 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,6 @@
1+
{
2+
"type": "feature",
3+
"category": "Amazon Transcribe Streaming Service",
4+
"contributor": "",
5+
"description": "Amazon Transcribe now supports PII Identification and Redaction for streaming transcription."
6+
}

services/transcribestreaming/src/main/resources/codegen-resources/service-2.json

Lines changed: 98 additions & 10 deletions
Original file line numberDiff line numberDiff line change
@@ -44,7 +44,7 @@
4444
{"shape":"ConflictException"},
4545
{"shape":"ServiceUnavailableException"}
4646
],
47-
"documentation":"<p>Starts a bidirectional HTTP2 stream where audio is streamed to Amazon Transcribe and the transcription results are streamed to your application.</p> <p>The following are encoded as HTTP2 headers:</p> <ul> <li> <p>x-amzn-transcribe-language-code</p> </li> <li> <p>x-amzn-transcribe-media-encoding</p> </li> <li> <p>x-amzn-transcribe-sample-rate</p> </li> <li> <p>x-amzn-transcribe-session-id</p> </li> </ul>"
47+
"documentation":"<p>Starts a bidirectional HTTP/2 stream where audio is streamed to Amazon Transcribe and the transcription results are streamed to your application.</p> <p>The following are encoded as HTTP/2 headers:</p> <ul> <li> <p>x-amzn-transcribe-language-code</p> </li> <li> <p>x-amzn-transcribe-media-encoding</p> </li> <li> <p>x-amzn-transcribe-sample-rate</p> </li> <li> <p>x-amzn-transcribe-session-id</p> </li> </ul> <p>See the <a href=\"https://docs.aws.amazon.com/sdk-for-go/api/service/transcribestreamingservice/#TranscribeStreamingService.StartStreamTranscription\"> SDK for Go API Reference</a> for more detail.</p>"
4848
}
4949
},
5050
"shapes":{
@@ -58,6 +58,10 @@
5858
"Items":{
5959
"shape":"ItemList",
6060
"documentation":"<p>One or more alternative interpretations of the input audio. </p>"
61+
},
62+
"Entities":{
63+
"shape":"EntityList",
64+
"documentation":"<p>Contains the entities identified as personally identifiable information (PII) in the transcription output.</p>"
6165
}
6266
},
6367
"documentation":"<p>A list of possible transcriptions for the audio.</p>"
@@ -76,15 +80,15 @@
7680
"eventpayload":true
7781
}
7882
},
79-
"documentation":"<p>Provides a wrapper for the audio chunks that you are sending.</p> <p>For information on audio encoding in Amazon Transcribe, see <a>input</a>. For information on audio encoding formats in Amazon Transcribe Medical, see <a>input-med</a>.</p>",
83+
"documentation":"<p>Provides a wrapper for the audio chunks that you are sending.</p> <p>For information on audio encoding in Amazon Transcribe, see <a href=\"https://docs.aws.amazon.com/transcribe/latest/dg/input.html\">Speech input</a>. For information on audio encoding formats in Amazon Transcribe Medical, see <a href=\"https://docs.aws.amazon.com/transcribe/latest/dg/input-med.html\">Speech input</a>.</p>",
8084
"event":true
8185
},
8286
"AudioStream":{
8387
"type":"structure",
8488
"members":{
8589
"AudioEvent":{
8690
"shape":"AudioEvent",
87-
"documentation":"<p>A blob of audio from your application. You audio stream consists of one or more audio events.</p> <p>For information on audio encoding formats in Amazon Transcribe, see <a>input</a>. For information on audio encoding formats in Amazon Transcribe Medical, see <a>input-med</a>.</p> <p>For more information on stream encoding in Amazon Transcribe, see <a>event-stream</a>. For information on stream encoding in Amazon Transcribe Medical, see <a>event-stream-med</a>.</p>"
91+
"documentation":"<p>A blob of audio from your application. You audio stream consists of one or more audio events.</p> <p>For information on audio encoding formats in Amazon Transcribe, see <a href=\"https://docs.aws.amazon.com/transcribe/latest/dg/input.html\">Speech input</a>. For information on audio encoding formats in Amazon Transcribe Medical, see <a href=\"https://docs.aws.amazon.com/transcribe/latest/dg/input-med.html\">Speech input</a>.</p> <p>For more information on stream encoding in Amazon Transcribe, see <a href=\"https://docs.aws.amazon.com/transcribe/latest/dg/event-stream.html\">Event stream encoding</a>. For information on stream encoding in Amazon Transcribe Medical, see <a href=\"https://docs.aws.amazon.com/transcribe/latest/dg/event-stream-med.html\">Event stream encoding</a>.</p>"
8892
}
8993
},
9094
"documentation":"<p>Represents the audio stream from your application to Amazon Transcribe.</p>",
@@ -110,7 +114,49 @@
110114
"error":{"httpStatusCode":409},
111115
"exception":true
112116
},
117+
"ContentIdentificationType":{
118+
"type":"string",
119+
"enum":["PII"]
120+
},
121+
"ContentRedactionType":{
122+
"type":"string",
123+
"enum":["PII"]
124+
},
113125
"Double":{"type":"double"},
126+
"Entity":{
127+
"type":"structure",
128+
"members":{
129+
"StartTime":{
130+
"shape":"Double",
131+
"documentation":"<p>The start time of speech that was identified as PII.</p>"
132+
},
133+
"EndTime":{
134+
"shape":"Double",
135+
"documentation":"<p>The end time of speech that was identified as PII.</p>"
136+
},
137+
"Category":{
138+
"shape":"String",
139+
"documentation":"<p>The category of of information identified in this entity; for example, PII.</p>"
140+
},
141+
"Type":{
142+
"shape":"String",
143+
"documentation":"<p>The type of PII identified in this entity; for example, name or credit card number.</p>"
144+
},
145+
"Content":{
146+
"shape":"String",
147+
"documentation":"<p>The words in the transcription output that have been identified as a PII entity.</p>"
148+
},
149+
"Confidence":{
150+
"shape":"Confidence",
151+
"documentation":"<p>A value between zero and one that Amazon Transcribe assigns to PII identified in the source audio. Larger values indicate a higher confidence in PII identification.</p>"
152+
}
153+
},
154+
"documentation":"<p>The entity identified as personally identifiable information (PII).</p>"
155+
},
156+
"EntityList":{
157+
"type":"list",
158+
"member":{"shape":"Entity"}
159+
},
114160
"InternalFailureException":{
115161
"type":"structure",
116162
"members":{
@@ -382,6 +428,12 @@
382428
"low"
383429
]
384430
},
431+
"PiiEntityTypes":{
432+
"type":"string",
433+
"max":300,
434+
"min":1,
435+
"pattern":"^[A-Z_, ]+"
436+
},
385437
"RequestId":{"type":"string"},
386438
"Result":{
387439
"type":"structure",
@@ -463,7 +515,7 @@
463515
},
464516
"MediaSampleRateHertz":{
465517
"shape":"MediaSampleRateHertz",
466-
"documentation":"<p>The sample rate of the input audio in Hertz. Sample rates of 16000 Hz or higher are accepted.</p>",
518+
"documentation":"<p>The sample rate of the input audio in Hertz.</p>",
467519
"location":"header",
468520
"locationName":"x-amzn-transcribe-sample-rate"
469521
},
@@ -542,7 +594,7 @@
542594
},
543595
"MediaSampleRateHertz":{
544596
"shape":"MediaSampleRateHertz",
545-
"documentation":"<p>The sample rate of the input audio in Hertz. Valid value: 16000 Hz.</p>",
597+
"documentation":"<p>The sample rate of the input audio in Hertz.</p>",
546598
"location":"header",
547599
"locationName":"x-amzn-transcribe-sample-rate"
548600
},
@@ -624,7 +676,7 @@
624676
},
625677
"MediaSampleRateHertz":{
626678
"shape":"MediaSampleRateHertz",
627-
"documentation":"<p>The sample rate, in Hertz, of the input audio. We suggest that you use 8000 Hz for low quality audio and 16000 Hz for high quality audio.</p>",
679+
"documentation":"<p>The sample rate, in Hertz, of the input audio. We suggest that you use 8,000 Hz for low quality audio and 16,000 Hz for high quality audio.</p>",
628680
"location":"header",
629681
"locationName":"x-amzn-transcribe-sample-rate"
630682
},
@@ -648,17 +700,17 @@
648700
},
649701
"AudioStream":{
650702
"shape":"AudioStream",
651-
"documentation":"<p>PCM-encoded stream of audio blobs. The audio stream is encoded as an HTTP2 data frame.</p>"
703+
"documentation":"<p>PCM-encoded stream of audio blobs. The audio stream is encoded as an HTTP/2 data frame.</p>"
652704
},
653705
"VocabularyFilterName":{
654706
"shape":"VocabularyFilterName",
655-
"documentation":"<p>The name of the vocabulary filter you've created that is unique to your AWS account. Provide the name in this field to successfully use it in a stream.</p>",
707+
"documentation":"<p>The name of the vocabulary filter you've created that is unique to your account. Provide the name in this field to successfully use it in a stream.</p>",
656708
"location":"header",
657709
"locationName":"x-amzn-transcribe-vocabulary-filter-name"
658710
},
659711
"VocabularyFilterMethod":{
660712
"shape":"VocabularyFilterMethod",
661-
"documentation":"<p>The manner in which you use your vocabulary filter to filter words in your transcript. <code>Remove</code> removes filtered words from your transcription results. <code>Mask</code> masks those words with a <code>***</code> in your transcription results. <code>Tag</code> keeps the filtered words in your transcription results and tags them. The tag appears as <code>VocabularyFilterMatch</code> equal to <code>True</code> </p>",
713+
"documentation":"<p>The manner in which you use your vocabulary filter to filter words in your transcript. <code>Remove</code> removes filtered words from your transcription results. <code>Mask</code> masks filtered words with a <code>***</code> in your transcription results. <code>Tag</code> keeps the filtered words in your transcription results and tags them. The tag appears as <code>VocabularyFilterMatch</code> equal to <code>True</code> </p>",
662714
"location":"header",
663715
"locationName":"x-amzn-transcribe-vocabulary-filter-method"
664716
},
@@ -691,6 +743,24 @@
691743
"documentation":"<p>You can use this field to set the stability level of the transcription results. A higher stability level means that the transcription results are less likely to change. Higher stability levels can come with lower overall transcription accuracy.</p>",
692744
"location":"header",
693745
"locationName":"x-amzn-transcribe-partial-results-stability"
746+
},
747+
"ContentIdentificationType":{
748+
"shape":"ContentIdentificationType",
749+
"documentation":"<p>Set this field to PII to identify personally identifiable information (PII) in the transcription output. Content identification is performed only upon complete transcription of the audio segments.</p> <p>You can’t set both <code>ContentIdentificationType</code> and <code>ContentRedactionType</code> in the same request. If you set both, your request returns a <code>BadRequestException</code>.</p>",
750+
"location":"header",
751+
"locationName":"x-amzn-transcribe-content-identification-type"
752+
},
753+
"ContentRedactionType":{
754+
"shape":"ContentRedactionType",
755+
"documentation":"<p>Set this field to PII to redact personally identifiable information (PII) in the transcription output. Content redaction is performed only upon complete transcription of the audio segments.</p> <p>You can’t set both <code>ContentRedactionType</code> and <code>ContentIdentificationType</code> in the same request. If you set both, your request returns a <code>BadRequestException</code>.</p>",
756+
"location":"header",
757+
"locationName":"x-amzn-transcribe-content-redaction-type"
758+
},
759+
"PiiEntityTypes":{
760+
"shape":"PiiEntityTypes",
761+
"documentation":"<p>List the PII entity types you want to identify or redact. In order to specify entity types, you must have either <code>ContentIdentificationType</code> or <code>ContentRedactionType</code> enabled.</p> <p> <code>PIIEntityTypes</code> must be comma-separated; the available values are: <code>BANK_ACCOUNT_NUMBER</code>, <code>BANK_ROUTING</code>, <code>CREDIT_DEBIT_NUMBER</code>, <code>CREDIT_DEBIT_CVV</code>, <code>CREDIT_DEBIT_EXPIRY</code>, <code>PIN</code>, <code>EMAIL</code>, <code>ADDRESS</code>, <code>NAME</code>, <code>PHONE</code>, <code>SSN</code>, and <code>ALL</code>.</p> <p> <code>PiiEntityTypes</code> is an optional parameter with a default value of <code>ALL</code>.</p>",
762+
"location":"header",
763+
"locationName":"x-amzn-transcribe-pii-entity-types"
694764
}
695765
},
696766
"payload":"AudioStream"
@@ -712,7 +782,7 @@
712782
},
713783
"MediaSampleRateHertz":{
714784
"shape":"MediaSampleRateHertz",
715-
"documentation":"<p>The sample rate for the input audio stream. Use 8000 Hz for low quality audio and 16000 Hz for high quality audio.</p>",
785+
"documentation":"<p>The sample rate for the input audio stream. Use 8,000 Hz for low quality audio and 16,000 Hz for high quality audio.</p>",
716786
"location":"header",
717787
"locationName":"x-amzn-transcribe-sample-rate"
718788
},
@@ -779,6 +849,24 @@
779849
"documentation":"<p>If partial results stabilization has been enabled in the stream, shows the stability level.</p>",
780850
"location":"header",
781851
"locationName":"x-amzn-transcribe-partial-results-stability"
852+
},
853+
"ContentIdentificationType":{
854+
"shape":"ContentIdentificationType",
855+
"documentation":"<p>Shows whether content identification was enabled in this stream.</p>",
856+
"location":"header",
857+
"locationName":"x-amzn-transcribe-content-identification-type"
858+
},
859+
"ContentRedactionType":{
860+
"shape":"ContentRedactionType",
861+
"documentation":"<p>Shows whether content redaction was enabled in this stream.</p>",
862+
"location":"header",
863+
"locationName":"x-amzn-transcribe-content-redaction-type"
864+
},
865+
"PiiEntityTypes":{
866+
"shape":"PiiEntityTypes",
867+
"documentation":"<p>Lists the PII entity types you specified in your request.</p>",
868+
"location":"header",
869+
"locationName":"x-amzn-transcribe-pii-entity-types"
782870
}
783871
},
784872
"payload":"TranscriptResultStream"

0 commit comments

Comments
 (0)