AWS Certified Solutions Architect - Associate / Question #1412 of 1019

Question #1412

A healthcare provider records patient consultations and stores the audio files in an Amazon S3 bucket. They need to convert these recordings into text and ensure that all protected health information (PHI) is automatically removed from the transcriptions. What should a solutions architect recommend to fulfill these requirements?

Process the audio files using Amazon Kinesis Data Streams. Use an AWS Lambda function to scan the text for PHI patterns and remove them.

When an audio file is uploaded to S3, invoke a Lambda function to start an Amazon Transcribe job without redaction. Use another Lambda function to parse the output and remove PHI before storing in another S3 bucket.

Configure an Amazon Transcribe transcription job with PHI redaction enabled. Trigger a Lambda function upon audio file upload to S3 to start the job and store the redacted text in a different S3 bucket.

Use Amazon Comprehend to detect PHI in the audio files. Set up an AWS Step Functions workflow triggered by S3 uploads to process the files through Comprehend and Lambda, then store the results.

Explanation

Option C is correct because Amazon Transcribe offers built-in PHI redaction, which automatically identifies and removes protected health information (e.g., names, dates, medical terms) during transcription. This eliminates the need for custom PHI detection logic, ensuring compliance and accuracy. A Lambda function triggers the Transcribe job when audio files are uploaded to S3, and the redacted text is stored in a separate S3 bucket for security.

Why other options are incorrect:
- A: Kinesis Data Streams is designed for real-time streaming, not batch processing of stored audio files. Custom Lambda-based PHI detection is error-prone and less secure than Transcribe's native redaction.
- B: Using Transcribe without redaction requires additional Lambda processing to remove PHI, which is redundant and less reliable than leveraging Transcribe's built-in feature.
- D: Amazon Comprehend analyzes text, not audio, so it cannot process audio files directly. Transcribing first and then using Comprehend adds unnecessary complexity compared to enabling redaction in Transcribe.

Key Points:
1. Use AWS services' native features (e.g., Transcribe PHI redaction) for compliance and efficiency.
2. Serverless architectures (Lambda + S3 triggers) minimize operational overhead.
3. Separate storage for redacted outputs enhances data security.

Answer

The correct answer is: C