AWS Certified Solutions Architect - Associate / Question #1084 of 1019

Question #1084

A medical clinic uses Amazon API Gateway and AWS Lambda to process patient records stored in PDF and JPEG formats. The clinic needs to update their Lambda function to detect protected health information (PHI) within these records. Which solution meets these requirements with the LEAST operational overhead?

A

Use open-source Java libraries to extract text from the documents and identify PHI within the extracted text.

B

Use Amazon Textract to extract text from the documents. Use Amazon Comprehend to identify PHI from the extracted text.

C

Use Amazon Textract to extract text from the documents. Use Amazon Comprehend Medical to identify PHI from the extracted text.

D

Use Amazon Rekognition to extract text from the documents. Use Amazon Comprehend Medical to identify PHI from the extracted text.

Explanation

Option C is correct because:
1. Amazon Textract is purpose-built for extracting text and structured data from scanned documents (PDF/JPEG), eliminating the need for custom code (unlike Option A).
2. Amazon Comprehend Medical is specialized in detecting PHI (e.g., patient names, diagnoses) in medical text, ensuring compliance and accuracy (unlike generic Comprehend in Option B).

Other options are incorrect because:
- A: Open-source libraries require manual implementation, increasing maintenance and operational overhead.
- B: Amazon Comprehend lacks PHI-specific detection capabilities compared to Comprehend Medical.
- D: Amazon Rekognition is optimized for image/video analysis, not document text extraction (Textract is more efficient).

Key Points:
- Use Textract for document text extraction.
- Use Comprehend Medical for PHI detection in healthcare data.
- Managed AWS services reduce operational overhead.

Answer

The correct answer is: C