Question #623
A company has migrated its invoice-processing application to AWS. Users upload scanned invoices through a web application, which stores metadata in an Amazon RDS for PostgreSQL database and files in Amazon S3. The web application runs on Amazon EC2 instances. When invoices are uploaded, notifications are sent via Amazon SNS, requiring manual validation and data entry into another system via an API. A solutions architect must automate this process with accurate extraction, minimal time to market, and low operational overhead. Which solution meets these requirements?
Develop custom OCR libraries and deploy them on an Amazon EKS cluster. Process invoices upon upload, store output in S3, parse into DynamoDB, and submit via API using EC2-hosted tiers.
Use AWS Step Functions and Lambda with EC2-hosted AI/ML models for OCR. Store output in S3, parse within the tier, and submit data via API.
Host EC2 instances to call SageMaker-hosted AI/ML models for OCR. Store output in Amazon ElastiCache, parse within the tier, and submit via API.
Implement AWS Step Functions and Lambda with Amazon Textract and Comprehend for OCR. Store output in S3, parse within the tier, and submit data via API.
Explanation
Option D is correct because:
1. Amazon Textract is a purpose-built, managed OCR service optimized for accurate text/data extraction from documents like invoices, eliminating the need for custom OCR development (unlike Option A).
2. AWS Step Functions and Lambda provide serverless orchestration and compute, minimizing operational overhead compared to managing EC2/EKS clusters (Options A/B/C).
3. Amazon Comprehend adds NLP capabilities to parse extracted data contextually.
4. The solution leverages fully managed services, ensuring low maintenance and faster deployment.
Other options fail because:
- A: Custom OCR on EKS introduces high development/operational costs.
- B/C: EC2-hosted AI/ML models require infrastructure management.
- C: ElastiCache is unnecessary for this use case.
Key Points: Use managed services (Textract, Step Functions, Lambda) for serverless, low-overhead workflows requiring OCR/data extraction.
Answer
The correct answer is: D