Question #621
A company stores medical imaging data in an Amazon S3 bucket, which accumulates several petabytes over time. The data grows by hundreds of gigabytes daily. The company regularly analyzes data from the past 18 months for research purposes but must retain all data indefinitely to meet regulatory requirements. Which approach provides the MOST cost-effective solution while meeting these needs?
Use S3 Select to query the data. Create an S3 Lifecycle policy to transition data older than 18 months to S3 Glacier Flexible Retrieval.
Use Amazon Redshift Spectrum to query the data. Create an S3 Lifecycle policy to transition data older than 18 months to S3 Intelligent-Tiering.
Use an AWS Glue Data Catalog and Amazon Athena to query the data. Create an S3 Lifecycle policy to transition data older than 18 months to S3 Glacier Deep Archive.
Use Amazon EMR with Hive to query the data. Create an S3 Lifecycle policy to transition data older than 18 months to S3 Standard-Infrequent Access.
Explanation
Option C is correct because:
1. AWS Glue Data Catalog and Amazon Athena: These services provide serverless, pay-per-query analytics, ideal for occasional analysis of recent data (past 18 months) without managing infrastructure.
2. S3 Glacier Deep Archive: This is the most cost-effective storage class for data retained indefinitely and rarely accessed, offering the lowest storage costs for long-term archival.
Other options are incorrect because:
- A: S3 Select is not optimized for large-scale analytics, and Glacier Flexible Retrieval is more expensive than Deep Archive for long-term storage.
- B: S3 Intelligent-Tiering is designed for unpredictable access patterns, not archival, and Redshift Spectrum incurs higher query costs compared to Athena.
- D: S3 Standard-Infrequent Access (IA) is more expensive than Glacier Deep Archive for archival, and EMR introduces unnecessary complexity and cost for this use case.
Key Points: Use Athena/Glue for analytics on recent data and Glacier Deep Archive for cost-effective, compliant long-term storage.
Answer
The correct answer is: C