AWS Certified Solutions Architect - Professional / Question #986 of 529

Question #986

A company analyzes sensor data stored on-premises, receiving 2 million new files daily. Each file averages 500 KB. Files are processed in 500 MB batches, zipped, and archived on an NFS server. The company uses Microsoft Hyper-V with available compute capacity but requires AWS for storage. Archived data must be retrievable within 5 days of a request. The company has a 10 Gbps AWS Direct Connect connection and needs to limit bandwidth by scheduling transfers during off-peak hours cost-effectively.

Which solution will meet these requirements MOST cost-effectively?

A

Deploy an AWS DataSync agent on a new GPU-based Amazon EC2 instance. Configure the agent to copy batches from the on-premises NFS server to Amazon S3 Glacier Instant Retrieval. Delete the data on-premises after successful transfer.

B

Deploy an AWS DataSync agent as a Hyper-V VM on premises. Configure the agent to copy batches from the NFS server to Amazon S3 Glacier Deep Archive. Delete the data on-premises after successful transfer.

C

Deploy an AWS DataSync agent on a new general-purpose Amazon EC2 instance. Configure the agent to copy batches to Amazon S3 Standard, then apply a lifecycle rule to transition objects to S3 Glacier Deep Archive after 1 day. Delete the data on-premises after transfer.

D

Deploy an AWS Storage Gateway Tape Gateway on premises in the Hyper-V environment. Configure it with an S3 Glacier Deep Archive pool and automatic tape creation. Eject tapes after batches are copied.

Explanation

Option B is correct because:
1. Cost-Effective Storage: S3 Glacier Deep Archive is the cheapest AWS storage class that meets the retrieval time requirement (within 12 hours, well under 5 days).
2. On-Premises Deployment: Deploying DataSync as a Hyper-V VM leverages existing compute capacity, avoiding EC2 costs.
3. Bandwidth Management: DataSync allows scheduling transfers during off-peak hours via Direct Connect, optimizing bandwidth usage.

Why other options are incorrect:
- A: Uses expensive GPU-based EC2 (unnecessary for DataSync) and Glacier Instant Retrieval (higher cost).
- C: Transitions data via S3 Standard (higher initial cost) before moving to Deep Archive.
- D: Tape Gateway adds complexity for file-based NFS data and may incur higher operational costs.

Key Points:
- Use S3 Glacier Deep Archive for cost-effective archival with acceptable retrieval times.
- DataSync is optimal for scheduled, efficient transfers from on-premises NFS to AWS.
- Avoid unnecessary EC2 costs by deploying DataSync on existing Hyper-V infrastructure.

Answer

The correct answer is: B