Question #986
A company analyzes sensor data stored on-premises, receiving 2 million new files daily. Each file averages 500 KB. Files are processed in 500 MB batches, zipped, and archived on an NFS server. The company uses Microsoft Hyper-V with available compute capacity but requires AWS for storage. Archived data must be retrievable within 5 days of a request. The company has a 10 Gbps AWS Direct Connect connection and needs to limit bandwidth by scheduling transfers during off-peak hours cost-effectively.
Which solution will meet these requirements MOST cost-effectively?
Deploy an AWS DataSync agent on a new GPU-based Amazon EC2 instance. Configure the agent to copy batches from the on-premises NFS server to Amazon S3 Glacier Instant Retrieval. Delete the data on-premises after successful transfer.
Deploy an AWS DataSync agent as a Hyper-V VM on premises. Configure the agent to copy batches from the NFS server to Amazon S3 Glacier Deep Archive. Delete the data on-premises after successful transfer.
Deploy an AWS DataSync agent on a new general-purpose Amazon EC2 instance. Configure the agent to copy batches to Amazon S3 Standard, then apply a lifecycle rule to transition objects to S3 Glacier Deep Archive after 1 day. Delete the data on-premises after transfer.
Deploy an AWS Storage Gateway Tape Gateway on premises in the Hyper-V environment. Configure it with an S3 Glacier Deep Archive pool and automatic tape creation. Eject tapes after batches are copied.
Explanation
Option B is correct because:
1. Cost-Effective Storage: S3 Glacier Deep Archive is the cheapest AWS storage class that meets the retrieval time requirement (within 12 hours, well under 5 days).
2. On-Premises Deployment: Deploying DataSync as a Hyper-V VM leverages existing compute capacity, avoiding EC2 costs.
3. Bandwidth Management: DataSync allows scheduling transfers during off-peak hours via Direct Connect, optimizing bandwidth usage.
Why other options are incorrect:
- A: Uses expensive GPU-based EC2 (unnecessary for DataSync) and Glacier Instant Retrieval (higher cost).
- C: Transitions data via S3 Standard (higher initial cost) before moving to Deep Archive.
- D: Tape Gateway adds complexity for file-based NFS data and may incur higher operational costs.
Key Points:
- Use S3 Glacier Deep Archive for cost-effective archival with acceptable retrieval times.
- DataSync is optimal for scheduled, efficient transfers from on-premises NFS to AWS.
- Avoid unnecessary EC2 costs by deploying DataSync on existing Hyper-V infrastructure.
Answer
The correct answer is: B