AWS Certified Solutions Architect - Professional / Question #888 of 529

Question #888

A company stores approximately 2 million .csv files on-premises across multiple VMs, with an initial dataset of 500 TB growing by 2 TB weekly. The company requires automated daily backups to AWS Cloud, with the ability to apply custom filters to back up only specific subsets of data from predefined source directories. They have an existing AWS Direct Connect connection. Which solution meets these requirements with the LEAST operational overhead?

A

Create an AWS Backup plan targeting Amazon S3 Glacier Flexible Retrieval. Configure a lifecycle policy to transition backups to Amazon S3 Standard-Infrequent Access after 30 days.

B

Deploy an AWS Storage Gateway file gateway on-premises. Use a scheduled AWS Lambda function to trigger backups to Amazon S3 daily via the file gateway interface.

C

Install an AWS DataSync agent on the on-premises VMs. Configure a DataSync task with custom filters to replicate designated data to Amazon S3 daily.

D

Use AWS Snowcone devices for the initial data transfer. Schedule AWS CLI commands to sync incremental changes to Amazon S3 daily using cron jobs.

Explanation

Answer C is correct because AWS DataSync is purpose-built for automated, high-performance data transfers between on-premises and AWS. It supports custom filters to include/exclude specific files or directories, aligning with the requirement to back up only subsets of data. DataSync uses the existing Direct Connect connection for fast, secure transfers and handles incremental updates efficiently, reducing bandwidth usage. It requires minimal operational overhead once configured.

Other options are less suitable:
- A: AWS Backup does not support on-premises file systems or custom filtering.
- B: Storage Gateway + Lambda introduces complexity in scripting and managing triggers.
- D: Snowcone + cron jobs requires manual scripting and monitoring, increasing operational effort.

Key Points:
1. DataSync supports custom filters and automated scheduling.
2. Direct Connect ensures reliable, high-speed transfers.
3. DataSync minimizes operational overhead compared to script-based solutions.

Answer

The correct answer is: C