Question #702
A company copies 200 TB of data from a recent ocean survey onto AWS Snowball Edge Storage Optimized devices. The company has a high performance computing (HPC) cluster that is hosted on AWS to look for oil and gas deposits. A solutions architect must provide the cluster with consistent sub-millisecond latency and high-throughput access to the data on the Snowball Edge Storage Optimized devices. The company is sending the devices back to AWS.
Which solution will meet these requirements?
Create an Amazon S3 bucket. Import the data into the S3 bucket. Configure an AWS Storage Gateway file gateway to use the S3 bucket. Access the file gateway from the HPC cluster instances.
Create an Amazon S3 bucket. Import the data into the S3 bucket. Configure an Amazon FSx for Lustre file system, and integrate it with the S3 bucket. Access the FSx for Lustre file system from the HPC cluster instances.
Create an Amazon S3 bucket and an Amazon Elastic File System (Amazon EFS) file system. Import the data into the S3 bucket. Copy the data from the S3 bucket to the EFS file system. Access the EFS file system from the HPC cluster instances.
Create an Amazon FSx for Lustre file system. Import the data directly into the FSx for Lustre file system. Access the FSx for Lustre file system from the HPC cluster instances.
Explanation
The correct answer is D. By creating an Amazon FSx for Lustre file system and importing the data directly into it, the HPC cluster can access the data with the sub-millisecond latency and high throughput that it requires. FSx for Lustre is designed for high performance workloads, making it ideal for HPC applications.
Option A is incorrect because using AWS Storage Gateway (file gateway) introduces potential latencies, and it may not provide the required performance for HPC.
Option B, while providing a file system integrated with S3, doesn't guarantee the sub-millisecond latency needed as FSx for Lustre is optimized specifically for such high-performance applications.
Option C involves using Amazon Elastic File System (EFS), which is not optimal for the large datasets or high-throughput needs as it may introduce higher latencies compared to FSx for Lustre.
In essence, FSx for Lustre is the best fit for cases where data access speed is crucial in high throughput scenarios, such as the one outlined in this question. Key takeaways are understanding the use cases for FSx for Lustre and recognizing that not all storage solutions are designed for high-performance computing workloads.
Answer
The correct answer is: D