AWS Certified Solutions Architect - Professional / Question #764 of 529

Question #764

A company operates a high-performance computing (HPC) cluster on AWS for a tightly coupled workload that relies on shared files stored in Amazon EFS. The cluster performed optimally with 200 EC2 instances but experienced significant performance degradation when scaled to 2,000 instances. Which combination of design changes should a solutions architect implement to maximize the cluster's performance? (Choose three.)

Deploy the HPC cluster within a single Availability Zone.

Configure EC2 instances with elastic network interfaces in multiples of four.

Use EC2 instance types that support Elastic Fabric Adapter (EFA).

Distribute the cluster across multiple Availability Zones.

Replace Amazon EFS with Amazon S3 for shared file storage.

Replace Amazon EFS with Amazon FSx for Lustre.

Explanation

The HPC cluster's performance degradation at scale is due to network latency and storage limitations.

- A: Deploying within a single Availability Zone minimizes cross-AZ network latency, critical for tightly coupled workloads.
- C: Elastic Fabric Adapter (EFA) provides low-latency, high-throughput networking, essential for scaling HPC workloads.
- F: Amazon FSx for Lustre is designed for HPC, offering faster performance than EFS for shared file storage.

Why others are incorrect:
- B: While ENIs can improve network bandwidth, EFA is more directly tied to HPC performance.
- D: Distributing across AZs increases latency, worsening performance for tightly coupled workloads.
- E: S3 is unsuitable for low-latency file access required by HPC workloads.

Key Points:
1. Tightly coupled workloads require low-latency networking and high-throughput storage.
2. EFA and single-AZ deployment optimize network performance.
3. FSx for Lustre outperforms EFS in HPC scenarios.

Answer

The correct answer is: ACF