AWS Certified Solutions Architect - Professional / Question #792 of 529

Question #792

A solutions architect is evaluating the architecture of an Amazon EMR cluster utilizing EMRFS. The cluster handles essential business operations and currently runs on Amazon EC2 On-Demand Instances for all node types (primary, core, and task). The EMR jobs execute nightly, commencing at 10:00 PM and completing in 8 hours. Processing duration is not critical as the data isn't required until the following afternoon. The goal is to reduce compute costs without compromising reliability.

Which solution should the solutions architect implement to meet these requirements?

A

Deploy all nodes on Spot Instances within an instance fleet. Shut down the entire cluster post-processing.

B

Use On-Demand Instances for primary and core nodes. Launch task nodes on Spot Instances in an instance fleet. Terminate the entire cluster after processing. Purchase Compute Savings Plans for On-Demand usage.

C

Keep all nodes on On-Demand Instances. Terminate the cluster post-processing. Buy Compute Savings Plans for On-Demand usage.

D

Run primary and core nodes on On-Demand Instances. Utilize Spot Instances for task nodes in an instance fleet. Only terminate task nodes post-processing. Purchase Compute Savings Plans for On-Demand usage.

Explanation

Answer D is correct because:
1. Primary and core nodes on On-Demand: Ensures reliability for critical cluster management and storage (if applicable).
2. Spot Instances for task nodes: Reduces costs significantly, as task nodes are stateless and can handle interruptions without data loss (EMRFS uses S3).
3. Terminate only task nodes: Maintains the primary/core nodes, avoiding the overhead of reprovisioning them nightly. While this incurs some On-Demand costs, Savings Plans optimize these expenses.
4. Savings Plans: Further reduces On-Demand compute costs.

Other options are incorrect because:
- A: Using Spot for all nodes risks primary node termination, compromising reliability.
- B: Terminating the entire cluster saves On-Demand costs but requires reprovisioning primary/core nodes nightly, which may not align with Savings Plans' efficient usage.
- C: Keeping all nodes On-Demand is the most expensive option.

Key Points:
- Use Spot for stateless task nodes to save costs.
- Always keep primary/core nodes on On-Demand for reliability.
- Savings Plans optimize On-Demand usage costs.
- Instance fleets improve Spot capacity availability.

Answer

The correct answer is: D