AWS Certified Solutions Architect - Associate / Question #1622 of 1019

Question #1622

A company is designing a serverless architecture on AWS to handle high-volume, real-time processing of diverse data types including application logs, image files, and sensor telemetry stored in Amazon S3. The solution must process each data item independently with maximum parallelism and scalability. Which approach provides the MOST scalable and efficient processing?

A

Use the AWS Step Functions Map state in Inline mode to process each item sequentially within a single execution.

B

Use the AWS Step Functions Map state in Distributed mode to process each item in a separate execution for maximum scalability.

C

Use AWS Glue to dynamically scale Apache Spark workers for parallel data processing.

D

Use multiple AWS Lambda functions invoked concurrently via Amazon S3 event notifications.

Explanation

Answer B is correct because AWS Step Functions' Distributed Map state launches a separate execution for each data item, enabling massive parallelism and scalability. This is ideal for high-volume, real-time processing as it avoids the concurrency limits of inline processing (Option A) and dynamically scales without the overhead of managing Spark workers (Option C). While Option D uses Lambda with S3 events, it requires each item to be a separate S3 object and may not handle items within large files efficiently. Distributed Map automatically splits items from files, ensuring each is processed independently. Key points: Distributed Map scales to thousands of executions, processes items from files or individual objects, and is serverless, making it the most scalable and efficient choice.

Answer

The correct answer is: B