AWS Certified Solutions Architect - Associate / Question #1823 of 1019

Question #1823

A company is ingesting large volumes of data into Amazon S3 daily. They need to transform this data and load it into a data warehouse with massively parallel processing (MPP) capabilities. Data analysts must then develop machine learning models using SQL queries on the data warehouse. The solution should use serverless AWS services where possible.

Which design meets these requirements?

Schedule daily AWS Glue jobs to process the data into Amazon Redshift. Use Amazon Redshift ML to train models with SQL.

Use AWS Step Functions to orchestrate daily Amazon EMR jobs for transformation, loading into Amazon Redshift Serverless. Utilize Amazon SageMaker for ML training via SQL.

Run AWS Glue ETL jobs daily to transform data and load into Amazon Redshift Serverless. Create ML models using Redshift ML with SQL commands.

Process data with AWS Lambda functions triggered by Amazon S3 events, loading into Amazon Athena. Use Athena ML to train models via SQL.

Explanation

Option C is correct because:
1. AWS Glue is serverless and handles large-scale ETL jobs efficiently.
2. Amazon Redshift Serverless provides MPP capabilities without managing infrastructure.
3. Redshift ML allows training ML models directly using SQL, aligning with the analysts' workflow.

Other options fail because:
- A: Uses provisioned Redshift (not serverless).
- B: Relies on non-serverless EMR and SageMaker (not SQL-based ML).
- D: Uses Athena (not a data warehouse) and Lambda (unsuitable for large data).

Key Points:
- Use serverless services (Glue, Redshift Serverless).
- Redshift ML enables SQL-based ML training.
- MPP data warehouse requirement is met by Redshift.

Answer

The correct answer is: C