Question #1175
A company receives batch data from multiple relational databases and real-time streaming data from IoT devices. The company requires a solution to unify all data into a central repository for analytics. The solution must process incoming data, store it in designated Amazon S3 buckets, and support ad-hoc queries for integration with business intelligence (BI) tools to visualize key performance indicators (KPIs) with minimal operational effort. Which combination of steps meets these requirements with the LEAST operational overhead? (Choose two.)
Use Amazon Athena for ad-hoc queries. Use Amazon QuickSight to create KPI dashboards.
Deploy Amazon Kinesis Data Analytics to process streaming data. Use Amazon Redshift Spectrum for querying data directly in S3.
Develop custom AWS Lambda functions to transform and load data into Amazon RDS instances.
Use AWS Glue ETL jobs to process batch data into Parquet format. Use Amazon Kinesis Data Firehose to ingest streaming data into Amazon S3.
Configure AWS Glue crawlers to catalog data sources. Use AWS Lake Formation to automate data lake setup and enforce access controls.
Explanation
A and D are correct because:
- A: Amazon Athena allows ad-hoc SQL queries directly on S3 data without managing infrastructure, and QuickSight is AWS's native BI tool for visualization, aligning with the requirement for minimal effort.
- D: AWS Glue ETL automates batch data processing into Parquet (storage-efficient format), and Kinesis Data Firehose ingests streaming data into S3 without custom code, reducing operational overhead.
Other options are incorrect because:
- B: Redshift Spectrum requires a Redshift cluster, adding operational complexity compared to serverless Athena.
- C: Storing data in RDS contradicts the S3 central repository requirement.
- E: While Glue crawlers and Lake Formation assist with cataloging and access control, they do not address data processing or ingestion.
Key Points:
1. Use managed services (Glue, Firehose, Athena) to minimize operational effort.
2. Store data in S3 for scalability and cost-effectiveness.
3. Optimize data formats (Parquet) for efficient analytics.
Answer
The correct answer is: AD