AWS Certified Developer – Associate / Question #836 of 557

Question #836

A video processing platform uses Amazon S3 to store video files. Each uploaded video is analyzed by an external machine learning service to generate metadata, which takes between 1 to 24 hours. The results are stored in an Amazon DynamoDB table with the S3 object key as the primary key. The application must automatically tag each S3 object with the generated metadata once available.

What should a developer do to meet this requirement in the MOST operationally efficient manner?

A

Create an AWS Lambda function triggered by s3:ObjectCreated events. The function writes the S3 key to an Amazon SQS queue with a 24-hour visibility timeout. A second Lambda function polls the queue, retrieves metadata from DynamoDB, and tags the S3 object.

B

Develop an AWS Lambda function triggered by s3:ObjectCreated events. Integrate this into an AWS Step Functions workflow with a Wait state set to 24 hours. A second Lambda function retrieves metadata from DynamoDB and applies the tags after the wait.

C

Implement a Lambda function that lists all untagged S3 objects, fetches metadata via a REST API, and tags them. Use an Amazon EventBridge scheduled rule to invoke this function periodically.

D

Deploy a script on an EC2 instance that queries DynamoDB for new metadata and tags S3 objects. Use crontab to run the script hourly.

Explanation

Option B is correct because AWS Step Functions provides a managed workflow to handle delays and retries efficiently. When an S3 object is created, a Lambda triggers a Step Functions workflow with a Wait state set to 24 hours. After the wait, a second Lambda retrieves metadata from DynamoDB and tags the S3 object. This approach ensures the system waits the maximum required time (24 hours) for the external ML service to populate metadata, avoiding premature checks. It is operationally efficient as it uses serverless components without manual infrastructure management.

Other options are less efficient:
- A: Using SQS with a 24-hour visibility timeout risks delayed processing if metadata is ready before the timeout expires.
- C: Periodic checks via EventBridge are resource-intensive and may introduce unnecessary delays.
- D: EC2 with crontab requires managing infrastructure, reducing operational efficiency.

Key points: Use Step Functions to manage delays in serverless workflows; avoid polling or manual infrastructure for operational efficiency.

Answer

The correct answer is: B