Question #805
A company has implemented a payment processing system using an event-driven architecture. During initial testing, the system stopped processing transactions. Log analysis revealed that one transaction message in an Amazon Simple Queue Service (Amazon SQS) standard queue was causing a persistent error on the backend, blocking all subsequent messages. The visibility timeout of the queue is set to 120 seconds, and the backend processing timeout is set to 30 seconds. A solutions architect needs to analyze the faulty transaction messages and ensure the system continues to process subsequent messages.
Which step should the solutions architect take to meet these requirements?
Increase the backend processing timeout to 120 seconds to match the visibility timeout.
Reduce the visibility timeout of the queue to 30 seconds to automatically remove the faulty message.
Configure a new SQS FIFO queue as a dead-letter queue to isolate the faulty messages.
Configure a new SQS standard queue as a dead-letter queue to isolate the faulty messages.
Explanation
The issue arises because a faulty message in the SQS standard queue causes repeated processing failures, blocking subsequent messages. The visibility timeout (120s) exceeds the backend processing timeout (30s), leading to message retries. To resolve this:
- Option D is correct: Configuring a standard queue as a DLQ allows the main queue to move faulty messages to the DLQ after a set number of retries (via a redrive policy). This isolates the problematic message, ensuring the main queue continues processing other messages.
- Option A is incorrect: Increasing the backend timeout to 120s does not resolve the faulty message issue; it only aligns timeouts, risking prolonged processing delays.
- Option B is incorrect: Reducing the visibility timeout to 30s risks message reappearance before processing completes, causing duplicates without isolating the faulty message.
- Option C is incorrect: FIFO queues are unnecessary for DLQs unless ordering is required, which isn't specified here. A standard DLQ suffices.
Key Points: Use DLQs to handle poison pills in SQS. Standard queues are suitable for DLQs unless FIFO features are needed. Configure redrive policies to limit retries and ensure system resilience.
Answer
The correct answer is: D