ArchitectureMessage Queues

Message queues enable asynchronous communication between services by decoupling producers (senders) from consumers (receivers). They're essential for building resilient, scalable systems that handle load spikes and service failures gracefully.

Why Message Queues?

Synchronous communication (direct HTTP calls) creates tight coupling:

  • If the consumer is slow, the producer waits
  • If the consumer is down, the producer fails
  • Traffic spikes propagate directly to all consumers

A message queue solves all three:

  • Decoupling: Producer and consumer don't need to be up simultaneously
  • Buffering: Queue absorbs traffic spikes; consumers process at their own pace
  • Reliability: Messages persist in the queue until acknowledged
  • Load leveling: Smooth out bursty traffic patterns

Core Concepts

Producer

Publishes messages to the queue. Doesn't know or care who consumes them or when.

Queue / Topic

The durable buffer that holds messages. Messages persist until consumed and acknowledged.

Consumer

Reads messages from the queue and processes them. Acknowledges successful processing.

Acknowledgment

After processing, the consumer sends an ACK. The queue then deletes the message. If the consumer crashes before ACKing, the queue re-delivers the message (at-least-once delivery).

Delivery Guarantees

| Guarantee | Description | Trade-off | |---|---|---| | At-most-once | Message delivered 0 or 1 times | Can lose messages | | At-least-once | Message delivered 1+ times | Can have duplicates | | Exactly-once | Message delivered exactly once | Complex, slower |

Most systems use at-least-once delivery and design consumers to be idempotent (processing the same message twice has no extra effect).

Queue Patterns

Work Queue (Competing Consumers)

Multiple consumers compete for messages from a single queue. Each message is processed by exactly one consumer. Used for task distribution and parallel processing.

Producer → [Queue] → Consumer A
                   → Consumer B  (one of them gets each message)
                   → Consumer C

Best for: Background job processing, image resizing, email sending

Pub/Sub (Fan-Out)

One message is delivered to all subscribers. Producer publishes to a topic; each subscriber gets a copy.

Producer → [Topic] → Consumer A (all get the message)
                   → Consumer B
                   → Consumer C

Best for: Event broadcasting, cache invalidation, notifications to multiple services

Popular Queue Systems

| System | Strengths | Best For | |---|---|---| | Kafka | High throughput, durable log, replay | Event streaming, audit logs, millions of msg/sec | | RabbitMQ | Flexible routing, mature | General task queues, complex routing needs | | Amazon SQS | Fully managed, simple | AWS-native apps, serverless | | Amazon SNS | Pub/sub fan-out | Notifications, multi-subscriber events | | Redis Streams | Simple, fast, in-memory | Low-latency queuing |

Kafka Deep Dive

Kafka is the industry standard for high-throughput event streaming.

Key concepts:

  • Topic: Named stream of records (like a category)
  • Partition: A topic is split into ordered, immutable partitions for parallelism
  • Offset: Each message has an offset (position) in its partition. Consumers track their offset.
  • Consumer Group: Multiple consumers sharing a topic's partitions. Each partition is consumed by one member of the group.
  • Retention: Messages are retained for a configurable period (days/weeks) and can be replayed

Why Kafka is fast:

  • Sequential disk writes (fast) instead of random writes
  • Zero-copy file transfers
  • Batch compression

Common Use Cases

  • Async email/SMS sending: Order service publishes event → Notification service sends email
  • Image/video processing: Upload service queues files → Processing service resizes/transcodes
  • Event sourcing: All state changes as events in Kafka, replay to reconstruct state
  • Log aggregation: Collect logs from all services into Kafka → push to Elasticsearch
  • Stream processing: Real-time analytics on live data

Interview Tips

  • Mention message queues when you need to decouple a write-heavy or spike-prone operation: "Instead of synchronously resizing images on upload, I'd publish to a queue and process asynchronously"
  • Kafka when scale is huge (millions of events/sec) or you need event replay. SQS/RabbitMQ for simpler task queues
  • Discuss idempotency: at-least-once delivery means duplicates can happen. Design consumers to handle this
  • Dead Letter Queues (DLQ): Where messages go after failing repeated processing. Essential for production systems