UDP in System Design: Fast Connectionless Protocol

After this topic, you will be able to:

Explain UDP’s connectionless model and why it achieves lower latency than TCP
Evaluate when UDP’s lack of reliability guarantees is acceptable or even preferable
Justify the choice of UDP for specific use cases like video streaming, gaming, and DNS

TL;DR

UDP is a connectionless transport protocol that trades reliability for speed, delivering datagrams without handshakes, acknowledgments, or ordering guarantees. It’s the protocol of choice when low latency matters more than perfect delivery—think video calls, online gaming, and DNS queries. While TCP ensures every byte arrives correctly, UDP gets data there fast and lets applications decide what to do about losses.

Cheat Sheet:

No connection setup: Send immediately, no 3-way handshake
No reliability: Packets may be lost, duplicated, or arrive out of order
No congestion control: Sender doesn’t slow down for network conditions
Minimal overhead: 8-byte header vs TCP’s 20+ bytes
Use when: Latency < reliability (gaming, VoIP, live video, DNS, IoT telemetry)

Background

UDP emerged in 1980 as part of the original Internet Protocol suite, designed by David P. Reed. While TCP was built for reliable data transfer—perfect for file transfers and web pages—engineers needed something faster for applications where occasional data loss was acceptable. The problem TCP solved (guaranteed delivery) created a new problem: latency from handshakes, acknowledgments, and retransmissions.

Consider a video call: if a packet containing 20 milliseconds of audio is lost, retransmitting it 100ms later is useless—the conversation has moved on. Better to skip that packet and keep the stream flowing. This insight drove UDP’s design philosophy: provide addressing and basic error detection, but leave reliability decisions to the application layer. DNS queries exemplify this perfectly—a 512-byte question-answer exchange doesn’t need TCP’s overhead; if the response is lost, just retry the query.

UDP’s simplicity became its superpower. By stripping away TCP’s reliability machinery, UDP achieves lower latency, smaller packet overhead, and the ability to broadcast or multicast to multiple recipients simultaneously. Modern protocols like QUIC (used in HTTP/3) and WebRTC build custom reliability on top of UDP, getting the best of both worlds: UDP’s speed with application-specific reliability where needed.

Architecture

UDP operates at the transport layer (Layer 4) with a remarkably simple architecture. Unlike TCP’s stateful connections, UDP is completely stateless—the protocol maintains no information about previous datagrams or expected future ones.

The UDP datagram consists of two parts: an 8-byte header and the payload. The header contains just four fields: source port (16 bits), destination port (16 bits), length (16 bits), and checksum (16 bits). That’s it. No sequence numbers, no acknowledgment numbers, no window sizes—just enough information to route the datagram to the correct application and verify it wasn’t corrupted in transit.

When an application sends data via UDP, the operating system wraps it in a UDP header, hands it to the IP layer for routing, and forgets about it. There’s no connection state to maintain, no send buffer for retransmissions, and no receive buffer for reordering. The receiving application gets datagrams in whatever order they arrive, and it’s the application’s job to handle duplicates, losses, or out-of-order delivery.

This stateless design enables UDP’s key architectural features: broadcasting (sending to all devices on a subnet, crucial for DHCP before a client has an IP address), multicasting (efficient one-to-many delivery for live video streams), and the ability to switch between network interfaces mid-stream without breaking anything—because there’s nothing to break.

UDP Datagram Structure and Flow

graph LR
    subgraph Application Layer
        App["Application<br/><i>Video Call App</i>"]
    end
    
    subgraph Transport Layer - UDP
        Header["UDP Header<br/><i>8 bytes</i>"]
        SrcPort["Source Port<br/><i>16 bits</i>"]
        DstPort["Dest Port<br/><i>16 bits</i>"]
        Length["Length<br/><i>16 bits</i>"]
        Checksum["Checksum<br/><i>16 bits</i>"]
        Payload["Payload<br/><i>Application Data</i>"]
    end
    
    subgraph Network Layer
        IP["IP Layer<br/><i>Routing</i>"]
    end
    
    App --"1. sendto() call"--> Header
    Header --> SrcPort
    Header --> DstPort
    Header --> Length
    Header --> Checksum
    Header --"2. Attach payload"--> Payload
    Payload --"3. Pass to IP<br/>(no state stored)"--> IP
    
    IP --"4. Send immediately<br/>(no handshake)"--> Network["Network<br/><i>Ethernet/WiFi</i>"]

UDP’s minimal 8-byte header contains only essential fields for addressing and error detection. Unlike TCP, no connection state is maintained—the datagram is sent immediately and forgotten.

Internals

UDP’s internal operation is deliberately minimal. When an application calls sendto(), the kernel performs just three operations: (1) calculate the checksum over the pseudo-header (source/destination IPs from the IP layer), UDP header, and payload; (2) fragment the datagram if it exceeds the network MTU (Maximum Transmission Unit, typically 1500 bytes for Ethernet); and (3) pass it to the IP layer for routing. No state is stored.

The checksum calculation is optional in IPv4 (though almost always used) and mandatory in IPv6. It uses a simple 16-bit one’s complement sum, which catches most transmission errors but isn’t cryptographically secure. If the checksum fails on receipt, the datagram is silently dropped—no error notification to the sender.

Fragmentation deserves special attention because it’s a common source of problems. If your application sends a 5000-byte UDP datagram, the IP layer fragments it into multiple IP packets (roughly four 1500-byte packets). If any fragment is lost, the entire datagram is lost—UDP has no mechanism to request retransmission of just the missing fragment. This is why applications using UDP typically keep datagrams under the MTU (around 1400 bytes to account for IP and UDP headers) to avoid fragmentation.

On the receive side, the kernel maintains a receive queue per socket. When a datagram arrives, it’s placed in the queue if there’s space; otherwise, it’s dropped. The application retrieves datagrams with recvfrom(), which returns the sender’s address—enabling request-response patterns like DNS without maintaining connection state. If the application doesn’t read fast enough, the queue fills and packets are lost. The kernel provides no backpressure signal to the sender; it’s the application’s responsibility to size buffers appropriately.

UDP sockets can be connected (binding to a specific remote address) or unconnected. Connected UDP sockets are slightly more efficient because the kernel doesn’t need to look up the destination on every send, but they can still receive datagrams from any source unless explicitly filtered.

UDP Fragmentation and MTU Impact

graph TB
    subgraph Application Sends 5000-byte Datagram
        App["Application<br/><i>sendto(5000 bytes)</i>"]
    end
    
    subgraph IP Layer Fragmentation
        Frag1["Fragment 1<br/><i>1500 bytes</i>"]
        Frag2["Fragment 2<br/><i>1500 bytes</i>"]
        Frag3["Fragment 3<br/><i>1500 bytes</i>"]
        Frag4["Fragment 4<br/><i>528 bytes</i>"]
    end
    
    subgraph Network Transmission
        Net1["✓ Delivered"]
        Net2["❌ Lost"]
        Net3["✓ Delivered"]
        Net4["✓ Delivered"]
    end
    
    subgraph Receiver
        Drop["Entire Datagram<br/>DROPPED<br/><i>Cannot reassemble</i>"]
    end
    
    App --"Exceeds MTU<br/>(1500 bytes)"--> Frag1
    App --> Frag2
    App --> Frag3
    App --> Frag4
    
    Frag1 --> Net1
    Frag2 --> Net2
    Frag3 --> Net3
    Frag4 --> Net4
    
    Net1 & Net2 & Net3 & Net4 --"Missing fragment"--> Drop
    
    Note["Best Practice:<br/>Keep datagrams ≤ 1400 bytes<br/>to avoid fragmentation"]

When a UDP datagram exceeds the network MTU, IP fragments it into multiple packets. If any fragment is lost, the entire datagram is discarded—UDP cannot request retransmission of individual fragments. Applications should keep datagrams under 1400 bytes to avoid this issue.

Performance Characteristics

UDP’s performance advantage over TCP is dramatic in latency-sensitive scenarios. A TCP connection requires a 3-way handshake (1.5 round trips) before any data flows; UDP sends immediately. For a client 50ms from the server, TCP adds 75ms of latency before the first byte of application data is transmitted. For a DNS query, UDP delivers the response in one round trip (100ms total); TCP would need 200ms minimum.

Throughput-wise, UDP’s 8-byte header versus TCP’s minimum 20-byte header (often 32+ with options) means 0.5% less overhead on a 1500-byte packet—negligible for large transfers but meaningful for small, frequent messages. More importantly, UDP doesn’t implement congestion control. While this sounds dangerous, it’s actually essential for real-time applications. TCP’s congestion control can cut throughput by 50% when it detects packet loss, causing video streams to stutter. UDP maintains constant throughput, letting the application decide how to handle congestion (reduce video quality, drop frames, etc.).

Packet loss tolerance varies by application. Voice calls remain intelligible with 5% loss; video streaming uses forward error correction to handle 10-20% loss; online games can tolerate 2-3% loss by interpolating player positions. DNS queries simply retry after a timeout (typically 2-5 seconds).

Scalability is where UDP truly shines. A single UDP socket can handle millions of clients without per-connection state. The Cloudflare DNS resolver (1.1.1.1) handles over 1 trillion DNS queries per day using UDP, with each server processing millions of queries per second. TCP would require maintaining millions of connection states, consuming gigabytes of memory.

The performance ceiling for UDP is typically limited by the application’s processing speed, not the protocol. A well-tuned UDP server on modern hardware can process 10+ million packets per second per core using kernel bypass techniques (DPDK, XDP).

TCP vs UDP Latency Comparison for Real-Time Communication

sequenceDiagram
    participant Client
    participant Server
    
    Note over Client,Server: TCP Connection (150ms total before data)
    Client->>Server: 1. SYN (50ms)
    Server->>Client: 2. SYN-ACK (50ms)
    Client->>Server: 3. ACK (50ms)
    Client->>Server: 4. Application Data
    Server->>Client: 5. Response Data
    Note over Client,Server: Total: 250ms for first response
    
    Note over Client,Server: <br/>UDP Communication (100ms total)
    Client->>Server: 1. UDP Datagram (50ms)
    Server->>Client: 2. UDP Response (50ms)
    Note over Client,Server: Total: 100ms for response
    
    Note over Client,Server: <br/>UDP with Packet Loss
    Client->>Server: 1. UDP Datagram (50ms)
    Client-xServer: 2. Response Lost ❌
    Note over Client: Timeout (2-5 sec)
    Client->>Server: 3. Retry (50ms)
    Server->>Client: 4. Response (50ms)
    Note over Client,Server: Total: 2.1-5.1 seconds with retry

UDP eliminates TCP’s 3-way handshake, reducing latency by 60% for single request-response exchanges like DNS queries. However, applications must implement their own retry logic for lost packets.

Trade-offs

UDP’s primary trade-off is reliability for speed. You get lower latency and higher throughput, but you accept that packets may be lost, duplicated, or arrive out of order. This is perfect for real-time applications where old data is worthless, but problematic for applications requiring perfect delivery.

The lack of congestion control is both a strength and a weakness. UDP won’t slow down when the network is congested, which is great for maintaining consistent latency in video calls. But it also means poorly written UDP applications can flood the network, causing problems for everyone. This is why protocols like QUIC and WebRTC implement their own congestion control on top of UDP—they get UDP’s low latency while being good network citizens.

UDP’s simplicity means you must implement reliability yourself if you need it. For DNS, this is trivial: set a timeout and retry. For video streaming, you might use forward error correction (sending redundant data so losses can be recovered without retransmission). For gaming, you might send position updates every 50ms and let the client interpolate missing updates. Each application needs a custom solution.

Security is another consideration. UDP is vulnerable to amplification attacks: an attacker sends small requests with a spoofed source IP, and the server sends large responses to the victim. DNS and NTP have both been exploited this way. TCP’s handshake makes this harder (though not impossible). Modern UDP services implement rate limiting and response size restrictions to mitigate this.

The lack of flow control means a fast sender can overwhelm a slow receiver. TCP automatically slows down; UDP just drops packets at the receiver’s queue. Applications must implement their own pacing mechanisms or risk self-inflicted packet loss.

Building Reliability on Top of UDP

sequenceDiagram
    participant Client
    participant Server
    
    Note over Client,Server: Custom Reliable UDP Protocol
    
    Client->>Server: Seq=1: Data Packet 1
    Client->>Server: Seq=2: Data Packet 2
    Client->>Server: Seq=3: Data Packet 3
    Note over Server: Packet 2 lost ❌
    Server->>Client: ACK: [1, 3] (selective)
    
    Note over Client: Detect missing Seq=2
    Client->>Server: Seq=2: Retransmit Packet 2
    Client->>Server: Seq=4: Data Packet 4
    
    Server->>Client: ACK: [2, 4] (selective)
    
    Note over Client,Server: <br/>Alternative: Forward Error Correction
    Client->>Server: Data Packets 1-10
    Client->>Server: FEC Packets A-C (redundant)
    Note over Server: Can recover 1-2 lost packets<br/>from FEC without retransmission

Applications can implement custom reliability on UDP using selective acknowledgments (like QUIC) or forward error correction (like video streaming). This provides UDP’s low latency while adding reliability only where needed, avoiding TCP’s head-of-line blocking.

When to Use (and When Not To)

Choose UDP when latency matters more than reliability and you can tolerate some data loss. Specific scenarios:

Real-time communications: VoIP, video conferencing, live streaming. Zoom, Google Meet, and Twitch all use UDP (via WebRTC or custom protocols) because a delayed packet is worse than a lost packet. If you’re building anything where humans are waiting for real-time feedback, UDP is likely the right choice.

Online gaming: Multiplayer games send frequent state updates (player positions, actions) where the latest update supersedes previous ones. Fortnite and Call of Duty use UDP because retransmitting a 100ms-old position update is pointless—the player has moved. TCP’s head-of-line blocking (waiting for a lost packet before delivering subsequent packets) would make games feel laggy.

DNS and service discovery: Short request-response exchanges where retrying is cheaper than connection setup. DNS queries are typically under 512 bytes and complete in one round trip. TCP would triple the latency.

IoT and telemetry: Sensors sending frequent measurements where occasional loss is acceptable. A temperature sensor sending readings every 10 seconds doesn’t need TCP’s reliability—if one reading is lost, the next one arrives soon. UDP’s lower overhead also saves battery life.

Broadcasting and multicasting: DHCP, IPTV, and network discovery protocols need to reach multiple recipients simultaneously. TCP’s one-to-one model doesn’t support this.

Avoid UDP when: You need guaranteed delivery (file transfers, financial transactions, API calls), ordered delivery matters (chat messages, database replication), or you’re building a general-purpose protocol and don’t want to reinvent reliability (use TCP or QUIC instead).

For comparison: TCP for web APIs and database connections; QUIC (UDP-based) for HTTP/3; SCTP for telecom signaling. See TCP for reliable transport alternatives.

Real-World Examples

Discord (Voice and Video): Discord handles millions of concurrent voice channels using UDP for media streams. They initially used WebRTC but switched to a custom UDP protocol for better control over codec selection and bandwidth adaptation. When packet loss exceeds 5%, they automatically reduce audio quality rather than letting TCP’s retransmissions cause stuttering. Their edge servers use UDP’s stateless nature to handle 10,000+ concurrent voice connections per server without per-connection overhead.

Cloudflare DNS (1.1.1.1): Cloudflare’s DNS resolver processes over 1 trillion queries per day using UDP. Each query is a single datagram (typically 30-100 bytes), and responses are usually under 512 bytes to avoid fragmentation. They implement aggressive rate limiting to prevent amplification attacks: if a client sends more than 100 queries per second, responses are rate-limited. UDP’s stateless design lets them scale horizontally—any server can answer any query without coordination.

Netflix’s Streaming Telemetry: While Netflix uses TCP for video delivery (via HTTP), they use UDP for real-time telemetry from client devices. Millions of devices send playback metrics (buffer health, bitrate changes, errors) via UDP every few seconds. Occasional packet loss is acceptable because they’re looking for aggregate trends, not perfect data from every device. This reduces server load by 40% compared to TCP-based telemetry and provides faster insights into streaming quality issues.

UDP Scalability: Cloudflare DNS Architecture

graph TB
    subgraph Internet
        Client1["Client 1<br/><i>DNS Query</i>"]
        Client2["Client 2<br/><i>DNS Query</i>"]
        Client3["Client 3<br/><i>DNS Query</i>"]
        ClientN["... millions more"]
    end
    
    subgraph Cloudflare Edge - Single Server
        Anycast["Anycast IP: 1.1.1.1<br/><i>UDP Port 53</i>"]
        
        subgraph Stateless Processing
            Queue["Receive Queue<br/><i>~1M packets/sec</i>"]
            Worker1["Worker Thread 1"]
            Worker2["Worker Thread 2"]
            Worker3["Worker Thread 3"]
            WorkerN["Worker Thread N"]
        end
        
        Cache["DNS Cache<br/><i>In-Memory</i>"]
    end
    
    Client1 & Client2 & Client3 & ClientN --"UDP datagrams<br/>(30-100 bytes)"--> Anycast
    Anycast --> Queue
    Queue --> Worker1 & Worker2 & Worker3 & WorkerN
    Worker1 & Worker2 & Worker3 & WorkerN <--"Lookup"--> Cache
    Worker1 & Worker2 & Worker3 & WorkerN --"UDP responses<br/>(no connection state)"--> Client1 & Client2 & Client3 & ClientN
    
    Note["Key: No per-client state<br/>10,000+ concurrent clients<br/>per server with minimal memory"]

Cloudflare’s DNS resolver handles over 1 trillion queries per day using UDP’s stateless design. Each server processes millions of queries per second without maintaining per-client connection state, enabling massive horizontal scalability that would be impossible with TCP’s connection overhead.

Interview Essentials

Mid-Level

Explain UDP’s connectionless model and why it’s faster than TCP. Describe the UDP header structure (source port, destination port, length, checksum) and what each field does. Discuss when you’d choose UDP over TCP with concrete examples (DNS, video streaming). Explain what happens when a UDP packet is lost—nothing, the application must detect and handle it. Understand that UDP doesn’t guarantee ordering, so packets can arrive out of sequence.

Senior

Analyze UDP’s trade-offs in depth: why no congestion control is both a feature and a risk. Explain how to build reliability on top of UDP (selective acknowledgments, forward error correction, sequence numbers). Discuss UDP’s role in modern protocols like QUIC and WebRTC—why they chose UDP as a foundation. Calculate the overhead difference between TCP and UDP for small packets (e.g., 100-byte payload: UDP = 8% overhead, TCP = 20%+). Explain UDP amplification attacks and mitigation strategies (rate limiting, response size limits, source validation).

Staff+

Design a custom reliable protocol on top of UDP, explaining your choices for acknowledgment strategy, congestion control, and flow control. Discuss UDP’s performance at scale: kernel bypass techniques (DPDK, XDP), receive queue sizing, and CPU affinity for packet processing. Explain why QUIC chose UDP over creating a new transport protocol (middlebox compatibility, faster iteration). Analyze the latency breakdown for UDP vs TCP in a real-time application (e.g., gaming: UDP = 50ms one-way, TCP = 125ms due to handshake + head-of-line blocking). Discuss UDP’s limitations for mobile networks (NAT traversal, cellular network buffering) and how protocols like WebRTC handle them.

Common Interview Questions

Why doesn’t UDP have a handshake like TCP? (No connection state to establish; send immediately)

How do you handle packet loss with UDP? (Application-layer retransmission, FEC, or accept loss)

Can UDP be used for reliable data transfer? (Yes, with custom reliability layer—see QUIC)

Why is UDP faster than TCP? (No handshake, no acknowledgments, no head-of-line blocking)

What’s the maximum size of a UDP datagram? (Theoretically 65,507 bytes, practically ~1400 to avoid fragmentation)

How does DNS use UDP? (Single request-response datagram, retry on timeout)

What’s UDP hole punching? (NAT traversal technique for peer-to-peer connections)

Red Flags to Avoid

Claiming UDP is ‘unreliable’ without explaining it’s a design choice, not a flaw

Not understanding that UDP provides no ordering guarantees

Thinking UDP is only for ‘unimportant’ data (it’s for time-sensitive data)

Not knowing about UDP amplification attacks and mitigation

Confusing UDP’s lack of congestion control with ‘unlimited speed’

Not understanding why fragmentation is problematic for UDP

Claiming UDP is always faster than TCP without context (bulk transfers favor TCP)

Key Takeaways

UDP trades reliability for speed: No handshakes, acknowledgments, or retransmissions means lower latency but no delivery guarantees. Perfect for real-time applications where old data is worthless.

Stateless and minimal overhead: 8-byte header and no connection state enable massive scalability (millions of clients per server) and features like broadcasting and multicasting.

Application-layer responsibility: UDP provides addressing and basic error detection; applications must implement their own reliability, ordering, and congestion control if needed.

Fragmentation is the enemy: Keep datagrams under ~1400 bytes to avoid IP fragmentation, which causes entire datagram loss if any fragment is lost.

Modern protocols build on UDP: QUIC (HTTP/3), WebRTC, and custom gaming protocols use UDP as a foundation, adding reliability where needed while keeping UDP’s low latency.