UDP in System Design: Fast Connectionless Protocol
After this topic, you will be able to:
- Explain UDP’s connectionless model and why it achieves lower latency than TCP
- Evaluate when UDP’s lack of reliability guarantees is acceptable or even preferable
- Justify the choice of UDP for specific use cases like video streaming, gaming, and DNS
TL;DR
UDP is a connectionless transport protocol that trades reliability for speed, delivering datagrams without handshakes, acknowledgments, or ordering guarantees. It’s the protocol of choice when low latency matters more than perfect delivery—think video calls, online gaming, and DNS queries. While TCP ensures every byte arrives correctly, UDP gets data there fast and lets applications decide what to do about losses.
Cheat Sheet:
- No connection setup: Send immediately, no 3-way handshake
- No reliability: Packets may be lost, duplicated, or arrive out of order
- No congestion control: Sender doesn’t slow down for network conditions
- Minimal overhead: 8-byte header vs TCP’s 20+ bytes
- Use when: Latency < reliability (gaming, VoIP, live video, DNS, IoT telemetry)
Background
UDP emerged in 1980 as part of the original Internet Protocol suite, designed by David P. Reed. While TCP was built for reliable data transfer—perfect for file transfers and web pages—engineers needed something faster for applications where occasional data loss was acceptable. The problem TCP solved (guaranteed delivery) created a new problem: latency from handshakes, acknowledgments, and retransmissions.
Consider a video call: if a packet containing 20 milliseconds of audio is lost, retransmitting it 100ms later is useless—the conversation has moved on. Better to skip that packet and keep the stream flowing. This insight drove UDP’s design philosophy: provide addressing and basic error detection, but leave reliability decisions to the application layer. DNS queries exemplify this perfectly—a 512-byte question-answer exchange doesn’t need TCP’s overhead; if the response is lost, just retry the query.
UDP’s simplicity became its superpower. By stripping away TCP’s reliability machinery, UDP achieves lower latency, smaller packet overhead, and the ability to broadcast or multicast to multiple recipients simultaneously. Modern protocols like QUIC (used in HTTP/3) and WebRTC build custom reliability on top of UDP, getting the best of both worlds: UDP’s speed with application-specific reliability where needed.
Architecture
UDP operates at the transport layer (Layer 4) with a remarkably simple architecture. Unlike TCP’s stateful connections, UDP is completely stateless—the protocol maintains no information about previous datagrams or expected future ones.
The UDP datagram consists of two parts: an 8-byte header and the payload. The header contains just four fields: source port (16 bits), destination port (16 bits), length (16 bits), and checksum (16 bits). That’s it. No sequence numbers, no acknowledgment numbers, no window sizes—just enough information to route the datagram to the correct application and verify it wasn’t corrupted in transit.
When an application sends data via UDP, the operating system wraps it in a UDP header, hands it to the IP layer for routing, and forgets about it. There’s no connection state to maintain, no send buffer for retransmissions, and no receive buffer for reordering. The receiving application gets datagrams in whatever order they arrive, and it’s the application’s job to handle duplicates, losses, or out-of-order delivery.
This stateless design enables UDP’s key architectural features: broadcasting (sending to all devices on a subnet, crucial for DHCP before a client has an IP address), multicasting (efficient one-to-many delivery for live video streams), and the ability to switch between network interfaces mid-stream without breaking anything—because there’s nothing to break.
UDP Datagram Structure and Flow
graph LR
subgraph Application Layer
App["Application<br/><i>Video Call App</i>"]
end
subgraph Transport Layer - UDP
Header["UDP Header<br/><i>8 bytes</i>"]
SrcPort["Source Port<br/><i>16 bits</i>"]
DstPort["Dest Port<br/><i>16 bits</i>"]
Length["Length<br/><i>16 bits</i>"]
Checksum["Checksum<br/><i>16 bits</i>"]
Payload["Payload<br/><i>Application Data</i>"]
end
subgraph Network Layer
IP["IP Layer<br/><i>Routing</i>"]
end
App --"1. sendto() call"--> Header
Header --> SrcPort
Header --> DstPort
Header --> Length
Header --> Checksum
Header --"2. Attach payload"--> Payload
Payload --"3. Pass to IP<br/>(no state stored)"--> IP
IP --"4. Send immediately<br/>(no handshake)"--> Network["Network<br/><i>Ethernet/WiFi</i>"]
UDP’s minimal 8-byte header contains only essential fields for addressing and error detection. Unlike TCP, no connection state is maintained—the datagram is sent immediately and forgotten.
Internals
UDP’s internal operation is deliberately minimal. When an application calls sendto(), the kernel performs just three operations: (1) calculate the checksum over the pseudo-header (source/destination IPs from the IP layer), UDP header, and payload; (2) fragment the datagram if it exceeds the network MTU (Maximum Transmission Unit, typically 1500 bytes for Ethernet); and (3) pass it to the IP layer for routing. No state is stored.
The checksum calculation is optional in IPv4 (though almost always used) and mandatory in IPv6. It uses a simple 16-bit one’s complement sum, which catches most transmission errors but isn’t cryptographically secure. If the checksum fails on receipt, the datagram is silently dropped—no error notification to the sender.
Fragmentation deserves special attention because it’s a common source of problems. If your application sends a 5000-byte UDP datagram, the IP layer fragments it into multiple IP packets (roughly four 1500-byte packets). If any fragment is lost, the entire datagram is lost—UDP has no mechanism to request retransmission of just the missing fragment. This is why applications using UDP typically keep datagrams under the MTU (around 1400 bytes to account for IP and UDP headers) to avoid fragmentation.
On the receive side, the kernel maintains a receive queue per socket. When a datagram arrives, it’s placed in the queue if there’s space; otherwise, it’s dropped. The application retrieves datagrams with recvfrom(), which returns the sender’s address—enabling request-response patterns like DNS without maintaining connection state. If the application doesn’t read fast enough, the queue fills and packets are lost. The kernel provides no backpressure signal to the sender; it’s the application’s responsibility to size buffers appropriately.
UDP sockets can be connected (binding to a specific remote address) or unconnected. Connected UDP sockets are slightly more efficient because the kernel doesn’t need to look up the destination on every send, but they can still receive datagrams from any source unless explicitly filtered.
UDP Fragmentation and MTU Impact
graph TB
subgraph Application Sends 5000-byte Datagram
App["Application<br/><i>sendto(5000 bytes)</i>"]
end
subgraph IP Layer Fragmentation
Frag1["Fragment 1<br/><i>1500 bytes</i>"]
Frag2["Fragment 2<br/><i>1500 bytes</i>"]
Frag3["Fragment 3<br/><i>1500 bytes</i>"]
Frag4["Fragment 4<br/><i>528 bytes</i>"]
end
subgraph Network Transmission
Net1["✓ Delivered"]
Net2["❌ Lost"]
Net3["✓ Delivered"]
Net4["✓ Delivered"]
end
subgraph Receiver
Drop["Entire Datagram<br/>DROPPED<br/><i>Cannot reassemble</i>"]
end
App --"Exceeds MTU<br/>(1500 bytes)"--> Frag1
App --> Frag2
App --> Frag3
App --> Frag4
Frag1 --> Net1
Frag2 --> Net2
Frag3 --> Net3
Frag4 --> Net4
Net1 & Net2 & Net3 & Net4 --"Missing fragment"--> Drop
Note["Best Practice:<br/>Keep datagrams ≤ 1400 bytes<br/>to avoid fragmentation"]
When a UDP datagram exceeds the network MTU, IP fragments it into multiple packets. If any fragment is lost, the entire datagram is discarded—UDP cannot request retransmission of individual fragments. Applications should keep datagrams under 1400 bytes to avoid this issue.
Performance Characteristics
UDP’s performance advantage over TCP is dramatic in latency-sensitive scenarios. A TCP connection requires a 3-way handshake (1.5 round trips) before any data flows; UDP sends immediately. For a client 50ms from the server, TCP adds 75ms of latency before the first byte of application data is transmitted. For a DNS query, UDP delivers the response in one round trip (100ms total); TCP would need 200ms minimum.
Throughput-wise, UDP’s 8-byte header versus TCP’s minimum 20-byte header (often 32+ with options) means 0.5% less overhead on a 1500-byte packet—negligible for large transfers but meaningful for small, frequent messages. More importantly, UDP doesn’t implement congestion control. While this sounds dangerous, it’s actually essential for real-time applications. TCP’s congestion control can cut throughput by 50% when it detects packet loss, causing video streams to stutter. UDP maintains constant throughput, letting the application decide how to handle congestion (reduce video quality, drop frames, etc.).
Packet loss tolerance varies by application. Voice calls remain intelligible with 5% loss; video streaming uses forward error correction to handle 10-20% loss; online games can tolerate 2-3% loss by interpolating player positions. DNS queries simply retry after a timeout (typically 2-5 seconds).
Scalability is where UDP truly shines. A single UDP socket can handle millions of clients without per-connection state. The Cloudflare DNS resolver (1.1.1.1) handles over 1 trillion DNS queries per day using UDP, with each server processing millions of queries per second. TCP would require maintaining millions of connection states, consuming gigabytes of memory.
The performance ceiling for UDP is typically limited by the application’s processing speed, not the protocol. A well-tuned UDP server on modern hardware can process 10+ million packets per second per core using kernel bypass techniques (DPDK, XDP).
TCP vs UDP Latency Comparison for Real-Time Communication
sequenceDiagram
participant Client
participant Server
Note over Client,Server: TCP Connection (150ms total before data)
Client->>Server: 1. SYN (50ms)
Server->>Client: 2. SYN-ACK (50ms)
Client->>Server: 3. ACK (50ms)
Client->>Server: 4. Application Data
Server->>Client: 5. Response Data
Note over Client,Server: Total: 250ms for first response
Note over Client,Server: <br/>UDP Communication (100ms total)
Client->>Server: 1. UDP Datagram (50ms)
Server->>Client: 2. UDP Response (50ms)
Note over Client,Server: Total: 100ms for response
Note over Client,Server: <br/>UDP with Packet Loss
Client->>Server: 1. UDP Datagram (50ms)
Client-xServer: 2. Response Lost ❌
Note over Client: Timeout (2-5 sec)
Client->>Server: 3. Retry (50ms)
Server->>Client: 4. Response (50ms)
Note over Client,Server: Total: 2.1-5.1 seconds with retry
UDP eliminates TCP’s 3-way handshake, reducing latency by 60% for single request-response exchanges like DNS queries. However, applications must implement their own retry logic for lost packets.
Trade-offs
UDP’s primary trade-off is reliability for speed. You get lower latency and higher throughput, but you accept that packets may be lost, duplicated, or arrive out of order. This is perfect for real-time applications where old data is worthless, but problematic for applications requiring perfect delivery.
The lack of congestion control is both a strength and a weakness. UDP won’t slow down when the network is congested, which is great for maintaining consistent latency in video calls. But it also means poorly written UDP applications can flood the network, causing problems for everyone. This is why protocols like QUIC and WebRTC implement their own congestion control on top of UDP—they get UDP’s low latency while being good network citizens.
UDP’s simplicity means you must implement reliability yourself if you need it. For DNS, this is trivial: set a timeout and retry. For video streaming, you might use forward error correction (sending redundant data so losses can be recovered without retransmission). For gaming, you might send position updates every 50ms and let the client interpolate missing updates. Each application needs a custom solution.
Security is another consideration. UDP is vulnerable to amplification attacks: an attacker sends small requests with a spoofed source IP, and the server sends large responses to the victim. DNS and NTP have both been exploited this way. TCP’s handshake makes this harder (though not impossible). Modern UDP services implement rate limiting and response size restrictions to mitigate this.
The lack of flow control means a fast sender can overwhelm a slow receiver. TCP automatically slows down; UDP just drops packets at the receiver’s queue. Applications must implement their own pacing mechanisms or risk self-inflicted packet loss.
Building Reliability on Top of UDP
sequenceDiagram
participant Client
participant Server
Note over Client,Server: Custom Reliable UDP Protocol
Client->>Server: Seq=1: Data Packet 1
Client->>Server: Seq=2: Data Packet 2
Client->>Server: Seq=3: Data Packet 3
Note over Server: Packet 2 lost ❌
Server->>Client: ACK: [1, 3] (selective)
Note over Client: Detect missing Seq=2
Client->>Server: Seq=2: Retransmit Packet 2
Client->>Server: Seq=4: Data Packet 4
Server->>Client: ACK: [2, 4] (selective)
Note over Client,Server: <br/>Alternative: Forward Error Correction
Client->>Server: Data Packets 1-10
Client->>Server: FEC Packets A-C (redundant)
Note over Server: Can recover 1-2 lost packets<br/>from FEC without retransmission
Applications can implement custom reliability on UDP using selective acknowledgments (like QUIC) or forward error correction (like video streaming). This provides UDP’s low latency while adding reliability only where needed, avoiding TCP’s head-of-line blocking.
When to Use (and When Not To)
Choose UDP when latency matters more than reliability and you can tolerate some data loss. Specific scenarios:
Real-time communications: VoIP, video conferencing, live streaming. Zoom, Google Meet, and Twitch all use UDP (via WebRTC or custom protocols) because a delayed packet is worse than a lost packet. If you’re building anything where humans are waiting for real-time feedback, UDP is likely the right choice.
Online gaming: Multiplayer games send frequent state updates (player positions, actions) where the latest update supersedes previous ones. Fortnite and Call of Duty use UDP because retransmitting a 100ms-old position update is pointless—the player has moved. TCP’s head-of-line blocking (waiting for a lost packet before delivering subsequent packets) would make games feel laggy.
DNS and service discovery: Short request-response exchanges where retrying is cheaper than connection setup. DNS queries are typically under 512 bytes and complete in one round trip. TCP would triple the latency.
IoT and telemetry: Sensors sending frequent measurements where occasional loss is acceptable. A temperature sensor sending readings every 10 seconds doesn’t need TCP’s reliability—if one reading is lost, the next one arrives soon. UDP’s lower overhead also saves battery life.
Broadcasting and multicasting: DHCP, IPTV, and network discovery protocols need to reach multiple recipients simultaneously. TCP’s one-to-one model doesn’t support this.
Avoid UDP when: You need guaranteed delivery (file transfers, financial transactions, API calls), ordered delivery matters (chat messages, database replication), or you’re building a general-purpose protocol and don’t want to reinvent reliability (use TCP or QUIC instead).
For comparison: TCP for web APIs and database connections; QUIC (UDP-based) for HTTP/3; SCTP for telecom signaling. See TCP for reliable transport alternatives.
Real-World Examples
Discord (Voice and Video): Discord handles millions of concurrent voice channels using UDP for media streams. They initially used WebRTC but switched to a custom UDP protocol for better control over codec selection and bandwidth adaptation. When packet loss exceeds 5%, they automatically reduce audio quality rather than letting TCP’s retransmissions cause stuttering. Their edge servers use UDP’s stateless nature to handle 10,000+ concurrent voice connections per server without per-connection overhead.
Cloudflare DNS (1.1.1.1): Cloudflare’s DNS resolver processes over 1 trillion queries per day using UDP. Each query is a single datagram (typically 30-100 bytes), and responses are usually under 512 bytes to avoid fragmentation. They implement aggressive rate limiting to prevent amplification attacks: if a client sends more than 100 queries per second, responses are rate-limited. UDP’s stateless design lets them scale horizontally—any server can answer any query without coordination.
Netflix’s Streaming Telemetry: While Netflix uses TCP for video delivery (via HTTP), they use UDP for real-time telemetry from client devices. Millions of devices send playback metrics (buffer health, bitrate changes, errors) via UDP every few seconds. Occasional packet loss is acceptable because they’re looking for aggregate trends, not perfect data from every device. This reduces server load by 40% compared to TCP-based telemetry and provides faster insights into streaming quality issues.
UDP Scalability: Cloudflare DNS Architecture
graph TB
subgraph Internet
Client1["Client 1<br/><i>DNS Query</i>"]
Client2["Client 2<br/><i>DNS Query</i>"]
Client3["Client 3<br/><i>DNS Query</i>"]
ClientN["... millions more"]
end
subgraph Cloudflare Edge - Single Server
Anycast["Anycast IP: 1.1.1.1<br/><i>UDP Port 53</i>"]
subgraph Stateless Processing
Queue["Receive Queue<br/><i>~1M packets/sec</i>"]
Worker1["Worker Thread 1"]
Worker2["Worker Thread 2"]
Worker3["Worker Thread 3"]
WorkerN["Worker Thread N"]
end
Cache["DNS Cache<br/><i>In-Memory</i>"]
end
Client1 & Client2 & Client3 & ClientN --"UDP datagrams<br/>(30-100 bytes)"--> Anycast
Anycast --> Queue
Queue --> Worker1 & Worker2 & Worker3 & WorkerN
Worker1 & Worker2 & Worker3 & WorkerN <--"Lookup"--> Cache
Worker1 & Worker2 & Worker3 & WorkerN --"UDP responses<br/>(no connection state)"--> Client1 & Client2 & Client3 & ClientN
Note["Key: No per-client state<br/>10,000+ concurrent clients<br/>per server with minimal memory"]
Cloudflare’s DNS resolver handles over 1 trillion queries per day using UDP’s stateless design. Each server processes millions of queries per second without maintaining per-client connection state, enabling massive horizontal scalability that would be impossible with TCP’s connection overhead.
Interview Essentials
Mid-Level
Explain UDP’s connectionless model and why it’s faster than TCP. Describe the UDP header structure (source port, destination port, length, checksum) and what each field does. Discuss when you’d choose UDP over TCP with concrete examples (DNS, video streaming). Explain what happens when a UDP packet is lost—nothing, the application must detect and handle it. Understand that UDP doesn’t guarantee ordering, so packets can arrive out of sequence.
Senior
Analyze UDP’s trade-offs in depth: why no congestion control is both a feature and a risk. Explain how to build reliability on top of UDP (selective acknowledgments, forward error correction, sequence numbers). Discuss UDP’s role in modern protocols like QUIC and WebRTC—why they chose UDP as a foundation. Calculate the overhead difference between TCP and UDP for small packets (e.g., 100-byte payload: UDP = 8% overhead, TCP = 20%+). Explain UDP amplification attacks and mitigation strategies (rate limiting, response size limits, source validation).
Staff+
Design a custom reliable protocol on top of UDP, explaining your choices for acknowledgment strategy, congestion control, and flow control. Discuss UDP’s performance at scale: kernel bypass techniques (DPDK, XDP), receive queue sizing, and CPU affinity for packet processing. Explain why QUIC chose UDP over creating a new transport protocol (middlebox compatibility, faster iteration). Analyze the latency breakdown for UDP vs TCP in a real-time application (e.g., gaming: UDP = 50ms one-way, TCP = 125ms due to handshake + head-of-line blocking). Discuss UDP’s limitations for mobile networks (NAT traversal, cellular network buffering) and how protocols like WebRTC handle them.
Common Interview Questions
Why doesn’t UDP have a handshake like TCP? (No connection state to establish; send immediately)
How do you handle packet loss with UDP? (Application-layer retransmission, FEC, or accept loss)
Can UDP be used for reliable data transfer? (Yes, with custom reliability layer—see QUIC)
Why is UDP faster than TCP? (No handshake, no acknowledgments, no head-of-line blocking)
What’s the maximum size of a UDP datagram? (Theoretically 65,507 bytes, practically ~1400 to avoid fragmentation)
How does DNS use UDP? (Single request-response datagram, retry on timeout)
What’s UDP hole punching? (NAT traversal technique for peer-to-peer connections)
Red Flags to Avoid
Claiming UDP is ‘unreliable’ without explaining it’s a design choice, not a flaw
Not understanding that UDP provides no ordering guarantees
Thinking UDP is only for ‘unimportant’ data (it’s for time-sensitive data)
Not knowing about UDP amplification attacks and mitigation
Confusing UDP’s lack of congestion control with ‘unlimited speed’
Not understanding why fragmentation is problematic for UDP
Claiming UDP is always faster than TCP without context (bulk transfers favor TCP)
Key Takeaways
UDP trades reliability for speed: No handshakes, acknowledgments, or retransmissions means lower latency but no delivery guarantees. Perfect for real-time applications where old data is worthless.
Stateless and minimal overhead: 8-byte header and no connection state enable massive scalability (millions of clients per server) and features like broadcasting and multicasting.
Application-layer responsibility: UDP provides addressing and basic error detection; applications must implement their own reliability, ordering, and congestion control if needed.
Fragmentation is the enemy: Keep datagrams under ~1400 bytes to avoid IP fragmentation, which causes entire datagram loss if any fragment is lost.
Modern protocols build on UDP: QUIC (HTTP/3), WebRTC, and custom gaming protocols use UDP as a foundation, adding reliability where needed while keeping UDP’s low latency.