Valet Key Pattern: Secure Direct Client Access
TL;DR
The Valet Key pattern provides clients with time-limited, scoped tokens that grant direct access to specific cloud resources (like storage or queues) without routing data through your application servers. This offloads bandwidth costs, reduces latency, and improves scalability by letting clients interact directly with cloud services while maintaining security through token-based access control.
Cheat Sheet: Generate signed URL/token → Client uses token to access resource directly → Token expires automatically → Application server never touches the data payload.
The Analogy
Think of a hotel valet parking service. You give the valet a special key that only starts the car and opens the driver’s door—it can’t access the trunk or glove compartment. The valet can move your car without having full access to everything inside it. Similarly, the Valet Key pattern gives clients a limited-access token to interact with specific cloud resources directly, without giving them your master credentials or forcing them to go through your servers as a middleman. The token has restrictions (time limits, specific operations, particular files) just like the valet key has physical limitations.
Why This Matters in Interviews
This pattern comes up when discussing file upload/download systems, media streaming architectures, or any scenario involving large data transfers in cloud environments. Interviewers want to see if you understand the cost and performance implications of proxying large files through application servers versus direct client-to-storage communication. Strong candidates recognize when to use pre-signed URLs (AWS S3), SAS tokens (Azure), or signed URLs (GCP), and can articulate the security tradeoffs. This pattern often appears in questions about building systems like Dropbox, Instagram, or video streaming platforms where bandwidth costs and latency matter significantly.
Core Concept
The Valet Key pattern addresses a fundamental challenge in cloud-native applications: how do you allow clients to access cloud resources directly without compromising security or overwhelming your application servers? In traditional architectures, when a user wants to upload a photo or download a video, the data flows through your application servers—the client sends data to your API, which then forwards it to cloud storage. This creates a bottleneck: your servers consume bandwidth, CPU cycles, and memory just shuffling bytes around, while you pay for both the compute resources and the data transfer costs.
The Valet Key pattern solves this by generating temporary, restricted-access credentials (tokens) that clients can use to interact directly with cloud services like object storage, message queues, or databases. Your application server authenticates the user, validates their permissions, generates a time-limited token with specific access rights, and hands it to the client. The client then uses this token to communicate directly with the cloud service, completely bypassing your application infrastructure for the actual data transfer.
This pattern is particularly powerful in modern cloud environments where storage services (S3, Azure Blob Storage, Google Cloud Storage) and message queues (SQS, Service Bus) are designed to handle massive scale and provide built-in security mechanisms for token-based access. By leveraging these capabilities, you transform your application from a data proxy into a lightweight orchestrator that issues credentials and tracks metadata, while the heavy lifting of data transfer happens directly between clients and cloud services.
Traditional vs. Valet Key Architecture
graph TB
subgraph Traditional Architecture
C1[Client] --"1. Upload 500MB file"--> A1[API Server]
A1 --"2. Forward 500MB"--> S1[(Cloud Storage)]
Note1["❌ Server bandwidth cost<br/>❌ Server CPU/memory load<br/>❌ Latency overhead<br/>❌ Scalability bottleneck"]
end
subgraph Valet Key Architecture
C2[Client] --"1. Request token<br/>(metadata only)"--> A2[API Server]
A2 --"2. Return signed URL<br/>(~200 bytes)"--> C2
C2 --"3. Upload 500MB directly"--> S2[(Cloud Storage)]
Note2["✅ Zero server bandwidth<br/>✅ Minimal server load<br/>✅ Lower latency<br/>✅ Infinite scale"]
end
Comparison showing how traditional architecture routes all data through API servers (creating bottlenecks and costs), while the valet key pattern enables direct client-to-storage communication with only lightweight token exchange through the API.
How It Works
Step 1: Client Request Initiation The client application makes a request to your application server indicating it wants to perform an operation on a resource—for example, uploading a profile photo or downloading a video file. This request includes authentication credentials (JWT, session token, OAuth token) and specifies the desired operation and resource. Your server validates the user’s identity and checks authorization rules to ensure they have permission to perform this operation.
Step 2: Token Generation Once authorization is confirmed, your application server generates a valet key (signed URL, pre-signed URL, SAS token, or similar). This token encodes several critical parameters: the specific resource identifier (bucket name, object key, queue name), allowed operations (read-only, write-only, or both), expiration timestamp (typically minutes to hours), and optionally IP restrictions or other security constraints. The token is cryptographically signed using credentials that your application server possesses (AWS IAM role, Azure storage account key, service account credentials), ensuring the cloud service can verify its authenticity.
Step 3: Token Delivery Your server returns the generated token to the client, usually as a URL (for storage operations) or as a credential object (for queue operations). This response is lightweight—just a few hundred bytes—and happens quickly since no data payload is involved. The client now possesses a time-limited capability to interact with the cloud resource directly.
Step 4: Direct Resource Access The client uses the token to communicate directly with the cloud service. For uploads, the client makes an HTTP PUT request to the signed URL with the file data. For downloads, it makes a GET request. For queue operations, it uses the token to publish or consume messages. Critically, this traffic flows directly between the client and the cloud service—your application servers are not involved in the data transfer at all. The cloud service validates the token’s signature, checks expiration, and enforces the specified permissions before allowing the operation.
Step 5: Token Expiration and Cleanup The token automatically expires after its configured lifetime, typically 15 minutes to 1 hour for most use cases. After expiration, the token becomes invalid and cannot be used for further operations. Your application server may track metadata about the operation (file size, upload completion status) through separate mechanisms like webhooks, event notifications, or polling, but the valet key itself requires no explicit revocation—it simply stops working when time runs out.
Valet Key Pattern: Complete Request Flow
sequenceDiagram
participant Client as Client App
participant API as API Server
participant Cloud as Cloud Storage<br/>(S3/Azure/GCS)
Client->>API: 1. Request upload permission<br/>(auth token + file metadata)
activate API
Note over API: Authenticate user<br/>Check authorization<br/>Validate quota
API->>API: 2. Generate valet key<br/>(pre-signed URL)<br/>15-min expiration
API-->>Client: 3. Return signed URL
deactivate API
Client->>Cloud: 4. PUT file data directly<br/>(using signed URL)
activate Cloud
Note over Cloud: Validate signature<br/>Check expiration<br/>Enforce permissions
Cloud-->>Client: 5. Upload success (200 OK)
deactivate Cloud
Client->>API: 6. Confirm upload complete<br/>(upload ID + metadata)
activate API
Note over API: Create DB record<br/>Trigger async processing<br/>(thumbnails, scanning)
API-->>Client: 7. Confirmation
deactivate API
Note over Client,Cloud: Data transfer bypasses API server<br/>Token expires automatically after 15 minutes
The complete valet key flow showing how clients obtain time-limited tokens from the API server, upload data directly to cloud storage, and confirm completion. Notice that the actual file data (step 4) never touches the API server, eliminating bandwidth costs and server load.
Key Principles
Principle 1: Least Privilege Access Every valet key should grant the minimum permissions necessary for the specific operation. If a user needs to upload a single file, the token should only allow PUT operations on that exact object path, not list or delete operations on the entire bucket. This principle limits the blast radius if a token is compromised—an attacker who intercepts a token can only perform the narrowly scoped operation it was designed for. For example, Netflix generates pre-signed URLs for video uploads that only allow writing to a specific S3 key with a specific content type, preventing attackers from uploading malicious executables or overwriting other users’ content.
Principle 2: Time-Bound Credentials Valet keys must have explicit expiration times, typically measured in minutes or hours rather than days. Short-lived tokens reduce the window of opportunity for token theft or misuse. If a token is accidentally logged, shared in an error message, or intercepted via network sniffing, it becomes useless after expiration. The tradeoff is that longer operations (like uploading a 10GB video on a slow connection) require longer token lifetimes, but you should still cap expiration at the maximum reasonable duration. Dropbox, for instance, uses 4-hour expiration windows for large file uploads, balancing user experience with security.
Principle 3: Cryptographic Integrity The token generation process must use cryptographic signatures that the cloud service can verify without contacting your application server. This ensures that clients cannot forge or modify tokens to gain unauthorized access. AWS S3 pre-signed URLs, for example, include an HMAC-SHA256 signature computed from the request parameters and your AWS secret key. Azure SAS tokens use similar HMAC-based signatures. This cryptographic binding means the cloud service can independently validate tokens, enabling true direct access without callback verification.
Principle 4: Separation of Control and Data Planes Your application servers handle the control plane (authentication, authorization, token generation, metadata tracking) while cloud services handle the data plane (actual bytes moving in and out of storage). This separation allows each component to scale independently. Your API servers can be lightweight, handling thousands of token generation requests per second, while cloud storage services handle the bandwidth-intensive data transfers. Instagram’s architecture exemplifies this: their API servers generate S3 pre-signed URLs for photo uploads, but the actual image data flows directly from mobile clients to S3, allowing Instagram’s servers to focus on feed generation, recommendations, and social features.
Principle 5: Audit Trail Independence Even though data bypasses your application servers, you still need visibility into what’s happening. Cloud services provide their own logging (S3 access logs, Azure Storage Analytics, GCS audit logs) that record all operations performed with valet keys. Your application should correlate these logs with the tokens you issued to maintain a complete audit trail. Additionally, consider implementing token generation logging in your application to track which users requested access to which resources, creating a paper trail even if the actual data access happens elsewhere.
Deep Dive
Types / Variants
Pre-Signed URLs (AWS S3, GCS)
Pre-signed URLs are HTTP URLs that include authentication information in query parameters, allowing anyone with the URL to perform a specific operation without additional credentials. AWS S3 pre-signed URLs are the most common implementation—you generate a URL like https://bucket.s3.amazonaws.com/object?AWSAccessKeyId=...&Expires=...&Signature=... that grants temporary access. These work for both uploads (PUT) and downloads (GET), and you can specify content type, cache headers, and other HTTP metadata during generation. Pre-signed URLs are ideal for browser-based uploads and downloads because they work with standard HTTP clients without special SDK requirements. The downside is that URLs can be long (200+ characters) and expose some metadata in the URL structure. Use pre-signed URLs when you need simple, stateless access from web browsers or mobile apps. Spotify uses pre-signed URLs to let users download their offline playlists directly from S3 without routing gigabytes of audio through Spotify’s API servers.
Shared Access Signatures (Azure Storage) Azure’s SAS tokens provide more granular control than pre-signed URLs, with support for account-level, service-level, and resource-level permissions. A SAS token can grant access to multiple blobs, containers, or even entire storage accounts, with fine-grained permissions like read, write, delete, list, and add. SAS tokens support IP restrictions, protocol requirements (HTTPS-only), and can be associated with stored access policies for centralized revocation. The flexibility makes SAS tokens powerful but more complex to configure correctly. Use SAS tokens when you need to grant access to multiple resources with a single token or when you need advanced features like IP whitelisting. Microsoft Teams uses SAS tokens to let users upload meeting recordings and files directly to Azure Blob Storage, with tokens scoped to specific containers per team.
Signed URLs (Google Cloud Storage) GCS signed URLs are similar to AWS pre-signed URLs but use Google’s authentication mechanisms. They support v2 (legacy) and v4 (current) signing processes, with v4 providing better security through improved signature algorithms. GCS signed URLs can include custom headers and support resumable uploads for large files. One unique feature is the ability to sign URLs using service account keys or IAM-based signing, giving you flexibility in credential management. Use GCS signed URLs when building on Google Cloud Platform or when you need resumable upload capabilities for large files. YouTube uses signed URLs to enable direct video uploads from creators’ browsers to GCS, with support for resumable uploads that can handle multi-gigabyte video files over unreliable connections.
Temporary Security Credentials (AWS STS) For more complex scenarios, AWS Security Token Service (STS) can generate temporary AWS credentials (access key, secret key, session token) that clients can use with AWS SDKs to access multiple services. Unlike pre-signed URLs which are single-use and single-resource, STS credentials provide full programmatic access to AWS APIs within defined permission boundaries. These credentials typically last 15 minutes to 12 hours and can be scoped using IAM policies. Use STS credentials when clients need to perform multiple operations across different AWS services or when you need more sophisticated access patterns than simple upload/download. Airbnb uses STS credentials to let their data science teams access specific S3 buckets and Athena tables for analytics, with credentials scoped to only the datasets relevant to each team.
Queue Access Tokens (SQS, Service Bus) For message queue operations, valet keys take the form of temporary credentials or connection strings that allow clients to publish or consume messages without persistent queue access. Azure Service Bus SAS tokens can grant send-only, receive-only, or manage permissions on specific queues or topics. AWS SQS uses temporary credentials from STS with policies that restrict access to specific queues. These tokens enable event-driven architectures where clients can directly interact with queues for asynchronous processing. Use queue access tokens when building webhook receivers, IoT device communication, or distributed task processing systems. Uber uses SQS with temporary credentials to let driver apps directly publish location updates to queues, which are then consumed by backend services for real-time positioning and ETA calculations.
Valet Key Implementation Variants Across Cloud Providers
graph LR
subgraph AWS Ecosystem
A1[S3 Pre-Signed URL] -->|"GET/PUT operations"| S1[(S3 Bucket)]
A2[STS Temporary Credentials] -->|"Full SDK access"| S2[Multiple AWS Services]
A3[CloudFront Signed URL] -->|"CDN downloads"| S3[Edge Locations]
end
subgraph Azure Ecosystem
B1[SAS Token] -->|"Blob operations"| S4[(Blob Storage)]
B2[Service Bus SAS] -->|"Queue access"| S5[Service Bus Queue]
B3[Stored Access Policy] -->|"Centralized control"| S4
end
subgraph GCP Ecosystem
C1[Signed URL v4] -->|"Object operations"| S6[(Cloud Storage)]
C2[Resumable Upload URL] -->|"Large files"| S6
end
Note["Token Characteristics:<br/>• Time-limited (15min-24hr)<br/>• Scoped permissions<br/>• Cryptographically signed<br/>• No server involvement"]
Different valet key implementations across major cloud providers. AWS offers pre-signed URLs for S3 and STS for multi-service access; Azure provides SAS tokens with granular control; GCP uses signed URLs with resumable upload support. All share the core principle of time-limited, cryptographically verified access.
Trade-offs
Token Lifetime: Short vs. Long Expiration Short-lived tokens (5-15 minutes) minimize security risk if tokens are compromised, but require clients to handle token refresh logic for long-running operations. Long-lived tokens (1-24 hours) simplify client implementation and support slow uploads, but create larger security windows and complicate revocation. The decision framework: use short tokens for quick operations (profile photo upload, document download) and when you have reliable client-server communication for refresh. Use longer tokens for large file transfers, batch operations, or scenarios where clients may go offline (mobile apps with intermittent connectivity). Consider implementing a progressive approach: start with 15-minute tokens, and if the client reports slow upload speeds, issue a refreshed token with extended lifetime.
Scope: Single-Resource vs. Multi-Resource Access Single-resource tokens (one URL for one file) provide maximum security isolation—each token can only affect one object. Multi-resource tokens (access to a folder or bucket) reduce the number of token generation requests and simplify client logic when operating on multiple files. Choose single-resource tokens for user-generated content where each file has independent access controls (social media posts, user documents). Choose multi-resource tokens for batch operations, administrative tools, or scenarios where a user legitimately needs access to many related resources (downloading an entire photo album, backing up a project folder). Slack uses single-resource tokens for individual file uploads but multi-resource tokens for workspace export operations where admins download all messages and files.
Direct Access vs. Proxied Access Direct access (valet key pattern) eliminates server bandwidth costs and reduces latency but gives up fine-grained monitoring and the ability to transform data in flight. Proxied access (traditional approach) maintains full control and visibility but creates bottlenecks and increases infrastructure costs. Use direct access for large files (>10MB), high-throughput scenarios, or when cloud service logging is sufficient. Use proxied access when you need to scan uploads for malware, resize images on the fly, enforce complex business logic, or when compliance requires all data to flow through audited systems. Pinterest uses direct S3 uploads for original images but proxies downloads through CDN edge servers that perform real-time image resizing and format conversion.
Client-Side vs. Server-Side Token Generation Server-side generation (standard approach) keeps signing keys secure and allows authorization checks before issuing tokens. Client-side generation (using temporary credentials) reduces server load and enables offline operation but requires distributing credentials to clients. Always use server-side generation for initial token issuance to enforce authorization. Consider client-side generation only for trusted clients (internal tools, backend services) that need to generate many tokens quickly. AWS Mobile SDK uses a hybrid approach: the app authenticates with Cognito to get temporary STS credentials, then uses those credentials to generate pre-signed URLs client-side for rapid photo uploads without hitting the app’s API servers.
URL-Based vs. Header-Based Authentication URL-based tokens (pre-signed URLs) work with any HTTP client and can be shared easily but expose credentials in URLs (which may be logged or cached). Header-based tokens (SAS tokens in headers, OAuth bearer tokens) keep credentials out of URLs but require SDK support or custom HTTP client configuration. Use URL-based tokens for browser downloads, email links, or when you need maximum compatibility. Use header-based tokens for API-to-API communication, mobile apps with SDKs, or when security policies prohibit credentials in URLs. Dropbox uses URL-based tokens for shared file links (so users can paste them in browsers) but header-based tokens for their mobile app’s direct S3 uploads.
Common Pitfalls
Pitfall 1: Overly Permissive Token Scopes Developers often generate tokens with broader permissions than necessary—for example, granting read-write access to an entire S3 bucket when only write access to a specific object is needed. This happens because it’s easier to use wildcard permissions or reuse existing IAM policies than to craft precise, operation-specific policies. The risk: if a token is compromised, attackers can access or modify far more data than the legitimate operation required. To avoid this, always generate tokens with the minimum required permissions. Use object-level policies (not bucket-level), specify exact operations (PUT only, not PUT and DELETE), and include content-type restrictions when possible. Implement a token generation library that enforces least-privilege by default, requiring explicit justification for broader permissions.
Pitfall 2: Insufficient Token Expiration Handling Clients often fail to handle token expiration gracefully, leading to failed uploads or cryptic error messages when tokens expire mid-operation. This is especially problematic for large file uploads on slow connections where the upload duration exceeds token lifetime. The issue arises because developers test with fast connections and short files, never encountering expiration in development. To avoid this, implement client-side logic that checks token expiration before starting operations and requests token refresh if needed. For long-running uploads, use resumable upload protocols (multipart upload in S3, resumable uploads in GCS) that can pause and resume with new tokens. Set token lifetimes based on 95th percentile upload times, not average times, and monitor token expiration errors in production to tune lifetimes appropriately.
Pitfall 3: Missing Token Generation Rate Limiting Without rate limiting on token generation endpoints, attackers can request thousands of tokens, potentially exhausting cloud service quotas or generating excessive costs. This happens because developers focus on rate limiting data operations but forget that token generation itself is a resource-intensive operation (cryptographic signing, database lookups for authorization). An attacker could request tokens for non-existent resources, forcing your system to perform expensive authorization checks. To avoid this, implement rate limiting on token generation endpoints (e.g., 100 tokens per user per hour), add CAPTCHA or proof-of-work for anonymous requests, and monitor for unusual token generation patterns. Cache authorization decisions when possible to reduce database load during token generation.
Pitfall 4: Inadequate Token Revocation Strategy Valet keys are designed to be self-expiring, but sometimes you need to revoke access immediately—for example, when a user’s account is compromised or they lose a device. Many implementations have no revocation mechanism beyond waiting for expiration. This happens because true revocation requires maintaining state (a blacklist of revoked tokens), which contradicts the stateless nature of the pattern. To address this, implement a hybrid approach: use short token lifetimes (15 minutes) as the primary security mechanism, and maintain a small, time-limited revocation list for emergency cases. When a user reports a compromised device, add recently issued tokens to the revocation list and force re-authentication. The revocation list only needs to store tokens issued in the last hour, keeping it small and fast to check.
Pitfall 5: Exposing Tokens in Logs or Error Messages Developers accidentally log full pre-signed URLs or SAS tokens in application logs, error messages, or analytics events, exposing credentials that could be extracted by attackers with log access. This happens because URLs are treated as safe-to-log identifiers, and developers forget they contain embedded credentials. To avoid this, implement log sanitization that strips query parameters from S3 URLs before logging, redacts SAS tokens from Azure URLs, and masks credentials in error messages. Use structured logging with separate fields for resource identifiers and access tokens, logging only the identifiers. Configure log retention policies that automatically purge logs after token expiration (e.g., if tokens last 1 hour, logs containing URLs should be deleted after 2 hours).
Token Expiration Handling for Large Uploads
sequenceDiagram
participant Client
participant API
participant S3
Note over Client,S3: Scenario: 2GB file upload on slow connection
Client->>API: Request upload token
API-->>Client: Token (15-min expiration)
Client->>S3: Start multipart upload
S3-->>Client: Upload ID
loop Upload chunks (5MB each)
Client->>S3: Upload chunk with token
S3-->>Client: Chunk ETag
end
Note over Client: ⚠️ Token expires after 15 min<br/>Still 1GB remaining!
Client->>API: Request token refresh<br/>(include upload ID)
API-->>Client: New token (15-min)
loop Continue with remaining chunks
Client->>S3: Upload chunk with NEW token
S3-->>Client: Chunk ETag
end
Client->>S3: Complete multipart upload<br/>(all ETags)
S3-->>Client: Success
Client->>API: Confirm completion
Note over Client,S3: ✅ Proper handling: Refresh before expiration<br/>❌ Common pitfall: No refresh logic = failed upload
Proper token expiration handling using multipart upload with token refresh. The client monitors token lifetime and proactively requests new tokens before expiration, allowing large uploads to complete successfully. The common pitfall is not implementing refresh logic, causing uploads to fail when tokens expire mid-transfer.
Math & Calculations
Token Lifetime Calculation Based on Upload Speed
When determining appropriate token expiration times, you need to account for file size and expected network speeds to ensure tokens don’t expire during legitimate operations.
Formula:
Token Lifetime (seconds) = (File Size / Expected Upload Speed) × Safety Factor
Variables:
- File Size: Maximum expected file size in bytes
- Expected Upload Speed: Conservative estimate of client upload bandwidth (bytes/second)
- Safety Factor: Multiplier to account for network variability (typically 2-3×)
Worked Example: Suppose you’re building a video upload feature where users can upload videos up to 500MB. Your analytics show that 95th percentile upload speed is 2 Mbps (250 KB/s).
File Size = 500 MB = 500 × 1024 × 1024 bytes = 524,288,000 bytes
Upload Speed = 2 Mbps = 250 KB/s = 256,000 bytes/second
Safety Factor = 2.5× (accounting for network variability and overhead)
Base Upload Time = 524,288,000 / 256,000 = 2,048 seconds ≈ 34 minutes
Token Lifetime = 34 minutes × 2.5 = 85 minutes
Round up to: 90 minutes (5,400 seconds)
For this scenario, you’d configure tokens with a 90-minute expiration. If you want to support resumable uploads with token refresh, you could use shorter tokens (15 minutes) and implement refresh logic that requests new tokens every 10 minutes.
Cost Savings Calculation
The Valet Key pattern eliminates data transfer costs through your application servers. Here’s how to calculate savings:
Formula:
Monthly Savings = (Data Transfer Volume × Server Bandwidth Cost) - (Token Generation Cost)
Worked Example: Your application handles 10TB of file uploads per month. Without valet keys, this data flows through your application servers.
Data Transfer Volume = 10 TB/month
Server Bandwidth Cost = $0.09/GB (typical cloud egress pricing)
Token Generation Cost = $0.0001 per token × 1,000,000 tokens = $100
Without Valet Key:
- Server bandwidth: 10,000 GB × $0.09 = $900
- Server compute (to handle data): ~$500 (estimated for instances to handle 10TB)
- Total: $1,400/month
With Valet Key:
- Direct S3 uploads: $0 (ingress is free)
- Token generation: $100
- Minimal server compute: $50
- Total: $150/month
Monthly Savings = $1,400 - $150 = $1,250/month = $15,000/year
This calculation doesn’t include the latency improvements and scalability benefits, which are often more valuable than the direct cost savings. By eliminating the server bottleneck, you can handle 10× more concurrent uploads without scaling your application tier.
Real-World Examples
Netflix: Video Upload Pipeline
Netflix uses the Valet Key pattern extensively in their content ingestion pipeline where studios and content creators upload master video files, often hundreds of gigabytes per file. When a content partner initiates an upload through Netflix’s partner portal, the portal authenticates the user, validates they have permission to upload to the specific title, and generates an S3 pre-signed URL with a 24-hour expiration. The pre-signed URL is scoped to a specific S3 key that includes the title ID and upload session ID, preventing partners from overwriting other content.
The interesting detail: Netflix generates separate pre-signed URLs for each chunk in a multipart upload, with each URL having a 4-hour expiration. This allows their upload client to pause and resume uploads over multiple days for extremely large files (4K HDR masters can exceed 500GB). Each chunk URL is generated on-demand when the client is ready to upload that chunk, ensuring tokens are only active when needed. This approach reduced Netflix’s data transfer costs by approximately $2M annually by eliminating the need to proxy hundreds of terabytes through their API servers, and improved upload reliability by leveraging S3’s built-in multipart upload durability.
Dropbox: Direct-to-Storage Architecture
Dropbox’s file sync architecture uses a sophisticated implementation of the Valet Key pattern to handle billions of file uploads daily. When a Dropbox client needs to upload a file, it first sends metadata (filename, size, hash) to Dropbox’s API servers. The API server checks if the file already exists (deduplication), validates the user has sufficient storage quota, and generates a pre-signed URL to their block storage service (a custom system built on top of S3 and Google Cloud Storage).
The interesting detail: Dropbox implements a two-phase commit protocol with valet keys. The client uploads the file directly to storage using the pre-signed URL, but the file isn’t visible to the user until the client sends a commit request to the API server confirming successful upload. This prevents partial uploads from appearing in users’ folders and allows Dropbox to perform virus scanning and content policy checks before making files accessible. If the client never sends the commit (due to crash or network failure), the uploaded blocks are automatically garbage collected after 24 hours. This pattern handles over 1 billion file uploads per day while keeping Dropbox’s API servers focused on metadata operations and sync logic rather than data transfer.
Instagram: Photo Upload Flow
Instagram’s photo upload architecture demonstrates the Valet Key pattern at massive scale—over 100 million photos uploaded daily. When a user taps the upload button, Instagram’s mobile app requests an upload token from their API servers. The server generates an S3 pre-signed URL with a 15-minute expiration, scoped to a specific object key that includes the user ID, timestamp, and random UUID. The app then uploads the photo directly to S3 using the pre-signed URL, completely bypassing Instagram’s application servers for the image data.
The interesting detail: Instagram uses CloudFront signed URLs (a variant of the valet key pattern) for photo downloads, but with a twist. Instead of generating a new signed URL for every photo view, they generate signed cookies that grant access to all of a user’s feed photos for 1 hour. This reduces token generation load from millions per second to thousands per second while maintaining security—the signed cookie only works for photos the user is authorized to see based on their follow graph. This optimization was critical for scaling Instagram to 2 billion users, as it eliminated token generation as a bottleneck in the feed rendering path. The pattern reduced API server load by 40% and improved feed load times by 200ms by enabling direct CDN access without per-image authorization checks.
Instagram Photo Upload Architecture with Valet Keys
graph TB
subgraph Mobile Client
App[Instagram App]
end
subgraph API Layer - us-east-1
LB[Load Balancer]
API1[API Server 1]
API2[API Server 2]
Cache[(Redis Cache<br/>Auth tokens)]
end
subgraph Storage Layer
S3[(S3 Bucket<br/>Raw Photos)]
CF[CloudFront CDN<br/>Signed Cookies]
end
subgraph Processing Layer
Lambda[Lambda Functions<br/>Thumbnail Generation]
Queue[SQS Queue]
end
App --"1. POST /upload/request<br/>(auth + metadata)"--> LB
LB --> API1 & API2
API1 & API2 -."Check auth".-> Cache
API1 --"2. Generate pre-signed URL<br/>(15-min expiration)"--> App
App --"3. PUT photo (direct)<br/>bypasses API"--> S3
S3 --"4. Upload event"--> Queue
Queue --"5. Trigger"--> Lambda
Lambda --"6. Generate thumbnails"--> S3
App --"7. Confirm upload"--> API2
API2 --"8. Create DB record"--> App
App --"9. Download feed photos"--> CF
CF --"Signed cookie (1-hr)<br/>grants access"--> S3
Note["Optimization: Signed cookies<br/>grant 1-hour access to all<br/>feed photos, reducing token<br/>generation from millions/sec<br/>to thousands/sec"]
Instagram’s production architecture handling 100M+ daily photo uploads using valet keys. The API servers generate pre-signed S3 URLs for uploads (bypassing API for data transfer) and CloudFront signed cookies for downloads (granting 1-hour access to feed photos). This pattern reduced API server load by 40% and improved feed load times by 200ms.
Interview Expectations
Mid-Level
What You Should Know: At the mid-level, you should understand the basic concept of the Valet Key pattern and be able to explain why routing large files through application servers is problematic (bandwidth costs, latency, server load). You should know what pre-signed URLs are and how they work at a high level—that they’re time-limited URLs with embedded authentication that allow direct access to cloud storage. Be able to describe a simple implementation: client requests upload permission, server generates pre-signed URL, client uploads directly to S3. You should recognize common use cases like file uploads, downloads, and understand the security benefit of time-limited access.
Bonus Points: Demonstrate awareness of token expiration handling—what happens if a token expires during a large file upload? Mention multipart uploads as a solution for large files. Show you’ve thought about the tradeoff between token lifetime (security) and user experience (long uploads). Reference a specific cloud service you’ve used (S3 pre-signed URLs, Azure SAS tokens) with correct terminology. Discuss how you’d monitor token usage or detect abuse (rate limiting, logging). Understanding that tokens should be generated server-side, not embedded in client code, shows security awareness.
Senior
What You Should Know: Senior engineers should demonstrate deep understanding of the pattern’s tradeoffs and implementation nuances across different cloud providers. You should be able to design a complete upload/download system using valet keys, including error handling, token refresh logic, and monitoring. Explain the security implications in detail: why least-privilege scoping matters, how cryptographic signatures work (HMAC-SHA256), and what happens if tokens are compromised. Discuss the cost implications quantitatively—calculate bandwidth savings for a given upload volume. Be able to compare different implementations (S3 pre-signed URLs vs. Azure SAS tokens vs. GCS signed URLs) and explain when to use each.
Bonus Points: Propose a hybrid architecture that uses valet keys for large files but proxies small files for additional processing (virus scanning, image resizing). Discuss how to implement token revocation despite the pattern’s stateless nature (short lifetimes + emergency blacklist). Explain how to handle resumable uploads with token refresh—describe the protocol for pausing an upload, requesting a new token, and resuming. Show awareness of edge cases: what if the client uploads successfully but never confirms completion? How do you garbage collect orphaned uploads? Discuss how to implement audit logging when data bypasses your servers (correlating cloud service logs with token generation logs). Mention specific production issues you’ve encountered or prevented, like tokens appearing in error logs or insufficient rate limiting on token generation endpoints.
Staff+
What You Should Know: Staff+ engineers should be able to architect enterprise-scale systems using the Valet Key pattern with sophisticated security, compliance, and operational requirements. Discuss how to implement the pattern in regulated environments (HIPAA, GDPR) where you need audit trails despite direct client-to-storage access. Explain how to build a token generation service that handles millions of requests per second with sub-10ms latency, including caching strategies for authorization decisions. Design a system that supports multiple cloud providers (multi-cloud strategy) with a unified token generation interface. Discuss the organizational implications—how this pattern affects team boundaries, as storage operations move from application teams to platform teams.
Distinguishing Signals: Propose novel extensions to the pattern, such as progressive token elevation (start with read-only, upgrade to read-write after additional verification) or token chaining (one token grants access to generate other tokens for related resources). Discuss how to implement the pattern for non-HTTP protocols (gRPC, WebSocket, custom binary protocols). Explain how to build a token generation system that’s resilient to cloud provider outages—if AWS is down, can you fail over to GCP with minimal disruption? Design a compliance-aware system that automatically adjusts token lifetimes and scopes based on data classification (PII gets 5-minute tokens, public data gets 1-hour tokens). Discuss the security implications of client-side token generation using temporary credentials (STS) and when this is appropriate. Show awareness of advanced attack vectors: token prediction attacks, timing attacks on signature validation, or token reuse across different resources. Propose metrics and SLIs for token generation services (token generation latency, token validation success rate, time-to-expiration distribution) and explain how these metrics inform system design decisions.
Common Interview Questions
Question 1: “How would you design a file upload system for a social media app where users upload photos and videos?”
Concise Answer (60 seconds): I’d use the Valet Key pattern with S3 pre-signed URLs. When a user wants to upload, the mobile app calls our API with file metadata. The API authenticates the user, validates their storage quota, and generates a pre-signed URL with 15-minute expiration scoped to a specific S3 key. The app uploads directly to S3 using this URL. After successful upload, the app notifies our API, which creates a database record with the S3 location and triggers async processing (thumbnail generation, virus scanning). This keeps our API servers lightweight and leverages S3’s scalability for data transfer.
Detailed Answer (2 minutes): I’d implement a multi-phase upload flow using valet keys. First, the client sends metadata (filename, size, content type, hash) to our API. The API performs authorization checks (user authentication, storage quota, rate limiting) and generates an S3 pre-signed URL with specific constraints: PUT-only operation, 15-minute expiration, specific content-type header requirement, and a unique object key that includes user ID and UUID to prevent collisions. For large videos, I’d use S3 multipart upload with separate pre-signed URLs for each part, allowing resumable uploads. The client uploads directly to S3, then sends a commit request to our API with the upload ID. Our API verifies the upload succeeded (checking S3 metadata), creates a database record, and enqueues async jobs for processing (thumbnail generation using Lambda, virus scanning with ClamAV, video transcoding with MediaConvert). I’d implement monitoring for token generation rate, upload success rate, and time-to-commit to detect issues. For security, I’d add rate limiting on token generation (100 tokens per user per hour) and implement a short-lived revocation list for emergency cases. This architecture handles millions of uploads daily while keeping API servers focused on metadata and orchestration rather than data transfer.
Red Flags: Saying you’d upload files through your API servers without mentioning the bandwidth and scalability implications. Not discussing token expiration or security constraints. Failing to mention the commit/confirmation phase that ensures upload completion. Not considering large file handling (multipart uploads).
Question 2: “What are the security risks of the Valet Key pattern and how do you mitigate them?”
Concise Answer (60 seconds): The main risks are token theft, overly broad permissions, and insufficient expiration. Mitigate token theft with short lifetimes (15 minutes), HTTPS-only transmission, and avoiding logging tokens. Prevent overly broad permissions by scoping tokens to specific resources and operations (PUT-only on one object, not full bucket access). Handle expiration with client-side refresh logic and resumable uploads. Add IP restrictions when possible and implement rate limiting on token generation to prevent abuse.
Detailed Answer (2 minutes): Security risks fall into several categories. Token theft is the primary concern—if an attacker intercepts a token, they gain temporary access to the resource. Mitigate this with short expiration times (15 minutes for most operations), HTTPS-only URLs, and careful logging practices (never log full pre-signed URLs). Implement token sanitization in logs and error messages. Overly permissive tokens are another risk—developers often grant broader access than needed. Enforce least-privilege by default: scope tokens to specific object keys, limit to required operations (read-only or write-only), and add content-type restrictions. For example, an image upload token should only allow PUT operations with image/* content types on a specific S3 key. Token reuse and replay attacks are concerns—ensure tokens are single-use when possible or implement nonce-based replay protection. For sensitive operations, add IP address restrictions to tokens, though this can cause issues with mobile clients on changing networks. Insufficient revocation is a challenge—since tokens are stateless, you can’t revoke them directly. Implement a hybrid approach: use short lifetimes as primary security, maintain a small time-limited revocation list for emergencies, and force re-authentication when accounts are compromised. Rate limiting on token generation prevents attackers from exhausting quotas or generating excessive costs. Finally, implement monitoring for anomalous patterns: unusual token generation rates, tokens used from unexpected geographic locations, or high failure rates that might indicate attack attempts.
Red Flags: Saying tokens are “completely secure” without acknowledging risks. Not mentioning token expiration as a security mechanism. Suggesting long-lived tokens (days or weeks) without justification. Not understanding the difference between token theft and credential compromise. Failing to mention least-privilege scoping.
Question 3: “When would you NOT use the Valet Key pattern?”
Concise Answer (60 seconds): Don’t use valet keys when you need to transform data in flight (image resizing, video transcoding), perform real-time content scanning (virus detection, content moderation), or enforce complex business logic during upload. Also avoid it when compliance requires all data to flow through audited systems, when dealing with very small files where token generation overhead exceeds transfer time, or when clients can’t handle token expiration and refresh logic. For these cases, proxy through your application servers.
Detailed Answer (2 minutes): Several scenarios make the Valet Key pattern inappropriate. First, when you need to transform or process data during transfer—for example, resizing images on upload, transcoding video formats, or compressing files. The pattern gives clients direct storage access, so you can’t intercept and modify data in flight. Second, when you need real-time content validation—virus scanning, content moderation for inappropriate images, or checking file formats. While you can scan after upload, some applications require blocking malicious content before it reaches storage. Third, when compliance or regulatory requirements mandate that all data flows through specific audited systems. Some industries require data to pass through certified security appliances or logging systems that can’t be bypassed. Fourth, for very small files (< 100KB) where the overhead of token generation and additional HTTP round trip exceeds the time saved by direct upload. In these cases, a simple POST to your API is faster and simpler. Fifth, when clients are unreliable or untrusted—if you can’t trust clients to properly handle token expiration, implement retry logic, or confirm upload completion, proxying through your servers gives you more control. Sixth, when you need to implement complex rate limiting or quota enforcement based on real-time data transfer—valet keys make it harder to throttle bandwidth per user since data bypasses your servers. Finally, when you need to support legacy clients that can’t handle modern authentication mechanisms or don’t have HTTP client capabilities for direct cloud service access. In these cases, a traditional upload-through-API approach is more appropriate, accepting the scalability and cost tradeoffs.
Red Flags: Saying you’d always use valet keys for any file upload scenario. Not recognizing the tradeoff between control and scalability. Failing to mention compliance or regulatory considerations. Not understanding that some operations require data transformation that’s incompatible with direct access.
Question 4: “How do you handle token expiration for large file uploads that take longer than the token lifetime?”
Concise Answer (60 seconds): Use resumable upload protocols like S3 multipart upload or GCS resumable uploads. Break large files into chunks, upload each chunk with its own token, and track progress. If a token expires mid-upload, pause the upload, request a new token for the remaining chunks, and resume. Implement client-side logic that monitors token expiration time and proactively requests refresh before expiration. For very large files, generate tokens with longer lifetimes based on expected upload duration.
Detailed Answer (2 minutes): Handling token expiration for large uploads requires a multi-layered approach. First, use cloud-native resumable upload protocols. S3 multipart upload allows you to break files into chunks (5MB-5GB each), upload chunks independently, and assemble them server-side. Each chunk can have its own pre-signed URL with independent expiration, so if one expires, you only need to re-upload that chunk. GCS resumable uploads use a different approach—you initiate an upload session, receive a session URL, and can upload data in chunks to that URL, resuming from the last byte received if interrupted. Second, implement intelligent token lifetime calculation. Analyze historical upload speeds (95th percentile, not average) and set token lifetimes to 2-3× the expected upload duration. For a 1GB file with 2Mbps typical upload speed, that’s about 90 minutes. Third, build client-side token refresh logic. The client should track token expiration time and proactively request a new token when 80% of lifetime has elapsed, continuing the upload with the new token. This requires your API to support token refresh endpoints that can issue new tokens for in-progress uploads. Fourth, implement upload session tracking. When issuing the initial token, create a server-side upload session record that tracks progress. Token refresh requests reference this session ID, allowing your server to validate that the refresh is for a legitimate in-progress upload. Fifth, handle edge cases: what if the client crashes mid-upload? Implement upload session expiration (24 hours) with automatic cleanup of incomplete uploads. What if network conditions are worse than expected? Allow clients to request token lifetime extensions with justification. Finally, monitor token expiration errors in production—if you see high rates of expiration during upload, it indicates your lifetime calculations need adjustment.
Red Flags: Suggesting extremely long token lifetimes (days) without discussing security implications. Not knowing about multipart or resumable upload protocols. Saying “just make the token last longer” without considering security tradeoffs. Not implementing client-side refresh logic.
Question 5: “How would you implement audit logging when using the Valet Key pattern?”
Concise Answer (60 seconds): Implement two-layer logging: application-level logs for token generation (who requested access, to what resource, when) and cloud service logs for actual operations (S3 access logs, CloudTrail). Correlate these logs using a unique request ID embedded in the token or object metadata. Store token generation events in your database with user ID, resource ID, token expiration, and operation type. Configure cloud service logging to capture all access events, then build a pipeline that joins application logs with cloud logs to create complete audit trails.
Detailed Answer (2 minutes): Audit logging with valet keys requires a comprehensive multi-source approach since data operations bypass your application servers. First, implement detailed token generation logging. Every time your API generates a valet key, log: user ID, timestamp, requested resource, granted permissions, token expiration time, client IP, and a unique request ID. Store this in a durable audit log database (not just application logs) with retention matching your compliance requirements. Second, enable cloud service access logging. For S3, enable server access logs and CloudTrail; for Azure, enable Storage Analytics; for GCS, enable Cloud Audit Logs. These capture every operation performed with your tokens: who accessed what, when, from where, and whether it succeeded. Third, implement correlation between application and cloud logs. Embed your request ID in the S3 object metadata or as a custom header in the upload request. When analyzing cloud logs, you can trace back to the original token generation event. Fourth, build a log aggregation pipeline. Use a system like ELK stack, Splunk, or cloud-native solutions (CloudWatch Insights, Azure Monitor) to collect logs from both sources and join them. Create dashboards showing: tokens generated per user, successful vs. failed access attempts, data volume transferred per user, and anomalous access patterns. Fifth, implement real-time alerting for suspicious activity: tokens used from unexpected geographic locations, high failure rates indicating potential token theft, or unusual access patterns. Sixth, ensure log immutability and retention. Audit logs should be write-once, stored in tamper-evident systems (S3 Object Lock, Azure immutable storage), and retained according to compliance requirements (often 7 years for financial data). Finally, implement periodic audit log analysis—automated jobs that detect policy violations, generate compliance reports, and identify security incidents. This comprehensive approach maintains full audit trails despite data bypassing your application infrastructure.
Red Flags: Saying you can’t audit operations because data bypasses your servers. Not knowing about cloud service access logs. Failing to mention log correlation or how to connect token generation with actual usage. Not discussing compliance retention requirements or log immutability.
Red Flags to Avoid
Red Flag 1: “Valet keys are less secure than traditional authentication because they’re temporary.”
Why It’s Wrong: This fundamentally misunderstands the security model. Temporary credentials are MORE secure than permanent credentials precisely because they’re time-limited. If a valet key is compromised, it automatically becomes useless after expiration, limiting the window of vulnerability. Permanent credentials, if stolen, remain valid until explicitly revoked, which often doesn’t happen quickly enough. The principle of least privilege combined with short lifetimes makes valet keys more secure for specific operations than sharing long-lived credentials.
What to Say Instead: “Valet keys enhance security through time-limited, scoped access. Unlike permanent credentials that remain valid indefinitely if compromised, valet keys automatically expire, limiting the blast radius of any security breach. The key is to set appropriate expiration times—long enough for legitimate operations but short enough to minimize risk. Combined with least-privilege scoping (specific resources and operations only), valet keys provide better security than proxying everything through servers with permanent credentials.”
Red Flag 2: “You should generate valet keys with long expiration times (days or weeks) to avoid user frustration with expired tokens.”
Why It’s Wrong: Long-lived tokens defeat the primary security benefit of the pattern—automatic expiration. A token valid for days creates a large attack window if stolen. The correct approach is to use short tokens (minutes to hours) and implement proper token refresh logic on the client side. User frustration comes from poor error handling, not short token lifetimes. If your client properly handles expiration and refreshes tokens proactively, users never experience token expiration errors.
What to Say Instead: “Token lifetime should be based on the expected operation duration with a safety margin, typically 15 minutes to 2 hours. For operations that might exceed this (large file uploads on slow connections), implement resumable upload protocols with token refresh logic. The client should monitor token expiration and request new tokens before expiration, making the process transparent to users. Long-lived tokens (days) should only be used in exceptional cases with strong justification and additional security controls like IP restrictions.”
Red Flag 3: “The Valet Key pattern eliminates the need for authorization checks since the cloud service handles access control.”
Why It’s Wrong: This confuses authentication with authorization. The cloud service validates that the token is cryptographically valid and not expired (authentication), but YOUR application must perform authorization checks BEFORE generating the token. The cloud service doesn’t know your business logic—whether this user should access this specific resource based on your application’s rules (ownership, permissions, subscription status, etc.). Authorization happens at token generation time in your application code.
What to Say Instead: “Authorization is critical and happens before token generation. When a client requests a valet key, your application must authenticate the user and perform full authorization checks: Does this user own this resource? Do they have permission for this operation? Are they within quota limits? Only after confirming authorization do you generate the token. The cloud service then validates the token’s cryptographic signature and expiration, but it doesn’t re-evaluate your business logic. This separation means you maintain full control over who can access what, while the cloud service handles the mechanics of validating and enforcing the token.”
Red Flag 4: “You can’t revoke valet keys, so they’re unsuitable for sensitive data.”
Why It’s Wrong: While valet keys can’t be revoked in the traditional sense (they’re stateless), this doesn’t make them unsuitable for sensitive data. The security model is different: instead of relying on revocation, you rely on short expiration times and least-privilege scoping. For the rare cases where immediate revocation is needed (account compromise), you can implement a small, time-limited revocation list. The pattern is widely used for sensitive data at companies like Netflix, Dropbox, and financial institutions.
What to Say Instead: “Valet keys use a different security model than traditional credentials. Instead of long-lived credentials with revocation, they use short-lived credentials with automatic expiration. For most scenarios, 15-minute tokens provide adequate security—if compromised, they’re useless after 15 minutes. For cases requiring immediate revocation (account compromise, lost device), implement a hybrid approach: maintain a small revocation list for recently issued tokens (last hour) and force re-authentication. The revocation list stays small because you only need to track tokens issued in the last expiration window. This approach combines the scalability benefits of stateless tokens with the security of revocation when needed.”
Red Flag 5: “Valet keys should be generated on the client side to reduce server load.”
Why It’s Wrong: Generating tokens client-side requires distributing your signing credentials (AWS secret keys, Azure storage account keys) to clients, which is a massive security risk. If credentials are embedded in client code, they can be extracted and used to generate unlimited tokens with any permissions. Token generation must happen server-side where you can securely store credentials and enforce authorization. The only exception is when clients have temporary credentials from a secure token service (AWS STS), but even then, the initial credential issuance must be server-side.
What to Say Instead: “Token generation must always happen server-side to protect signing credentials. Your application server securely stores the credentials needed to sign tokens (AWS secret keys, Azure storage account keys) and generates tokens only after performing authorization checks. Clients should never have access to signing credentials. For high-scale scenarios where token generation becomes a bottleneck, optimize server-side generation with caching (cache authorization decisions), use faster signing algorithms, or implement a dedicated token generation service. The only scenario for client-side generation is when clients have temporary credentials from AWS STS or similar services, but those temporary credentials themselves must be issued server-side after authentication and authorization.”
Key Takeaways
-
The Valet Key pattern offloads data transfer from application servers to cloud services, using time-limited, cryptographically signed tokens that grant direct access to specific resources. This eliminates bandwidth costs, reduces latency, and improves scalability by letting clients interact directly with storage or queues while maintaining security through token-based access control.
-
Security depends on three pillars: least-privilege scoping, short expiration times, and cryptographic integrity. Every token should grant the minimum permissions necessary (specific resource, specific operation), expire quickly (typically 15 minutes to 2 hours), and use cryptographic signatures that cloud services can verify independently. This ensures that even if tokens are compromised, the blast radius is limited.
-
Token lifetime must balance security and user experience, calculated based on expected operation duration with safety margins. For large file uploads, implement resumable protocols (S3 multipart upload, GCS resumable uploads) with token refresh logic rather than using dangerously long token lifetimes. Monitor token expiration errors in production to tune lifetimes appropriately.
-
Authorization happens at token generation time, not at resource access time. Your application must authenticate users and perform full authorization checks before generating tokens—the cloud service only validates token authenticity and expiration, not your business logic. This separation maintains security while enabling direct access.
-
Audit logging requires a two-layer approach: application logs for token generation (who requested what, when) and cloud service logs for actual operations (who accessed what, when). Correlate these logs using unique request IDs to maintain complete audit trails despite data bypassing your application servers. This is critical for compliance and security monitoring in production systems.
Related Topics
Prerequisites:
- API Gateway Patterns - Understanding how API gateways handle authentication and routing helps contextualize where valet key generation fits in request flow
- Authentication & Authorization - Core concepts of authn/authz are essential before understanding how valet keys delegate limited access
- Object Storage - Knowledge of S3, Azure Blob Storage, and GCS is necessary to understand what valet keys are granting access to
Related Patterns:
- Claim Check Pattern - Another pattern for handling large payloads, but uses reference tokens instead of direct access tokens
- Throttling Pattern - Rate limiting applies to valet key generation endpoints to prevent abuse
- CDN & Edge Caching - Signed URLs in CDNs are a variant of valet keys for content delivery
Follow-up Topics:
- Multipart Upload Protocols - Deep dive into resumable uploads that work with valet keys for large files
- Security Token Service (STS) - Advanced credential management for client-side token generation scenarios
- Audit Logging at Scale - Comprehensive logging strategies when data bypasses application servers