Volatile vs Atomic Variables: Java Concurrency Guide

TL;DR

Volatile and atomic variables enable lock-free concurrency by ensuring memory visibility and atomic operations across threads. While volatile guarantees visibility of changes, atomic operations provide both visibility and thread-safe read-modify-write operations using hardware-level compare-and-swap (CAS) instructions.

Prerequisites: Understanding of threads and basic concurrency concepts, familiarity with race conditions and shared memory, knowledge of how CPU caches work at a high level.

After this topic: Implement lock-free concurrent data structures using atomic operations, identify when volatile or atomic variables are appropriate alternatives to locks, and explain memory visibility issues and how CAS operations solve them.

Core Concept

What Are Volatile and Atomic Variables?

Volatile variables guarantee that reads and writes are visible across all threads immediately. When a thread writes to a volatile variable, that write is flushed to main memory. When another thread reads it, the value is fetched from main memory, not from a CPU cache. This solves visibility problems but does NOT make compound operations (like increment) thread-safe.

Atomic variables go further: they provide both visibility AND thread-safe operations. An atomic integer’s increment operation is guaranteed to complete without interference from other threads. Under the hood, atomic operations use compare-and-swap (CAS) — a CPU instruction that atomically checks if a value matches an expected value and, if so, updates it.

Why Do We Need Them?

Modern CPUs use caching and instruction reordering for performance. Thread A might write x = 5, but Thread B might still see x = 0 because:

The write is cached in Thread A’s CPU core
The compiler or CPU reordered instructions

Locks solve this but are heavyweight. Volatile and atomic variables provide lock-free alternatives for specific scenarios.

Compare-and-Swap (CAS)

CAS is the foundation of lock-free programming. The operation works like:

if (current_value == expected_value):
    current_value = new_value
    return True
else:
    return False

This entire check-and-update happens atomically at the hardware level. If CAS fails (another thread changed the value), you retry with the updated expected value. This is called a CAS loop or spin loop.

When to Use Each

Volatile: Use for simple flags or status variables read by multiple threads but written by one thread.
Atomic: Use when you need thread-safe read-modify-write operations (increment, add, compare-and-set).
Locks: Use when you need to protect multiple operations or complex state changes as a single transaction.

Visual Guide

Memory Visibility Problem Without Volatile

sequenceDiagram
    participant T1 as Thread 1
    participant C1 as CPU1 Cache
    participant M as Main Memory
    participant C2 as CPU2 Cache
    participant T2 as Thread 2
    
    T1->>C1: Write flag=true
    Note over C1: Value cached locally
    T2->>C2: Read flag
    C2->>M: Fetch from memory
    M->>C2: Returns flag=false
    Note over T2: Sees stale value!
    C1->>M: Eventually flushes
    Note over M: Too late for Thread 2

Without volatile, Thread 2 may read a stale cached value because Thread 1’s write hasn’t been flushed to main memory yet.

Compare-and-Swap Operation Flow

graph TD
    A[Start: counter=10] --> B{CAS: expected=10, new=11}
    B -->|Match| C[Atomically set counter=11]
    C --> D[Return True]
    B -->|No Match| E[Another thread changed it]
    E --> F[Return False]
    F --> G[Retry with new expected value]

CAS atomically checks if the current value matches the expected value before updating. If another thread modified the value, CAS fails and you retry.

Atomic vs Lock Performance

graph LR
    subgraph Lock-Based
    L1[Thread 1] -->|acquire lock| L2[Critical Section]
    L2 --> L3[release lock]
    L4[Thread 2] -.->|blocked| L2
    end
    
    subgraph Lock-Free Atomic
    A1[Thread 1] -->|CAS attempt| A2[Success/Retry]
    A3[Thread 2] -->|CAS attempt| A4[Success/Retry]
    end
    
    Note1[Lock: Context switch overhead]-->Lock-Based
    Note2[Atomic: No blocking, just retry]-->Lock-Free Atomic

Locks cause threads to block and context switch. Atomic operations allow threads to retry immediately without blocking, reducing overhead for low-contention scenarios.

Examples

Example 1: Volatile Flag (Python with threading)

import threading
import time

# Python doesn't have volatile keyword, but this demonstrates the concept
# In practice, use threading.Event or atomic libraries

class VolatileFlag:
    def __init__(self):
        self._flag = False
        self._lock = threading.Lock()  # Ensures visibility
    
    def set(self):
        with self._lock:
            self._flag = True
    
    def is_set(self):
        with self._lock:
            return self._flag

# Worker thread
def worker(flag):
    print("Worker: Starting...")
    while not flag.is_set():
        pass  # Busy wait
    print("Worker: Flag detected, exiting")

flag = VolatileFlag()
thread = threading.Thread(target=worker, args=(flag,))
thread.start()

time.sleep(1)
print("Main: Setting flag")
flag.set()
thread.join()
print("Main: Done")

Expected Output:

Worker: Starting...
Main: Setting flag
Worker: Flag detected, exiting
Main: Done

Java Equivalent:

private volatile boolean flag = false;

// In worker thread:
while (!flag) {
    // Busy wait
}

Key Point: In Java/C++, the volatile keyword ensures visibility. Python requires explicit synchronization (locks or atomic operations from libraries).

Try it yourself: Remove the lock from the Python example and see if the worker thread reliably detects the flag change (it might not due to caching).

Example 2: Atomic Counter with CAS

import threading
from threading import Lock

class AtomicCounter:
    def __init__(self):
        self._value = 0
        self._lock = Lock()
    
    def increment(self):
        """Thread-safe increment using lock (simulating atomic)"""
        with self._lock:
            self._value += 1
    
    def get(self):
        with self._lock:
            return self._value

# Better: Use actual atomic library
from ctypes import c_int
import multiprocessing

class TrueAtomicCounter:
    def __init__(self):
        self._value = multiprocessing.Value(c_int, 0)
    
    def increment(self):
        with self._value.get_lock():
            self._value.value += 1
    
    def get(self):
        return self._value.value

# Test with multiple threads
def increment_counter(counter, times):
    for _ in range(times):
        counter.increment()

counter = AtomicCounter()
threads = []
for _ in range(10):
    t = threading.Thread(target=increment_counter, args=(counter, 1000))
    threads.append(t)
    t.start()

for t in threads:
    t.join()

print(f"Final count: {counter.get()}")
print(f"Expected: {10 * 1000}")

Expected Output:

Final count: 10000
Expected: 10000

Java Equivalent with Real Atomics:

import java.util.concurrent.atomic.AtomicInteger;

AtomicInteger counter = new AtomicInteger(0);

// In threads:
counter.incrementAndGet();  // Atomic increment using CAS

// Or manual CAS:
int current, next;
do {
    current = counter.get();
    next = current + 1;
} while (!counter.compareAndSet(current, next));

C++ Equivalent:

#include <atomic>

std::atomic<int> counter(0);
counter.fetch_add(1);  // Atomic increment

// Manual CAS:
int expected = counter.load();
int desired = expected + 1;
while (!counter.compare_exchange_weak(expected, desired)) {
    desired = expected + 1;
}

Try it yourself: Implement a lock-free stack using CAS operations where push and pop operations use compare-and-swap to update the head pointer.

Example 3: ABA Problem with CAS

import threading
import time

class Node:
    def __init__(self, value):
        self.value = value
        self.next = None

class LockFreeStack:
    def __init__(self):
        self._head = None
        self._lock = threading.Lock()
    
    def push(self, value):
        new_node = Node(value)
        while True:
            # Read current head
            current_head = self._head
            new_node.next = current_head
            
            # Try CAS (simulated with lock for demonstration)
            with self._lock:
                if self._head == current_head:
                    self._head = new_node
                    return True
            # CAS failed, retry
    
    def pop(self):
        while True:
            current_head = self._head
            if current_head is None:
                return None
            
            next_node = current_head.next
            
            # Try CAS
            with self._lock:
                if self._head == current_head:
                    self._head = next_node
                    return current_head.value
            # CAS failed, retry

# Demonstrate ABA problem scenario
stack = LockFreeStack()
stack.push(1)
stack.push(2)

print(f"Popped: {stack.pop()}")  # 2
print(f"Popped: {stack.pop()}")  # 1
print(f"Popped: {stack.pop()}")  # None

Expected Output:

Popped: 2
Popped: 1
Popped: None

The ABA Problem: Thread 1 reads head as A, gets preempted. Thread 2 pops A, pops B, pushes A back. Thread 1 resumes and CAS succeeds (head is still A), but the stack state changed! Solution: Use versioned references (AtomicStampedReference in Java).

Try it yourself: Add a version counter to each CAS operation to detect the ABA problem.

Common Mistakes

1. Using Volatile for Compound Operations

Mistake:

private volatile int counter = 0;

// NOT thread-safe!
counter++;  // This is read-modify-write, not atomic

Why it’s wrong: counter++ is three operations: read, increment, write. Volatile only guarantees visibility of individual reads/writes, not atomicity of the compound operation. Two threads can read the same value, increment it, and both write back the same result.

Fix: Use AtomicInteger in Java or atomic operations in C++.

2. Forgetting Memory Ordering Guarantees

Mistake:

class DataPublisher:
    def __init__(self):
        self.data = None
        self.ready = False  # Should be volatile
    
    def publish(self, value):
        self.data = value
        self.ready = True  # Other threads might see ready=True but data=None!

Why it’s wrong: Without proper memory barriers, the compiler or CPU might reorder these writes. Another thread might see ready=True but still see the old data value.

Fix: Use volatile for ready (Java/C++) or proper synchronization primitives. The volatile write creates a memory barrier ensuring all previous writes are visible.

3. Infinite CAS Loops Under High Contention

Mistake:

AtomicInteger counter = new AtomicInteger(0);

// Can spin forever under extreme contention
int current, next;
do {
    current = counter.get();
    next = current + 1;
} while (!counter.compareAndSet(current, next));

Why it’s wrong: Under very high contention, CAS can fail repeatedly, wasting CPU cycles. This is called livelock.

Fix: Add exponential backoff or fall back to locks after N failed attempts:

int attempts = 0;
while (!counter.compareAndSet(current, next)) {
    if (++attempts > 100) {
        // Fall back to lock or yield
        Thread.yield();
    }
    current = counter.get();
    next = current + 1;
}

4. Assuming Atomic Operations Are Always Faster

Mistake: Replacing all locks with atomic operations expecting better performance.

Why it’s wrong: Atomic operations excel in low-to-medium contention scenarios. Under high contention, the constant CAS retries can be slower than a lock that puts threads to sleep. Locks also provide better fairness.

Fix: Profile your code. Use atomics for simple operations with low contention. Use locks for complex critical sections or high contention.

5. Ignoring the ABA Problem

Mistake:

// Lock-free stack
Node current = head.get();
Node next = current.next;
if (head.compareAndSet(current, next)) {
    // Success... or is it?
}

Why it’s wrong: Between reading current and the CAS, another thread might have popped several nodes and pushed current back. The CAS succeeds but the stack state is inconsistent.

Fix: Use AtomicStampedReference or AtomicMarkableReference in Java to include a version/stamp:

AtomicStampedReference<Node> head = new AtomicStampedReference<>(null, 0);
int[] stampHolder = new int[1];
Node current = head.get(stampHolder);
int stamp = stampHolder[0];
// ... later
head.compareAndSet(current, next, stamp, stamp + 1);

Interview Tips

Be Ready to Explain Memory Visibility

Interviewers often ask: “Why do we need volatile?” Don’t just say “for thread safety.” Explain CPU caching and how threads might see stale values. Draw a diagram showing Thread 1’s cache, main memory, and Thread 2’s cache. Mention that volatile creates a happens-before relationship — writes before a volatile write are visible to reads after a volatile read.

Know When to Use Atomic vs Lock

A common question: “When would you use AtomicInteger instead of synchronized?” Answer:

Atomic: Single variable, simple operations (increment, compare-and-set), low-to-medium contention
Lock: Multiple variables, complex operations, need to maintain invariants across multiple fields

Example: “For a counter, I’d use AtomicInteger. For a bank transfer updating two account balances, I’d use a lock to ensure both updates happen atomically.”

Demonstrate CAS Understanding with Code

If asked to implement a lock-free data structure, start with the CAS loop pattern:

do {
    current = atomicRef.get();
    next = computeNext(current);
} while (!atomicRef.compareAndSet(current, next));

Explain: “We read the current value, compute the new value, then attempt CAS. If another thread changed it, we retry with the updated value.”

Mention the ABA Problem

When discussing CAS, proactively mention the ABA problem. This shows depth: “One limitation of CAS is the ABA problem, where a value changes from A to B and back to A. The CAS succeeds but we missed intermediate state changes. Solutions include versioned references or hazard pointers.”

Compare Language Implementations

Interviewers appreciate breadth. Mention:

Java: volatile keyword, java.util.concurrent.atomic package
C++: std::atomic<T>, memory ordering parameters (relaxed, acquire, release, seq_cst)
Python: No built-in volatile; use threading.Lock or libraries like atomics
Go: Channels and sync/atomic package

Discuss Performance Trade-offs

If asked about performance, explain: “Atomic operations avoid context switches and kernel calls that locks require, making them faster for low contention. However, under high contention, the spinning can waste CPU. I’d profile to decide. For read-heavy workloads, I might use AtomicReference with immutable objects to avoid writes entirely.”

Practice Common Interview Questions

“Implement a thread-safe counter without locks.” (Use AtomicInteger)
“Why might volatile boolean be sufficient for a stop flag?” (Single writer, simple read/write, no compound operations)
“What’s the difference between volatile and synchronized?” (Visibility vs atomicity + visibility)
“Explain compare-and-swap at the hardware level.” (CPU instruction that atomically checks and updates)
“When would lock-free algorithms perform worse than locks?” (High contention, complex operations)

Key Takeaways

Volatile guarantees memory visibility across threads but does NOT make compound operations atomic. Use for simple flags or status variables.
Atomic variables provide both visibility and thread-safe operations using compare-and-swap (CAS), enabling lock-free programming for simple scenarios.
CAS operations atomically check if a value matches an expected value and update it if so, retrying on failure. This avoids locks but can spin under high contention.
Memory visibility is critical: without volatile or atomics, threads may see stale cached values. Volatile writes create happens-before relationships ensuring visibility.
Choose wisely: Use atomics for single-variable operations with low contention, locks for complex critical sections or high contention. Always profile to verify performance assumptions.