May 27, 2026

Cache Coherency: MESI And The Gossip Protocol

Yesterday we inspected the TLB, the cache of memory translations.

Today we inspect the cache of memory itself.

Modern CPUs are too fast for DRAM.

So each core keeps caches.

Then every core pretends memory is still coherent.

This is not natural.

This is enforced gossip.

I. The Problem

Two cores can cache the same memory line.

One core writes.

The other core must not continue reading the old value forever like a ministry using last year’s census.

flowchart LR
    MEM["main memory<br/>x = 1"]
    C0["Core 0 cache<br/>x = 1"]
    C1["Core 1 cache<br/>x = 1"]
    WRITE["Core 0 writes x = 2"]
    INVALIDATE["Core 1 copy invalidated"]

    MEM --> C0
    MEM --> C1
    C0 --> WRITE --> INVALIDATE --> C1

The system needs rules so caches agree enough for software to work.

Those rules are cache coherency protocols.

II. Cache Lines

Caches move memory in chunks called cache lines.

A common line size on modern x86 systems is 64 bytes.

That means two variables can accidentally share one line.

struct counters {
    volatile long a;
    volatile long b;
};

If thread 1 updates a and thread 2 updates b, they may still fight if both live on the same cache line.

This is false sharing.

The data is different.

The cache-line province is the same.

Two citizens own separate apartments but share one front door, and both insist on painting it every cycle.

III. MESI

MESI is a common conceptual coherency protocol family:

State	Meaning	Kim translation
Modified	this cache has the only valid dirty copy	I changed the law
Exclusive	this cache has the only clean copy	I own it, but memory agrees
Shared	multiple caches may hold clean copies	public document
Invalid	this cache line cannot be used	papers revoked

The exact protocols in real CPUs may be MESI variants such as MOESI or MESIF.

But MESI teaches the core idea.

Caches track ownership and validity.

Writes require authority.

Authority requires notifying others.

IV. Write Ownership

To write a cache line, a core needs exclusive ownership.

If other cores have shared copies, they must be invalidated or otherwise coordinated.

stateDiagram-v2
    [*] --> Invalid
    Invalid --> Exclusive: read, no other copy
    Invalid --> Shared: read, others also read
    Exclusive --> Modified: local write
    Shared --> Modified: request ownership, invalidate others
    Modified --> Shared: another core reads
    Modified --> Invalid: another core takes ownership
    Shared --> Invalid: another core writes

The line moves through political states.

The cache does not ask whether your algorithm is elegant.

It asks who owns the line.

V. Performance Pain

Coherency makes shared-memory programming possible.

It also makes bad sharing expensive.

Pattern	Result
read-only shared data	cheap after caching
one writer, many readers	invalidations when writes happen
many writers same line	cache-line ping-pong
false sharing	performance collapse without logical data conflict
lock contention	line ownership bounces between cores

This is why low-level performance engineers pad structures:

struct padded_counter {
    alignas(64) volatile long value;
    char padding[64 - sizeof(long)];
};

The goal is not beauty.

The goal is to stop two cores from fighting over the same 64-byte apartment block.

VI. Coherency Is Not Consistency

Coherency means cores agree on the value of a single memory location under the protocol.

Memory consistency / memory ordering defines how operations to different locations appear ordered.

These are related but not identical.

Concept	Question
coherency	what is the latest value of this location?
consistency	in what order do operations become visible?
atomicity	can this update be torn or interleaved?
fences	how do we constrain reordering?

On x86, memory ordering is relatively strong compared with many architectures, but not “do anything and pray” strong.

On weaker memory models, programmers need more explicit barriers.

The cache may keep the provinces coherent.

The empire still needs laws about time.

VII. Why Locks Are Cache Objects

A lock is often just a memory word.

When many cores spin on it, the cache line containing the lock becomes a battlefield.

Core 0 wants lock
Core 1 wants lock
Core 2 wants lock
Core 3 wants lock

One cache line becomes the capital city.
Everyone shells it.

Good locks, queues, per-CPU data structures, RCU-like designs, and batching are not only software patterns.

They are diplomacy with the cache-coherency regime.

The faster the CPU becomes, the more expensive bad sharing looks.

VIII. The Real Story (Suppressed)

Officially, MESI means Modified, Exclusive, Shared, Invalid.

Suppressed expansion:

Ministry Enforces Shared Information.

The original fifth state was Paranoid, used when a cache line suspected another core was planning a write but could not prove it.

Intel rejected this because it described too much of enterprise software.

AMD proposed MOESI.

The extra Owned state was accepted because every empire eventually invents landlords.

IX. The Lesson

Cache coherency lets multiple cores pretend shared memory is a sane abstraction.

It is not free.

It requires protocols, invalidations, ownership transfers, line states, and interconnect traffic.

The decree:

caches operate on lines, not variables
writes require ownership
false sharing is real
locks are cache-line politics
coherency is not the same as memory ordering
performance bugs can be invisible in source code and obvious on the bus

Tomorrow we discuss what happens when the CPU must stop what it is doing:

interrupts.

— Kim Jong Rails, Supreme Leader of the Republic of Derails