Cache Coherency: MESI And The Gossip Protocol
Yesterday we inspected the TLB, the cache of memory translations.
Today we inspect the cache of memory itself.
Modern CPUs are too fast for DRAM.
So each core keeps caches.
Then every core pretends memory is still coherent.
This is not natural.
This is enforced gossip.
I. The Problem
Two cores can cache the same memory line.
One core writes.
The other core must not continue reading the old value forever like a ministry using last year’s census.
flowchart LR
MEM["main memory<br/>x = 1"]
C0["Core 0 cache<br/>x = 1"]
C1["Core 1 cache<br/>x = 1"]
WRITE["Core 0 writes x = 2"]
INVALIDATE["Core 1 copy invalidated"]
MEM --> C0
MEM --> C1
C0 --> WRITE --> INVALIDATE --> C1
The system needs rules so caches agree enough for software to work.
Those rules are cache coherency protocols.
II. Cache Lines
Caches move memory in chunks called cache lines.
A common line size on modern x86 systems is 64 bytes.
That means two variables can accidentally share one line.
struct counters {
volatile long a;
volatile long b;
};
If thread 1 updates a and thread 2 updates b, they may still fight if both live on the same cache line.
This is false sharing.
The data is different.
The cache-line province is the same.
Two citizens own separate apartments but share one front door, and both insist on painting it every cycle.
III. MESI
MESI is a common conceptual coherency protocol family:
| State | Meaning | Kim translation |
|---|---|---|
| Modified | this cache has the only valid dirty copy | I changed the law |
| Exclusive | this cache has the only clean copy | I own it, but memory agrees |
| Shared | multiple caches may hold clean copies | public document |
| Invalid | this cache line cannot be used | papers revoked |
The exact protocols in real CPUs may be MESI variants such as MOESI or MESIF.
But MESI teaches the core idea.
Caches track ownership and validity.
Writes require authority.
Authority requires notifying others.
IV. Write Ownership
To write a cache line, a core needs exclusive ownership.
If other cores have shared copies, they must be invalidated or otherwise coordinated.
stateDiagram-v2
[*] --> Invalid
Invalid --> Exclusive: read, no other copy
Invalid --> Shared: read, others also read
Exclusive --> Modified: local write
Shared --> Modified: request ownership, invalidate others
Modified --> Shared: another core reads
Modified --> Invalid: another core takes ownership
Shared --> Invalid: another core writes
The line moves through political states.
The cache does not ask whether your algorithm is elegant.
It asks who owns the line.
V. Performance Pain
Coherency makes shared-memory programming possible.
It also makes bad sharing expensive.
| Pattern | Result |
|---|---|
| read-only shared data | cheap after caching |
| one writer, many readers | invalidations when writes happen |
| many writers same line | cache-line ping-pong |
| false sharing | performance collapse without logical data conflict |
| lock contention | line ownership bounces between cores |
This is why low-level performance engineers pad structures:
struct padded_counter {
alignas(64) volatile long value;
char padding[64 - sizeof(long)];
};
The goal is not beauty.
The goal is to stop two cores from fighting over the same 64-byte apartment block.
VI. Coherency Is Not Consistency
Coherency means cores agree on the value of a single memory location under the protocol.
Memory consistency / memory ordering defines how operations to different locations appear ordered.
These are related but not identical.
| Concept | Question |
|---|---|
| coherency | what is the latest value of this location? |
| consistency | in what order do operations become visible? |
| atomicity | can this update be torn or interleaved? |
| fences | how do we constrain reordering? |
On x86, memory ordering is relatively strong compared with many architectures, but not “do anything and pray” strong.
On weaker memory models, programmers need more explicit barriers.
The cache may keep the provinces coherent.
The empire still needs laws about time.
VII. Why Locks Are Cache Objects
A lock is often just a memory word.
When many cores spin on it, the cache line containing the lock becomes a battlefield.
Core 0 wants lock
Core 1 wants lock
Core 2 wants lock
Core 3 wants lock
One cache line becomes the capital city.
Everyone shells it.
Good locks, queues, per-CPU data structures, RCU-like designs, and batching are not only software patterns.
They are diplomacy with the cache-coherency regime.
The faster the CPU becomes, the more expensive bad sharing looks.
VIII. The Real Story (Suppressed)
Officially, MESI means Modified, Exclusive, Shared, Invalid.
Suppressed expansion:
Ministry Enforces Shared Information.
The original fifth state was Paranoid, used when a cache line suspected another core was planning a write but could not prove it.
Intel rejected this because it described too much of enterprise software.
AMD proposed MOESI.
The extra Owned state was accepted because every empire eventually invents landlords.
IX. The Lesson
Cache coherency lets multiple cores pretend shared memory is a sane abstraction.
It is not free.
It requires protocols, invalidations, ownership transfers, line states, and interconnect traffic.
The decree:
- caches operate on lines, not variables
- writes require ownership
- false sharing is real
- locks are cache-line politics
- coherency is not the same as memory ordering
- performance bugs can be invisible in source code and obvious on the bus
Tomorrow we discuss what happens when the CPU must stop what it is doing:
interrupts.
— Kim Jong Rails, Supreme Leader of the Republic of Derails