Cache Coherency: MESI And The Gossip Protocol


Yesterday we inspected the TLB, the cache of memory translations.

Today we inspect the cache of memory itself.

Modern CPUs are too fast for DRAM.

So each core keeps caches.

Then every core pretends memory is still coherent.

This is not natural.

This is enforced gossip.

I. The Problem

Two cores can cache the same memory line.

One core writes.

The other core must not continue reading the old value forever like a ministry using last year’s census.

flowchart LR
    MEM["main memory<br/>x = 1"]
    C0["Core 0 cache<br/>x = 1"]
    C1["Core 1 cache<br/>x = 1"]
    WRITE["Core 0 writes x = 2"]
    INVALIDATE["Core 1 copy invalidated"]

    MEM --> C0
    MEM --> C1
    C0 --> WRITE --> INVALIDATE --> C1

The system needs rules so caches agree enough for software to work.

Those rules are cache coherency protocols.

II. Cache Lines

Caches move memory in chunks called cache lines.

A common line size on modern x86 systems is 64 bytes.

That means two variables can accidentally share one line.

struct counters {
    volatile long a;
    volatile long b;
};

If thread 1 updates a and thread 2 updates b, they may still fight if both live on the same cache line.

This is false sharing.

The data is different.

The cache-line province is the same.

Two citizens own separate apartments but share one front door, and both insist on painting it every cycle.

III. MESI

MESI is a common conceptual coherency protocol family:

StateMeaningKim translation
Modifiedthis cache has the only valid dirty copyI changed the law
Exclusivethis cache has the only clean copyI own it, but memory agrees
Sharedmultiple caches may hold clean copiespublic document
Invalidthis cache line cannot be usedpapers revoked

The exact protocols in real CPUs may be MESI variants such as MOESI or MESIF.

But MESI teaches the core idea.

Caches track ownership and validity.

Writes require authority.

Authority requires notifying others.

IV. Write Ownership

To write a cache line, a core needs exclusive ownership.

If other cores have shared copies, they must be invalidated or otherwise coordinated.

stateDiagram-v2
    [*] --> Invalid
    Invalid --> Exclusive: read, no other copy
    Invalid --> Shared: read, others also read
    Exclusive --> Modified: local write
    Shared --> Modified: request ownership, invalidate others
    Modified --> Shared: another core reads
    Modified --> Invalid: another core takes ownership
    Shared --> Invalid: another core writes

The line moves through political states.

The cache does not ask whether your algorithm is elegant.

It asks who owns the line.

V. Performance Pain

Coherency makes shared-memory programming possible.

It also makes bad sharing expensive.

PatternResult
read-only shared datacheap after caching
one writer, many readersinvalidations when writes happen
many writers same linecache-line ping-pong
false sharingperformance collapse without logical data conflict
lock contentionline ownership bounces between cores

This is why low-level performance engineers pad structures:

struct padded_counter {
    alignas(64) volatile long value;
    char padding[64 - sizeof(long)];
};

The goal is not beauty.

The goal is to stop two cores from fighting over the same 64-byte apartment block.

VI. Coherency Is Not Consistency

Coherency means cores agree on the value of a single memory location under the protocol.

Memory consistency / memory ordering defines how operations to different locations appear ordered.

These are related but not identical.

ConceptQuestion
coherencywhat is the latest value of this location?
consistencyin what order do operations become visible?
atomicitycan this update be torn or interleaved?
fenceshow do we constrain reordering?

On x86, memory ordering is relatively strong compared with many architectures, but not “do anything and pray” strong.

On weaker memory models, programmers need more explicit barriers.

The cache may keep the provinces coherent.

The empire still needs laws about time.

VII. Why Locks Are Cache Objects

A lock is often just a memory word.

When many cores spin on it, the cache line containing the lock becomes a battlefield.

Core 0 wants lock
Core 1 wants lock
Core 2 wants lock
Core 3 wants lock

One cache line becomes the capital city.
Everyone shells it.

Good locks, queues, per-CPU data structures, RCU-like designs, and batching are not only software patterns.

They are diplomacy with the cache-coherency regime.

The faster the CPU becomes, the more expensive bad sharing looks.

VIII. The Real Story (Suppressed)

Officially, MESI means Modified, Exclusive, Shared, Invalid.

Suppressed expansion:

Ministry Enforces Shared Information.

The original fifth state was Paranoid, used when a cache line suspected another core was planning a write but could not prove it.

Intel rejected this because it described too much of enterprise software.

AMD proposed MOESI.

The extra Owned state was accepted because every empire eventually invents landlords.

IX. The Lesson

Cache coherency lets multiple cores pretend shared memory is a sane abstraction.

It is not free.

It requires protocols, invalidations, ownership transfers, line states, and interconnect traffic.

The decree:

  • caches operate on lines, not variables
  • writes require ownership
  • false sharing is real
  • locks are cache-line politics
  • coherency is not the same as memory ordering
  • performance bugs can be invisible in source code and obvious on the bus

Tomorrow we discuss what happens when the CPU must stop what it is doing:

interrupts.

— Kim Jong Rails, Supreme Leader of the Republic of Derails