TLB: The Cache That Lies For Speed


Yesterday we inspected page tables, the registry office of virtual memory.

Today we inspect the shortcut:

the TLB, Translation Lookaside Buffer.

The TLB is a cache of address translations.

Without it, virtual memory would still work.

It would also feel like every memory access required a passport interview.

I. Why It Exists

A page-table walk can require multiple memory reads.

On a four-level x86-64 page table, a miss can walk:

PML4 -> PDPT -> Page Directory -> Page Table -> final page

Doing that for every load, store, and instruction fetch would be intolerable.

So the CPU caches translations:

StructureWhat it caches
instruction TLBtranslations for instruction fetch
data TLBtranslations for loads and stores
second-level TLBlarger shared translation cache on many CPUs
page-walk cachespieces of upper-level page-table walks

The TLB remembers:

virtual page -> physical frame + permissions

It is the border guard’s notebook.

II. The Fast Path

flowchart TB
    VA["virtual address"]
    TLB["TLB lookup"]
    HIT["hit<br/>physical address immediately"]
    MISS["miss<br/>walk page tables"]
    FILL["fill TLB"]
    PA["physical address"]
    FAULT["page fault"]

    VA --> TLB
    TLB -->|hit| HIT --> PA
    TLB -->|miss| MISS
    MISS -->|valid| FILL --> PA
    MISS -->|invalid| FAULT

The best translation is the one the CPU does not have to rediscover.

This is the central TLB lesson:

page tables define truth,

the TLB remembers a useful copy of truth,

and stale truth is treason.

III. TLB Flushes

If the operating system changes a page table, the CPU may still have the old translation cached.

That cached translation must be invalidated.

On x86, the INVLPG instruction can invalidate a specific page translation. Loading CR3 traditionally invalidates many translations for the current context, though features such as PCID modify the story.

Conceptually:

; invalidate cached translation for address in rax
invlpg [rax]

This is not a userspace command.

This is kernel ritual.

The OS changes the law, then tells the checkpoint to forget the old stamp.

IV. Shootdowns

On multiprocessor systems, every CPU core may have cached translations.

If one core changes page tables, other cores may need invalidation too.

This is called a TLB shootdown.

The name is accurate.

The kernel sends inter-processor interrupts so other CPUs invalidate affected entries.

sequenceDiagram
    participant K0 as Kernel on CPU 0
    participant PT as Page table
    participant C1 as CPU 1 TLB
    participant C2 as CPU 2 TLB

    K0->>PT: change mapping
    K0->>C1: shootdown IPI
    K0->>C2: shootdown IPI
    C1-->>K0: invalidated
    C2-->>K0: invalidated

Changing memory maps is no longer a private decision.

It is a diplomatic incident involving every core that may have seen the old border.

V. PCID And ASID

If every context switch flushed all translations, performance would suffer.

Modern processors use tags to distinguish address spaces.

On x86, PCID means Process-Context Identifier. Other architectures often use ASID, Address Space Identifier.

The idea:

without tag:
  virtual page -> physical frame

with tag:
  (address-space tag, virtual page) -> physical frame

This lets the TLB keep translations for multiple address spaces without confusing one process for another.

TermMeaning
ASIDarchitecture-generic idea of tagging address spaces
PCIDx86 process-context identifier feature
global pagemapping that can survive some context switches
shootdownforced invalidation across CPUs

The TLB becomes a clerk with multiple stamp books.

This is faster.

It also gives kernel engineers more ways to be wrong.

VI. Security Consequences

The TLB is part of isolation performance.

It is also part of security hardening cost.

Kernel Page Table Isolation after Meltdown-class attacks made context switching more expensive because address-space boundaries became more strict. PCID helped reduce some of that pain on supporting hardware.

The deeper lesson:

security changes the shape of translation.

Translation changes the shape of performance.

Performance changes what people are willing to secure.

The Ministry does not like this cycle.

Reality does not care.

VII. Huge Pages And TLB Reach

The TLB has finite entries.

If each entry maps 4 KiB, a small number of entries covers a modest memory range.

If each entry maps 2 MiB or 1 GiB, the same number of entries covers much more memory.

This is TLB reach.

Page sizeTLB reach effect
4 KiBfine granularity, more entries needed
2 MiBcovers more memory per entry
1 GiBhuge reach, coarse mapping

This is why huge pages can help performance:

less translation bookkeeping,

fewer misses,

fewer page walks.

The cost is reduced flexibility.

The dictator likes large provinces.

The city planner prefers districts.

VIII. The Real Story (Suppressed)

Officially, TLB means Translation Lookaside Buffer.

Suppressed expansion:

Tiny Liar Bureau.

Because the TLB lies for speed.

It says:

“I already checked the paperwork.”

Usually this is true.

Sometimes the kernel changed the paperwork and forgot to notify every clerk.

Then the system learns why stale authority is dangerous.

IX. The Lesson

The TLB is why virtual memory is fast enough to be invisible.

It caches translations, carries permission knowledge, complicates context switches, forces shootdowns, benefits from PCID/ASID tagging, and makes huge pages attractive.

Page tables are truth.

The TLB is remembered truth.

Remembered truth must be invalidated when the regime changes.

Tomorrow we leave address translation and inspect a different kind of gossip:

cache coherency.

— Kim Jong Rails, Supreme Leader of the Republic of Derails