TLB: The Cache That Lies For Speed
Yesterday we inspected page tables, the registry office of virtual memory.
Today we inspect the shortcut:
the TLB, Translation Lookaside Buffer.
The TLB is a cache of address translations.
Without it, virtual memory would still work.
It would also feel like every memory access required a passport interview.
I. Why It Exists
A page-table walk can require multiple memory reads.
On a four-level x86-64 page table, a miss can walk:
PML4 -> PDPT -> Page Directory -> Page Table -> final page
Doing that for every load, store, and instruction fetch would be intolerable.
So the CPU caches translations:
| Structure | What it caches |
|---|---|
| instruction TLB | translations for instruction fetch |
| data TLB | translations for loads and stores |
| second-level TLB | larger shared translation cache on many CPUs |
| page-walk caches | pieces of upper-level page-table walks |
The TLB remembers:
virtual page -> physical frame + permissions
It is the border guard’s notebook.
II. The Fast Path
flowchart TB
VA["virtual address"]
TLB["TLB lookup"]
HIT["hit<br/>physical address immediately"]
MISS["miss<br/>walk page tables"]
FILL["fill TLB"]
PA["physical address"]
FAULT["page fault"]
VA --> TLB
TLB -->|hit| HIT --> PA
TLB -->|miss| MISS
MISS -->|valid| FILL --> PA
MISS -->|invalid| FAULT
The best translation is the one the CPU does not have to rediscover.
This is the central TLB lesson:
page tables define truth,
the TLB remembers a useful copy of truth,
and stale truth is treason.
III. TLB Flushes
If the operating system changes a page table, the CPU may still have the old translation cached.
That cached translation must be invalidated.
On x86, the INVLPG instruction can invalidate a specific page translation. Loading CR3 traditionally invalidates many translations for the current context, though features such as PCID modify the story.
Conceptually:
; invalidate cached translation for address in rax
invlpg [rax]
This is not a userspace command.
This is kernel ritual.
The OS changes the law, then tells the checkpoint to forget the old stamp.
IV. Shootdowns
On multiprocessor systems, every CPU core may have cached translations.
If one core changes page tables, other cores may need invalidation too.
This is called a TLB shootdown.
The name is accurate.
The kernel sends inter-processor interrupts so other CPUs invalidate affected entries.
sequenceDiagram
participant K0 as Kernel on CPU 0
participant PT as Page table
participant C1 as CPU 1 TLB
participant C2 as CPU 2 TLB
K0->>PT: change mapping
K0->>C1: shootdown IPI
K0->>C2: shootdown IPI
C1-->>K0: invalidated
C2-->>K0: invalidated
Changing memory maps is no longer a private decision.
It is a diplomatic incident involving every core that may have seen the old border.
V. PCID And ASID
If every context switch flushed all translations, performance would suffer.
Modern processors use tags to distinguish address spaces.
On x86, PCID means Process-Context Identifier. Other architectures often use ASID, Address Space Identifier.
The idea:
without tag:
virtual page -> physical frame
with tag:
(address-space tag, virtual page) -> physical frame
This lets the TLB keep translations for multiple address spaces without confusing one process for another.
| Term | Meaning |
|---|---|
| ASID | architecture-generic idea of tagging address spaces |
| PCID | x86 process-context identifier feature |
| global page | mapping that can survive some context switches |
| shootdown | forced invalidation across CPUs |
The TLB becomes a clerk with multiple stamp books.
This is faster.
It also gives kernel engineers more ways to be wrong.
VI. Security Consequences
The TLB is part of isolation performance.
It is also part of security hardening cost.
Kernel Page Table Isolation after Meltdown-class attacks made context switching more expensive because address-space boundaries became more strict. PCID helped reduce some of that pain on supporting hardware.
The deeper lesson:
security changes the shape of translation.
Translation changes the shape of performance.
Performance changes what people are willing to secure.
The Ministry does not like this cycle.
Reality does not care.
VII. Huge Pages And TLB Reach
The TLB has finite entries.
If each entry maps 4 KiB, a small number of entries covers a modest memory range.
If each entry maps 2 MiB or 1 GiB, the same number of entries covers much more memory.
This is TLB reach.
| Page size | TLB reach effect |
|---|---|
| 4 KiB | fine granularity, more entries needed |
| 2 MiB | covers more memory per entry |
| 1 GiB | huge reach, coarse mapping |
This is why huge pages can help performance:
less translation bookkeeping,
fewer misses,
fewer page walks.
The cost is reduced flexibility.
The dictator likes large provinces.
The city planner prefers districts.
VIII. The Real Story (Suppressed)
Officially, TLB means Translation Lookaside Buffer.
Suppressed expansion:
Tiny Liar Bureau.
Because the TLB lies for speed.
It says:
“I already checked the paperwork.”
Usually this is true.
Sometimes the kernel changed the paperwork and forgot to notify every clerk.
Then the system learns why stale authority is dangerous.
IX. The Lesson
The TLB is why virtual memory is fast enough to be invisible.
It caches translations, carries permission knowledge, complicates context switches, forces shootdowns, benefits from PCID/ASID tagging, and makes huge pages attractive.
Page tables are truth.
The TLB is remembered truth.
Remembered truth must be invalidated when the regime changes.
Tomorrow we leave address translation and inspect a different kind of gossip:
cache coherency.
— Kim Jong Rails, Supreme Leader of the Republic of Derails