Page Tables: The Bureaucracy Of Virtual Memory
Yesterday we met the MMU, the border guard of memory.
Today we inspect the paperwork the guard reads:
page tables.
If the MMU is the checkpoint,
page tables are the registry office.
They say which virtual pages exist, where they go, and what crimes they are allowed to commit.
I. The Simple Lie
A virtual page maps to a physical frame.
That is the simple story.
virtual page 0x12345 -> physical frame 0xabcde
offset stays the same
But a real address space may contain huge regions:
- executable code
- shared libraries
- heap
- stack
- memory-mapped files
- kernel mappings
- guard pages
- device mappings
- empty holes waiting to catch fools
A flat table for all possible virtual pages would be enormous.
So modern architectures use multi-level page tables.
Translation becomes a bureaucratic walk through offices.
II. Four-Level x86-64 Paging
Common x86-64 paging uses four levels:
flowchart LR
VA["virtual address"]
PML4["PML4"]
PDPT["PDPT"]
PD["Page Directory"]
PT["Page Table"]
PAGE["4 KiB physical page"]
VA --> PML4 --> PDPT --> PD --> PT --> PAGE
For 4 KiB pages, the virtual address is split into index fields plus an offset.
Conceptually:
virtual address bits:
[ PML4 ][ PDPT ][ PD ][ PT ][ page offset ]
9 9 9 9 12
Nine bits select one of 512 entries at each level.
Twelve bits select a byte inside the 4 KiB page.
The numbers are not decoration.
They are why page-table pages are 4 KiB and hold 512 eight-byte entries.
The Ministry loves symmetry when it can tax it.
III. CR3: The Address-Space Passport
On x86, the CR3 register points to the root of the active page-table hierarchy.
When the operating system switches from one process to another, it can switch address spaces by changing the page-table root.
Conceptually:
; simplified: load new page-table root
mov cr3, rax
Do not paste this into your shell.
This is the kind of instruction that belongs to kernels, hypervisors, boot code, and people who already know which machine they are about to ruin.
| Register / structure | Job |
|---|---|
| CR3 | root pointer for current address-space translations |
| PML4 | top-level page map in 4-level paging |
| PTE | final entry for a 4 KiB page |
| physical frame | actual memory backing the page |
Every process receives a different map.
The physical RAM is shared.
The lies are personalized.
IV. Page-Table Entries
A page-table entry does not merely point to memory.
It carries policy.
Typical x86 page-entry concepts include:
| Bit / field | Meaning |
|---|---|
| present | translation exists |
| read/write | writes allowed if set |
| user/supervisor | userland access allowed if set |
| accessed | hardware saw the page |
| dirty | hardware saw a write |
| page size | large page at this level |
| global | avoid flushing across some context switches |
| NX | no-execute, if supported/enabled |
A page entry is a tiny dictatorship:
physical frame address + permission bits = law
The OS writes the law.
The MMU enforces it.
The process complains on social media.
V. Huge Pages
Not every mapping has to end at a 4 KiB page.
x86-64 can use larger pages such as 2 MiB and 1 GiB, depending on mode and support.
| Page size | Why use it | Cost |
|---|---|---|
| 4 KiB | fine-grained protection and allocation | many entries |
| 2 MiB | fewer translations, better TLB reach | internal fragmentation |
| 1 GiB | huge mapping efficiency | coarse and expensive to reserve |
Huge pages are useful for databases, hypervisors, large memory workloads, and anything that wants fewer translation entries.
But huge pages are not free magic.
They trade precision for fewer bureaucrats.
The dictator approves in principle.
The memory allocator files objections.
VI. Copy-On-Write
Page tables make fork() elegant.
When a Unix process forks, the kernel does not need to copy all memory immediately.
It can map the same physical pages into both processes as read-only and mark them copy-on-write.
When one process tries to write, a page fault occurs. The kernel then copies that page and updates the writer’s page table.
sequenceDiagram
participant P as Parent
participant K as Kernel
participant C as Child
participant M as Physical page
P->>K: fork()
K->>P: map page read-only COW
K->>C: map same page read-only COW
C->>M: write attempt
M-->>K: page fault
K->>C: allocate private copy and resume
This is the kind of trick that makes operating systems feel civilized.
Nobody copies what may never be modified.
The state delays work until a citizen commits a write crime.
VII. Kernel Mappings
Many operating systems map kernel memory into every process address space, protected by supervisor-only permissions.
This makes system calls and interrupts efficient because the kernel is already mapped when control changes privilege.
But speculative execution attacks changed the comfort level.
After Meltdown-class issues, many systems adopted stronger kernel page-table isolation techniques so user mode would not keep useful kernel mappings visible in the same old way.
This is the hardware lesson:
permissions are not only logical.
Microarchitecture can make forbidden knowledge observable through side channels.
The page table said no.
The cache whispered yes.
VIII. The Real Story (Suppressed)
Officially, PTE means Page Table Entry.
Suppressed expansion:
Permission To Exist.
If the present bit is clear, the page does not exist.
If writable is clear, the page may not be modified.
If NX is set, the page may not execute.
The page asks:
“May I live?”
The PTE replies:
“Only as data.”
This is why the kernel is the Supreme Bureaucrat.
It does not allocate memory.
It issues visas.
IX. The Lesson
Page tables are not a side detail.
They are the operating system’s map of reality.
They enable:
- process isolation
- memory permissions
- demand paging
- copy-on-write
- memory-mapped files
- kernel/user separation
- huge pages
- virtualization foundations
But walking them is expensive.
Tomorrow we inspect the cache that prevents every memory access from becoming a committee meeting:
the TLB.
— Kim Jong Rails, Supreme Leader of the Republic of Derails