15-213

"The course that gives CMU its Zip!"

### Virtual Memory October 15, 2009

### **Topics**

- Address spaces
- Motivations for virtual memory
- Address translation
- Accelerating translation with TLBs

lecture-14.ppt



### Simple Addressing Modes

- Normal (R) Mem[Reg[R]]
  - Register R specifies memory address

movl (%ecx),%eax

- Displacement D(R) Mem[Reg[R]+D]
  - Register R specifies start of memory region
  - Constant displacement D specifies offset

movl 8(%ebp),%edx

3 From class04.ppt

15-213, F

### Lets think on this: physical memory?

### How does everything fit?

- 32-bit addresses: ~4,000,000,000 (4 billion) bytes
- 64-bit addresses: ~16,000,000,000,000,000 (16 quintillion) bytes

How to decide which memory to use in your program?

How about after a fork()?

What if another process stores data into your memory?

How could you debug your program?

15-213, F

### So, we add a level of indirection

### One simple trick solves all three problems

- Each process gets its own private image of memory
- appears to be a full-sized private memory range
- This fixes "how to choose" and "others shouldn't mess w/yours"
- surprisingly, it also fixes "making everything fit"
- Implementation: translate addresses transparently
   add a mapping function
- to map private addresses to physical addresses
- do the mapping on every load or store

This mapping trick is the heart of virtual memory

5 15-213, F'0:

### **Address Spaces**

A *linear address space* is an ordered set of contiguous nonnegative integer addresses:

{0, 1, 2, 3, ... }

A virtual address space is a set of N = 2<sup>n</sup> virtual addresses:

{0, 1, 2, ..., N-1}

A physical address space is a set of  $M=2^m$  (for convenience) physical addresses:

{0, 1, 2, ..., M-1}

In a system based on virtual addressing, each byte of main memory has a physical address *and* a virtual address (or more)

15-213, F'09









# DRAM Cache Organization DRAM cache organization driven by the enormous miss penalty DRAM is about 10x slower than SRAM Disk is about 10x,000x slower than a DRAM to get first byte, though fast for next byte DRAM cache properties Large page (block) size (typically 4-8 KB) Fully associative Any virtual page can be placed in any physical page Requires a "large" mapping function – different from CPU caches Highly sophisticated replacement algorithms Too complicated and open-ended to be implemented in hardware Write-back rather than write-through













## Why does it work? Locality Virtual memory works because of locality At any point in time, programs tend to access a set of active virtual pages called the working set • Programs with better temporal locality will have smaller working sets If (working set size < main memory size) • Good performance for one process after compulsory misses If (SUM(working set sizes) > main memory size) • Thrashing: Performance meltdown where pages are swapped (copied) in and out continuously

### (2) VM as a Tool for Memory Mgmt Key idea: each process has its own virtual address space It can view memory as a simple linear array Mapping function scatters addresses through physical memory • Well chosen mappings simplify memory allocation and management Address Translation Physical Virtual Address Address Space (DRAM) Space for Process 1: Virtual Address PP 10 Space for 15-213, F















### Speeding up Translation with a TLB Page table entries (PTEs) are cached in L1 like any other memory word PTEs may be evicted by other data references PTE hit still requires a 1-cycle delay Solution: Translation Lookaside Buffer (TLB) Small hardware cache in MMU Maps virtual page numbers to physical page numbers Contains complete page table entries for small number of pages





































