# Memory Consistency Models for Shared-Memory Multiprocessors (Slide content courtesy of Kourosh Gharachorloe.) Todd C. Mowry 15-418: Memory Consistency Models 1



15-418: Memory Consistency Models

Carnegie Mellon

Todd C. Mowry





## Uniprocessor Memory Model • Memory model specifies ordering constraints among accesses • Uniprocessor model: memory accesses atomic and in program order write A write B read A read B • Not necessary to maintain sequential order for correctness - hardware: buffering, pipelining - compiler: register allocation, code motion • Simple for programmers • Allows for high performance Carnegie Mellon Todd C. Mowry







### Need for a Multiprocessor Memory Model Provide reasonable ordering constraints on memory accesses affects programmers affects system designers

15-418: Memory Consistency Models

### Carnegie Mellon Todd C. Mowry

### Sequential Consistency

- Formalized by Lamport
  - accesses of each processor in program order
  - all accesses appear in sequential order



Any order implicitly assumed by programmer is maintained

15-418: Memory Consistency Models

Carnegie Mellon
Todd C. Mowry

### Memory Behavior

What should the semantics be for memory operations to the shared memory?

- ease-of-use: keep it similar to serial semantics for uniprocessor
- operating system community used concurrent programming:
  - multiple processes interleaved on a single processor
- Lamport (1979) formalized Sequential Consistency (SC):
  - "... the result of any execution is the same as if the operations of all the processors were executed in some sequential order, and the operations of each individual processor appear in this sequence in the order specified by its program."

15-418: Memory Consistency Models

Todd C. Mowry

### Example with SC

### Simple Synchronization:

$$\frac{P1}{A} = 1 \qquad (a)$$

$$Flag = 1 \quad (b)$$

$$x = Flag \quad (c)$$

$$y = A \quad (d)$$

- all locations are initialized to 0
- possible outcomes for (x,y):
  - (0,0), (0,1), (1,1)
- (x,y) = (1,0) is not a possible outcome:
  - we know a->b and c->d by program order
  - b->c implies that a->d
  - y==0 implies d->a which leads to a contradiction

15-418: Memory Consistency Models

Carnegie Mellon Todd C. Mowry

### Another Example with SC

### From Dekker's Algorithm:

<u>P1</u> <u>P2</u> B = 1 (c)

- all locations are initialized to 0
- possible outcomes for (x,y):
  - (0,1), (1,0), (1,1)
- (x,y) = (0,0) is not a possible outcome:
  - a->b and c->d implied by program order
  - -x = 0 implies b->c which implies a->d
  - a->d says y = 1 which leads to a contradiction
  - similarly, y = 0 implies x = 1 which is also a contradiction

15-418: Memory Consistency Models

Carnegie Mellon

Todd C. Mowry

### How to Guarantee SC

- Sufficient Conditions for SC (Dubois et al., 1987):
  - assumes general cache coherence (if we have caches):
    - writes to the same location are observed in same order by all P's
  - for each processor, delay issue of access until previous completes
    - a read completes when the return value is back
    - a write completes when new value is visible to all processors
      - for simplicity, we assume writes are atomic
- Important to note that these are not necessary conditions for maintaining SC

15-418: Memory Consistency Models

Todd C. Mowry

### Simple Bus-Based Multiprocessor



- · assume write-back caches
- general cache coherence maintained by serialization at bus
  - writes to same location serialized and observed in the same order by all
- writes are atomic because all processors observe the write at the same time
- accesses from a single process complete in program order:
  - cache is busy while servicing a miss, effectively delaying later access
- SC is guaranteed without any extra mechanism above coherence

15-418: Memory Consistency Models

Carnegie Mellon Todd C. Mowry

Example of Complication in Bus-Based Machines P1 Pn L1 Cache L1 Cache write-buffer write-buffer L2 Cache L2 Cache write-buffer with no forwarding

- 1st level cache write-thru, 2nd level write-back (e.g., SGI cluster in DASH)
  - (reads to 2nd level delayed until buffer empty)
- never hit in the 1st level cache: SC is maintained (same as previous slide)
- read hits in the first level cache cause complication
  - (e.g., Dekker's algorithm)
- to maintain SC, we need to delay access to 1st level until there are no writes pending in write buffer (full write latency observed by processor)
- multiprocessors may not maintain SC to achieve higher performance

15-418: Memory Consistency Models

Carnegie Mellon Todd C. Mowry









### Relaxed Models

- Processor consistency (PC) Goodman 89
- Total store ordering (TSO) Sindhu 90
- · Causal memory Hutto 90
- PRAM Lipton 90
- Partial store ordering (PSO) Sindhu 90
- Weak ordering (WO) Dubois 86
- Problems:
  - programming complexity
  - portability

15-418: Memory Consistency Models

Carnegie Mellon

### Todd C. Mowry 15-418: Memory Consistency Models

Framework for Programming Simplicity

• Develop a unified framework that provides

- all previous performance optimizations, and more

- programming simplicity

Todd C. Mowry

### Intuition

- "Correctness": same results as sequential consistency
- Most programs don't require strict ordering for "correctness"

Program Order

Sufficient Order

- Difficult to automatically determine orders that are not necessary
- Specify methodology for writing "safe" programs

15-418: Memory Consistency Models

Carnegie Mellon Todd C. Mowry

### Overview of Framework

- Programmer:
  - methodology for writing programs
- System Designer:
  - safe optimizations for such programs

15-418: Memory Consistency Models

Carnegie Mellon Todd C. Mowry





### Identifying Data Races and Synchronization Two accesses conflict if: - access same location - at least one is a write • Order accesses by: program order (po) - dependence order (do): op1 --> op2 if op2 reads op1 <u>P2</u> Write A Write Flag Read Flag **↓** po Read A · Data Race: - two conflicting accesses on different processors - not ordered by intervening accesses Carnegie Mellon Todd C. Mowry 15-418: Memory Consistency Models



### Outline Memory Consistency Models • Framework for Programmer Simplicity Performance Evaluation Carnegie Mellon Todd C. Mowry 15-418: Memory Consistency Models



### Performance Evaluation • Goal: characterize gains from relaxed models - relaxed models effective in hiding memory latency - enhance gains from other latency hiding techniques Todd C. Mowry 15-418: Memory Consistency Models 30































# Other Gains from Relaxed Models Common compiler optimizations require reordering of accesses e.g., register allocation, code motion, loop transformation Sequential consistency disallows reordering of shared accesses What model is best for compiler optimizations? intermediate models (e.g. PC) not flexible enough weak ordering and release consistency only models that work Carnegie Mellon Todd C. Mowry

### Summary of Interaction with Other Techniques Release consistency complements prefetching and multiple contexts gains over prefetching: 1.1 - 1.4x gains over multiple contexts: 1.2 - 1.3x lockup-free caches common requirement

