### Instruction Scheduling: Introduction

Todd C. Mowry CS745: Optimizing Compilers

# Optimization: What's the Point? (A Quick Review)

### Machine-Independent Optimizations:

- e.g., constant propagation & folding, redundancy elimination, dead-code elimination, etc.
- · Goal: eliminate work

### Machine-Dependent Optimizations:

- register allocation
  - · Goal: reduce cost of accessing data
- instruction scheduling
  - · Goal: 222
- ..

CS745: Instruction Scheduling

Todd C. Mowry

# The Goal of Instruction Scheduling

- Assume that the remaining instructions are all essential
  - (otherwise, earlier passes would have eliminated them)
- How can we perform this fixed amount of work in less time?
  - · Answer: execute the instructions in parallel

### Time





Todd C. Mowry

CS745: Instruction Scheduling

-3-

### Hardware Support for Parallel Execution

- Three forms of parallelism are found in modern machines:
  - Pipelining
  - Superscalar Processing
  - Multiprocessing

Instruction Scheduling

 Automatic Parallelization (covered later in class)

CS745: Instruction Scheduling

-4-

 $\mathsf{Todd}\; \mathit{C}.\; \mathsf{Mowry}$ 

















### Limitation #1: Hardware Resources

 Processors have finite resources, and there are often constraints on how these resources can be used.

### Examples:

- · Finite issue width
- · Limited functional units (FUs) per given instruction type
- · Limited pipelining within a given functional unit (FU)

CS745: Instruction Scheduling

-13-

Todd C. Mowry

# Finite Issue Width Prior to superscalar processing: processors only "issued" one instruction per cycle Even with superscalar processing: limit on total # of instructions issued per cycle Issue Width = infinite Issue Width = 4 Issue Width = 4 Todd C. Mowry





# Limitations Upon Scheduling

- 1. Hardware Resources
- → 2. Data Dependences
  - 3. Control Dependences

CS745: Instruction Scheduling

Todd C Mowry

### Limitation #2: Data Dependences

If we read or write a data location "too early", the program may behave incorrectly.

(Assume that initially, x = 0.)

(no simple fix)

Can potentially fix through renaming.

CS745: Instruction Scheduling

Todd C. Mowry

# Why Data Dependences are Challenging

- which of these instructions can be reordered?
- ambiguous data dependences are very common in practice
  - · difficult to resolve, despite fancy pointer analysis

CS745: Instruction Scheduling

-19-

Todd C. Mowry

# Given Ambiguous Data Dependences, What Can the Compiler Do?

- Conservative approach: don't reorder instructions
  - ensures correct execution
  - but may suffer poor performance
- Aggressive approach?
  - · is there a way to safely reorder instructions?

CS745: Instruction Scheduling

-20-

Todd C. Mowry

# Hardware Limitations Revisited: Multi-cycle Execution Latencies

- Simple instructions often "execute" in one cycle
  - (as observed by other instructions in the pipeline)
  - e.g., integer addition
- More complex instructions may require multiple cycles
  - · e.g., integer division, square-root
  - cache misses!
- These latencies, when combined with data dependencies, can result in non-trivial critical path lengths through code

CS745: Instruction Scheduling

-21-

Todd C. Mowry

### Limitations Upon Scheduling

- 1 Hardware Resources
- 2. Data Dependences
- → 3. Control Dependences

CS745: Instruction Scheduling

### Limitation #3: Control Dependences



- What do we do when we reach a conditional branch?
  - · choose a "frequently-executed" path?
  - choose multiple paths?

CS745: Instruction Scheduling

-23-

Todd C. Mowry

# Scheduling Constraints: Summary

- Hardware Resources
  - finite set of FUs with instruction type, bandwidth, and latency constraints
  - · cache hierarchy also has many constraints
- Data Dependences
  - · can't consume a result before it is produced
  - · ambiguous dependences create many challenges
- Control Dependences
  - · impractical to schedule for all possible paths
  - · choosing an "expected" path may be difficult
    - · recovery costs can be non-trivial if you are wrong

CS745: Instruction Scheduling

24-

Todd C. Mowry

Todd C Mowry

### Hardware- vs. Compiler-Based Scheduling

- The hardware can also attempt to reschedule instructions (on-the-fly) to improve performance
- What advantages/disadvantages would hardware have (vs. the compiler) when trying to reason about:
  - · Hardware Resources
  - · Data Dependences
  - Control Dependences
- Which is better:
  - · doing more of the scheduling work in the compiler?
  - · doing more of the scheduling work in the hardware?

Todd C. Mowry

CS745: Instruction Scheduling -25-

### Spectrum of Hardware Support for Scheduling Compiler-Centric Hardware-Centric VLIW In-Order Out-of-Order (Very Long Superscalar Superscalar Instruction Word) e.g.: Itanium e.g.: Original Pentium e.g.: Pentium 4 Todd C. Mowry CS745: Instruction Scheduling

# VLIW Processors Motivation: • if the hardware spends zero (or almost zero) time thinking about scheduling, it can run faster Philosophy: • give full control over scheduling to the compiler Implementation: • expose control over all FUs directly to software via a "very long instruction word" Int Mem FP Time CS745: Instruction Scheduling

```
Compiling for VLIW
Predicting Execution Latencies:

    easy for most functional units (latency is fixed)

    but what about memory references?
Data Dependences:
    · in "pure" VLIW, the hardware does not check for them
       • the compiler takes them into account to produce safe code
                                while (p != NULL) {
                                  if (test(p->val))
                                     q->next = p->left;
                                  p = p->next;
            Example #1
                                    Example #2
CS745: Instruction Scheduling
                              -28-
                                                     Todd C. Mowry
```









### Out-of-Order Superscalar Processors

### Motivation:

 when an instruction is stuck, perhaps there are subsequent instructions that can be executed

Sounds great! But how does this complicate the hardware?

CS745: Instruction Scheduling

-33-

Todd C. Mowry





