# **Machine-Level Programming I: Basics**

15-213/18-213/15-513: Introduction to Computer Systems 5<sup>th</sup> Lecture, May 27, 2020

## Logistics

#### Course ombudsmen

Ishita Sinha



 If you're having any issues with a TA, and are uncomfortable discussing this with the instructor, go to her

### **TA office hours on course website**

# **Today: Machine Programming I: Basics**

- History of Intel processors and architectures
- Assembly Basics: Registers, operands, move
- Arithmetic & logical operations
- C, assembly, machine code

### **Intel x86 Processors**

Dominate laptop/desktop/server market

### Evolutionary design

- Backwards compatible up until 8086, introduced in 1978
- Added more features as time goes on

#### x86 is a Complex Instruction Set Computer (CISC)

- Many different instructions with many different formats
  - But, only small subset encountered with Linux programs

#### Compare: Reduced Instruction Set Computer (RISC)

- RISC: \*very few\* instructions, with \*very few\* modes for each
- RISC can be quite fast (but Intel still wins on speed!)
- Current RISC renaissance (e.g., ARM, RISC V), especially for low-power

# **Intel x86 Evolution: Milestones**

| Name         | Date                    | Transistors               | MHz       |
|--------------|-------------------------|---------------------------|-----------|
| <b>8086</b>  | 1978                    | <b>29K</b>                | 5-10      |
| First 16-bi  | it Intel processo       | r. Basis for IBM PC & DO  | DS        |
| 1MB addr     | ess space               |                           |           |
| <b>386</b>   | 1985                    | 275K                      | 16-33     |
| First 32 bi  | t Intel processor       | , referred to as IA32     |           |
| Added "flat  | at addressing", c       | capable of running Unix   |           |
| Pentium 4    | E 2004                  | 125M                      | 2800-3800 |
| First 64-bit | it Intel x86 proce      | essor, referred to as x86 | -64       |
| Core 2       | 2006                    | 291M                      | 1060-3333 |
| First mult   | i-core Intel proc       | essor                     |           |
| Core i7      | 2008                    | 731M                      | 1600-4400 |
|              | s (our <i>shark</i> mac |                           |           |

### Intel x86 Processors, cont.

### Machine Evolution

| <b>386</b>      | 1985 | 0.3M |
|-----------------|------|------|
| Pentium         | 1993 | 3.1M |
| Pentium/MMX     | 1997 | 4.5M |
| PentiumPro      | 1995 | 6.5M |
| Pentium III     | 1999 | 8.2M |
| Pentium 4       | 2000 | 42M  |
| Core 2 Duo      | 2006 | 291M |
| Core i7         | 2008 | 731M |
| Core i7 Skylake | 2015 | 1.9B |



### Added Features

- Instructions to support multimedia operations
- Instructions to enable more efficient conditional operations
- Transition from 32 bits to 64 bits
- More cores

### Intel x86 Processors, cont.

| Past Generations |                           |         | Process technology |
|------------------|---------------------------|---------|--------------------|
| • 1              | <sup>st</sup> Pentium Pro | 1995    | 600 nm             |
| <b>1</b>         | <sup>st</sup> Pentium III | 1999    | 250 nm             |
| • 1              | <sup>st</sup> Pentium 4   | 2000    | 180 nm             |
| <b>•</b> 1       | <sup>st</sup> Core 2 Duo  | 2006    | 65 nm              |
| Rece             | ent & Upcom               | ning Ge | nerations          |
| 1.               | Nehalem                   | 2008    | 45 nm              |
| 2.               | Sandy Bridge              | 2011    | 32 nm              |
| 3.               | Ivy Bridge                | 2012    | 22 nm              |
| 4.               | Haswell                   | 2013    | 22 nm              |
| 5.               | Broadwell                 | 2014    | 14 nm              |
| 6.               | Skylake                   | 2015    | 14 nm              |
| 7.               | Kaby Lake                 | 2016    | 14 nm              |
| 8.               | Coffee Lake               | 2017    | 14 nm              |
| 9.               | Cannon Lake               | 2018    | 10 nm              |
| 10.              | Ice Lake                  | 2019    | 10 nm              |
| 11.              | Tiger Lake                | 2020?   | 10 nm              |

Process technology dimension = width of narrowest wires (10 nm ≈ 100 atoms wide)

### **2018 State of the Art: Coffee Lake**



#### Mobile Model: Core i7

- 2.2-3.2 GHz
- 45 W

### Desktop Model: Core i7

- Integrated graphics
- 2.4-4.0 GHz
- **35-95 W**

#### Server Model: Xeon E

- Integrated graphics
- Multi-socket enabled
- 3.3-3.8 GHz
- **80-95 W**

# x86 Clones: Advanced Micro Devices (AMD)

### Historically

- AMD has followed just behind Intel
- A little bit slower, a lot cheaper

### Then

- Recruited top circuit designers from Digital Equipment Corp. and other downward trending companies
- Built Opteron: tough competitor to Pentium 4
- Developed x86-64, their own extension to 64 bits

### Recent Years

- Intel got its act together
  - 1995-2011: Lead semiconductor "fab" in world
  - 2018: #2 largest by \$\$ (#1 is Samsung)
  - 2019: reclaimed #1
- AMD fell behind
  - Relies on external semiconductor manufacturer GlobalFoundaries
  - ca. 2019 CPUs (e.g., Ryzen) are competitive again

### Intel's 64-Bit History

### 2001: Intel Attempts Radical Shift from IA32 to IA64

- Totally different architecture (Itanium, AKA "Itanic")
- Executes IA32 code only as legacy
- Performance disappointing

### 2003: AMD Steps in with Evolutionary Solution

x86-64 (now called "AMD64")

### Intel Felt Obligated to Focus on IA64

Hard to admit mistake or that AMD is better

### **2004:** Intel Announces EM64T extension to IA32

- Extended Memory 64-bit Technology
- Almost identical to x86-64!

### Virtually all modern x86 processors support x86-64

But, lots of code still runs in 32-bit mode

## **Our Coverage**

### IA32

- The traditional x86
- For 15/18-213: RIP, Summer 2015

#### **x86-64**

- The standard
- shark> gcc hello.c
- shark> gcc -m64 hello.c

#### Presentation

- Book covers x86-64
- Web aside on IA32
- We will only cover x86-64

# **Today: Machine Programming I: Basics**

- History of Intel processors and architectures
- Assembly Basics: Registers, operands, move
- Arithmetic & logical operations
- C, assembly, machine code

### **Levels of Abstraction**

#### C programmer

```
#include <stdio.h>
int main() {
    int i, n = 10, t1 = 0, t2 = 1, nxt;
    for (i = 1; i <= n; ++i) {
        printf("%d, ", t1);
        nxt = t1 + t2;
        t1 = t2;
        t2 = nxt; }
    return 0; }</pre>
```

### Nice clean layers, but beware...

#### Assembly programmer



#### **Computer designer**



Gates, clocks, circuit layout, ...





# Definitions

Architecture: (also ISA: instruction set architecture) The parts of a processor design that one needs to understand for writing correct machine/assembly code

- Examples: instruction set specification, registers
- Machine Code: The byte-level programs that a processor executes
- Assembly Code: A text representation of machine code

#### Microarchitecture: Implementation of the architecture

Examples: cache sizes and core frequency

### Example ISAs:

- Intel: x86, IA32, Itanium, x86-64
- ARM: Used in almost all mobile phones
- RISC V: New open-source ISA

# **Assembly/Machine Code View**



### **Programmer-Visible State**

- PC: Program counter
  - Address of next instruction
  - Called "RIP" (x86-64)
- Register file
  - Heavily used program data

#### Condition codes

- Store status information about most recent arithmetic or logical operation
- Used for conditional branching

#### Memory

- Byte addressable array
- Code and user data
- Stack to support procedures

## **Assembly Characteristics: Data Types**

#### "Integer" data of 1, 2, 4, or 8 bytes

- Data values
- Addresses (untyped pointers)
- Floating point data of 4, 8, or 10 bytes
- (SIMD vector data types of 8, 16, 32 or 64 bytes)
- Code: Byte sequences encoding series of instructions

#### No aggregate types such as arrays or structures

Just contiguously allocated bytes in memory

### x86-64 Integer Registers

| % <b>rax</b> | %eax             | % <b>r8</b>  | %r8d  |
|--------------|------------------|--------------|-------|
| %rbx         | %ebx             | % <b>r9</b>  | %r9d  |
| %rcx         | <sup>8</sup> ecx | % <b>r10</b> | %r10d |
| %rdx         | %edx             | % <b>r11</b> | %r11d |
| % <b>rsi</b> | %esi             | % <b>r12</b> | %r12d |
| % <b>rdi</b> | %edi             | % <b>r13</b> | %r13d |
| % <b>rsp</b> | %esp             | % <b>r14</b> | %r14d |
| %rbp         | %ebp             | %r15         | %r15d |

- Can reference low-order 4 bytes (also low-order 1 & 2 bytes)
- Not part of memory (or cache)

# **Some History: IA32 Registers**

#### Origin (mostly obsolete)



general purpose

# **Assembly Characteristics: Operations**

### Transfer data between memory and register

- Load data from memory into register
- Store register data into memory

Perform arithmetic function on register or memory data

### Transfer control

- Unconditional jumps to/from procedures
- Conditional branches
- Indirect branches

# **Moving Data**

- Moving Data movq.jource, Dest
- Operand Types

Immediate: Constant integer data

- Example: \$0x400, \$-533
- Like C constant, but prefixed with `\$'
- Encoded with 1, 2, or 4 bytes
- *Register:* One of 16 integer registers
  - Example: %rax, %r13
  - But %rsp reserved for special use
  - Others have special uses for particular instructions
- Memory 8 consecutive bytes of memory at address given by register
  - Simplest example: (%rax)
  - Various other "addressing modes"

| %rax         |
|--------------|
| %rcx         |
| %rdx         |
| %rbx         |
| % <b>rsi</b> |
| %rdi         |
| %rsp         |
| %rbp         |

| %rN |
|-----|
|-----|

### movq Operand Combinations



#### Cannot do memory-memory transfer with a single instruction

# **Simple Memory Addressing Modes**

- Normal (R) Mem[Reg[R]]
  - Register R specifies memory address
  - Aha! Pointer dereferencing in C

```
movq (%rcx),%rax
```

### Displacement D(R) Mem[Reg[R]+D]

- Register R specifies start of memory region
- Constant displacement D specifies offset

#### movq 8(%rbp),%rdx

# **Simple Memory Addressing Modes**

- Normal (R) Mem[Reg[R]]
  - Register R specifies memory address
  - Aha! Pointer dereferencing in C

```
movq (%rcx),%rax
```

### Displacement D(R) Mem[Reg[R]+D]

- Register R specifies start of memory region
- Constant displacement D specifies offset

#### movq 8(%rbp),%rdx

# **Complete Memory Addressing Modes**

#### Most General Form

### D(Rb,Ri,S) Mem[Reg[Rb]+S\*Reg[Ri]+D]

- D: Constant "displacement" 1, 2, or 4 bytes
- Rb: Base register: Any of 16 integer registers
- Ri: Index register: Any, except for %rsp
- S: Scale: 1, 2, 4, or 8 (why these numbers?)

Special Cases (Rb,Ri) D(Rb,Ri) (Rb,Ri,S)

Mem[Reg[Rb]+Reg[Ri]] Mem[Reg[Rb]+Reg[Ri]+D] Mem[Reg[Rb]+S\*Reg[Ri]]

# **Today: Machine Programming I: Basics**

- History of Intel processors and architectures
- Assembly Basics: Registers, operands, move
- Arithmetic & logical operations
- C, assembly, machine code

# **Address Computation Instruction**

### leaq Src, Dst

- Src is address mode expression
- Set Dst to address denoted by expression

### Uses

- Computing addresses without a memory reference
  - E.g., translation of p = &x[i];
- Computing arithmetic expressions of the form x + k\*y
  - k = 1, 2, 4, or 8

### Example

```
long m12(long x)
{
   return x*12;
}
```

#### **Converted to ASM by compiler:**

# **Some Arithmetic Operations**

### Two Operand Instructions:

| Format | Computation |                    |  |
|--------|-------------|--------------------|--|
| addq   | Src,Dest    | Dest = Dest + Src  |  |
| subq   | Src,Dest    | Dest = Dest – Src  |  |
| imulq  | Src,Dest    | Dest = Dest * Src  |  |
| shlq   | Src,Dest    | Dest = Dest << Src |  |
| sarq   | Src,Dest    | Dest = Dest >> Src |  |
| shrq   | Src,Dest    | Dest = Dest >> Src |  |
| xorq   | Src,Dest    | Dest = Dest ^ Src  |  |
| andq   | Src,Dest    | Dest = Dest & Src  |  |
| orq    | Src,Dest    | Dest = Dest   Src  |  |

Synonym: salq Arithmetic Logical

- Watch out for argument order! Src,Dest (Warning: Intel docs use "op Dest,Src")
- No distinction between signed and unsigned int (why?)

## **Some Arithmetic Operations**

#### One Operand Instructions

| incq | Dest | Dest = Dest + 1 |
|------|------|-----------------|
| decq | Dest | Dest = Dest – 1 |
| negq | Dest | Dest = – Dest   |
| notq | Dest | Dest = ~Dest    |

### See book for more instructions

- Depending how you count, there are 2,034 total x86 instructions
- (If you count all addr modes, op widths, flags, it's actually 3,683)

### Activity

# **Today: Machine Programming I: Basics**

- History of Intel processors and architectures
- Assembly Basics: Registers, operands, move
- Arithmetic & logical operations
- **C**, assembly, machine code

# **Turning C into Object Code**

- Code in files p1.c p2.c
- Compile with command: gcc -Og p1.c p2.c -o p
  - Use basic optimizations (-Og) [New to recent versions of GCC]
  - Put resulting binary in file p



# **Compiling Into Assembly**

| C Code (sum.c)                | Generated x86-64 Assembly |
|-------------------------------|---------------------------|
| long plus(long x, long y);    | sumstore:                 |
|                               | pushq %rbx                |
| void sumstore(long x, long y, | movq %rdx, %rbx           |
| long *dest)                   | call plus                 |
| {                             | movq %rax, (%rbx)         |
| long $t = plus(x, y);$        | popq %rbx                 |
| *dest = t;                    | ret                       |
| 3                             |                           |

Obtain (on shark machine) with command

gcc -Og -S sum.c

Produces file sum.s

*Warning*: Will get very different results on non-Shark machines (Andrew Linux, Mac OS-X, ...) due to different versions of gcc and different compiler settings.

### What it really looks like

.globl sumstore

.type sumstore, @function

sumstore:

.LFB35:

.cfi startproc pushq %rbx .cfi def cfa offset 16 .cfi offset 3, -16 movq %rdx, %rbx call plus movq %rax, (%rbx) popq %rbx .cfi def cfa offset 8 ret .cfi endproc .LFE35: .size sumstore, .-sumstore

# What it really looks like

.globl sumstore

.type sumstore, @function

#### sumstore:

.LFB35:

.cfi\_startproc

| pusho | a %rbx                  |
|-------|-------------------------|
| .cfi_ | _def_cfa_offset 16      |
| .cfi_ | offset 3, -16           |
| movq  | <pre>%rdx, %rbx</pre>   |
| call  | plus                    |
| movq  | <pre>%rax, (%rbx)</pre> |
| popq  | %rbx                    |
| .cfi_ | _def_cfa_offset 8       |

#### ret

.cfi\_endproc

.LFE35:

.size sumstore, .-sumstore

Things that look weird and are preceded by a '.' are generally directives.

| sumstore: |                  |              |
|-----------|------------------|--------------|
| pushq     | % <b>rbx</b>     |              |
| movq      | <pre>%rdx,</pre> | % <b>rbx</b> |
| call      | plus             |              |
| movq      | <pre>%rax,</pre> | (%rbx)       |
| popq      | % <b>rbx</b>     |              |
| ret       |                  |              |

## **Assembly Characteristics: Data Types**

#### "Integer" data of 1, 2, 4, or 8 bytes

- Data values
- Addresses (untyped pointers)
- Floating point data of 4, 8, or 10 bytes
- (SIMD vector data types of 8, 16, 32 or 64 bytes)
- Code: Byte sequences encoding series of instructions

#### No aggregate types such as arrays or structures

Just contiguously allocated bytes in memory

# **Assembly Characteristics: Operations**

#### Transfer data between memory and register

- Load data from memory into register
- Store register data into memory

Perform arithmetic function on register or memory data

### Transfer control

- Unconditional jumps to/from procedures
- Conditional branches

# **Object Code**

### Code for sumstore

**0x0400595**:

- 0x53
- 0x48
- 0x89
- 0xd3
- 0xe8
- 0xf2
- 0xff

0xff

- 0xff
- 0x48
- UX48
- 0x89
- $0 \times 03$
- .....
- 0x5bStarts at address0xc30x0400595

• Total of 14 bytes

Each instruction

1, 3, or 5 bytes

### Assembler

- Translates .s into .o
- Binary encoding of each instruction
- Nearly-complete image of executable code
- Missing linkages between code in different files

### Linker

- Resolves references between files
- Combines with static run-time libraries
  - e.g., code for malloc, printf
- Some libraries are *dynamically linked* 
  - Linking occurs when program begins execution

## **Machine Instruction Example**

\*dest = t;

movq %rax, (%rbx)

0x40059e: 48 89 03

- C Code
  - Store value t where designated by dest

### Assembly

- Move 8-byte value to memory
  - Quad words in x86-64 parlance
- Operands:
  - t: Register %rax
  - dest: Register %rbx
  - \*dest: Memory M[%rbx]

### Object Code

- 3-byte instruction
- Stored at address 0x40059e

# **Disassembling Object Code**

### Disassembled

| 0000000000 | 400595 <sumstore< th=""><th>≥&gt;:</th></sumstore<> | ≥>:                        |
|------------|-----------------------------------------------------|----------------------------|
| 400595:    | 53                                                  | push %rbx                  |
| 400596:    | 48 89 d3                                            | mov %rdx,%rbx              |
| 400599:    | e8 f2 ff ff ff                                      | callq 400590 <plus></plus> |
| 40059e:    | 48 89 03                                            | mov %rax,(%rbx)            |
| 4005a1:    | 5b                                                  | pop %rbx                   |
| 4005a2:    | c3                                                  | retq                       |

### Disassembler

#### objdump -d sum

- Useful tool for examining object code
- Analyzes bit pattern of series of instructions
- Produces approximate rendition of assembly code
- Can be run on either a .out (complete executable) or .o file

# **Alternate Disassembly**

#### Disassembled

| Dump of assembler code for function sumstore: |             |                        |  |  |
|-----------------------------------------------|-------------|------------------------|--|--|
| 0x000000000400595                             | <+0>: push  | %rbx                   |  |  |
| 0x000000000400596                             | <+1>: mov   | %rdx,%rbx              |  |  |
| 0x000000000400599                             | <+4>: callq | 0x400590 <plus></plus> |  |  |
| 0x000000000040059e                            | <+9>: mov   | <pre>%rax,(%rbx)</pre> |  |  |
| 0x00000000004005a1                            | <+12>:pop   | %rbx                   |  |  |
| 0x00000000004005a2                            | <+13>:retq  |                        |  |  |

#### Within gdb Debugger

- Disassemble procedure
- gdb sum
- disassemble sumstore

<plus>

# **Alternate Disassembly**

#### Disassembled

### **Object** Code

| COUC            |                                                                      |                                                                                                                                                                                                                                                                   |  |  |
|-----------------|----------------------------------------------------------------------|-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|--|--|
|                 | 0x0400595:<br>0x53<br>0x48<br>0x89<br>0xd3<br>0xe8                   | <pre>Dump of assembler code for function sumstore:<br/>0x000000000000000595 &lt;+0&gt;: push %rbx<br/>0x00000000000000596 &lt;+1&gt;: mov %rdx,%rbx<br/>0x000000000000000599 &lt;+4&gt;: callq 0x400590 <plu<br>0x00000000000000000000000000000000</plu<br></pre> |  |  |
|                 | 0xf2<br>0xff<br>0xff<br>0xff<br>0x48<br>0x89<br>0x03<br>0x5b<br>0xc3 | <ul> <li>Within gdb Debugger</li> <li>Disassemble procedure<br/>gdb sum<br/>disassemble sumstore</li> <li>Examine the 14 bytes starting at sumstore</li> </ul>                                                                                                    |  |  |
| x/14xb sumstore |                                                                      |                                                                                                                                                                                                                                                                   |  |  |

### What Can be Disassembled?

```
% objdump -d WINWORD.EXE
WINWORD.EXE: file format pei-i386
No symbols in "WINWORD.EXE".
Disassembly of section .text:
30001000 < text>:
30001000:
30001001:
               Reverse engineering forbidden by
30001003:
             Microsoft End User License Agreement
30001005:
3000100a:
```

- Anything that can be interpreted as executable code
- Disassembler examines bytes and reconstructs assembly source

# **Machine Programming I: Summary**

History of Intel processors and architectures

Evolutionary design leads to many quirks and artifacts

### C, assembly, machine code

- New forms of visible state: program counter, registers, ...
- Compiler must transform statements, expressions, procedures into low-level instruction sequences

#### Assembly Basics: Registers, operands, move

 The x86-64 move instructions cover wide range of data movement forms

### Arithmetic

 C compiler will figure out different instruction combinations to carry out computation