Lecture 11 - Object models

- So far, talked about code representations and executing code
- Object models are about designing an efficient data representation for programs
  - critically impact the space used by programs
- Space/time:
  - make all object operations as cheap (few instructions) as possible, including operations
    by the VM itself (like garbage collection)
  - make each field, object, class, etc the smallest possible representation

- Common things among object models
  - objects are one or more contiguous regions of memory
  - sometimes classified as "scalars" and "arrays"
  - each object (usually) needs some kind of dynamic identification: "tag", "type", "header"
    - exact nature of that depends on language and GC
  - some languages allow *reflection* on an object's type (and thus members)
    - requires existence of metadata objects and access to them
  - source-level object might correspond to multiple implementation-level objects
- Object model is influenced by the choice of value representation and whether language
  is statically- or dynamically- typed
  - fully statically typed: fields in objects have known types
  - dynamically-typed: fields in objects have no type, can hold any value
  - gradually-typed: fields in object may have known representation (e.g. not an int)
  - specializing polymorphic languages influences object model
- Object model is heavily dependent on language's semantics
  - inheritance model (single-, multiple-, prototype)
  - static or dynamic fields or properties
  - method resolution (aka dispatch) rules

- Records
  - some languages don't have objects, but have records
  - records can be immutable or have some mutable fields
  - typically stored on the heap
  - can allow subtyping, both width (adding more fields) and depth (immutable field is a subtype)
  - records may or may not support dynamic type queries
  - object representation requires header for GC (at the very least)
  - can we observe uninitialized fields? no
  - can we race? only on mutable fields
  - Object models are usually built out of records underneath

  - The Java Object model
    - all objects inherit from java.lang.Object, have methods
    - operations common to all objects
      - monitorenter, exit
      - wait, notify
      - System.identityHashcode
      - finalization bit
    - classes inherit from only one superclass
      - can override (non-final) methods from a superclass and introduce new ones
      - signature of params/return must match exactly
    - classes can implement multiple interfaces
    - inner classes: source-level construct
      - java source compiler introduces hidden fields to point to outer class
    - allocating objects (new, newarray, anewarray)
      - acquiring and initializing memory
      - invoking constructors (invokespecial)
    - object layouts:
      - header field contains meta information, pointer to meta object
      - instance fields stored sequentially thereafter in memory (getfield, putfield)
      - static fields stored in globals (or as part of class metadata) (getstatic, putstatic)
      - metaobject stores: class, virtual table, i-table
    - arrays:
      - like objects, have the same header files
      - additionally have a length
      - are co-variantly typed in Java
        - requires a dynamic check on stores
      - accesses always bounds-checked against the length (aaload, baload, ... aastore, bastore ...)
    - implementing casts efficiently (checkcast, instanceof)
      - Linear search
      - Cohen display
      - Intervals
      - search for interface implementation
    - invoking methods
      - invokespecial, invokestatic
      - invokevirtual: virtual table lookup
      - invokeinterface: i-table search, typically with caching
      - invokedynamic: additional programmable binding

  - Invokeinterface
    - classes can implement multiple interfaces
    - Cohen display and interval approaches don't work
    - Fall back to linear search
    - Inline-cache: all-important concept
      - save a mapping of last-seen type to result, e.g. class B -> method m for a dispatch
      - can be used for many things
      - invented in 1984 for Smalltalk
      - engine can use statistics collected from ICs to inline later

  - Exceptions
    - Java programs can throw exceptions under various circumstances
      - safety check failures: NullPointerException, ClassCastException
      - arithmetic exceptions: DivideByZeroException
    - exceptions are just objects that implement Throwable
    - throwing an exception initiates a search for an exception handler (athrow)
      - JVM uses exception handler tables to find which method catches an exception
      - zero-cost: requires VM to do stack walking
      - stacktraces: JVM internally attaches a native-level trace, can materialize as objects
      
- SELF object model
  - dynamically typed, prototype-based language
  - objects have slots, each slot has a name
  - can add and remove slots dynamically
  - essentially one primitive operation: send a message to an object
  - if a slot is not found in an object, parent object(s) are consulted
  - objects are cloned and modified rather than instantiating classes

- JavaScript object model
  - prototype-based (https://mathiasbynens.be/notes/prototypes)
    - JavaScript programs often use prototypes to approximate classes
    - "state" are properties on objects, methods are functions on the prototype chain
    - failing a lookup in one object causes another lookup to be started in prototype
    - prototype chains are long for DOM objects
    - instead: use hidden classes to store prototype
    - gotcha: handling mutation of the prototype chain
  - uses the term "properties" which is more general than fields
  - properties are dynamically typed: slots don't have types
  - slots usually not smaller than a machine word
  - every object is an instance of a Function, an anonymous object literal, a function, a closure or Array
  - additional operations
    - typeof
    - get prototype
    - delete property
    - list properties
    - seal object
  - any object can have array-like behavior (.length, foreach, indexed by integer)
  - property names can be any JS value (implicitly converted to string)
  - properties can be added dynamically or deleted
    - even from the middle of an array
    - access of missing array element goes to prototype
  - slack-tracking: objects are over-allocated and then trimmed down
  - functions (and closures) are objects, have properties
  - arguments object is like an array
  - the global object
    - contains top-level variables in a script
  - JavaScript allows intercepting property accesses in several ways
    - defining setters
    - defining proxies
  - can we observe uninitialized fields? yes, as undefined
  - can we race? no threads, but async functions

- Implementing JavaScript with hidden classes and inline caches
  - V8 article on ICs for JavaScript (https://mathiasbynens.be/notes/shapes-ics)
  - using a linear search or a hashtable for every property access is too slow
  - many objects have the same number and names of properties
  - idea: track object shapes behind the scenes with "hidden classes" or maps
  - object does not need to have its hashtable, but shared map tells code where to find properties
  - use an inline cache to check against known shapes
    - example:
      var f = o.x;
    - inline cache stores K entries, where an entry can be of the form:
      entry = {shape, offset}
    - code for the access inline cache looks like
      lookup(o: Object, ic: InlineCache, propertyname: string) {
        for (i = 0; i < K; i++) {
          if (o.shape == ic.entries[i].shape) return o.properties[ic.entries[i].offset];
	}
	return o.hashtable.lookup(propertyname, ic); // ic might be updated
      }
    - essentially a search through the entries, looking for a matching shape, with a backup
    
  - hidden classes form a tree with transitions
  - example:
    function Foo(x, y) {
      this.x = x;
      this.y = y;
    }
    var x = new Foo(33, 44);
  - this creates hidden classes:
      HC{Foo_1}, HC{Foo_2}, HC{Foo_3}
    with transitions between them:
      HC{Foo_1} -(x)-> HC{Foo_2}
      HC{Foo_2} -(y)-> HC{Foo_3}
  - allocation of a new Foo() starts with HC{Foo_1}, then adding an "x" property transitions to HX{Foo_2}
  - deleting a property can backward transition (if last property deleted) or cause transition to "dictionary mode"


- WebAssembly GC object model
  - split into two proposals "function-references" and "gc"--both are done (Phase 4)
  - will be merged into the Wasm 3.0 standard
  - adds a three new families of reference types in addition to funcref and externref
    - functions, structs, and arrays
  - reference types have nullity built into the type
  - "funcref" and "externref" become shorthands for (ref null func) and (ref null extern)
  - allows first-class, statically-typed functions
    - (ref func #signature) where #signature is an index into the types section
    - in text format #signature can sometimes be inline (param ...) (result ...) or an alias
    - new instructions: call.ref and return_call.ref
  - structs are declared, fixed-size records
    - declared in the type section
    - can have a sequence of fields, each with a static type, mutability
    - in the binary format, fields are referred to by field index
    - in the text format, fields can have aliases
    - new instructions: struct.new, struct.get, struct.set
      - variations: struct.new_default, struct.get_s, struct.get_u
      - instructions have immediates that refer to declaration in type section
  - arrays are declared with a single, static element type
    - declared in the type section
    - has a single element type, mutability
    - length fixed at allocation time
    - dynamically-bounds checked accesses
    - new instructions: array.new, array.get, array.set, array.length
      - instructions have an immediate that refers to declaration in type section
      - variations: array.new_default, array.new_data, array.new_elem, array.get_s, array.get_u
  - all three kinds of declarations can have a declared supertype
    - struct can be a subtype of another struct if:
      - has at least as many fields (width subtyping)
      - each field has exact same type as supertype, if mutable
      - each field has a type which is a subtype of the corresponding supertype's field's type (depth subtyping)
    - array can be a subtype of another array if:
      - element is exact same type as supertype, if mutable
      - element is a subtype of supertype's element type, if immutable
    - function can be a subtype of another function type if:
      - result types are correspondingly subtypes of super-functions results (co-variant)
      - parameter types are correspondingly supertypes of super-functions results (contra-variant)
      - easy way to remember: results are co-, params are contra-
  - i31 is a new heaptype that allows storing 31-bit integers under the ref type hierarchy
    - implies that it is implemented as a tagged pointer by the engine
  - overall, type hierarchy now looks like this

valuetype ::= i32 | i64 | f32 | f64 | v128 | (ref null? <heaptype>) | shortref

heaptype ::= extern | exn | func | struct | array | i31 | <usertype>

usertype ::= #index | $alias

# previous types "<x>ref" are now shorthand for "(ref null <x>)"
shortref ::= externref | exnref | funcref | structref | arrayref | i31ref

- WebAssembly GC type canonicalization
  - modules must agree on types for data exchange
  - how do two modules refer to the same func, struct, or array type?
  - before Wasm GC, only term types and function signatures existing
  - glossed over a detail: all structurally-equivalent signatures are equivalent
  - Wasm GC extends the same to declaring structs and arrays
  - Two structurally-equivalent struct decls are considered equivalent if:
    - equivalent supertype (if any)
    - equivalent field types
  - Two structurally-equivalent array decls are considered equivalent if:
    - equivalent supertype (if any)
    - equivalent element types
  - definition is obviously recursive, what about recursive data structures?
    - two possible interpretations: equi-recursive and iso-recursive
    - equi-recursive: all unfoldings of the type are equivalent
    - iso-recursive: only isomorphic graphs are equivalent
  - partition types into *recursion groups*
    - entire recursion group is treated as a unit
    - A recursion group can be checked for isomorphism with another in linear time
      - simply compare external type indices and (relative) internal type indices
  - Final result: two different modules can repeat the same type definition,
      and the result is considered equivalent

- Algebraic data types
  - some languages allow defining discriminated unions, aka variants or sum types
  - list a fixed set of cases (sometimes confusingly called constructors)
  - can we observe unitialized fields? no
  - can we race? no

type colour =
  | Red
  | Green
  | Blue
  | Yellow
  | RGB of float * float * float;;

- Closures and lexical scope
  - languages like JavaScript, Python, Lisp, Scheme, ML, Haskell have closures
  - inner functions (lexically) can refer to variables in outer scopes
  - inner function may escape outer function's lifetime
  - normally, variables in a function are allocated in a stack frame
  - with escape, stack frames effectively need to be "reified" or allocated on heap
  - with lexical scoping, we can calculate a location (nesting depth, offset) for
    variables
  - dynamic scoping requires a runtime lookup
  - illustration of scopes in JavaScript
    function foo(x) {
      function bar(y) {
        return x + y;
      }
      return bar;
    }
  - at the bytecode level (such as internal V8 bytecode), the scoping level is usually
    explicit => source to bytecode translation does "closure conversion".
  - closure conversion may do some optimizations to avoid allocating closures that don't escape

- Dynamic scoping
  - dynamic scoping (as opposed to lexical) scoping is an object-model concern
  - so far, we haven't seen languages with dynamic scoping
  - JavaScript does have one type of dynamic scoping mechanism, "with" blocks
  - other languages (like Lisp) have dynamic scoping as a primitive
  - ultimately, new to reify part of the environment in a closure

- Optimizations for "static" object models
  - field packing
  - field reordering
  - field deletion / constant promotion
  - splitting objects
  - inlining objects
  - escape analysis

- Optimizations for "dynamic" object models
  - inline caches
  - slack tracking - presizing objects properly
  - field representation tracking (double unboxing)
  - holey array representations (JavaScript)
  - sparse array representations