Lecture 11 - Object models

- So far, mostly talked about code and metadata representation
- Object models are about designing an efficient data representation for programs
  - critically impact the space used by programs
- Space/time:
  - make all object operations as cheap (few instructions) as possible, including operations
    by the VM itself (like garbage collection)
  - make each field, object, class, etc the smallest possible representation
    - saw this tradeoff with approach to implementing casts

- Common things among object models
  - objects are contiguous regions of memory
  - sometimes classified as "scalars" and "arrays"
  - each object (usually) needs some kind of dynamic identification: "tag", "type", "header"
    - exact nature of that depends on language, GC
  - some languages allow *reflection* on an object's type (and thus members)
    - requires existence of metadata objects and access to them
  - source-level object might correspond to multiple implementation-level objects
- Object model is influenced by the choice of value representation and whether language
  is statically- or dynamically- typed
  - fully statically typed: fields in objects have known types
  - dynamically-typed: fields in objects have no type, can hold any value
  - gradually-typed: fields in object may have known representation (e.g. not an int)
  - specializing polymorphic languages influences object model

- Java object model
  - class-based
  - every object is an instance of a class or array, remembers its type
  - additional operations
    - monitorenter, exit
    - wait, notify
    - System.identityHashcode
    - finalization bit
  - class object involved in method calls
    - v-table and i-table
  - class object contains reflection metadata
    - arrays of declared fields, methods, interfaces
  - instance fields are statically-typed, statically allocated locations in the object
  - storage space for all instance fields is always there (can't be added or deleted)
  - methods are statically-typed, statically-allocated into v-table and i-tables
  - consideration: can we observe uninitialized (read only) fields?
    - yes, unfortunately
    - requirements on GC: allocate objects with zero-initialized fields
  - consideration: data races on fields of objects

- Records
  - some languages don't have objects, but have records
  - records can be immutable or have some mutable fields
  - typically stored on the heap
  - can allow subtyping, both width (adding more fields) and depth (immutable field is a subtype)
  - records may or may not support dynamic type queries
  - object representation requires header for GC (at the very least)
  - can we observe uninitialized fields? no
  - can we race? only on mutable fields

- Algebraic data types
  - some languages allow defining discriminated unions, aka variants or sum types
  - list a fixed set of cases (sometimes confusingly called constructors)
  - can we observe unitialized fields? no
  - can we race? no

type colour =
  | Red
  | Green
  | Blue
  | Yellow
  | RGB of float * float * float;;

- SELF object model
  - dynamically typed, prototype-based language
  - objects have slots, each slot has a name
  - can add and remove slots dynamically
  - essentially one primitive operation: send a message to an object
  - if a slot is not found in an object, parent object(s) are consulted
  - objects are cloned and modified

- JavaScript object model
  - prototype-based (https://mathiasbynens.be/notes/prototypes)
  - uses the term "properties" which is more general than fields
  - properties are dynamically typed: slots don't have types
  - slots usually not smaller than a machine word
  - every object is an instance of a Function, an anonymous object literal, a function, a closure or Array
  - additional operations
    - typeof
    - get prototype
    - delete property
    - list properties
    - seal object
  - any object can have array-like behavior (.length, foreach, indexed by integer)
  - property names can be any JS value (implicitly converted to string)
  - properties can be added dynamically or deleted
  - slack-tracking: objects are over-allocated and then trimmed down
  - functions (and closures) are objects, have properties
  - arguments object is like an array
  - the global object
    - contains top-level variables in a script
  - JavaScript allows intercepting property accesses in several ways
    - defining setters
    - defining proxies
  - can we observe uninitialized fields? yes, as undefined
  - can we race? no threads

- Implementing JavaScript with hidden classes
  - using a linear search or a hashtable for every property access is too slow
  - many objects have the same number and names of properties
  - idea: track object shapes behind the scenes with "hidden classes" or maps
  - object does not need to have its hashtable, but shared map tells code where to find properties
  - use an inline cache to check against known shapes
    - example:
      var f = o.x;
    - inline cache stores K entries, where an entry can be of the form:
      entry = {shape, offset}
    - code for the access inline cache looks like
      lookup(o: Object, ic: InlineCache, propertyname: string) {
        for (i = 0; i < K; i++) {
          if (o.shape == ic.entries[i].shape) return o.properties[ic.entries[i].offset];
	}
	return o.hashtable.lookup(propertyname, ic); // ic might be updated
      }
    - essentially a search through the entries, looking for a matching shape, with a backup
    
  - hidden classes form a tree with transitions
  - example:
    function Foo(x, y) {
      this.x = x;
      this.y = y;
    }
    var x = new Foo(33, 44);
  - this creates hidden classes:
      HC{Foo_1}, HC{Foo_2}, HC{Foo_3}
    with transitions between them:
      HC{Foo_1} -(x)-> HC{Foo_2}
      HC{Foo_2} -(y)-> HC{Foo_3}
  - allocation of a new Foo() starts with HC{Foo_1}, then adding an "x" property transitions to HX{Foo_2}
  - deleting a property can backward transition (if last property deleted) or cause transition to "dictionary mode"

- Prototypes and hidden classes (https://mathiasbynens.be/notes/prototypes)
  - JavaScript programs often use prototypes to approximate classes
  - "state" are properties on objects, methods are functions on the prototype chain
  - failing a lookup in one object causes another lookup to be started in prototype
  - prototype chains are long for DOM objects
  - instead: use hidden classes to store prototype
  - gotcha: handling mutation of the prototype chain

- Closures and lexical scope
  - languages like JavaScript, Python, Lisp, Scheme, ML, Haskell have closures
  - inner functions (lexically) can refer to variables in outer scopes
  - inner function may escape outer function's lifetime
  - normally, variables in a function are allocated in a stack frame
  - with escape, stack frames effectively need to be "reified" or allocated on heap
  - with lexical scoping, we can calculate a location (nesting depth, offset) for
    variables
  - dynamic scoping requires a runtime lookup
  - illustration of scopes in JavaScript
    function foo(x) {
      function bar(y) {
        return x + y;
      }
      return bar;
    }
  - at the bytecode level (such as internal V8 bytecode), the scoping level is usually
    explicit => source to bytecode translation does "closure conversion".
  - closure conversion may do some optimizations to avoid allocating closures that don't escape

- Optimizations for object models
  - field packing
  - field reordering
  - field deletion / constant promotion
  - splitting objects
  - inlining objects
  - escape analysis

- Dynamic scoping
  - dynamic scoping (as opposed to lexical) scoping is an object-model concern
  - so far, we haven't seen languages with dynamic scoping
  - JavaScript does have one type of dynamic scoping mechanism, with
  - other languages (like Lisp) have dynamic scoping as a primitive
  
- Hashtable design
  - A hashtable is an associative data structure; a key -> value mapping
  - A key can be almost anything: object, string, record, integer, double
  - Basic idea: reduce the key value to a smaller integer and use
    properties of the integer value to search internal storage
  - Perfect hashing: for a finite, fixed set of keys, a function that maps
    them uniquely onto a (dense) numerical range
    - can use an array after that
    - usually infeasible for the kinds of hashtables used in VMs
  - Most common kind of hashtable in a VM: string -> value
    - used in implementing dynamic environments
    - strings are often "interned" (also needs hashmap)
  - Hashtable can be customized depending on whether it supports all get, set, delete operations
  - Dealing with collisions: chaining and probing
    - Chaining: main array contains pointers to buckets
    - Probing: only main array, need strategy to check backup slots
  - Design considerations:
    - cache the hashcode in buckets (elements)?
    - Use a power-of-two main array or a prime?