Lecture 11 - Object models - So far, talked about code representations and executing code - Object models are about designing an efficient data representation for programs - critically impact the space used by programs - Space/time: - make all object operations as cheap (few instructions) as possible, including operations by the VM itself (like garbage collection) - make each field, object, class, etc the smallest possible representation - Common things among object models - objects are one or more contiguous regions of memory - sometimes classified as "scalars" and "arrays" - each object (usually) needs some kind of dynamic identification: "tag", "type", "header" - exact nature of that depends on language and GC - some languages allow *reflection* on an object's type (and thus members) - requires existence of metadata objects and access to them - source-level object might correspond to multiple implementation-level objects - Object model is influenced by the choice of value representation and whether language is statically- or dynamically- typed - fully statically typed: fields in objects have known types - dynamically-typed: fields in objects have no type, can hold any value - gradually-typed: fields in object may have known representation (e.g. not an int) - specializing polymorphic languages influences object model - Object model is heavily dependent on language's semantics - inheritance model (single-, multiple-, prototype) - static or dynamic fields or properties - method resolution (aka dispatch) rules - Records - some languages don't have objects, but have records - records can be immutable or have some mutable fields - typically stored on the heap - can allow subtyping, both width (adding more fields) and depth (immutable field is a subtype) - records may or may not support dynamic type queries - object representation requires header for GC (at the very least) - can we observe uninitialized fields? no - can we race? only on mutable fields - Object models are usually built out of records underneath - The Java Object model - all objects inherit from java.lang.Object, have methods - operations common to all objects - monitorenter, exit - wait, notify - System.identityHashcode - finalization bit - classes inherit from only one superclass - can override (non-final) methods from a superclass and introduce new ones - signature of params/return must match exactly - classes can implement multiple interfaces - inner classes: source-level construct - java source compiler introduces hidden fields to point to outer class - allocating objects (new, newarray, anewarray) - acquiring and initializing memory - invoking constructors (invokespecial) - object layouts: - header field contains meta information, pointer to meta object - instance fields stored sequentially thereafter in memory (getfield, putfield) - static fields stored in globals (or as part of class metadata) (getstatic, putstatic) - metaobject stores: class, virtual table, i-table - arrays: - like objects, have the same header files - additionally have a length - are co-variantly typed in Java - requires a dynamic check on stores - accesses always bounds-checked against the length (aaload, baload, ... aastore, bastore ...) - implementing casts efficiently (checkcast, instanceof) - Linear search - Cohen display - Intervals - search for interface implementation - invoking methods - invokespecial, invokestatic - invokevirtual: virtual table lookup - invokeinterface: i-table search, typically with caching - invokedynamic: additional programmable binding - Invokeinterface - classes can implement multiple interfaces - Cohen display and interval approaches don't work - Fall back to linear search - Inline-cache: all-important concept - save a mapping of last-seen type to result, e.g. class B -> method m for a dispatch - can be used for many things - invented in 1984 for Smalltalk - engine can use statistics collected from ICs to inline later - Exceptions - Java programs can throw exceptions under various circumstances - safety check failures: NullPointerException, ClassCastException - arithmetic exceptions: DivideByZeroException - exceptions are just objects that implement Throwable - throwing an exception initiates a search for an exception handler (athrow) - JVM uses exception handler tables to find which method catches an exception - zero-cost: requires VM to do stack walking - stacktraces: JVM internally attaches a native-level trace, can materialize as objects - SELF object model - dynamically typed, prototype-based language - objects have slots, each slot has a name - can add and remove slots dynamically - essentially one primitive operation: send a message to an object - if a slot is not found in an object, parent object(s) are consulted - objects are cloned and modified rather than instantiating classes - JavaScript object model - prototype-based (https://mathiasbynens.be/notes/prototypes) - JavaScript programs often use prototypes to approximate classes - "state" are properties on objects, methods are functions on the prototype chain - failing a lookup in one object causes another lookup to be started in prototype - prototype chains are long for DOM objects - instead: use hidden classes to store prototype - gotcha: handling mutation of the prototype chain - uses the term "properties" which is more general than fields - properties are dynamically typed: slots don't have types - slots usually not smaller than a machine word - every object is an instance of a Function, an anonymous object literal, a function, a closure or Array - additional operations - typeof - get prototype - delete property - list properties - seal object - any object can have array-like behavior (.length, foreach, indexed by integer) - property names can be any JS value (implicitly converted to string) - properties can be added dynamically or deleted - even from the middle of an array - access of missing array element goes to prototype - slack-tracking: objects are over-allocated and then trimmed down - functions (and closures) are objects, have properties - arguments object is like an array - the global object - contains top-level variables in a script - JavaScript allows intercepting property accesses in several ways - defining setters - defining proxies - can we observe uninitialized fields? yes, as undefined - can we race? no threads, but async functions - Implementing JavaScript with hidden classes and inline caches - V8 article on ICs for JavaScript (https://mathiasbynens.be/notes/shapes-ics) - using a linear search or a hashtable for every property access is too slow - many objects have the same number and names of properties - idea: track object shapes behind the scenes with "hidden classes" or maps - object does not need to have its hashtable, but shared map tells code where to find properties - use an inline cache to check against known shapes - example: var f = o.x; - inline cache stores K entries, where an entry can be of the form: entry = {shape, offset} - code for the access inline cache looks like lookup(o: Object, ic: InlineCache, propertyname: string) { for (i = 0; i < K; i++) { if (o.shape == ic.entries[i].shape) return o.properties[ic.entries[i].offset]; } return o.hashtable.lookup(propertyname, ic); // ic might be updated } - essentially a search through the entries, looking for a matching shape, with a backup - hidden classes form a tree with transitions - example: function Foo(x, y) { this.x = x; this.y = y; } var x = new Foo(33, 44); - this creates hidden classes: HC{Foo_1}, HC{Foo_2}, HC{Foo_3} with transitions between them: HC{Foo_1} -(x)-> HC{Foo_2} HC{Foo_2} -(y)-> HC{Foo_3} - allocation of a new Foo() starts with HC{Foo_1}, then adding an "x" property transitions to HX{Foo_2} - deleting a property can backward transition (if last property deleted) or cause transition to "dictionary mode" - WebAssembly GC object model - split into two proposals "function-references" and "gc"--both are done (Phase 4) - will be merged into the Wasm 3.0 standard - adds a three new families of reference types in addition to funcref and externref - functions, structs, and arrays - reference types have nullity built into the type - "funcref" and "externref" become shorthands for (ref null func) and (ref null extern) - allows first-class, statically-typed functions - (ref func #signature) where #signature is an index into the types section - in text format #signature can sometimes be inline (param ...) (result ...) or an alias - new instructions: call.ref and return_call.ref - structs are declared, fixed-size records - declared in the type section - can have a sequence of fields, each with a static type, mutability - in the binary format, fields are referred to by field index - in the text format, fields can have aliases - new instructions: struct.new, struct.get, struct.set - variations: struct.new_default, struct.get_s, struct.get_u - instructions have immediates that refer to declaration in type section - arrays are declared with a single, static element type - declared in the type section - has a single element type, mutability - length fixed at allocation time - dynamically-bounds checked accesses - new instructions: array.new, array.get, array.set, array.length - instructions have an immediate that refers to declaration in type section - variations: array.new_default, array.new_data, array.new_elem, array.get_s, array.get_u - all three kinds of declarations can have a declared supertype - struct can be a subtype of another struct if: - has at least as many fields (width subtyping) - each field has exact same type as supertype, if mutable - each field has a type which is a subtype of the corresponding supertype's field's type (depth subtyping) - array can be a subtype of another array if: - element is exact same type as supertype, if mutable - element is a subtype of supertype's element type, if immutable - function can be a subtype of another function type if: - result types are correspondingly subtypes of super-functions results (co-variant) - parameter types are correspondingly supertypes of super-functions results (contra-variant) - easy way to remember: results are co-, params are contra- - i31 is a new heaptype that allows storing 31-bit integers under the ref type hierarchy - implies that it is implemented as a tagged pointer by the engine - overall, type hierarchy now looks like this valuetype ::= i32 | i64 | f32 | f64 | v128 | (ref null? ) | shortref heaptype ::= extern | exn | func | struct | array | i31 | usertype ::= #index | $alias # previous types "ref" are now shorthand for "(ref null )" shortref ::= externref | exnref | funcref | structref | arrayref | i31ref - WebAssembly GC type canonicalization - modules must agree on types for data exchange - how do two modules refer to the same func, struct, or array type? - before Wasm GC, only term types and function signatures existing - glossed over a detail: all structurally-equivalent signatures are equivalent - Wasm GC extends the same to declaring structs and arrays - Two structurally-equivalent struct decls are considered equivalent if: - equivalent supertype (if any) - equivalent field types - Two structurally-equivalent array decls are considered equivalent if: - equivalent supertype (if any) - equivalent element types - definition is obviously recursive, what about recursive data structures? - two possible interpretations: equi-recursive and iso-recursive - equi-recursive: all unfoldings of the type are equivalent - iso-recursive: only isomorphic graphs are equivalent - partition types into *recursion groups* - entire recursion group is treated as a unit - A recursion group can be checked for isomorphism with another in linear time - simply compare external type indices and (relative) internal type indices - Final result: two different modules can repeat the same type definition, and the result is considered equivalent - Algebraic data types - some languages allow defining discriminated unions, aka variants or sum types - list a fixed set of cases (sometimes confusingly called constructors) - can we observe unitialized fields? no - can we race? no type colour = | Red | Green | Blue | Yellow | RGB of float * float * float;; - Closures and lexical scope - languages like JavaScript, Python, Lisp, Scheme, ML, Haskell have closures - inner functions (lexically) can refer to variables in outer scopes - inner function may escape outer function's lifetime - normally, variables in a function are allocated in a stack frame - with escape, stack frames effectively need to be "reified" or allocated on heap - with lexical scoping, we can calculate a location (nesting depth, offset) for variables - dynamic scoping requires a runtime lookup - illustration of scopes in JavaScript function foo(x) { function bar(y) { return x + y; } return bar; } - at the bytecode level (such as internal V8 bytecode), the scoping level is usually explicit => source to bytecode translation does "closure conversion". - closure conversion may do some optimizations to avoid allocating closures that don't escape - Dynamic scoping - dynamic scoping (as opposed to lexical) scoping is an object-model concern - so far, we haven't seen languages with dynamic scoping - JavaScript does have one type of dynamic scoping mechanism, "with" blocks - other languages (like Lisp) have dynamic scoping as a primitive - ultimately, new to reify part of the environment in a closure - Optimizations for "static" object models - field packing - field reordering - field deletion / constant promotion - splitting objects - inlining objects - escape analysis - Optimizations for "dynamic" object models - inline caches - slack tracking - presizing objects properly - field representation tracking (double unboxing) - holey array representations (JavaScript) - sparse array representations