Lecture 11 - Object models - So far, mostly talked about code and metadata representation - Object models are about designing an efficient data representation for programs - critically impact the space used by programs - Space/time: - make all object operations as cheap (few instructions) as possible, including operations by the VM itself (like garbage collection) - make each field, object, class, etc the smallest possible representation - saw this tradeoff with approach to implementing casts - Common things among object models - objects are contiguous regions of memory - sometimes classified as "scalars" and "arrays" - each object (usually) needs some kind of dynamic identification: "tag", "type", "header" - exact nature of that depends on language, GC - some languages allow *reflection* on an object's type (and thus members) - requires existence of metadata objects and access to them - source-level object might correspond to multiple implementation-level objects - Object model is influenced by the choice of value representation and whether language is statically- or dynamically- typed - fully statically typed: fields in objects have known types - dynamically-typed: fields in objects have no type, can hold any value - gradually-typed: fields in object may have known representation (e.g. not an int) - specializing polymorphic languages influences object model - Java object model - class-based - every object is an instance of a class or array, remembers its type - additional operations - monitorenter, exit - wait, notify - System.identityHashcode - finalization bit - class object involved in method calls - v-table and i-table - class object contains reflection metadata - arrays of declared fields, methods, interfaces - instance fields are statically-typed, statically allocated locations in the object - storage space for all instance fields is always there (can't be added or deleted) - methods are statically-typed, statically-allocated into v-table and i-tables - consideration: can we observe uninitialized (read only) fields? - yes, unfortunately - requirements on GC: allocate objects with zero-initialized fields - consideration: data races on fields of objects - Records - some languages don't have objects, but have records - records can be immutable or have some mutable fields - typically stored on the heap - can allow subtyping, both width (adding more fields) and depth (immutable field is a subtype) - records may or may not support dynamic type queries - object representation requires header for GC (at the very least) - can we observe uninitialized fields? no - can we race? only on mutable fields - Algebraic data types - some languages allow defining discriminated unions, aka variants or sum types - list a fixed set of cases (sometimes confusingly called constructors) - can we observe unitialized fields? no - can we race? no type colour = | Red | Green | Blue | Yellow | RGB of float * float * float;; - SELF object model - dynamically typed, prototype-based language - objects have slots, each slot has a name - can add and remove slots dynamically - essentially one primitive operation: send a message to an object - if a slot is not found in an object, parent object(s) are consulted - objects are cloned and modified - JavaScript object model - prototype-based (https://mathiasbynens.be/notes/prototypes) - uses the term "properties" which is more general than fields - properties are dynamically typed: slots don't have types - slots usually not smaller than a machine word - every object is an instance of a Function, an anonymous object literal, a function, a closure or Array - additional operations - typeof - get prototype - delete property - list properties - seal object - any object can have array-like behavior (.length, foreach, indexed by integer) - property names can be any JS value (implicitly converted to string) - properties can be added dynamically or deleted - slack-tracking: objects are over-allocated and then trimmed down - functions (and closures) are objects, have properties - arguments object is like an array - the global object - contains top-level variables in a script - JavaScript allows intercepting property accesses in several ways - defining setters - defining proxies - can we observe uninitialized fields? yes, as undefined - can we race? no threads - Implementing JavaScript with hidden classes - using a linear search or a hashtable for every property access is too slow - many objects have the same number and names of properties - idea: track object shapes behind the scenes with "hidden classes" or maps - object does not need to have its hashtable, but shared map tells code where to find properties - use an inline cache to check against known shapes - example: var f = o.x; - inline cache stores K entries, where an entry can be of the form: entry = {shape, offset} - code for the access inline cache looks like lookup(o: Object, ic: InlineCache, propertyname: string) { for (i = 0; i < K; i++) { if (o.shape == ic.entries[i].shape) return o.properties[ic.entries[i].offset]; } return o.hashtable.lookup(propertyname, ic); // ic might be updated } - essentially a search through the entries, looking for a matching shape, with a backup - hidden classes form a tree with transitions - example: function Foo(x, y) { this.x = x; this.y = y; } var x = new Foo(33, 44); - this creates hidden classes: HC{Foo_1}, HC{Foo_2}, HC{Foo_3} with transitions between them: HC{Foo_1} -(x)-> HC{Foo_2} HC{Foo_2} -(y)-> HC{Foo_3} - allocation of a new Foo() starts with HC{Foo_1}, then adding an "x" property transitions to HX{Foo_2} - deleting a property can backward transition (if last property deleted) or cause transition to "dictionary mode" - Prototypes and hidden classes (https://mathiasbynens.be/notes/prototypes) - JavaScript programs often use prototypes to approximate classes - "state" are properties on objects, methods are functions on the prototype chain - failing a lookup in one object causes another lookup to be started in prototype - prototype chains are long for DOM objects - instead: use hidden classes to store prototype - gotcha: handling mutation of the prototype chain - Closures and lexical scope - languages like JavaScript, Python, Lisp, Scheme, ML, Haskell have closures - inner functions (lexically) can refer to variables in outer scopes - inner function may escape outer function's lifetime - normally, variables in a function are allocated in a stack frame - with escape, stack frames effectively need to be "reified" or allocated on heap - with lexical scoping, we can calculate a location (nesting depth, offset) for variables - dynamic scoping requires a runtime lookup - illustration of scopes in JavaScript function foo(x) { function bar(y) { return x + y; } return bar; } - at the bytecode level (such as internal V8 bytecode), the scoping level is usually explicit => source to bytecode translation does "closure conversion". - closure conversion may do some optimizations to avoid allocating closures that don't escape - Optimizations for object models - field packing - field reordering - field deletion / constant promotion - splitting objects - inlining objects - escape analysis - Dynamic scoping - dynamic scoping (as opposed to lexical) scoping is an object-model concern - so far, we haven't seen languages with dynamic scoping - JavaScript does have one type of dynamic scoping mechanism, with - other languages (like Lisp) have dynamic scoping as a primitive - Hashtable design - A hashtable is an associative data structure; a key -> value mapping - A key can be almost anything: object, string, record, integer, double - Basic idea: reduce the key value to a smaller integer and use properties of the integer value to search internal storage - Perfect hashing: for a finite, fixed set of keys, a function that maps them uniquely onto a (dense) numerical range - can use an array after that - usually infeasible for the kinds of hashtables used in VMs - Most common kind of hashtable in a VM: string -> value - used in implementing dynamic environments - strings are often "interned" (also needs hashmap) - Hashtable can be customized depending on whether it supports all get, set, delete operations - Dealing with collisions: chaining and probing - Chaining: main array contains pointers to buckets - Probing: only main array, need strategy to check backup slots - Design considerations: - cache the hashcode in buckets (elements)? - Use a power-of-two main array or a prime?