design.txt -- design document Compiler and Symbol Table Info The compiler will build symbol tables into the run-time representation to avoid redundant representations. Also, dependencies must be represented so that when a class is redefined, all subclasses are invalidated. Globals are stored on atoms. Classes are simply global variables bound to class objects. Symbols must be searched to find a symbol name corresponding to a class. (We could store the name in the class, but there is the danger that the user might reassign the symbol's value.) STRUCTURES Symbol is a Node with the following fields: name: a pointer to an FString, the name of the symbol value: the global variable binding function: the global function finding Array is a Node. Arrays can be extended using an append operation. Since Nodes are not naturally relocatable, Arrays are implemented as an Array node with a pointer to the actual data. To extend the array, the data is copied to a larger block of memory. The Array node has the following fields: length: the length of the array data: pointer to a Node Since all nodes have a built-in length count, we can tell how much space is allocated in the data. The array length is different so that we can allocate extra space to allow for growth. It is not necessary to reallocate the array every time its length changes. When memory is allocated to accomodate a growing array, the new allocated size will be some factor, making the average cost of extending an array linear rather than quadratic. FClass is an Object with these fields: class: the class class super: the superclass symboltable: symbols object symbols: vector of atoms slots: size of instances (this is the number of member variables, including inherited members, which in turn include the class pointer) Symbols is a Dict mapping symbols to class members (vars and methods) Every FClass points to an instance of Symbols lookup: symbol -> ismethod, index, params insert_method: symbol, params -> void insert_var: symbol -> void methods: vector (a Node) of method pointers Frame is a Node (not an Object) with these fields: tos: top of stack, tells how many stack elements in frame (during procedure calls) pc: remembers next instruction to execute (set by a call instruction) prev: previous frame (a Frame) self: the object for this frame (an Object) method: the method for this frame (a Method) parameters: any number of slots for function parameters (the number of parameters is given by the method's signature) optional parameters keyword parameters rest array pointer (if any) dict dictionary pointer (if any) locals: any number of slots for local variables (the number of locals is given by the method's stack_offset) stack: during calls, the expression stack is stored in the frame (the size is given by tos) Method is a Node (not an Object) with these fields: literals: Array of literals used in the code code: instructions (a Code) name: name (a symbol) of method parameters: 16-bit integer: how many positional parameters optionals: 16-bit integer: how many optional parameters keywords: 16-bit integer: how many keyword parameters (these parameters are in the symbol table; this number is used to tell which of the symbols are keyword parameters.) rest_flag: 8-bit boolean telling whether additional parameters are passed by array dict_flag: 8-bit boolean telling whether additional keyword parameters are passed in a dictionary default: default values (in an Array) for optional and keyword parameters line_nos: line numbers corresponding to instructions (a Code) stack_offset: location in stack frame to store copy of stack frame_size: how big should the frame be (including parameters, locals, and expression stack area) symbols: symbol table (a Locals) for this method: class: class to which this method belongs (a Class) Locals is a Dict containing parameters and local variables: lookup: symbol -> index (or -1 for not found) insert_var: symbol, index -> void ATOMS/SYMBOLS A Symbol is a lisp-like atom. There is a hash table in the Machine class that maps strings to symbols. Symbols contain the string, a function pointer, and a value pointer. Because atoms are common, we expect code to be full of them, but this creates a problem for the loader, which must somehow make sure that an instruction that references an atom resolves to the right address for the atom. Our solution is to make a table of atoms per compilation unit. We don't know at this point exactly what a compilation unit is (a file, a set of files, a class, ...), but we will assume that a compilation unit is at least a file. Originally, we used class as the compilation unit and used a per-class table of symbols. This created a problem for global functions, which have no class. Therefore, we will make an atom table for each file and share the table among all methods in the table. Every method will have a pointer to the atom table. Code that references an atom will actually use an integer offset into the atom table. INSTRUCTION SET Instructions are 16-bit "byte codes" for a stack machine. The machine uses a small expression stack (rather than registers). To avoid the problem of stack overflow and/or preallocating a large object for the stack (which might also create problems for garbage collection), there is no large linearly addressed call stack. Instead, every stack frame is an independent object. Any values on the expression stack are copied to the current stack frame before a call (to avoid the problem of expression stack overflow). The compiler is responsible for allocating enough space in each stack frame to hold the largest possible expression stack. The expression stack size can be determined statically, and the size is zero if a function does not call any other functions. Instructions have a high-order, 4-bit "op-code" and a 12-bit "argument" field. The op-codes are as follows: The instructions types are the following: operation: the argument selects an operation, e.g. add or call load/store global: the argument selects an atom load/store instvar: the argument is offset into instance load/store local: the argument is offset into frame load small signed integer: the argument is a 12-bit signed int load literal: the argument is a 12-bit offset into a literals array load atom: the argument is a 12-bit offset into the symbols array local frame: the argument is a 12-bit offset into the symbols array and names the method. This invokes a method in the current object. frame: the argument is a 12-bit offset into the symbols array. This is like local frame except the method is invoked in the object on the expression stack rather than the current object. non-local load/store: the argument is a 12-bit offset into the symbols array, naming the field, which must be looked up in the object's class. Note that there is no "load character literal", although a char would fit into 12 bits with room to spare. I decided the "load literal", which loads a value from a table of literals would be more general. Storing characters in the literal table is perhaps wasteful, but chars are not a frequently-used data type, so the overall cost should be low. Note also that there are no character types per se, but there are 1-character strings. METHOD INVOCATION This has become complicated. Here are some issues leading to the design. 1) expression stack: there is an expression stack, which provides a cheap place to store intermediate results of expressions, but we don't want the expression stack to overflow, and we don't want to allocate a huge expression stack. The expression stack could get huge if we evaluated argument lists on the stack -- a procedure could have very many arguments, e.g. LIST could take hundreds of arguments to build a big list of constant values. Also, a call could have nested procedure calls and be recursive. Therefore, the expression stack cannot hold all arguments; it should compute one at a time, and the argument should then be stored somewhere. Secondly, the expression stack should be emptied before invoking a method and restored upon return of the method to support recursion. I think of the stack as a set of registers, so saving/restoring the stack is like saving and restoring registers in a register-based architecture. 2) methods support several kinds of parameter passing. All parameters are passed by value, but mutable "values" and the ability to return structures allows various ways to work around the absence of call-by-reference. The most common parameter passing is required positional parameters. There are also keyword parameters, optional parameters, and the ability to pass in a variable number of parameters. The variable case is handled by passing in a vector of actual parameters as a single argument. Any optional or keyword parameter can have a default value. 3) because method lookup can be dynamic, the caller does not know what style of parameter passing will be used. Solutions: There has to be a place to place actual arguments while the argument list is being evaluated. Therefore, a stack frame is allocated before the first argument is evaluated. As each argument is evaluated, the argument is stored into the stack frame. Note that if there are nested function calls, you could be in the middle of computing a stack frame when you need to begin calling another function, requiring a different stack frame. In this case, you will end up with multiple stack frames (!) allocated. The stack frame pointers are simply pushed on the expression stack, and we assume that expressions are not nested so deep that this will be a problem. Also, we allocate frames from the heap, so we can allocate multiple frames and make them the top-of-stack in sequence without copying. (The static nesting level, not the dynamic level, determines the stack requirements.) After the arguments are evaluated, the frame actually becomes the top of the stack (frames are allocated from a heap, not from a linear memory region). The expression stack is stored into the caller's stack frame. Thus, the caller's frame does need room for some temporaries, but the compiler can figure out the specific number of temporaries (zero if the method does not call another method) and allocate just enough space for that particular method. Upon return, the expression stack will have one value -- the return value -- the expression stack is restored from the caller's frame and the return value is moved to the top of the stack. Processing continues in the caller. Since the caller does not know how many arguments are expected by the callee, or whether the callee wants a vector of arguments or individual arguments, this must be handled at runtime. (An alternative would be pass all parameters in vectors, but this would require an extra allocation and free operation per call and a special set of op-codes to load and store parameters as opposed to local variables. It's not a bad idea though. Instead, we will use relatively clever instructions that "hide" the callee's protocol from the caller. This will now be described. The basic structure for a call is: push the object (unless it is implicitly "this") create a frame -- this single instruction, either instr_local_frame which pushes "this" or instr_frame. Both types of instructions include the index of the atom that names the method. The method is then looked up in the object's dictionary (the object is on the stack). The method is an object with code, debugging info, etc., and includes the number of required parameters, the number of keyword parameters, the number of positional parameters, and whether the rest of the parameters should be passed as a vector. An exception is also raised if there is no receiving method found by the method lookup. Based on the method's parameter count value, the caller sets up the frame and possibly an array for extra parameters, and an offset into the object. These two things are stored on the expression stack, so the stack contains (from top down): index into object for the next parameter (or -1) the new frame pointer Class instantiation is a special case -- the compiler doesn't know whether a symbol denotes a class or function, so this is determined at frame creation time. What we want is to create an instance and then pass parameters to the init() method. If the class were actually a global function, here's what it would look like: def MyClass(p1, p2): var instance = new(MyClass) instance.init(p1, p2) return instance I think we can fake this as follows: Modify the compiler to generate extra instructions at the head of init() methods to allocate an instance and store the instance pointer into "this" Modify the return instructions in init() to return "this" Modify compilation of class to store a pointer to the init() method as the function pointer on the class's symbol. If there is no init() method, then generate a little method that just allocates and returns an instance of the class. initialize parameters -- if there are default values, these are copied from the method to the frame. Other values are initialized to NULL. for each parameter (left to right) code to evaluate the parameter param -- this instruction stores the top of stack into the frame or appends it to the array in the first local slot of the frame, OR... keyparam -- this alternate form of param expects a keyword (a symbol) and a value on the stack. It looks up the symbol in the method's symbol table (mapping symbols to locations in the stack frame). If the target is found and if it is actually a keyword parameter (determined by comparing the location to the range of locations allocated for keyword parameters), then the value is written to the frame, overwriting the default value for parameter. call the method -- this instruction expects an index and new frame pointer on the expression stack. It pops the index. It remembers the new frame. The rest of the expression stack is copied into the current frame, using an offset stored in the current method. (The size of the stack is also stored so we know how much to restore.) Once the expression stack is saved, the new frame is made the current frame, the PC is stored in one of the frames (the caller's frame would make sense) and the PC is set to the entry point in the called method. Return is much simpler: the method leaves one value on the expression stack (the return value). The previous frame is located by following a pointer in the current frame. From the previous frame, the expression stack is restored and the return value is placed at the top of the expression stack (it isn't left on the bottom). The previous PC and previous frame become current, and execution resumes in the previous method's code. The virtual machine is written in C++ and is run by calling the Machine::execute() method. This method is not recursive, i.e. the C++ stack is not used when a new Serpent method is called, and execute() is not exited when a Serpent method returns. Therefore, we need some other mechanism to cause execute to return. The mechanism is a frame with a NULL program counter. After returning from a procedure, if the machine PC is NULL, the machine returns. Thus, to call a Serpent method from C++, you allocate a frame with a NULL PC. Then simulate a procedure call. The methods defined in call.cpp provide a convenient interface for creating a frame, passing in parameters, and then running a function. FUNCALL To call a function named by an atom: compiler treats funcall() as special. After evaluating the first argument (which should evaluate to an atom), execute OP_FUNCALL, which pushes NULL as the "object" and calls op_frame to create a frame for the remaining arguments. These are evaluated and stored with OP_PARAM into the frame. Finally, OP_CALL calls the function. SEND Send works by taking an object and a method selector (an atom), followed by arguments. Like funcall, this is handled specially by the compiler. After evaluating the first two arguments, execute OP_SEND, which removes the atom from the stack and calls op_frame to create a frame for the remaining arguments. As with funcall, parameters are stored with OP_PARAM and finally the method is called with OP_CALL. Garbage Collection Use the write-barrier technique. Initially, objects are marked black. We want to find all reachable objects and mark them white. Objects that are in the process of being marked white are marked gray. The invariant is that no white object points directly to a black object. (But a white object can point to a gray object and a gray object can point to a black object.) When there are no gray objects left, then we know that no white objects point to black objects, so all black objects can be freed. During the marking phase, new allocations are white. Any store must be checked: if storing a ptr into a white or gray object, if the pointer is black, mark it gray to make sure its pointers are marked later. Each gray object in this phase is marked white after marking the objects it references gray (or leaving them white). During the scanning phase, where white objects are converted back to black and black objects are freed, the write checking should be disabled because storing a black object (that has just been converted by the scan from a white object) into a white object (that will be converted into a black object) is OK. During the scanning phase, new allocations should go on the gray list. If new allocations were black, they could be freed during the scan. If new allocations were white, we might store black objects into them after GC. Then in the next GC cycle, the white object would appear to be already marked, so the black objects it references would not be marked and they would then be freed. Also, marking new things white during scan violates the "initially, objects are marked black" rule. After scanning, there should be another phase to change gray (recently allocated objects) into black. (See Freeing Stack Frames for more information.) Should we scan objects to locate and free black ones? We could keep all objects on a linked list, but we really need objects sorted by size. There could be multiple lists, one per size. Actually, there would be 4 lists per size: the free list, the black, white, and gray lists. During GC, objects would move from black to gray to white. (We need the gray list so that we can scan gray objects.) When gray is empty, we can move all black to free. In this scheme, we need to scan all different sizes, but we don't have to scan the final white list changing back to black because we can just renumber the colors. The alternative is to not move objects between lists. Suppose we keep the size of each object at the beginning of the object and we allocate objects sequentially from memory blocks. Then we can scan all objects by going through each memory block from beginning to end. We need a separate list or table (stack?) of gray objects. Storage analysis: S different sizes N objects Assuming we can squeeze the color into a spare byte somewhere: For lists: 2N + 4S (doubly linked lists + table for sizes) For scanning: N + S (size field + free lists) Timing analysis: L time to move object between doubly-linked lists I time to insert and remove from singly-linked list/stack S time to scan an object M objects (including free ones) N links F freed objects For lists: allocate object: L GC = 2L(M-F) -- each reachable object is moved from white to gray to black For scanning: allocate object: 0 GC = I(M-F) + MS + IF -- each reachable object is put on/off gray list. Every object is scanned. Freed objects are put on free list. What are values of L, I, S? L: x->prev->next = x->next 3 x->next->prev = x->prev 1 x->next = y->next 2 y->next = x 1 x->next->prev = x 1 x->prev = y 1 = 9 load/stores I: x->next = stack stack = x stack = stack->next = 4 load/stores S: ptr += ptr->size if (ptr->color) ... ptr->color = black = 3 loads/stores + branch = 7 lists: 2L(M-F) = 18(M-F) scanning: I(M-F) + MS + IF = 4(M-F) + 7M + 4F = 11M 11M = 18(M-F) 18F = 7M F = (7/18)M i.e. when free list is around half, linked lists work better by avoiding scanning for GC of 100,000 objects, we're talking about a difference of 18(M-F) - 11M = 7M-18F: max is 7M, min is -11M 11M = 1,100,000 loads/stores, which at 100M/s = 10ms Conclusion: linked lists impose a space overhead of at least one pointer and may not be faster. The scanning method seems simpler so we should use it, but perhaps we can make the marking interface abstract enough to allow a change in representation someday. GC GLOBALS We have to mark everything starting with the "globals" that can be reached directly. The globals include: symbols -- the global symbol table expr_stack -- the expression stack frame -- the current stack frame GC ALGORITHM (Remember that initially, all is black) gc_start: set initial_color to white node_make_gray on symbols and frame node_make_gray on each object on expr_stack gc_mark: take object off gray_list, set color white, select on type, and if any referenced object is black, make it gray (repeat gc_mark until gray_list is empty) gc_mark2: mark the stack again. If gray_list is not empty, repeat gc_mark, otherwise proceed to gc_mark3 gc_mark3: mark_white() to take care of integers, characters, etc. gc_sweep: scan through all pools, freeing anything that is black during this phase, new allocations go on gray list gc_sweep2: initial_color = black change gray objects to black objects gc_final: FREEING STACK FRAMES After implementing and measuring garbage collection, I discovered that some undesirable behavior is possible: if there are many short subroutine calls, then many stack frames are allocated. Perhaps the worst case is a sort routine where two stack frames are allocated to run a simple comparison function. Stack frames are relatively large, so it is possible to allocate a huge amount of space before the GC frees any stack frames. This led, in practice, to over 10MB of storage for stack frames waiting to be freed, and dominated the memory requirements for the program. Note that more than half of this space would have been saved if the compiler could keep track of the maximum number of slots needed to save the stack. For these measurements, 16 slots were allocated and rarely used, wasting as much as 64 bytes. A typical frame size is 96 bytes. There are two obvious solutions: (1) have return instructions free stack frames. This has the disadvantage that explicitly freeing storage is not part of the current GC model, and it rules out the possibility that someone (a debugger or closure) might be holding a reference to the stack frame even after the function returns. (2) implement generational scavanging to quickly return stack frames to free lists. This has the problem of even greater complexity and perhaps more storage requirements for additional data structures to keep track of generations. For now, I will implement (1), explicit freeing of stacks. It would be nice to keep these nodes completely separate from GC, but stack frames point to other data and must be traversed in the marking phase. So let's assume that frames are part of the normal heap. After implementing (1) (and it works!), I discovered that a significant number of stack frames are still allocated because frames are sometimes gray when a function returns. These do get collected eventually, but you can still end up with kilobytes of useless stack frame-sized memory chunks sitting on free lists. Given that you are going to have wasted stack frames on freelists, an even better solution (I think) would be to have explicit lists of stack frames from which to allocate. These would always be reachable and would always be marked, and would never appear on the free list. A disadvantage in terms of speed would be that frames should be cleared when they go on the stack or else things they reference would be marked. Alternatively, the mark algorithm could be modified to treat these frames specially. Perhaps frames could have an "in use" bit to tell the marker when to mark a frame. Assume that the OP_RETURN instruction frees the current frame. The frame is one of: Gray -- GC is running and the frame is on the Gray list. If we free it, then the Gray list will be corrupted. We cannot remove it from the Gray list, so we should not free it now. The GC will not free it either on this cycle, but the next GC will claim it. Black -- either GC is not running or frame has not been marked yet. It is safe to put node on free list, except that there may be one or more Gray stack frames that point to this one. If we free it, then we will be in trouble when later the GC tries to mark it. A solution is to make a special case: Do not mark (make gray) nodes that are already free. As a sanity check, the mark phase should never reach a free node except through the previous frame pointer of a stack frame. Note also that if we free this node, it could be reallocated. Then we will have a reference from a Gray stack frame to this newly allocated memory. Of course, the newly allocated memory will be White, so it doesn't change things whether we mark it or not. White -- GC is running and the frame will not be collected. It is not on the Gray list. It is safe to free it. Now, consider what happens with the debugger: The debugger gets a pointer to the stack frame of the offending routine. These frames are reachable via the debugger stack frame so they will not be collected. No OP_RETURN instruction will be executed within these frames, so they will not be explicitly freed. This is the right behavior, so we do not need any special handling to keep a handle on stack frames. Any code that keeps a handle on a stack frame and returns from it, for example code that creates a closure, will have to make sure the return instruction does not free the frame or the frames it references. This could be a big problem, but right now there is no way for this problem to arise. Another potential problem is that stores into the frame pointer of the virtual machine are not checked. Here's the scenario: GC starts on a tall stack and the top frame is marked at the beginning (because the frame pointer points to it). The machine rapidly returns a few levels. The original top frame is freed, so it will not be marked. The inner frames of the stack have nothing pointing to them, so they'll be black. Local variables will not be marked and even the stack frames might be freed. To solve this, we need to make sure that, before GC completes, the stack is completely marked, but we still have to do this incrementally because the stack is too big to mark atomicly. One solution is when we return, if the top frame is BLACK, mark it GRAY. This will force the GC to mark it and all stack frames lower in the stack (frames are linked via the FRAME_PREV field). Alternatively (and this is the implementation), we can have the GC check the frame field of the Machine at the end of the mark phase. If the top frame is black, mark it GRAY and continue marking. This is similar to how we handle the expression stack -- it gets rescanned every time the GRAY list becomes empty until there is nothing left that is GRAY. STORAGE The storage layout is like this: TAG -- 1 byte COLOR -- 1 byte SIZE -- 2 bytes NEXT -- 4 byte pointer to next gray or free object Most objects are basically vectors of pointers, so GC can scan them all the same way. Exceptions include longs, doubles, strings, and code. DATA STRUCTURES For real-time, a limiting factor is dealing with large arrays. A 133MHz Pentium system copies 1GB in 33s, so the copy rate is 30MB/s, or 30KB/ms, or 8K pointers/ms. It would be nicer to use incremental structures such as lists, but lists have the problem of linear time to access them. This time would be greater than the copy time measured above. Therefore, the most general structure seems to be a variable length array, implemented by a block of contiguous memory, and expanded by copying to a larger chunk of memory. There should be an expand operation that gives an explicit size. Otherwise, arrays should grow by some multiplicative factor when data is appended and an overflow occurs. In order to expand without changing addresses, arrays need a header with a pointer to the actual data. In addition to the array type, we need dictionaries. A dictionary will be some sort of dynamic structure with log insert, lookup, and delete time. Hash tables are another possibility, but my concern is the time to rehash if the table grows too large. For now, the dictionary is just an array of alternating keys and values, and linear search is used for lookup. Strings: zero terminated. Nothing fancy. Longs: keep a set of preallocated longs in a table. Otherwise, allocate them dynamically. All "longs" are 64-bit integers. The implementation uses 32-bit ints for things like array size, stack size, frame offset, etc. for efficiency. Doubles: similar to longs, but nothing preallocated. Boolean: use symbol instead. The symbols are 't' and NIL. Symbol: implemented as a dictionary. The key in the dictionary is also stored in the symbol object, which also stores a global variable value pointer and a global function pointer. File: a standard FILE* pointer stored in a node. An exception is that standard input and standard output might be implemented using a text control in a graphical user interface, for example, so there must be a way to circumvent the restriction to FILE*. In serpent, all output goes through the Machine::put_string() method which checks to see if the FILE* is equal to (FILE *)(1). This is a special pointer value used to denote standard output. If so, the string is printed by calling (*Machine::put_stdout)(). The put_stdout() method is a pointer, so you can modify Machine to provide any type of standard output. Standard input works the same way. The initial value of put_stdout is a function that prints to machine->mach_stdout, and the default value of this file is C's stdout. To send stdout to another file, replace mach_stdout with a new file pointer. To send stdout to an object that's not a FILE *, replace put_stdout with a new function pointer. In general, put_stdout should call fputs unless the file pointer is a special value, e.g. 1, and mach_stdout should be (FILE *) 1. (That way, you can still redirect to another file by storing the file pointer into mach_stdout.) Most output from programs (e.g. the "print" statement) goes to the file stored in the symbol 'stdout'. If this value is not of type FILE, e.g. if the user sets it to nil, it is reset to the value of mach_stdout and an error message is written. Example: print "hello world" calls op_print calls stream_print calls put_string_stdout calls put_string on get_stdout() put_string calls fp->write(string, len) Thus fp depends on get_stdout() which returns file from s_stdout s_stdout is initialized to Stdio, a subclass of Vfile and Stdio's flush (the actual output) calls: *(machine->console_puts) if it is non-nil stdlog->puts(), stdlog->flush() if stdlog is non-nil machine->console_puts is initialized to wxs_print_string, which calls wxs_print_string_1 (if WXS_STDOUT_WINDOW) and fputs(s, stdout) stdout_print_string, which calls fputs(s, stdout) machine->mach_stdlog is initialized using log_enable(filename) Note: srplog should be eliminated and log_enable() should be called by including debug.srp unless dbg_nolog is set. CODE GENERATION: Consider the following: class stack: var stack var top def init(): top = 0 stack = []; def push(item): append(stack, item); When stack: is read, create a new class named stack with no super: Add stack to classes and create a new class object When var stack is read, create a new variable in stack: Add stack to the class's symbol table with an offset of 0 Similar for top When init is declared: Create a new method object. Make the signature 0. When top = is read: remember the variable. When 0 is read: write code to push 0 (small int) Write code to save value into top. When stack = is read: write code to push new array. Write code to save value into stack. Write return statement. (note that "id=" is treated differently than "=") When push is declared: Add push to methods dictionary Add item to locals at offset 0 Make number of parameters 1 When append is called: See if append is declared as a local or inherited method Since it isn't, assume that it is global: make a global call after pushing stack and item RETURN VALUES Every statement yields a value. An expression can be a statement. The value of the last expression is returned from a function if no RETURN statement is encountered. To implement this, NULL is pushed on the expression stack when a method is called. Every statement begins with a POP and then pushes its value on the stack. Procedures must end with a RETURN instruction so we don't "fall off the end". This gets more complicated with compound statements. Consider the following: def fn(): if expr: a return b and def fn(): if expr: a else: b and def fn(): if expr: a In the first case, b is returned, so there is no need to worry about the value of the if statement. In the second case, the complete statement lists inside the "then" and "else" parts generate the function return value regardless of the value of the boolean expression. In the third case, if expr is true, a is returned, but if false, then NULL must be pushed and returned. Let's figure out what the code should look like for the third case: [evaluate expr] jmpifnot L1 [evaluate a] L1:jump L2 null L2:return Alternatively: REPLNULL -- new opcode to replace top of stack with NULL [evaluate expr] jmpifnot L1 [evaluate a] L2:return To simplify code generation, we'll use the second approach. METHOD/PROCEDURE CALLS A problem is knowing what we're calling at compile time: A method could be local, but we just haven't gotten to the declaration yet -- this could be solved with forward declarations but do we really want these? Another case is virtual methods: they aren't declared except in subclasses that we haven't seen yet. Again, declarations are required to resolve this at compile time. Also, do we need/want to declare all globals routines before their use? What if there are cross-references between different compilation units? Probably a better approach is dynamic lookup at runtime. So then there are only two kinds of call: a method call where an object is evaluated, then parameters, then a method is named by indexing into the class's atom table. The object is searched for the named method. If none if found, an error is raised. In the other type of call, there is syntactically no object specified, so we either use this or we execute a global routine. Global routines are like methods, but their "this" pointer is NULL. After other params are evaluated, we search current object for method matching the name. If nothing is found, we search the global routines for a match. If nothing is found there either, we raise an error. BUILT-IN FUNCTIONS Consider "open()", a built-in global function returning an opened file object. There could be a method named "open", so we must call it as if it is a non-built-in function. A special value is stored as the method: a Node with a pointer to a function. BUILT-IN CLASSES AND METHODS Built-in classes have certain methods. It will be illegal to declare these methods without a matching number of parameters. This will allow the compiler to check the parameter count and pass the parameters on the stack to a function just by emiting a particular op-code. The op-code checks the object (first paramter) type. If it matches, the built-in method is performed. If not, the parameters on the stack are moved to a frame and the method is invoked in the normal fashion. FOR LOOPS Compiling for i = expr1 to expr2 by expr3: stmts declare i as local variable if not already declared expr1 store to local i expr2 ; leave target on stack expr3 ; leave increment on stack OP_NIL ; default value for loop jmp looptest looptop: pop ; get rid of value from previous iteration stmts ; leave last value on stack loopincr i ; this instruction increments loop var and puts in stack looptest: compare loopcount and branch to looptop ; stack is returnval, increment, finalcount, loopcount ; this instruction compares stack[-3] to stack[-2] ; if less or equal, branch to looptop breaktohere: fordone ; moves stacktop to stack[-3], then pop3 Compiling for i in a: stmts declare i as local variable if not already declared push a push 0 REPLNIL -- default value in case of no iterations jmp looptest looptop: store local i pop stmts increment looptest: compare tos to length of next on stack, push a[tos] and branch to looptop breaktohere: pop Compiling while expr: stmts loopstart: expr jmpifnot endloop smts jmp loopstart STREAMS A common data type is an array of strings. Since arrays can be extended, this is an easy way to build long strings, for example to implement repr(). The flatten() function converts this to a single string by concatenating all members. A print function exists to print one of these arrays of strings (for internal use only for now). "SHORT-CIRCUIT" CONDITIONALS The operators "and" and "or" evaluate only until a false or true operand is found. An example nested expression is (A and B) or C: [evaluate A] jmpifnot orbranch [evaluate B] jmpif truebranch orbranch: [evaluate C] jmpifnot falsebranch No elegant, simple algorithm is popping out at me, and I thought of a fairly simple way to deal with these expressions, having the property that expressions always leave a value on the stack, which means there is no special case for, say, a = b or c and d. The trick is to make special conditional branch instructions that support the conditional operators. For example, (A and B) or C: [evaluate A] cndand lab_1 [evaluate B] lab_1: cndor lab_2 [evaluate C] lab_2: In this scheme, "cndand" jumps to the label if the top of stack is false, without popping the stack. "cndor" jumps to the label if the top of stack is true, without popping the stack. This executes a few more instructions than necessary, but simplifies code generation. DEBUGGING Every method has an (optional) array of shorts giving the line number corresponding to each instruction. This doubles the code size, but is simple to implement. Imagine how great it will be when this becomes a real problem! Method names can also be retrieved at run-time, so it is easy to print where any run-time error occurs. Beyond that, we need a stack trace and it would be even better to examine the contents of stack frames. The initial step in this direction will be a simple debugger, implemented in Serpent itself. The mechanisms include the following: When a run-time error is raised, the error routine will look for a global procedure named runtime_exception. This procedure should take two arguments: a string describing the error and a pointer to the current stack frame. Machine::execute() returns when this procedure returns. Within the runtime_exeception procedure, the following built-in functions will be useful: frame_previous(frame) -- returns the previous frame frame_variables(frame) -- returns a dictionary containing all local variable bindings, including parameters frame_pc(frame) -- returns the pc for a frame frame_method(frame) -- returns symbol naming the frame's method frame_class(frame) -- returns symbol naming the frame's method's class runtime_exception_nesting() -- returns integer count of the recursive nesting level of this instance of runtime_exception (initially 1) A simple debugger has been defined with the following commands: -- print value of identifier - -- go down a stack frame + -- go back up a stack frame ? [] -- print a stack trace, optionally limited to number n levels = -- compile and evaluate expression . For now, note that expressions cannot be evaluated in the context of this frame, so we cannot compute expressions of variables in the frame. * -- print the values of variables in the current frame NESTED DEBUGGING The machine will keep a count of how many debugging levels have been entered. It will simply return after level MAX_DEBUG_LEVEL, e.g. 10. The nesting level can be retrieved with the builtin function runtime_exception_nesting(). Since there is no exception-handling facility (yet), there is no way to "catch" errors or resume at any particular point in the program. Instead, errors cause the stack to unwind all the way back to some place where errors are explicitly reset. This happens now in the compiler when the compiler is compiling an "interactive" file, i.e. it is taking input from a command line interface. The "interactive" flag is a parameter the caller can set. The error_flag is also reset by the OP_EVAL instruction, which returns true if execution completed without errors and false otherwise. Alternatively, in a windowing system like wxserpent, all code is run from WX Windows callbacks, and at this level, errors should be reset. Errors are also reset by the error routine if the debugger is invoked. After the debugger completes, a special instruction, OP_DBGEND, is executed to set the error flag, reset the running flag, and zero the runtime_exception_nesting variable. This will cause the C++ stack to unwind. Question: what happens to stack frames when there is an error. They are not explicitly freed. Are they GC'd? COMPILATION ERRORS The compiler compiles declarations and executes expressions from a file or from some "interactive" input, e.g. stdin (with stdout). If a compilation error occurs, the error is reported by report_error, and this sets compiler->error_flag. The compilation stops at this point and recursively called parsing functions return out to the top-level, compile_file(). If this is an interactive file, the error is cleared and compilation continues. Otherwise, the result (true for success, false for error) is passed back to op_load(). We want compilation to stop at all levels when a nested load fails. If a file loads 10 other files and the first one isn't found, the compilation process should stop and no more files should be loaded. To accomplish this, the compiler needs to look at machine->error_flag when the call returns and set compiler->error_flag accordingly. Similarly, op_load() must inspect the result of compile_file() (returned via compile_compile()) and set machine->error_flag accordingly. When an error is returned, perhaps recursively through nested compile_file() and op_load() calls, the returns stop at the instance of compile_file() where the interactive flag is set. At this point, the error flags are cleared and compilation resumes at the first expression that begins in column one. RUNTIME ERRORS AND FLAGS There are many flags: running -- tells when to stop executing instructions and return running is initially true. It is set to false when there is an error or when a call from C++ completes and it is time to return to the calling C++ method or function. error_flag -- tells that an error occurred. Since the error is a member variable and not a local variable, errors "stick" as the stack is unwound. Error propagation stops when the stack unwinds all the way to a compiler with an "interactive" level. compiler->error_flag -- similar to error_flag, but this keeps track of compiler errors only. It is generally a bug for the compiler to report two errors before exiting, and this flag keeps track of when compilation errors are reported. exit_flag -- causes machine to exit, even if you are in an interactive loop which clears error_flag and set running Under wxserpent, we want to turn off callbacks from WXWindows when the debugger is invoked. Otherwise, screen refresh code might get invoked over and over again, recursively calling the debugger. This is done by looking at the exception_nesting variable. TRACING Methods will have a new field: a flag indicating whether tracing is enabled. The field will be a 32-bit boolean (not an FVal), where true means trace. When called and the trace flag is set, the frame is passed as a parameter to method_trace(frame), a global function. BINARIES Goal should be to make loading a binary do the same thing as loading the source. That means there can be immediate statements in the binary, and we need to keep track of what's in the file. Things that can be in the file are: Methods and their code Classes Immediate statements To save a method, write: method name in ascii, followed by zero, then [to be continued -- but I think we won't worry about this until compiling programs is a limiting factor] CALLING C++ OR C CODE Functions must take fixed number of parameters (for now). Say you have a function "foo" with int and float args. Initialization: mach->create_builtin(FSymbol::create("foo", mach), 2, stub_foo); Glue function: void stub_foo(Machine_ptr m) { FFrame f = m->frame; FLong_ptr p1 = f->local(0).flong->type_check(m); FDouble_ptr p2 = f->local(1).fdouble->type_check(m); if (p1 && p2) { int64 p1c = p1->int64_data; double p2c = p2->double_data; int res = foo(p1c, p2c); m->push(FLong::create(res, m)); } } To register init functions, create object, pass it a parameterless function: Builtin_init(init_fn); THE GLOBAL METHOD Normally, calls are made from within a method, and the call instruction expects there to be a defined method to which the called function will return. This raises the question, "who calls the first method?" To answer this, we provide a function named "". The interpreter is initialized as if global is executing. ADDING EXTERNAL FUNCTIONS The interface.ser program writes C code to extend the interpreter. C comments are added as follows: int my_c_fn(int a, float b); /*SER long my_boa_fn[my_c_fn](long, float) PENT*/ Explanation: The is given inside /*SER ... PENT*/ comments. This is conveniently located near the C declaration in case there are changes to be made, but the BOA interface specification is completely independent of the C declaration, and it is up to the programmer to keep them consistent (some minimal checking will be made when the interface is compiled). The function name as it will appear in Serpent is given first. If this differs from the function name in C, the C name is given in square brackets. The types in the Serpent specification are a subset of C types, as follows: void -- for return values only, the Serpent function will return 'nil'. Type Formal Actual long long int64 short short int64 char char string; string must contain one character string char* string double double double float float double bool bool any; nil is false, else true anylong long int64 or double; value is coerced to long anydouble double int64 or double; value is coerced to double anyfloat float int64 or double; value is coerced to float anyshort short int64 or double; value is coerced to short SCANNING AND PARSING This is much more difficult than I first imagined. The basic idea I picked up from Python is to turn indentation into specific INDENT and DEDENT tokens. What causes the complexity are the following rules: (1) Lines with only comments are ignored. This means you cannot base determine indentationI on white space -- you have to check if the white space is followed by a comment. (2) In interactive mode, you cannot really ignore empty lines because then the user could never execute a command except by starting to enter a new one! In interactive mode, typing a completely blank line generates DEDENTS back to the left margin, completing any multi-line entry. To simplify things, I decided to build a two-stage scanner. The low-level stage mostly parses tokens but doesn't deal with INDENT/DEDENT and other details. The high-level stage, called get_token() deals with indentation, ungetting tokens, comments, etc. Here's a sketch of the algorithm: initially, indent_dedent_mode = false get_token(): if tokens were ungotten, return them without further scanning. while not indent_dedent_mode: // skip blank lines scan_token() // scan the next token candidate if token->typ != token_spaces: break indent = position scan_token() // lookahead if token->typ == token_comment: scan_token() // skip comment if token->typ != token_newline: unscan() // save token for later indent_dedent_mode = true else: // ignore blank or comments-only line if indent_dedent_mode: if indent < stack[top]: top-- return token_dedent elif indent > stack[top]: stack[++top] = position return token_indent indent_dedent_mode = false scan_token() if token->typ == token_comment: scan_token() // skip comment return token->typ This does not work for the following reasons: (1) in interactive mode, if there are no spaces, we should generate dedents. This is fixed in ALL CAPS in the next version: get_token(): if tokens were ungotten, return them without further scanning. while not indent_dedent_mode: // skip blank lines scan_token() // scan the next token candidate if token->typ != token_spaces: break indent = position scan_token() // lookahead if token->typ == token_comment: scan_token() // skip comment if token->typ != token_newline: unscan() // save token for later indent_dedent_mode = true ELSE IF INTERACTIVE_MODE AND POSITION == 1: UNSCAN() // SAVE NEWLINE FOR LATER INDENT_DEDENT_MODE = TRUE else: // ignore blank or comments-only line if indent_dedent_mode: if indent < stack[top]: top-- return token_dedent elif indent > stack[top]: stack[++top] = position return token_indent indent_dedent_mode = false scan_token() if token->typ == token_comment: scan_token() // skip comment return token->typ LIST COMPREHENSION COMPILATION List compilations have this syntax: [ expr for x at i in list if pred ] or [ expr for i = start to end by step ] where "at i", "if pred" and "by step" are optional. This is equivalent to: temp = array(0) for x at i in list: if pred: temp.append(expr) To avoid creating temp, we can keep the array on the stack. Then, to accomplish temp.append(), we'll make a new special opcode. For the 2nd form, the equivalent code is: temp = array(0) for i = start to end by step: if pred: temp.append(expr) We might as well relax the syntax a little and allow the following: [a, b, f(x) for x in list, g(y) for y = 0 to 10 if p(y), c, d] The problem with list comprehension compilation is shown by the following examples: [x] [x for x in a] Let's assume x is not declared at the point where these expressions are encountered. In the first expression, x is compiled as a global variable. In the second expression, x must be a local variable. x is implicitly declared just as it would be in a for loop. The problem is that we have to compile the expression "x" (which might in fact be a long, complex nested expression) without knowing whether this is a regular list (and x is global) or this is a list comprehension (and x is local). A general solution is to make 2 passes: form a parse tree, then come back and generate code. Serpent does not form parse trees, so this would be a major rewrite. The alternative is to save tokens starting with the beginning of "expr" but compile "expr" as if it is just a normal expression. If we find "for", we set the code location to the beginning of "expr" so that code generation will overwrite the code we just generated, we continue parsing the "for" and "if" parts. Then we compile "expr" from the saved token stream. Finally, we generate an instruction to append the computed "expr" value to the array, which is on the stack. All this has to be recursive. Consider: [x for x in [y for y in [z for z in ...]]] Or, perhaps worse, "expr" can be a list comprehension: [[i for i = 0 to n] for n = 5 to 10] which generates a list of lists of length 5 to 10. Here the compiler will recompile "i" in the first comprehension, then recompile the entire compreshension when the second "for" is reached. The "replay" data structure is a stack of structures with instruction addresses for where the code generation started and a pointer into a list of tokens (marked "T"): [ i for i = 0 to n ] for n = 5 to 10 ] ^ ^ +------+ | | | T--+--+ | +------+ | | T--+----+ +------+ | . | | . | | . | Let's rewrite the tokens before replay. The above would become [ i for ^ ^ +------+ | | | T--+--+ | +------+ | | T--+----+ +------+ | . | | . | | . | When the "for" is encountered, we back up to place at the top of the stack and save the expression tokens (in this case just "i"), marking the stack element with "E", and change the "for" to "lc-for" ("list-comprehension for") which we parse as a list comprehension, continuing to save tokens because the stack is not empty. [ lc-for i = 0 to n ] ^ +------+ | | T--+--+ +------+ | E--+--> i +------+ | . | | . | | . | When we reach the end of the lc-for-if expressions, insert the expression on the top of the stack into the tokens: [ lc-for i = 0 to n lc-expr i ] ^ +------+ | | T--+--+ +------+ | . | | . | | . | The next token is "for" so again we back up and replay, this time from the next top of stack: lc-for n = 5 to 10 ] ^ +------+ | | T--+--+ +------+ | E--+--> [ lc-for i = 0 to n lc-expr i ] +------+ | . | | . | | . | lc-for n = 5 to 10 lc-expr [ lc-for i = 0 to n lc-expr i ] ] ^ +------+ | | T--+--+ +------+ | . | | . | | . | Finally, this sequence will parse and the stack can be popped. NETWORKING If NETWORK is defined, then Serpent is compiled with functions that allow network communication. OUTPUT, STDERR, STDOUT, ETC. Some general goals: error messages from Serpent should go to stderr. Other general output should go to stdout. Both stdout and stderr should be initialized to correspond to Unix stdout and stderr, but stdout and stderr should be global variables that the application (in Serpent) can reassign to redirect output. Since wxserpent on Windows does not have console output, the default stdout and stderr should be duplicated to stdlog (if defined) and also with _CrtDbgReport() on Windows to get output into the debugger window. The "real" stdout and stderr will be represented by instances of Stdio which will flush to machine->console_out and machine->console_err, respectively, and also to _CrtDbgReport() on windows and to stdlog if the file exists. machine->console_out and machine->console_err output to stdout and stderr, but since they are function pointers, they can be replaced, e.g. to direct output to a window in wxserpent. ----- The interpreter can print debugging info depending on flags. There are two sources of console-like output: Serpent's stdout file, which is directed to the console and debug trace information. Both are handled in about the same way; differences are: 2) If dbg_trace_output_disable is set (in Serpent), then under serpent, stdout is not copied to CrtDbg, and under wxserpent, stdout is not copied to stdout Implementation: The initial stdout in Serpent is special and implemented by a Stdio. All other files in Serpent are implemented by Vfile and go directly to standard C FILEs using fputs, etc. So writes to Stdio mean "console output" and the actual write operation is performed by Stdio::flush as follows: 1) output goes to console. A) in serpent, this is a simple function that calls fputs B) in wxserpent, this is wxs_print_string, which echoes to stdout 2) if Windows and !dbg_trace_output_disable, output goes to CrtDbg console 3) if srplog, output is written there 4) if log_enable created a log file, output is written there Other output is sent from internal functions to Machine::trace, which does: 1) output goes to console: A) in serpent, this is a simple function that calls fputs B) in wxserpent, this is wxs_print_string, which echoes to stdout 2) if Windows, output goes to CrtDbg console 3) if srplog, output is written there Output from Machine::trace goes to: CrtDbg console (Visual C++ debugger output window) stdout if a GUI text window is available, copy output there too Summary of calls: Serpent print (op_print) calls Machine::stream_print calls put_string_stdout gets Vfile for stdout and calls put_string (which could be eliminated) calls Vfile::write calls either Cfile::putc which uses fputc OR Vfile::putc calls Vfile::flush calls (*Vfile::put_stdout)(buff, machine) calls wxs_print_string calls wxs_print_string_1 which handles output to wxWindow Machine::trace (used like printf) calls CrtDbgReport AND fputs(msg, stdout) AND fputs(msg, stdlog) THREADS (see sthread.h for many details.) Currently running thread is machine->thread. Threads have links forward and backward (next_runnable, prev_runnable). Threads also have (next_sibling, prev_sibling) links to maintain a circular list (set) of children, a children link to one of the children, and parent link from each child thread to its parent. machine->thread ----------+ | v +-- thread <===> thread <===> thread <===> thread --+ | ^ ^ | | | | | +------+-------------------------------------+ | | | +--------------------------------------------+ UNICODE In 2021, Serpent began to adopt Unicode. Internal representation for strings is similar: strings are either short strings with a length count (byte) a zero terminator, and up to 4 bytes, stored in a single 64-bit SVal, *or*, regular strings are in the heap stored with: 64-bit header with pointer, tag, color, and slots fields 32-bit slot count (allowing this to be a long node) 32-bit byte count (exact string length) 32-bit char_len (number of UTF-8 characters in string) 32-bit char_index (cached character index) 32-bit byte_index (cached byte offset for char_index) (Overhead per string is 28 bytes. If we did UTF-32, we'd eliminate the need for 12 bytes of overhead and indexing would be linear, but the space requirement is greater for all ASCII strings since our UTF-8 scheme can put 4-char ASCII strings into a single 64-bit SVal.) Strings are always stored as valid Unicode UTF-8 codes, normalized using NFC. This means that: - strings can no longer contain arbitrary binary. E.g. we need a new way to read binary file data: Probably read them into arrays of ints, with 6-bytes per integer - all incoming strings must be checked and normalized - concatenated strings must be re-normalized - string comparisons of "logically" equivalent strings may not support human-oriented searching, e.g. the ligature "fi" will not match the 2-character pattern "fi". This requires "compatibility" conversions such as normalization form NFKC. Serpent has a nfkc() function to normalize NFC to NFKC. SERPENT SOURCE CODE AND UNICODE We can allow unicode characters (UTF-8) in Serpent literal strings: - existing code probably works except escape character "\" applied to a unicode character should yield the entire unicode character. - need to apply NFC normalization to content of literal strings Serpent identifiers should be restricted to prevent visual aliases where two symbols are different but visually indistinguishable. - identifiers are formed by NFKC normalization - symbols are also formed by NFKC normalization - PI can be represented by the greek symbol! UNICODE SUPPORT FUNCTIONS We need functions for: - find next character - find previous character - check range of bytes for valid Unicode string - apply compatibility conversion - normalize Unicode string to NFC, options are: o libunistring -- good, but uses malloc and free not clear what to do with allocated results source code is large, but probably could substitute Serpent memory allocation for malloc/free o ICU -- seems to use UTF-16 o utf8proc -- see utf8proc_NFC() + smaller, simpler, cleaner, but still uses malloc/free + see libmojibake, might be newer/better