Follow-up Work

No doubt many of the happy plans of the previous sections will remain `to do' even after the thesis is complete. The runtime/backend system contains many opportunities for improvement and extension. Alternatively, techniques developed with nitrous could be grafted onto an existing system. This section lays out directions for future work.

Backend

First, the system currently only produces intermediate code. I have a jury-rigged root-->scheme compiler working in the scheme48 system, which compiles scheme to interpreted bytecode. To get any kind of real performance measurements, a custom root-->machine-code backend is required. This involves primarily register allocation, instruction scheduling, jump optimization, and dead code elimination.

After a simple backend is working, more powerful global optimizations can be incorporated, using feedback from execution. Take ideas from Self [self] and code coagulation[ref]. Side effects can be isolated by using three levels of purity: pure (inline freely), infrequently changed (propagate, but keep backpointers for when the value is mutated), and volatile (do not inline/propagate). This ties into the code developement system, and how a programmer changes the definition of a value/procedure on the fly.

Current CPU architectures have fixed maximum instruction level parallelism because they are implemented with fixed CMOS gates; PE sometimes produces extremely wide code, wider than any imaginable general purpose CPU. Field Programmable Gate Arrays (FPGAs) are another new chip technology, they allow their gates to be reconfigured `at run time', and thus provide an interesting target for a future backend [IseSa93].

Cogen

Cogen itself has many deficiencies, even after this thesis is complete. The following are possible courses of action:

handle general records rather than just pairs. OOP?
handle side effects in compilers [Heintze94].
extend the analysis to track eq-ness of values and their purity.
convert fixed-length lists into records (unboxing according to static structure (see spines)).
switch to conditional, procedure-call, exceptions, and loop based control flow rather than CPS-CPS.
return if possible to a self-applicable specializer (from a directly implemented compiler generator). This is much more elegant and contains less duplicated functionality, though it's not clear if it can be done without losing important features.
extend it from two-stage to multi-stage.

Experiments

The proposed instruction count measurements could be improved by profiling the abstract code with progressively more accurate models of memory hierarchy and CPU resources. One way to implement such a model is with a root language self-interpreter that records execution statistics. For example, by modeling the instruction cache one could answer the question: does creating lots of specialized code blow the instruction cache, or is that offset by eliminating conditionals?

Complex, statistics-collection memory-model simulators are well served by RTCG. It may be possible to generate the simulators by writing root self-interpreters that collect statistics, then using them to compile the programs you want to benchmark. Similar techniques may provide safe languages for kernel environments. Hardware resource sharing of the clock (non-preemptive threads) and memory (GCed heaps) also make interesting targets for code transformation.

fecund experiment list: OOP (prototypes, first class environments), interpreter kit (monads, monadoids), fnord, diffey-q package (convert to finite-difference), interactive DSP programming, a network distributed paint program, a visual-musical instrument, artificial life, genetic art, structure browser/editor/outliner, janus, a lisp OS.