No doubt many of the happy plans of the previous sections will remain `to do' even after the thesis is complete. The runtime/backend system contains many opportunities for improvement and extension. Alternatively, techniques developed with nitrous could be grafted onto an existing system. This section lays out directions for future work.
First, the system currently only produces intermediate code. I have a jury-rigged root-->scheme compiler working in the scheme48 system, which compiles scheme to interpreted bytecode. To get any kind of real performance measurements, a custom root-->machine-code backend is required. This involves primarily register allocation, instruction scheduling, jump optimization, and dead code elimination.
After a simple backend is working, more powerful global optimizations can be incorporated, using feedback from execution. Take ideas from Self [self] and code coagulation[ref]. Side effects can be isolated by using three levels of purity: pure (inline freely), infrequently changed (propagate, but keep backpointers for when the value is mutated), and volatile (do not inline/propagate). This ties into the code developement system, and how a programmer changes the definition of a value/procedure on the fly.
Current CPU architectures have fixed maximum instruction level parallelism because they are implemented with fixed CMOS gates; PE sometimes produces extremely wide code, wider than any imaginable general purpose CPU. Field Programmable Gate Arrays (FPGAs) are another new chip technology, they allow their gates to be reconfigured `at run time', and thus provide an interesting target for a future backend [IseSa93].
Cogen itself has many deficiencies, even after this thesis is complete. The following are possible courses of action:
The proposed instruction count measurements could be improved by
profiling the abstract code with progressively more accurate models of
memory hierarchy and CPU resources. One way to implement such a model
is with a root
language self-interpreter that records execution
statistics. For example, by modeling the instruction cache one could
answer the question: does creating lots of specialized code blow the
instruction cache, or is that offset by eliminating conditionals?
Complex, statistics-collection memory-model simulators are well served
by RTCG. It may be possible to generate the simulators by writing root
self-interpreters that collect statistics, then using them to
compile the programs you want to benchmark. Similar techniques may
provide safe languages for kernel environments. Hardware resource
sharing of the clock (non-preemptive threads) and memory (GCed heaps)
also make interesting targets for code transformation.
fecund experiment list: OOP (prototypes, first class environments), interpreter kit (monads, monadoids), fnord, diffey-q package (convert to finite-difference), interactive DSP programming, a network distributed paint program, a visual-musical instrument, artificial life, genetic art, structure browser/editor/outliner, janus, a lisp OS.