THE SPECIALIZER UNMIX Sergei A.Romanenko Keldysh Institute for Applied Mathematics Russian Academy of Sciences Miusskaya Sq.4, SU-125047, Moscow, Russia July, 1990 Revised: December 1992 COPYRIGHT NOTICE ================ Permission to use, copy, modify, and distribute this software and its documentation for any personal or educational use without fee is hereby granted, provided that: * This copyright notice is retained in both source code and supporting documentation. * Modified versions of this software cannot be redistributed unless accompanied by a complete history (date, author, description) of modifications made; the intension here is to give appropriate credit to those involved, whilst simultaneously ensuring that any recipient can determine the origin of the software. * These same conditions will also be applied to any software system derived either in full or in part from this system. The name `Unmix` is not a trademark, registered or otherwise, and you are free to mention this name in published material, public and private correspondence, and other documents without restriction or obligation. `Unmix` is provided `as is` without express or implied warranty. WHAT IS UNMIX? ============== If a program has several input parameters, it can be "specialized" with respect to the values of some of the parameters, in which case we get a "residual" program with less input parameters than the original program. The parameters whose values are known at the time the program is being specialized are referred to as "static" parameters, all other parameters are referred to as "dynamic" ones. Unmix is a system that specializes programs written in a subset of Scheme, and consists of two phases: "preprocessor" and "generator of residual programs". The specialization is done in two steps. At the first step, preprocessor takes a Scheme program and a descriptor of the program's input parameters. The program is a non-empty sequence of function definitions. The first function definition in the program is assumed to be the "main" function of the program. The descriptor is a string of letters "s" and "d", the length of the descriptor being equal to the number of the main function's parameters. The descriptor of the program's parameters classifies each parameter of the main function as either static or dynamic. If a parameter is static, the corresponding letter in the descriptor is "s", otherwise the letter is "d". Preprocessor takes as input a program and a descriptor and produces an "annotated" version of the original program, which contains some directions for the generator of residual programs. At the second step, the generator of residual programs takes as input an annotated program and a sequence of file names (separated by one or more spaces). The number of file names must be equal to the number of the program's static input parameters. Each of the files must contain zero, one, or more Scheme S-expressions to be used as values of the corresponding static parameters. THE STRUCTURE OF THE PREPROCESSOR ================================= The preprocessing consists of several stages. First, the original program is "desugared", i.e. compiled from Scheme to Mixwell, which is the internal language of the specializer. Second, all variables and operations in the program are classified as either static or dynamic and this information is inserted into the program. The result is an annotated version of the original program. Annotated Mixwell programs will be regarded as programs written in Mixwell-Ann, a version of Mixwell supplemented with additional constructs to express annotations. Second, some of the defined function calls in the annotated program are annotated as "residual" in order to avoid infinite unfolding of function calls and duplication of function calls that may result from some calls being unfolded. THE STRUCTURE OF THE GENERATOR ============================== The generator consists of "partial evaluator", which generates residual program by symbolically evaluating Mixwell expressions, and "postprocessor", which performs some additional transformations of the residual program and then translates the residual Mixwell program to Scheme. THE STRUCTURE OF THE POSTPROCESSOR ================================== The postprocessing comprises the following stages. 1) the first Call Graph Reduction, 2) Arity Raising, 3) the second Call Graph Reduction, 4) Compilation from Mixwell into Scheme. More information may be found in the source programs of Unmix. HOW TO RUN UNMIX? ================= The directory containing programs to be specialized must also contain the file SCHEME.INI having the following contents: (define **unmix-path** SSSS) (define **external-editor-name** EEEE) (load (string-append **unmix-path** "xunmix.fsl")) (unmix) where SSSS is the full path of the directory containing Unmix, and EEEE the full name (including the path) of a text editor to be called from the main menu of Unmix. For example, if Unmix resides in the directory E:\UNMIX, and the external editor NE.EXE resides in the directory C:\NC, the file SCHEME.INI must contain the following lines: (define **unmix-path** "E:\\UNMIX\\") (define **external-editor-name** "C:\\NC\\NE.EXE") (load (string-append **unmix-path** "xunmix.fsl")) (unmix) If an external editor is not supposed to be used, the external editor name needn't be specified. To call Unmix, make sure that the directory containing programs to be specialized is the current one. Then enter one of the command: PCS PCSEXP PCSEXT The command PCS runs the version of the TI Scheme that makes use of neither expanded nor extended memory. The commands PCSEXP and PCSEXT run versions of the TI Scheme capable of making use of the expanded and extended memory, respectively. As a result, Unmix starts and displays a menu on the screen, which provides further information. THE INPUT LANGUAGE OF UNMIX =========================== Here is the syntax of the subset of Scheme accepted by the specializer Unmix: ::= * ;; Program ::= (define ( *) ) ;; Procedure ;; definition ::= ;; Variable | (quote ) ;; Constant | ;; Literal constant | (if ) ;; Conditional | (let (*) ) ;; Let-expression | (rcall ( *)) ;; Residual call | (generalize ) ;; Generalizer | ( *) ;; Procedure call | ( *) ;; Macro ::= ( ) ;; Local binding ::= ;; Procedure name ::= ;; Variable name ::= ;; Macro name ::= | | | | All procedures called in the program must be without side-effects. For this reason, the terms "procedure" and "function" will be used in the description of Unmix interchangeably. Constructs (generalize ) and (rcall ( *)) are used to insert into the program hand-made annotations, which permit the user to control the specializer. They are useless for ordinary programming. Construct (generalize ) tells the specializer that the result of specializing must be dynamic, even if is static. When the program is compiled in the usual way, this construct is equivalent to . Construct (rcall ( *)) tells the specializer that the result of specializing the procedure call ( *) must be a residual call. When the program is compiled in the usual way, this construct is equivalent to ( *). Some useful macro definitions may be found in the file "x-match.s". File "x-synt.s" contains a definition of "extend-syntax", a powerful tool for defining macro extensions. THE LANGUAGE MIXWELL ==================== Mixwell is the internal language of the specializer Unmix. Here is its syntax: ::= * ::= ( (*) = ) ::= ;; Variable | (quote ) ;; Constant | (if ) ;; Conditional | (call *) ;; Defined function call | (rcall *) ;; Defined function call | (xcall *) ;; External function call | ( *) ;; External function call ::= ;; Procedure name ::= ;; Variable name The construct (call *) is a call on the procedure defined in the program, which will be unfolded during partial evaluation. The construct (rcall *) is a call on the procedure defined in the program, which will give rise to a residual call during partial evaluation. The construct (xcall *) is a call on the procedure defined somewhere outside the program. If is different from the symbols STATIC, IFS, IFD, RCALL, CALL, and XCALL, the keyword XCALL is omitted and the construct takes the form ( *). THE LANGUAGE MIXWELL-ANN ======================== ::= ;; Program ::= (*) ;; Residual procedure ;; names ::= ;; Dynamic program ( *) ::= (*) ;; Static program ::= ( = ) ;; Annotated procedure ;; definition ::= (*) ;; Parameter list ::= ;; Variable | (static ) ;; Static subexpression | (ifs ) ;; Static conditional | (ifd ) ;; Dynamic conditional | (call (*) (*)) ;; Unfoldable defined ;; function call | (rcall (*) (*)) ;; Residual defined ;; function call | (xcall *) ;; External function call | ( *) ;; External function call EXAMPLES ======== In addition to the files that contain the specializer, there are several files containing a number of example programs to be specialized. Here we list the programs with some suggestions about the way in which they can be specialized. ------------------------------------------------------------------ | Program | Source | Parameter | Static data | | | file | description | files | ------------------------------------------------------------------ | | | | | | Zipper | ZZIP.S | SD | Z123.DAT | | Maximum substring | ZMAXCS.S | SD | Z123.DAT | | MP Interpreter | ZMP.S | SD | ZMPREV.DAT | | TM Interpreter | ZTM.S | SDDD | ZTMTST.DAT | | Parser | ZPRS.S | SD | ZPRSEXP.DAT | | | | | | ------------------------------------------------------------------ REFERENCES ========== [Barzdin 88] G.Barzdin. Mixed Computation and Compiler Basis. In D.Bjorner, A.P.Ershov and N.D.Jones, editors, Partial Evaluation and Mixed Computation, pages 15-26, North-Holland, 1988. [Beckman 76] L.Beckman, A.Haraldson, O.Oskarsson, E.Sandewall. A Partial Evaluator, and Its Use as a Programming Tool. Artificial Intelligence, 7(4):319-357, 1976. [Bulyonkov 84] M.A.Bulyonkov. Polyvariant Mixed Computation for Analyzer Programs. Acta Informatica, 21:473-484, 1984. [Burstall 77] R.M.Burstall and J.Darlington. A Transformation System for Developing Recursive Programs. Journal of the ACM, 24(1):44-67, 1977. [Dixon 71] J.Dixon. The Specializer, a Method of Automatically Writing Computer Programs. Division of Computer Research and Technology, National Institute of Health, Bethenda, Maryland, 1971. [Ershov 78] On the Essence of Compilation. In E.J.Neuhold, editor, Formal Description of Programming Concepts, pages 391-420, North-Holland, 1978. [Ershov 81] A.P.Ershov. The Transformational Machine: Theme and Variations. In J.Grushka and M.Chytil, editors, Mathematical Foundations of Computer Science, Strbske Pleso, Czechoslovakia, pages 16-32, Lecture Notes in Computer Science, Vol.118, Springer-Verlag, 1981. [Futamura 71] Partial Evaluation of Computation Process - An Approach to a Compiler-Compiler. Systems, Computers, Controls, 2(5):45-50, 1971. [Hughes 88] J.Hughes. Backward Analysis of Functional Programs. In D.Bjorner, A.P.Ershov and N.D.Jones, editors, Partial Evaluation and Mixed Computation, pages 187-208, North-Holland, 1988. [Jones 85] N.D.Jones, P.Sestoft and H.Sondergaard. An Experiment in Partial Evaluation: The Generation of a Compiler Generator. In J.-P.Jouannaud, editor, Rewriting Techniques and Applications, Dijon, France, pages 124-140, Lecture Notes in Computer Science, Vol.202, Springer-Verlag, 1985. [Jones 86] N.D.Jones and A.Mycroft. Data Flow Analysis of Applicative Programs Using Minimal Function Graphs. In Thirteens ACM Symposium on Principles of Programming Languages, St.Petersburg, Florida, pages 296-306, ACM, 1986. [Jones 88] Automatic Program Specialization: A Re-Examination from Basic Principles. In D.Bjorner, A.P.Ershov and N.D.Jones, editors, Partial Evaluation and Mixed Computation, pages 225-282, North-Holland, 1988. [Mogensen 88] T.Mogensen. Partially Static Structures in a Self-Applicable Partial Evaluator. In D.Bjorner, A.P.Ershov and N.D.Jones, editors, Partial Evaluation and Mixed Computation, pages 325-347, North-Holland, 1988. [Ostrovski 88] B.N.Ostrowski. Implementation of Controlled Mixed Computation in System for Automatic Development of Language-Oriented Parsers. In D.Bjorner, A.P.Ershov and N.D.Jones, editors, Partial Evaluation and Mixed Computation, pages 385-403, North-Holland, 1988. [Romanenko 88] S.A.Romanenko. A Compiler Generator Produced by a Self-Applicable Specializer Can Have a Surprisingly Natural and Understandable Structure. In D.Bjorner, A.P.Ershov and N.D.Jones, editors, Partial Evaluation and Mixed Computation, pages 445-463, North-Holland, 1988. [Romanenko 90] S.A.Romanenko. Arity Raiser and Its Use in Program Specialization. In N.Jones, editor, ESOP '90, 3rd European Symposium on Programming, Copenhagen, Denmark, May 15-18, 1990, pages 341-360, Lecture Notes in Computer Science, Vol. 432, Springer-Verlag, 1990. [Sestoft 86] The Structure of a Self-Applicable Partial Evaluator. In H.Ganzinger and N.D.Jones, editors, Programs as Data Objects, Copenhagen, Denmark, 1985, pages 236-256, Lecture Notes in Computer Science, Vol. 217, Springer-Verlag, 1986. [Schmidt 86] D.A.Schmidt. Denotational Semantics. Allyn and Bacon, Boston, 1986. [Sestoft 88] P.Sestoft. Automatic Call Unfolding in a Partial Evaluator. In D.Bjorner, A.P.Ershov and N.D.Jones, editors, Partial Evaluation and Mixed Computation, pages 485-506, North-Holland, 1988. [Turchin 72] V.F.Turchin. Equivalent Transformation of Recursive Functions Defined in Refal. In Teoriya Yazykov i Metody Programmirovaniya. Trudy Simposiuma, pages 31-42, Alushta-Kiev, 1972 (in Russian). [Turchin 79] V.F.Turchin. A Supercompiler System Based on the Language Refal. SIGPLAN Notices, 14(2):46-54, February 1979. [Turchin 82] V.F.Turchin, R.M.Nirenberg and D.V.Turchin. Experiments with a Supercompiler. In 1982 ACM Symposium on Lisp and Functional Programming, Pittsburgh, Pennsylvania, pages 47-55, ACM, 1982. [Turchin 86] V.F.Turchin. The Concept of a Supercompiler. ACM Transactions on Programming Languages and Systems, 8(3):292-325, July 1986. [Turchin 88] V.F.Turchin. The Algorithm of Generalization in the Supercompiler. In D.Bjorner, A.P.Ershov and N.D.Jones, editors, Partial Evaluation and Mixed Computation, pages 531-549, North-Holland, 1988. [Wadler 88] P.Wadler. Deforestation: Transforming Programs to Eliminate Trees. In European Symposium on Programming, Lecture Notes in Computer Science, Springer-Verlag, 1988. APPENDIX. SOME MACROS USED IN UNMIX =================================== Unmix, as well as the example programs, has been written in Scheme extended with the following macros. GENERALIZED CASE-EXPRESSION (MATCH (arg ...) (pat ... & guard => exp ...) ... ) The expressions "arg ..." are evaluated to produce S-expressions "S-exp ...". "S-exp ..." are then matched against the corresponding patterns "pat ...". If the matching succeeds for some clause (pat ... & guard => exp ...) the variables in "pat ..." get bound to the corresponding subexpressions in "S-exp ...", and then the expression "guard" is evaluated in the extended environment. If the result of "guard" is not #!FALSE, the expressions "exp ..." are evaluated in the extended environment, otherwise the next clause is tried. If the guard is #!TRUE, "& guard" may be omitted. The patterns have the following syntax: ::= ' matches . | matches . | matches anything, is bound. | _ matches anything. | ( as ) matches , is bound. | ( . ) matches a pair with 's as elements. ::= ::= | () | | | | | GENERALIZED LET-EXPRESSION (WITH ((pat arg) ...) exp ... ) The expressions "arg ..." are evaluated to produce S-expressions "S-exp ...". "S-exp ..." are supposed to match the patterns "pat ...", in which case the variables in "pat ..." get bound to the corresponding subexpressions in "S-exp ...", and then the expressions "exp ..." are evaluated in the extended environment. If some of "S-exp ..." do not match against patterns "pat ...", the result of the form WITH is unspecified, because there is no actual analysis of the structure of "S-exp ...". The syntax of patterns is exactly the same as in the case of the form MATCH. The form (WITH* ((pat1 arg1) . (pat arg) ...) exp ...) is equivalent to (WITH ((pat1 arg1)) (WITH* ((pat arg) ...) exp ...) RESTRICTED GENERALIZED CASE-EXPRESSION (SELECT (arg ...) (rpat ... & guard => exp ...) ... ) The expressions "arg ..." are evaluated to produce S-expressions "S-exp ...". "S-exp ..." are then matched against the corresponding restricted patterns "rpat ...". If the matching succeeds for some clause (rpat ... & guard => exp ...) the variables in "pat ..." get bound to the corresponding subexpressions in "S-exp ...", and then the expression "guard" is evaluated in the extended environment. If the result of "guard" is not #!FALSE, the expressions "exp ..." are evaluated in the extended environment, otherwise the next clause is tried. If the guard is #!TRUE, "& guard" may be omitted. Restricted patterns is a subclass of patterns. ::= ' | | | _ | (' . ) | ( . ) | (_ . ) | ( . ) Restricted patterns have the following meaning. Restricted patterns ', , , and _ have the same meaning as the ordinary patterns. The result of matching an S-expression against other restricted patterns may be unspecified. If an S-expression is not a pair, the result of matching against the patterns (' . ), ( . ), (_ . ), and ( . ) is unspecified. If an S-expression is a pair ( . ), where the pattern does not match , the result of matching ( . ) against the patterns (' . ), ( . ), (_ . ), and ( . ) is unspecified. Now suppose that an S-expression is a pair ( . ), where the pattern matches . Then we have the following cases. The patterns (_ . ), and ( . ) always match ( . ). The pattern ( . ) matches ( . ) if is equal to , otherwise the matching fails. The pattern ( . ) matches ( . ) if is equal to , otherwise the matching fails. The above definition enables the restricted patterns to be compiled into efficient code. RCALL (RCALL (fname arg ...)) This construct is used for telling the specializer that the function call (fname arg ...) is a residual one. In all other respects this construct is equivalent to (fname arg ...). GENERALIZE (GENERALIZE exp) This construct is used for telling the specializer that the result of specializing (GENERALIZE exp) must be dynamic even if exp is static. In all other respects this construct is equivalent to exp.