This is a guide to editing and executing Standard ML (SML) programs at Carnegie Mellon University, using Harlequin Incorporated's MLWorks system. This document was written by Peter Lee (petel@cs.cmu.edu), with extensive contributions by Robert Harper (rwh@cs.cmu.edu), Iliano Cervesato (iliano@cs.cmu.edu), Carsten Shurmann (carsten@cs.cmu.edu), Frank Pfenning (fp@cs.cmu.edu), and Herb Derby (derby@cs.cmu.edu).
This is not a reference manual for the Standard ML language. If you need a reference manual or a tutorial, you can find several sources of information, both on-line and in hard copy from the Introduction.
When you start the MLWorks system, an initial window called the console window will appear (assuming you are running X-windows). The displayed message should be
MLWorks 1.0 Copyright (C) 1996 The Harlequin Group Limited. All rights reserved. MLWorks is a trademark of The Harlequin Group Limited.
The window also offers a number of pull-down menus which are described in the User's Guide, available on-line at
http://www.cs.cmu.edu/afs/andrew/scs/cs/mlworks/doc/guide/html/index.htm
The ML system is launed from the menu Tools>Listener. A listener window will appear, prompting you for an SML expression: This command will then be compiled and executed, and the result displayed.
When prompted, you can type in a top-level declaration.
There are several kinds of top-level declarations in SML. For
example, the following is declaration of a function called
inc
that increments its integer argument. (In these examples, "MLWorks>") is the
MLWorks prompt, and the text in teletype
font is the
user input. In some browsers, user input will also appear in blue
text. The italic font is used for the output from the
MLWorks system. The symbol represents a carriage return.)
MLWorks>fun inc x = x + 1;
val inc = fn : int -> int
The text "fun inc x = x + 1
"
is the declaration for the inc
function. The semicolon (";
")
is a marker that indicates to the MLWorks system that it should
perform the following actions: elaborate (that is,
perform typechecking and other static analyses), compile (to
obtain executable machine code), execute, and finally print
the result of this declaration. After all of this, the
system then prompts for new input and the whole process starts
again. This is the so-called "top-level loop". To exit
from the MLWorks listener, select File>Close
or simply type an end-of-file character (Control-d) to the
prompt.
In the example above, the printed result shows that inc
is
a function that takes an integer argument and yields an integer
result. Actually, it is important for you to know that, in SML,
functions are "first-class" values, fundamentally no
different than other values such as integers. So, to be more
precise, it is better to say that the identifier inc
has been bound to
a value (which happens to be a function, as denoted by the fn
keyword above) of type int -> int.
If we had left out the semicolon, then the elaboration,
compilation, execution, and printing would have been deferred and
a prompt (this time, an equal sign, "=")
would be given, for either a continuation of the declaration of inc
or else another top-level
declaration. When a semicolon is finally entered (perhaps after
several more top-level declarations), all of the declarations
since the last semi-colon would be processed in sequence. For
example:
MLWorks>fun inc x = x + 1
fun f n = (inc n) * 5;
val inc = fn : int -> int val f = fn : int -> int
In this example, we have defined the inc
function as well as a function f
that uses inc
. Notice
that no prompt was given for the second function.
In the interactive top-level loop, the simplest form of input
is an expression. For example, after typing in the declarations
for inc
and f
above, we can now call f
by typing in:
MLWorks>f (2+4);
val it = 35 : int
Notice that since no identifier is given to bind to the value,
the interactive system has chosen the identifier it and
bound it to the result of compiling and executing the expression f (2+4)
.
You might have experience with other languages whose
implementations support a similar kind of interactive top-level
loop. For example, most implementations of the Lisp, Scheme, and
Basic languages support top-level loops. If you have experience
with any of these languages, then you might expect that
re-defining a function will change the binding of the function
name, as well as any other functions that call that function.
However, in the MLWorks system, this is not the case. For
example, suppose we wish to change the definition of the inc
function, so that it
increments by two instead of one:
MLWorks>fun inc x = x + 2;
val inc = fn : int -> int
In typical Lisp and Scheme systems, such a re-definition would
cause the function f
to
change as well, since f
calls inc
. But in the
MLWorks system, f
's
binding does not change, so in fact referring to f
now still yields the
original function:
MLWorks>f (2+4);
val it = 35 : int
To understand why the MLWorks system behaves in this way,
consider what would happen if we re-defined inc
so that it had a type different than int -> int, for
example:
MLWorks>fun inc x = (x mod 2 = 0);
val inc = fn : int -> bool
Here, inc
has been
changed to a function that returns true
if and only if its integer argument is even. Now, if f
should also be changed to
reflect this re-definition (as it would be in Lisp and Scheme
systems), it would fail to typecheck. This is not necessarily a
bad thing, but at any rate the MLWorks system does not bother to
go back to earlier top-level declarations and re-elaborate them;
hence, f
's binding is
left unchanged.
If you are already familiar with the SML language, then you can think of the sequence of top-level declarations typed into an MLWorks interactive top-level loop as being in nested let-bindings:
let fun inc x = x + 1 in let fun f n = (inc n) * 5 in let fun inc x = x + 2 in
...
[ Back to the Table of Contents ]
Instead of typing your program into the interactive top-level,
it is more productive to put your program into a file (or set of
files) and then load it (them) into the MLWorks system. The
simplest way to do this is to use the built-in function use
. For example:
MLWorks>use "myprog.sml";
val it = () : unit Use: myprog.sml ...
The use
function
takes the name of the file (of type string) to load. If
the file exists, it is opened and read, with each top-level
declaration in the file processed in turn (and the results
printed on the standard output). The "result" of the use
function is the unit value
("()
").
For those who prefer clicking than typing, the use
function can also be
invoked from the menu File>Use file...; a
file dialog will appear and allow you to choose the file to use.
[ Back to the Table of Contents ]
I recommend using Emacs to edit your SML programs and also to manage interaction with the MLWorks system. To do this, select Emacs Server from the menu Preferences>Editor..., and include the following lines into your .emacs file
(setq load-path (cons "/afs/andrew/scs/cs/mlworks/ultra/lib/emacs/lisp/" load-path)) (autoload 'mlworks-server "mlworks-server" "The MLWorks server" t) (autoload 'sml-mode "sml-mode" "Major mode for editing Standard ML programs." t) (setq auto-mode-alist (cons '("\\.sml$" . sml-mode) auto-mode-alist))
Then, start emacs and type Meta-x mlworks-server
.
The error manager will then communicate with your emacs session
and locate errors directly in the source file. The above emacs
lisp code is available on the Andrew file system at
/afs/andrew/scs/cs/mlworks/ultra/lib/emacs/sample.emacs.el
The above commands also load the "sml mode", a
special editing mode will be invoked any time you edit a file
with an appropriate extension (such as ".sml
";
other extensions can be specified in the init.el
file). As in other special editing modes, using the Tab key or
Control-j will cause emacs to attempt to indent your code in a
pleasing way. Control-c followed by Tab will indent the current
region. Since SML's syntax is rather complex, the sml mode
indentation can be rather haphazard at times. Still, many people
find it to be quite useful. A particularly useful key combination
is "Meta" along with a vertical bar ("|");
this creates a template for an arm of a case expression or clause
of a function. There are several other useful emacs commands for
interacting with the inferior sml shell. You can find
documentation for them by hitting Control-h m. Some of the most
basic commands are
C-cC-l | save the current buffer and then "use" the file |
C-cC-r | send the current region to the sml shell |
C-c` | find the next error message and position the cursor on the corresponding line in the source file |
C-cC-s | split the screen and show the sml shell |
Other editors can be used in conjunction with MLWorks. Consult the MLWorks User Guide for details.
[ Back to the Table of Contents ]
As with most compilers, the MLWorks system oftens produce error messages that can be hard to decipher. The problem is compounded by the fact that SML supports polymorphic type inference, which makes it very difficult for the compiler to figure out precisely the real source of a type error. On the other hand, once all of the compile-time type errors are removed, it is often the case that the bulk of the bugs have already been stamped out. In practice, SML programs often work the first time, once all of the type errors reported by the compiler have been removed!
MLWorks displays the error messages in a dedicated window (the error browser) with often intelligible messages. If the error was present in a file and the interaction with the editor has been set up properly, clicking on the Action>edit menu item will highlight the (approximate) location of the error in the source file. More about errors and error handling can be ound in the User's Guide.
The most common kind of error is the simple type mismatch. For
example, suppose we have the following code in a file called myprog.sml
:
fun inc x = x + 1 fun f n = inc true
Notice that a semi-colon is not needed here, since the end-of-file marker will serve the same purpose. Now, if we load this file, we get the following error message:
use "myprog.sml";
myprog.sml:2,11-2,18 error: function applied to argument of wrong type Near: inc true Required argument type: int Actual argument type: bool Type clash between int and bool
The error message indicates that the expression inc true
, on line 2, between
columns 11 and 18, is guilty of a type mismatch. The function inc
is being applied to an
argument of type bool
in
this expression, but its domain (argument type) is int
. Selecting Action>edit,
or double-clicking on the error in the upper part of the error
browser window will locate the cursor at the right position in
your file and highlight the faulty term.
To see a simple example of how error messages aren't always so illuminating, consider the following code:
fun fact 0 = 1 | fact n = n * fact true
Here, we have attempted to define the factorial function, but
in the recursive call we have (stupidly) applied the fact
function to the boolean
value true instead of to the integer argument n-1
. The error message given
by the MLWorks system is as follows:
myprog.sml:1,5 to 2,26: error: Type mismatch in recursive value binding for fact Near: fn 0 => ... Pattern type: bool -> int Expression type: int -> int Type clash between int and bool
Despite the fact that the error is "clearly" in the
recursive call to fact
,
the message indicates that the error is somewhere between line
one and line two - this is the entire program! Another
confusing aspect of this error message is that the function
declaration is printed out in a form that does not closely
resemble our original program. This is because many of SML's
constructs are "derived forms," in other words,
essentially macros that expand into a more basic "core"
syntax. The MLWorks system always prints out code in terms of the
core language, never the derived forms.
Some of the arithmetic operators, such as +
,
*
, -
,
=
, and so on, are
"overloaded", in the sense that they can be used with
either integer arguments or real arguments. This overloading
feature leads to possible source of confusion for the novice SML
programmer. Consider, for example, the following declaration of a
function for squaring numbers:
fun square x = x * x
The response from MLWorks is:
val square : int -> int = fn
MLWorks assumes that the *
is for integers. In other SML compilers, such as the Standard ML
of New Jersey, the resulting error message would be:
myprog.sml:1.18 Error: overloaded variable not defined at type symbol: * type: 'Z
Because there is not enough information in this program to
determine whether the *
is for integers or for reals, an error message is generated to
complain about the inability to "resolve" the
overloading.
The simple fix for this kind of error is simply to declare the type of one of the arguments to (or the result of) the arithmetic operation. For example, here are three versions that work:
fun square' x = x * x : int fun square'' (x : int) = x * x fun square''' x : int = x * x
The first version explicitly declares the type of the second
argument to the *
operator. The second version declares the type of the argument.
Finally, the third version declares the type of the result of the
square'''
function. All
three versions allow the SML type inference mechanism to infer
the types of the identifiers in the declarations.
It is not uncommon to spend quite a long time tracking down
the source of a type error. (Actually, the time spent doing this
is almost always much less than the time it takes to track down
the same error without the benefit of static typechecking!) A
common way to narrow down the possibilities, and also to improve
the precision of the error messages produced by the compiler, is
to annotate the program with explicit types, in the way that we
have done above. It is particularly helpful to annotate the types
of function parameters, as we have done in square''
above. This is similar to the declaration of parameter types in
languages such as C and Pascal. Of course, in those languages the
declarations are required; in SML they are optional.
One of the most fundamental changes in the 1997 revision of the SML language is that it now enforces something called the value restriction. Essentially, this restricts polymorphism to expressions that clearly are values, specifically single identifiers and functions. When this restriction is violated, the error message, "nongeneric type variable," is given. For example, the following program results in this error:
fun id x = x fun map f nil = nil | map f (h::t) = (f h) :: (map f t) val f = map id
The message given is
myprog.sml:6,5: error: Free type variable 'a in 'a list -> 'a list at top level
which indicates that the expression map
id
is polymorphic, but not syntactically a value
(that is, not an identifier or lambda expression), and hence the
attempt to use it as a polymorphic value (by binding f
to it) violates the value
restriction. The reasons for this restriction are beyond the
scope of this document, but are explained in several papers as
well as the textbook by Paulson.
In some cases the compiler can determine from context that an
expression like map id
that appears polymorphic can be given a non-polymorphic type. In
this case the compiler does not report an error. For example,
let val x = ref [] in x := [3] end
is accepted as a correct program.
Because the syntax of SML is rather complex, there are several common errors that novices tend to make. One of the most common has to do with the syntax of patterns in clausal-form function declarations and case expressions. Consider the following code:
datatype 'a btree = Leaf of 'a | Node of 'a btree * 'a btree
fun preorder Leaf(v) = [v] | preorder Node(l,r) = preorder l @ preorder r
The MLWorks system complains vigorously over this:
myprog.sml:4,14 to 4,17: error: Value constructor Leaf used without argument in pattern myprog.sml:5,14 to 5,17: error: Value constructor Node used without argument in pattern myprog.sml:4,5 to 5,48: error: Type mismatch in recursive value binding for preorder Near: fn _id216 => ... Pattern type: 'a -> ('a * 'a) list Expression type: 'a -> 'a * 'a -> ('a * 'a) list Type clash between ('a * 'a) list and 'a * 'a -> ('a * 'a) list
The problem here is that Leaf and Node are patterns that are syntactically separate from, respectively, the (v) and (l,r) patterns. The (admittedly strange) syntax of SML requires extra parenthesization:
fun preorder (Leaf v) = [v] | preorder (Node(l,r)) = preorder l @ preorder r
This is true in all contexts where patterns are used, including clausal-form function declarations, case expressions, and exception handlers.
Another rather confusing part of the syntax has to do with the interaction between case expressions, exception handlers, and clausal-form function declarations. Consider the following function, taken in slightly modified form from the MLWorks library (which is described later):
datatype 'a option = NONE | SOME of 'a
fun filter pred l = let fun filterP (x::r, l) = case (pred x) of SOME y => filterP(r, y::l) | NONE => filterP(r, l) | filterP ([], l) = rev l in filterP (l, []) end
In this example, the local function filterP
is defined in two clauses, the first handling the case of a
non-empty list argument, and the second handling the empty list.
In the first clause, a case expression is used. The syntactic
ambiguity arises from the fact that it takes too much
``lookahead'' to figure out whether or not the second clause of filterP
is actually the third
arm of the case expression. This leads to the following rather
cryptic error message:
myprog.sml:8,11 to 8,25: error: Non-constructor filterP used in pattern myprog.sml:8,27: error: Unexpected `=', inserting `=>' myprog.sml:8,11 to 8,25: error: Non-constructor filterP used in pattern myprog.sml:8,27: error: Reserved word `op' required before infix identifier `='
As before, parenthesization fixes the problem:
fun filter pred l = let fun filterP (x::r, l) = (case (pred x) of SOME y => filterP(r, y::l) | NONE => filterP(r, l)) | filterP ([], l) = rev l in filterP (l, []) end
Alternatively, in this example we can also exchange the two
clauses of filterP
:
fun filter pred l = let fun filterP ([], l) = rev l | filterP (x::r, l) = case (pred x) of SOME y => filterP(r, y::l) | NONE => filterP(r, l) in filterP (l, []) end
As with many programming languages, the basic advice to follow is: When in doubt, parenthesize.
[ Back to the Table of Contents ]
MLWorks contains numerous commands and options beyond the scope of this document. We suggest a careful reading of the User's Guide for getting acquainted with issues such as debugging, tracing, etc.
There is a reference manual for MLWorks available at URL:
http://www.cs.cmu.edu/afs/andrew/scs/cs/mlworks/doc/reference/html/index.htm
The reference manual includes a detailed discussion of the MLWorks libraries. A PostScript version of the reference manual is available as
/afs/andrew/scs/cs/mlworks/doc/reference/ps/reference-1-0.ps
[ Back to the Table of Contents ]