This is a guide to editing and executing Standard ML (SML) programs at Carnegie Mellon University, using the Standard ML of New Jersey system. This document was written by Peter Lee (petel@cs.cmu.edu), with extensive contributions by Robert Harper (rwh@cs.cmu.edu), Iliano Cervesato (iliano@cs.cmu.edu), Carsten Shurmann (carsten@cs.cmu.edu), Frank Pfenning (fp@cs.cmu.edu), and Herb Derby (derby@cs.cmu.edu).
This is not a reference manual for the Standard ML language. If you need a reference manual or a tutorial, you can find several sources of information, both on-line and in hard copy from the Introduction.
When you start the SML/NJ system, it loads and responds with a message giving the current version number and then a prompt for user input. The prompt is a single dash ("-").
When prompted, you can type in a top-level declaration.
There are several kinds of top-level declarations in SML. For
example, the following is declaration of a function called
inc
that increments its integer argument. (In these examples, the
dash ("-") is the SML/NJ
prompt, and the text in teletype
font is the user
input. In some browsers, user input will also appear in blue
text. The italic font is used for the output from the
SML/NJ system. The symbol represents a carriage return on Unix-based systems
or the Enter key on the PC and Macintosh systems.)
-fun inc x = x + 1;
val inc = fn : int -> int
The text "fun inc x = x + 1
"
is the declaration for the inc
function. The semicolon (";
")
is a marker that indicates to the SML/NJ system that it should
perform the following actions: elaborate (that is,
perform typechecking and other static analyses), compile (to
obtain executable machine code), execute, and finally print
the result of this declaration. After all of this, the
system then prompts for new input and the whole process starts
again. This is the so-called "top-level loop". To exit
from the SML/NJ system, simply type an end-of-file character
(Control-d) to the prompt.
In the example above, the printed result shows that inc
is
a function that takes an integer argument and yields an integer
result. Actually, it is important for you to know that, in SML,
functions are "first-class" values, fundamentally no
different than other values such as integers. So, to be more
precise, it is better to say that the identifier inc
has been bound to
a value (which happens to be a function, as denoted by the fn
keyword above) of type int -> int.
If we had left out the semicolon, then the elaboration,
compilation, execution, and printing would have been deferred and
a prompt (this time, an equal sign, "=")
would be given, for either a continuation of the declaration of inc
or else another top-level
declaration. When a semicolon is finally entered (perhaps after
several more top-level declarations), all of the declarations
since the last semi-colon would be processed in sequence. For
example:
-fun inc x = x + 1
=fun f n = (inc n) * 5;
val inc = fn : int -> int val f = fn : int -> int
In this example, we have defined the inc
function as well as a function f
that uses inc
.
In the interactive top-level loop, the simplest form of input
is an expression. For example, after typing in the declarations
for inc
and f
above, we can now call f
by typing in:
-f (2+4);
val it = 35 : int
Notice that since no identifier is given to bind to the value,
the interactive system has chosen the identifier it and
bound it to the result of compiling and executing the expression f (2+4)
.
You might have experience with other languages whose
implementations support a similar kind of interactive top-level
loop. For example, most implementations of the Lisp, Scheme, and
Basic languages support top-level loops. If you have experience
with any of these languages, then you might expect that
re-defining a function will change the binding of the function
name, as well as any other functions that call that function.
However, in the SML/NJ system, this is not the case. For example,
suppose we wish to change the definition of the inc
function, so that it
increments by two instead of one:
-fun inc x = x + 2;
val inc = fn : int -> int
In typical Lisp and Scheme systems, such a re-definition would
cause the function f
to
change as well, since f
calls inc
. But in the
SML/NJ system, f
's
binding does not change, so in fact referring to f
now still yields the
original function:
-f (2+4);
val it = 35 : int
To understand why the SML/NJ system behaves in this way,
consider what would happen if we re-defined inc
so that it had a type different than int -> int, for
example:
-fun inc x = (x mod 2 = 0);
val inc = fn : int -> bool
Here, inc
has been
changed to a function that returns true
if and only if its integer argument is even. Now, if f
should also be changed to
reflect this re-definition (as it would be in Lisp and Scheme
systems), it would fail to typecheck. This is not necessarily a
bad thing, but at any rate the SML/NJ system does not bother to
go back to earlier top-level declarations and re-elaborate them;
hence, f
's binding is
left unchanged.
If you are already familiar with the SML language, then you can think of the sequence of top-level declarations typed into an SML/NJ interactive top-level loop as being in nested let-bindings:
let fun inc x = x + 1 in let fun f n = (inc n) * 5 in let fun inc x = x + 2 in
...
[ Back to the Table of Contents ]
Instead of typing your program into the interactive top-level,
it is more productive to put your program into a file (or set of
files) and then load it (them) into the SML/NJ system. The
simplest way to do this is to use the built-in function use
. For example:
-use "myprog.sml";
[opening myprog.sml] ... val it = () : unit
The use
function
takes the name of the file (of type string) to load. If
the file exists, it is opened and read, with each top-level
declaration in the file processed in turn (and the results
printed on the standard output). The "result" of the use
function is the unit value
("()
").
As your programs get larger and the code becomes spread over
many modules, you can find it extremely difficult to remember
exactly the right order in which to "use
"
the files. In order to alleviate this problem, the SML/NJ system
has a built-in feature called the Compilation Manager, or simply
CM, which I highly recommend that you use. (Actually, you might
have to start the SML/NJ system by invoking the
"sml-cm" binary, instead of simply "sml".) CM
is a complex system with documentation available on-line at http://www.cs.princeton.edu/~blume/cm-manual.ps.
For most uses the simplest interface is sufficient: simply create
a file in the current directory called sources.cm
which contains the names of all of your SML source files, listed
one per line in any order. Once this file is created, then you
can use the function CM.make
to load, compile, and execute your system. For example, suppose
you have three source files, a.sig
,
b.sml
, and c.sml
. Then you can create a
file called sources.cm
with the following contents:
Group is a.sig b.sml c.sml
Note that it does not matter in what order the file names occur. Once this file has been created, typing the following to the SML/NJ system will do whatever is necessary in order to load your program:
-CM.make();
The CM.make
function
will scan all of your sources files and calculate the
dependencies among them so as to compile and load them in the
right order. If CM.make
has already been used before to compile and load your program,
then it looks to see what files have been changed since the last
"make", and then loads and compiles the minimal number
of files necessary in order to bring the system up-to-date. After
running CM.make
, you
might notice a new directory in your source file directory. This
new directory is used by CM to "remember" the results
of the dependency calculation, as well as to store the results of
compiling your files so that they don't have to be compiled again
(unless, of course, they have been changed).
There is an extensive set of pre-defined values and functions in the SML/NJ system. This is referred to as the standard basis, or sometimes the pervasive environment. As with CM, there is also extensive documentation available on-line for the standard basis at http://cm.bell-labs.com/cm/cs/what/smlnj/basis/index.html. (A book on the standard basis will be published soon.) For dealing with files, the following function is often useful:
OS.FileSys.chDir : string -> unit
This function implements the standard "cd" Unix command, which changes the current working directory to the directory specified in the string argument. This is useful if you have started the SML/NJ system in a directory different from the one containing your source files.
Another set of basis functions are useful for controlling the output produced by the SML/NJ system:
Compiler.Control.Print.printDepth : int ref Compiler.Control.Print.printLength : int ref
These variables control the maximum depth and length to which
lists, tuples, and other data structures are to be printed. When
a data structure is deeper than printDepth
or longer than printLength
,
the remaining portion of the structure is printed as an ellipse
("...
").
To change the value of one of these variables, an assignment can be used. For example:
-Compiler.Control.Print.printDepth := 10;
changes the maximum print depth to ten.
The standard basis contains many modules and functions for manipulating values of all of the basic types, including booleans, integers, reals, characters, strings, arrays, and lists. Unfortunately, the SML/NJ system does not provide any kind of browser, so either you need to refer to the written documentation for the standard basis, or use a little bit of a hack in order to see the complete set of basis functions currently supplied in the SML/NJ for these types. For example, type the following to the interactive top-level:
-signature S = INTEGER;
Each set of standard basis functions is encapsulated in an SML
module, and each such module has a signature, or
"interface", whose name is written entirely in
uppercase and refers to the type of values for which the module
provides functionality. (Note that SML is case sensitive.) For
the integer functions, the signature is called INTEGER
. So, the above
declaration simply binds the identifier S
to the signature INTEGER
, which causes the
SML/NJ system to respond with a listing of the entire INTEGER
interface. (We
could have used any name besides S.)
Other useful signatures include BOOL
, REAL
, CHAR
, STRING
, ARRAY
, and LIST
. For functions that
interface to the operating system (such as OS.FileSys.chDir
above), see the signature OS
(and POSIX
, if provided). There
are many many other useful modules in the standard basis as well.
[ Back to the Table of Contents ]
I recommend using Emacs to edit your SML programs and also to manage interaction with the SML/NJ system. To do this, you should incorporate the "sml mode" into your emacs startup file. The relevant emacs lisp files can be found in the same directory tree as the SML/NJ system itself. For example, from Unix machines in the Computer Science Department, you can simply add the line
(load "/usr/local/lib/sml/sml-mode/sml-site")
to your .emacs
file
so that the next time you start Emacs, the sml mode will be
present. From the Andrew network, you can find the emacs lisp files in the
15-411 course directory.
With the sml mode, a special editing mode will be invoked any
time you edit a file with an appropriate extension (such as
".sml
"; other
extensions can be specified in the init.el
file). As in other special editing modes, using the Tab key or
Control-j will cause emacs to attempt to indent your code in a
pleasing way. Control-c followed by Tab will indent the current
region. Since SML's syntax is rather complex, the sml mode
indentation can be rather haphazard at times. Still, many people
find it to be quite useful. A particularly useful key combination
is "Meta" along with a vertical bar ("|");
this creates a template for an arm of a case expression or clause
of a function.
To run SML/NJ from Emacs, make sure that the emacs variable sml-program-name
is set to
"sml
" (which
is the default), and then type M-x sml
(that is, "Meta" along with "x
",
followed by "sml
").
This will start up the SML/NJ system as an inferior shell
process. There are several useful emacs commands for interacting
with the inferior sml shell. You can find documentation for them
by hitting Control-h m. Some of the most basic commands are
C-cC-l | save the current buffer and then "use" the file |
C-cC-r | send the current region to the sml shell |
C-c` | find the next error message and position the cursor on the corresponding line in the source file |
C-cC-s | split the screen and show the sml shell |
[ Back to the Table of Contents ]
As with most compilers, the SML/NJ system oftens produce error messages that can be hard to decipher. The problem is compounded by the fact that SML supports polymorphic type inference, which makes it very difficult for the compiler to figure out precisely the real source of a type error. On the other hand, once all of the compile-time type errors are removed, it is often the case that the bulk of the bugs have already been stamped out. In practice, SML programs often work the first time, once all of the type errors reported by the compiler have been removed!
The most common kind of error is the simple type mismatch. For
example, suppose we have the following code in a file called myprog.sml
:
fun inc x = x + 1 fun f n = inc true
Notice that a semi-colon is not needed here, since the end-of-file marker will serve the same purpose. Now, if we load this file, we get the following error message:
use "myprog.sml";
myprog.sml:2.11-2.18 Error: operator and operand don't agree (tycon mismatch) operator domain: int operand: bool in expression: inc true
The error message indicates that the expression inc true
, on line 2, between
columns 11 and 18, is guilty of a type mismatch. The function inc
is being applied to an
argument of type bool
in
this expression, but its domain (argument type) is int
.
If we are using the sml mode in Emacs, then typing C-c C-l
in an edit buffer
containing the program would cause the SML/NJ system to load the
file, and then typing C-c `
would move the edit cursor to the exact point in the program
corresponding to this error message.
Some of the arithmetic operators, such as +
,
*
, -
,
=
, and so on, are
"overloaded", in the sense that they can be used with
either integer arguments or real arguments. This overloading
feature leads to possible source of confusion for the novice SML
programmer. Consider, for example, the following declaration of a
function for squaring numbers:
fun square x = x * x
The following error message is given for this program:
myprog.sml:1.18 Error: overloaded variable not defined at type symbol: * type: 'Z
Because there is not enough information in this program to
determine whether the *
is for integers or for reals, an error message is generated to
complain about the inability to "resolve" the
overloading.
The simple fix for this kind of error is simply to declare the type of one of the arguments to (or the result of) the arithmetic operation. For example, here are three versions that work:
fun square' x = x * x : int fun square'' (x : int) = x * x fun square''' x : int = x * x
The first version explicitly declares the type of the second
argument to the *
operator. The second version declares the type of the argument.
Finally, the third version declares the type of the result of the
square'''
function. All
three versions allow the SML type inference mechanism to infer
the types of the identifiers in the declarations.
It is not uncommon to spend quite a long time tracking down
the source of a type error. (Actually, the time spent doing this
is almost always much less than the time it takes to track down
the same error without the benefit of static typechecking!) A
common way to narrow down the possibilities, and also to improve
the precision of the error messages produced by the compiler, is
to annotate the program with explicit types, in the way that we
have done above. It is particularly helpful to annotate the types
of function parameters, as we have done in square''
above. This is similar to the declaration of parameter types in
languages such as C and Pascal. Of course, in those languages the
declarations are required; in SML they are optional.
One of the most fundamental changes in the 1997 revision of the SML language is that it now enforces something called the value restriction. Essentially, this restricts polymorphism to expressions that clearly are values, specifically single identifiers and functions. When this restriction is violated, the error message, "nongeneric type variable," is given. For example, the following program results in this error:
fun id x = x fun map f nil = nil | map f (h::t) = (f h) :: (map f t) val f = map id
The message given is
myprog.sml:6.1-6.14 Error: nongeneralizable type variable f : 'Y list -> 'Y list
which indicates that the expression map
id
is polymorphic, but not syntactically a value
(that is, not an identifier or lambda expression), and hence the
attempt to use it as a polymorphic value (by binding f
to it) violates the value
restriction. The reasons for this restriction are beyond the
scope of this document, but are explained in several papers as
well as the textbook by Paulson.
Because the syntax of SML is rather complex, there are several common errors that novices tend to make. One of the most common has to do with the syntax of patterns in clausal-form function declarations and case expressions. Consider the following code:
datatype 'a btree = Leaf of 'a | Node of 'a btree * 'a btree
fun preorder Leaf(v) = [v] | preorder Node(l,r) = preorder l @ preorder r
The SML/NJ system complains vigorously over this:
myprog.sml:4.5-5.48 Error: data constructor Leaf used without argument in pattern myprog.sml:4.5-5.48 Error: data constructor Node used without argument in pattern myprog.sml:4.1-5.48 Error: pattern and expression in val rec dec don't agree (tycon mismatch) pattern: 'Z -> ('Z * 'Z) list expression: 'Z -> 'Z * 'Z -> ('Z * 'Z) list in declaration: preorder = (fn arg => (fn <pat> => <exp>))
The problem here is that Leaf and Node are patterns that are syntactically separate from, respectively, the (v) and (l,r) patterns. The (admittedly strange) syntax of SML requires extra parenthesization:
fun preorder (Leaf v) = [v] | preorder (Node(l,r)) = preorder l @ preorder r
This is true in all contexts where patterns are used, including clausal-form function declarations, case expressions, and exception handlers.
Another rather confusing part of the syntax has to do with the interaction between case expressions, exception handlers, and clausal-form function declarations. Consider the following function, taken in slightly modified form from the SML/NJ library (which is described later):
datatype 'a option = NONE | SOME of 'a
fun filter pred l = let fun filterP (x::r, l) = case (pred x) of SOME y => filterP(r, y::l) | NONE => filterP(r, l) | filterP ([], l) = rev l in filterP (l, []) end
In this example, the local function filterP
is defined in two clauses, the first handling the case of a
non-empty list argument, and the second handling the empty list.
In the first clause, a case expression is used. The syntactic
ambiguity arises from the fact that it takes too much
``lookahead'' to figure out whether or not the second clause of filterP
is actually the third
arm of the case expression. This leads to the following rather
cryptic error message:
myprog.sml:8.23-8.28 Error: syntax error: deleting EQUALOP ID myprog.sml:9.3-9.13 Error: syntax error: deleting IN ID
As before, parenthesization fixes the problem:
fun filter pred l = let fun filterP (x::r, l) = (case (pred x) of SOME y => filterP(r, y::l) | NONE => filterP(r, l)) | filterP ([], l) = rev l in filterP (l, []) end
Alternatively, in this example we can also exchange the two
clauses of filterP
:
fun filter pred l = let fun filterP ([], l) = rev l | filterP (x::r, l) = case (pred x) of SOME y => filterP(r, y::l) | NONE => filterP(r, l) in filterP (l, []) end
As with many programming languages, the basic advice to follow is: When in doubt, parenthesize.
[ Back to the Table of Contents ]
The SML language encourages modularity, and in practice separate modules tend to be placed into separate files. While this is useful during development, it becomes highly inconvenient when you finally "ship" your finished program to your users. The standard way to ship a program, then, is to save an image of the system heap after all of your files have been loaded. This is referred to as "exporting" the heap, and results in a single file that contains the state of your SML world at the time you performed the export operation.
You can export a heap with the function exportML
.
For example, to save the heap image in a file called mysml
, the following should be
typed to the SML/NJ prompt:
-SMLofNJ.exportML "mysml";
This will save the current state of the SML/NJ system into the
file mysml
. This can
then be executed later by running the sml system with the
command-line option, "@SMLload=mysml
".
This will restart the SML/NJ system at the same point in which
the exportML
took place.
(Note that exportML
is
not supported for the Macintosh System 7 version.)
There is also a function called exportFn
,
which saves an SML state as a function that takes in the shell
command-line arguments when restarted. The functionality of exportFn
is
SMLofNJ.exportFn : string * (string * string list -> OS_Process.status) -> unit
The first argument is the name of the file to contain the
exported heap image. The second argument is a function that takes
the command line and command line arguments (as strings) and
returns a process-status value (usually OS_Process.success
or OS_Process.failure
).
[ Back to the Table of Contents ]
In addition to the standard basis, the SML/NJ system comes
with several tools and libraries. The ml-lex
and ml-yacc
programs
perform automatic generation of lexical analyzers and LALR(1)
parsers, respectively. Documentation for these
and other useful tools can be found at the SML/NJ documentation page.
An extensive library of useful data structures and functions are also available, at http://cm.bell-labs.com/cm/cs/what/smlnj/doc/smlnj-lib/index.html.
Finally, extensions to SML for concurrency and interaction with the X window system are supported by the Concurrent ML and eXene extensions to SML, available at http://cm.bell-labs.com/cm/cs/who/jhr/sml/eXene/index.html.
[ Back to the Table of Contents ]