SYNOPSIS
#include <arg.h>
arg_parse(int argc, char **argv, [char *formatstr, char *paramptrs, char *docstr, char *docparams]*, 0);
double expr_eval(char *str);
DESCRIPTION
arg_parse is a subroutine for parsing and conversion of
command-line arguments. This parser is an alternative to
the common method of argument parsing, which is an ad-hoc
parser in each program, typically written with a large,
cumbersome switch statement. arg_parse allows a command-
line parser to be described very concisely while retaining
the flexibility to handle a variety of syntaxes.
The parser has a number of features:
o arbitrary order of flag arguments
o automatic argument conversion and type checking
o multiple-character flag names
o required, optional, and flag arguments
o automatic usage message
o subroutine call for exotic options (variable number of parameters)
o modularized parsers encourage standardized options
o expression evaluation
o works either from argv or in interactive mode, as a primitive language parser and interpreter
o concise specification
o easy to use
It is hoped that use of arg_parse will help standardize
argument conventions and reduce the tedium of adding options
to programs.
arg_parse(argc, argv,
"", "Usage: prog [options]",
"%S", &file, "set output file",
"[%d]", &level, "set recursion level [default=%d]", level,
"-size %F %F", &xsize, &ysize, "set x and y sizes",
"-debug", ARG_FLAG(&debug), "turn on debugging",
0);
The arg_parse call defines the program's arguments, in this
case: one required argument (a filename), an optional
argument (an integer level number), an optional flag with
two parameters (floating point size), and a simple flag
(boolean debug flag). If the above program (call it prog)
were run with
prog joe.c
it would set file to joe.c, and set debug to 0, and if run
with
prog -size 100 400/3 joe.c -debug 5
it would set file="joe.c", level=5, xsize=100, ysize=133.33,
and debug=1. In all programs using arg_parse, a hyphen
arguments elicits a usage message, so the command
prog -
results in the printout
Usage: prog [options]
%S set output file
[%d] set recursion level [default=3]
-size %F %F set x and y sizes
-debug turn on debugging
TERMINOLOGY
&xsize T{ Pointer to a parameter variable through which
converted values are stored. T} _ doc string "set output
file" T{ Documentation string describing the option's
effect. T} _ form "-res%d", &r, "set res" T{ Format string,
parameter pointers, and documentation describing an option.
T}
"[%d]", &level, "set level"
We will describe the syntax of formlists first, then the
method for matching arguments to forms.
FORMLIST SYNTAX
The syntax and conversion rules for parsing are specified in
the formlist following argc and argv in the arg_parse call.
arg_parse reads its subroutine parameters using the
varargs(3) convention for run-time procedure calls, so it is
crucial that the formlist be terminated with a 0. Each form
consists of a scanf-style format string, a list of parameter
pointers, a documentation string, and a list of
documentation parameters. In some cases the paramptr and
docparam lists will be empty, but the format string and doc
string arguments are mandatory.
Format String
The format string consists of a flag string followed by
parameter conversion codes (if any). A flag is a hyphen
followed by a string. None of the characters in the string
may be a '%' and the string must not begin with a numeral.
Acceptable conversion codes in the format string are a '%'
followed by any single character codes accepted by scanf
plus the new conversion 'S':
CODE TYPE
%c char
%d int
%f float
%F double
%s char array
%S char *
... (see scanf(3) for a complete list)
"-pt [%F%F%F[%F]]" a flag with 0, 3, or 4 parameters
Since assignments of args to parameter pointers are done
left-right within the form, no conversion codes can follow
the first ']'. In fact, the ]'s are optional since they can
be inferred to be at the end of the format string. Spaces
between conversion codes are optional and ignored.
Following the format string is the list of parameter
pointers, whose number must match the number of conversion
codes in the format string, like the arguments to scanf or
printf.
Form Types
There are six form types. In addition to the ones we've
seen, regular arguments and flags with parameters, there are
several others for more exotic circumstances: simple flags,
nop forms, subroutine flags, and sublists.
A simple flag is a flag option with no parameters that sets
a boolean variable to 1 if that flag appears in argv, else
0. A pointer to the boolean (int) variable is passed after
the format string using the ARG_FLAG macro. For example,
ARG_FLAG(&debug) will set the boolean variable debug.
A nop form is a documentation string with no associated
flags or arguments that appears in the usage message but
does not affect parsing. Nop forms have a format string and
a doc string, the former containing neither a flag nor a
conversion code. Example:
"", "This program converts an AIS picture file to PF
format",
When the usage message is printed, the doc string is
indented if the format string is non-null.
A subroutine flag is an option that calls a user-supplied
action subroutine every time it is used rather than using
arg_parse's format conversion and parameter assignment.
Subroutine flags provide a trapdoor whereby the programmer
can do custom conversion or processing of parameters with
arbitrary type and number. To parse our list of people with
a subroutine flag instead, we use the form:
"-people", ARG_SUBR(arg_people), "people names"
where arg_people is a subroutine to gobble the parameters,
just like in the example near the end of this document.
The macro ARG_SUBR takes the name of a subroutine to call
when the flag is encountered. The parameter arguments
following the flag in argv are packaged into a new argument
vector av along with ac, and the subroutine is called with
these two arguments. In our list-of-people example, the
command prog foo -people ned alvy bruce -debug would call
arg_people with ac=3 and av={"ned","alvy","bruce"}.
Whereas flags with arguments had the simple side effect of
setting a variable, subroutine flags can have arbitrarily
complex side effects, and can be used multiple times.
Subroutine flags can also be flagless; that is, they can
have null format strings. In this case, any ``leftover''
regular arguments are passed to the supplied action
subroutine. Flagless subroutines are useful for reading
lists of filenames.
The final form type is a sublist. A sublist is a
subordinate parser defined as another formlist. Sublists
can be used to build a tree of parsers, for example a 3-D
graphics program might have a standard set of commands for
controlling the display (setting the output device, screen
window, and colors) and also a standard set of commands for
transforming 3-D objects (rotation, scaling, etc.). Within
the display command parser there could well be a standard
set of commands for each output device (one for Suns,
another for Versatec plotters, etc.). Using sublists we can
prepare a standard parser for display commands and keep it
in the source for the display library, a parser for the
transformation commands in the transformation library, and
so on, so that the parser for each graphics application can
be very simple, merely listing its own options and then
invoking the standard parsers for the major libraries it
programmers and reducing option confusion among users.
To invoke a sublist we use the form:
"-display", ARG_SUBLIST(form), "display commands"
The ARG_SUBLIST macro expects a structure pointer of type
Arg_form * as returned from the arg_to_form routine. Its
use is illustrated in an example later.
MATCHING ARGUMENTS TO FORMS
arg_parse steps through the arguments in argv from left to
right, matching arguments against the format strings in the
formlist. Flag arguments (simple flags or flags with
parameters) can occur in arbitrary order but regular
arguments are matched by stepping through the formlist in
left to right order. For this reason regular arguments are
also known as positional arguments. Matching of parameters
within an option is also done in a left-to-right, greedy
fashion within the form without regard for the parameter
types. No permutation of the matching is done to avoid
conversion errors. To illustrate, in our prog above, if we
changed the size option to make the second parameter
optional:
"-size %F[%F]", &xsize, &ysize, "set sizes",
then the command:
prog -size 100 -debug joe.c
succeeds because it is clear that only one parameter is
being supplied to size, but if we try:
prog -size 100 joe.c -debug
then arg_parse will attempt to convert "joe.c" via %F into
ysize and fail, returning an error code.
The matching algorithm for subroutine flags and sublists
varies somewhat from that for the other form types. For
most types, arg_parse grabs as many arguments out of argv as
the form can take up to the next flag argument (or the end
of argv), but for subroutine flags and sublists, all
arguments up to the next flag argument are grabbed and
bundled into a smaller argument vector (call it av). (For
matching purposes, a flag argument is an argument that
begins with a hyphen followed by any character except digits
and '.'.) The new argument vector is passed to the action
routine in the case of subroutine flags or recursively to a
sub-parser in the case of sublist flags.
The sub-parser invoked by a sublist flag does matching
identically. Normally the entire formlist tree is traversed
depth-first whenever a search for a flag is being made. If
there are no flag duplicates between different levels of the
form tree then the structure of the tree is irrelevant; the
user needn't be conscious of the command grouping or of the
sublist names. But if there are name duplicates, for
example if there were a -window option in both the display
and transformation parsers, then explicit control of search
order within the tree is needed. This disambiguation
problem is analogous to pathname specification of files
within a UNIX directory tree. When explicit sublist
selection is needed it is done using the sublist flag
followed by the arguments for the sub-parser, bracketed with
-{ and -} flags. For example, if there were more than one
window option, to explicitly select the one in the display
parser, we type:
-display -{ -window 0 0 639 479 -}
The brace flags group and quote the arguments so that all of
the enclosed arguments will be passed to the sub-parser.
Without them the argument matcher would think that display
has no parameters, since it is immediately followed by a
flag (-window). Note that in csh, the braces must be
escaped as -\{ and -\}.
[If you can think of a better way to do matching please tell
me! -Paul].
The matching is checked in both directions: in the
formlist, all required arguments must be assigned to and
most flags can be called at most once, and in argv, each
argument must be recognized. Regular arguments are required
if they are unbracketed, and optional if they are bracketed.
Unmatched forms for required arguments cause an error but
unmatched forms for optional or flag arguments do not; they
are skipped. A warning message is printed if a simple flag
or flag with parameters appears more than once in argv.
Note that it is not an error for subroutine flags to appear
more than once, so they should be used when repeats of a
flag are allowed. Unmatched arguments in argv cause an
``extra argument'' error.
A hyphen argument in argv causes arg_parse to print a usage
message constructed from the format and documentation
that begin or end in the letter 'd' work in degrees. Thus,
"exp(-.5*2^2)/sqrt(2*pi)" is a legal expression. All
expressions are computed in double-precision floating point.
Note that it is often necessary to quote expressions so the
shell won't get excited about asterisks and parentheses.
The expression evaluator expr_eval can be used independently
of arg_parse.
INTERACTIVE MODE
If the lone argument -stdin is passed in argv then arg_parse
goes into interactive mode. Interactive mode reads its
arguments from standard input rather than getting them from
the argument vector. This allows programs to be run semi-
interactively. To encourage interactive use of a program,
one or more of the options should be a subroutine flag. One
could have a -go flag, say, that causes computation to
commence. In interactive mode the hyphens on flags are
optional at the beginning of each line, so the input syntax
resembles a programming language. In fact, scripts of such
commands are often saved in files.
EXAMPLE
The following example illustrates most of the features of
arg_parse.
/* tb.c - arg_parse test program */
#include <stdio.h>
double atof();
#include <arg.h>
static double dxs = 1., dys = .75;
static int x1 = 0, y1 = 0, x2 = 99, y2 = 99;
static char *chanlist = "rgba";
int arg_people(), arg_dsize();
Arg_form *fb_init();
main(ac, av)
int ac;
char **av;
{
int fast, xs = 512, ys = 486;
double scale = 1.;
"-ch %S", &child, "set child name",
"-srcsize %d[%d]", &xs, &ys, "set source size
[default=%d,%d]", xs, ys,
"-dstsize", ARG_SUBR(arg_dsize), "set dest
size",
"-fb", ARG_SUBLIST(arg_fb), "FB COMMANDS",
0) < 0)
exit(1);
printf("from=%s to=%s scale=%g fast=%d child=%s
src=%dx%d dst=%gx%g\n",
fromfile, tofile, scale, fast, child, xs, ys,
dxs, dys);
printf("window={%d,%d,%d,%d} chan=%s\n", x1, y1, x2,
y2, chanlist);
}
static arg_people(ac, av)
int ac;
char **av;
{
int i;
for (i=0; i<ac; i++)
printf("person[%d]=%s\n", i, av[i]);
}
static arg_dsize(ac, av)
int ac;
char **av;
{
if (ac<1 || ac>3) {
fprintf(stderr, "-dsize wants 1 or 2 args\n");
exit(1);
}
/* illustrate two methods for argument conversion */
dxs = atof(av[0]); /* constant conversion */
if (ac>1) dys = expr_eval(av[1]); /* expression
conversion */
else dys = .75*dxs;
}
Arg_form *fb_init()
{
return arg_to_form(
"-w%d%d%d%d", &x1, &y1, &x2, &y2, "set screen
window",
"-ch%S", &chanlist, "set channels
[default=%s]", chanlist,
0);
}
In this example we have two required arguments, one optional
argument, and a flagless subroutine (arg_people) to gobble
the remaining regular arguments. The two required arguments
illustrate the differences between %S and %s, and the
advantages of the former. The -srcsize and -dstsize forms
illustrate two different ways to get a flag with either one
or two parameters. Note in the arg_dsize routine that the
expression evaluator expr_eval is just as easy to use as
atof. A small sublist shows an example of command name
ambiguity in the flag -ch.
Below are the results of several sample runs.
o tb one two
from=one to=two scale=1 fast=0 child=jim src=512x486
dst=1x0.75
window={0,0,99,99} chan=rgba
Only the two required args are specified here and
everything else defaults.
o tb -fast -srcsize 100 1+2 one two -dstsize 2 -ch amy
-w 1 2 3 4 "sqrt(2)"
from=one to=two scale=1.41421 fast=1 child=amy
src=100x3 dst=2x1.5
window={1,2,3,4} chan=rgba
This illustrates expression evaluation, the precedence
of the first -ch flag over the one in the sublist, and
easy access to a non-ambiguous sublist option, -w.
o tb -fb -\{ -ch abc -w 9 8 7 6 -\} -ch -\{ -jo -\} A B
44 larry curly moe
person[0]=larry
person[1]=curly
person[2]=moe
from=A to=B scale=44 fast=0 child=-jo src=512x486
dst=1x0.75
window={9,8,7,6} chan=abc
This shows access to a ``shadowed'' sublist option, -ch,
and escaping a parameter string that happens to begin
with a hyphen, -jo, with braces, plus the use of a
flagless subroutine to pick up extra regular arguments.
RETURN VALUE
arg_parse returns a negative code on error, otherwise 0.
The file arg.h contains definitions for the error codes:
l l.
ARG_BADCALL programmer error, bad formlist
ARG_BADARG bad argument in argv
ARG_MISSING required argument or parameter to flag
missing
ARG_EXTRA argv contains an extra, unrecognizable
argument
NOTE
arg_parse modifies argv as a side-effect to eliminate the -{
and -} arguments.
COMPILING
If arg_parse is installed in libarg.a, compile with cc ...
-larg -lm.
AUTHOR
Paul Heckbert, ph@cs.cmu.edu, April 1988