Issue: SHARP-STAR-DELIMITERForum: Editorial
References: *READ-SUPPRESS* (p345), #* (p355)
Category: CLARIFICATION
Edit history: 05-Mar-91, Version 1 by Pitman
15-Mar-91, Version 2 by Pitman
Status: For X3J13 consideration
Problem Description:
What constitutes a delimiter at the end of #* ?
In the description of #* on p355, CLtL says:
``A series of binary digits (0 and 1) preceded by #* is read
as a simple bit-vector. ... If an unsigned decimal integer
appears between the # and *, it specifies explicitly the
length of the vector. In that case, it is an error if too
many bits are specified, and if too few are specified the
last one (it is an error if there are none in this case) is
used to fill all remaining elements of the bit-vector.
... The notation #* denotes an empty bit-vector, as does
#0* (which is legitimate because it is not the case that
too few elements are specified.)''
This seems to imply that the bit vector ends when the sequence
of 0's and 1's ends.
In the discussion of *READ-SUPPRESS* on p345, it says:
``The #* construction always scans over a following token and
produces the value NIL. It will not signal an error even if
the token does not consist solely of the characters 0 and 1.''
This seems to imply that the bit vector ends at the first normal
delimiter.
Proposal (SHARP-STAR-DELIMITER:NORMAL-DELIMITER):
Specify that a bit vector is delimited like any normal numeric
token, and that an error of type READER-ERROR is signalled if all
the characters in the token are not 0's and 1's. Clarify
that | and \ are not permitted as part of the token.
Rationale:
This will seem most natural to people already familiar with the
parsing of other tokens in the language. This is most consistent
with the wording in *READ-SUPPRESS*, which is slightly more explicit
than the wording in #* itself.
Also, this is safest for interchange with other dialects since it
forces users not to rely on non-standard delimiters (like "2" in
Test Case #1 below), and therefore it makes it more likely that
when in a read-suppress context in another dialect, the tokenization
a CL program has used will be the same as the tokenization such an
`other dialect' expects.
Proposal (SHARP-STAR-DELIMITER:NOT-ZERO-OR-ONE):
Specify that a bit vector is delimited by any character that
is not a 0 or 1. Correct the description of *READ-SUPPRESS*
to indicate that #* stops reading and returns NIL as soon as
any character other than 0 or 1.
Rationale:
This prefers a very literal reading of the wording in CLtL's description
of #*, and reverses the behavior of *read-suppress* to be consistent.
Test Cases:
These should signal an error under NORMAL-DELIMITER, and
should return 3 under NOT-ZERO-OR-ONE:
1. (LENGTH '(#*012 3))
2. (LENGTH '(#*0123 4))
These should return 1 under NORMAL-DELIMITER, and
should return 2 under NOT-ZERO-OR-ONE:
3. (LENGTH '(#+NO-SUCH-FEATURE #*012 3))
4. (LENGTH '(#+NO-SUCH-FEATURE #*0123 4))
These should signal an error under NORMAL-DELIMITER since
# is not a terminating readmacro, and should return 2 under
proposal NOT-ZERO-OR-ONE. (Note that in case 5 the two
tokens are both bit-vectors under NOT-ZERO-OR-ONE, but in
cases 6 and 7 the second token is a symbol.)
5. (LENGTH '(#*01#*01))
6. (LENGTH '(#*012#*012))
7. (LENGTH '(#*0123#*0123))
These should return 0 under NORMAL-DELIMITER, and
should return 1 under proposal NOT-ZERO-OR-ONE. (Note that
in case 8 the token that is seen is a bit-vector under
NOT-ZERO-OR-ONE, but in cases 9 and 10 it is a symbol.)
8. (LENGTH '(#+NO-SUCH-FEATURE #*01#*01))
9. (LENGTH '(#+NO-SUCH-FEATURE #*012#*012))
10. (LENGTH '(#+NO-SUCH-FEATURE #*0123#*0123))
Current Practice:
Symbolics Genera 8.1 implements NOT-ZERO-OR-ONE.
Symbolics Cloe implements neither (being closer to the confusing
thing that CLtL actually demands).
Specific results:
Cloe Genera
#1 3 3
#2 3 3
#3 1 2
#4 1 2
#5 2 2
#6 2 2
#7 2 2
#8 0 1
#9 0 1
#10 0 1
Moon, commenting on the test cases for the previous version of this
writeup, says MCL 2.0 is similar to Genera, but differs on one
or two examples. [New sample data for this version not available.]
JonL says Lucid has always supported NORMAL-DELIMITER.
Cost to Implementors:
For implementations that are not already compatible, the cost is
probably relatively small.
Cost to Users:
Problem situations could be mechanically detected, and semi-automatically
corrected in a straightforward way.
Cost of Non-Adoption:
Implementations could differ on how #* expressions were parsed,
causing portability problems.
Benefits:
Cost of non-adoption is avoided.
Aesthetics:
The effect of NOT-ZERO-OR-ONE, while seemingly what CLtL intends,
is often suprising (i.e., unintuitive) to new users.
The NORMAL-DELIMITER is probably more aesthetic since it uses
conventional rules for delimiters.
Discussion:
Moon, Barmar, and Pitman support NORMAL-DELIMITER.
Moon says ``if we vote for NOT-ZERO-OR-ONE then I think we're
inconsistent if we don't say that (length '(#o12399)) is 2.''
Barmar disagrees that this particular consistency is in issue.
Moon cited another test case of #o2+2.
JonL noted that Lucid barfs not only on #o2+2 but even on #o12399.