CLHS: Issue SHARP-STAR-DELIMITER Writeup

Issue SHARP-STAR-DELIMITER Writeup

Issue:        SHARP-STAR-DELIMITER
Forum:	      Editorial

References:   *READ-SUPPRESS* (p345), #* (p355)

Category:     CLARIFICATION

Edit history: 05-Mar-91, Version 1 by Pitman

	      15-Mar-91, Version 2 by Pitman

Status:	      For X3J13 consideration


Problem Description:


  What constitutes a delimiter at the end of #* ?


  In the description of #* on p355, CLtL says:


    ``A series of binary digits (0 and 1) preceded by #* is read 

      as a simple bit-vector. ... If an unsigned decimal integer 

      appears between the # and *, it specifies explicitly the 

      length of the vector.   In that case, it is an error if too

      many bits are specified, and if too few are specified the 

      last one (it is an error if there are none in this case) is

      used to fill all remaining elements of the bit-vector.

      ... The notation #* denotes an empty bit-vector, as does

      #0* (which is legitimate because it is not the case that

      too few elements are specified.)''


  This seems to imply that the bit vector ends when the sequence

  of 0's and 1's ends.


  In the discussion of *READ-SUPPRESS* on p345, it says:


   ``The #* construction always scans over a following token and

     produces the value NIL.  It will not signal an error even if

     the token does not consist solely of the characters 0 and 1.''


  This seems to imply that the bit vector ends at the first normal

  delimiter.


Proposal (SHARP-STAR-DELIMITER:NORMAL-DELIMITER):


  Specify that a bit vector is delimited like any normal numeric

  token, and that an error of type READER-ERROR is signalled if all

  the characters in the token are not 0's and 1's.  Clarify

  that | and \ are not permitted as part of the token.


  Rationale:


  This will seem most natural to people already familiar with the 

  parsing of other tokens in the language.  This is most consistent

  with the wording in *READ-SUPPRESS*, which is slightly more explicit

  than the wording in #* itself.


  Also, this is safest for interchange with other dialects since it 

  forces users not to rely on non-standard delimiters (like "2" in

  Test Case #1 below), and therefore it makes it more likely that

  when in a read-suppress context in another dialect, the tokenization

  a CL program has used will be the same as the tokenization such an

  `other dialect' expects. 


Proposal (SHARP-STAR-DELIMITER:NOT-ZERO-OR-ONE):


  Specify that a bit vector is delimited by any character that 

  is not a 0 or 1.  Correct the description of *READ-SUPPRESS*

  to indicate that #* stops reading and returns NIL as soon as

  any character other than 0 or 1.


  Rationale:


  This prefers a very literal reading of the wording in CLtL's description

  of #*, and reverses the behavior of *read-suppress* to be consistent.


Test Cases:


  These should signal an error under NORMAL-DELIMITER, and

  should return 3 under NOT-ZERO-OR-ONE:


    1. (LENGTH '(#*012 3))

    2. (LENGTH '(#*0123 4))


  These should return 1 under NORMAL-DELIMITER, and

  should return 2 under NOT-ZERO-OR-ONE:


    3. (LENGTH '(#+NO-SUCH-FEATURE #*012 3))

    4. (LENGTH '(#+NO-SUCH-FEATURE #*0123 4))


  These should signal an error under NORMAL-DELIMITER since

  # is not a terminating readmacro, and should return 2 under

  proposal NOT-ZERO-OR-ONE.  (Note that in case 5 the two

  tokens are both bit-vectors under NOT-ZERO-OR-ONE, but in 

  cases 6 and 7 the second token is a symbol.)


    5. (LENGTH '(#*01#*01))

    6. (LENGTH '(#*012#*012))

    7. (LENGTH '(#*0123#*0123))


  These should return 0 under NORMAL-DELIMITER, and

  should return 1 under proposal NOT-ZERO-OR-ONE.  (Note that

  in case 8 the token that is seen is a bit-vector under

  NOT-ZERO-OR-ONE, but in cases 9 and 10 it is a symbol.)


    8. (LENGTH '(#+NO-SUCH-FEATURE #*01#*01))

    9. (LENGTH '(#+NO-SUCH-FEATURE #*012#*012))

   10. (LENGTH '(#+NO-SUCH-FEATURE #*0123#*0123))


Current Practice:


  Symbolics Genera 8.1 implements NOT-ZERO-OR-ONE. 


  Symbolics Cloe implements neither (being closer to the confusing

  thing that CLtL actually demands).


  Specific results:


           Cloe   Genera


   #1        3       3

   #2        3       3

   #3        1       2

   #4        1       2

   #5        2       2

   #6        2       2

   #7        2       2

   #8        0       1

   #9        0       1

   #10       0       1


  Moon, commenting on the test cases for the previous version of this

  writeup, says MCL 2.0 is similar to Genera, but differs on one 

  or two examples.  [New sample data for this version not available.]


  JonL says Lucid has always supported NORMAL-DELIMITER.


Cost to Implementors:


  For implementations that are not already compatible, the cost is 

  probably relatively small.


Cost to Users:


  Problem situations could be mechanically detected, and semi-automatically

  corrected in a straightforward way.


Cost of Non-Adoption:


  Implementations could differ on how #* expressions were parsed,

  causing portability problems.


Benefits:


  Cost of non-adoption is avoided.


Aesthetics:


  The effect of NOT-ZERO-OR-ONE, while seemingly what CLtL intends,

  is often suprising (i.e., unintuitive) to new users.

  The NORMAL-DELIMITER is probably more aesthetic since it uses 

  conventional rules for delimiters.


Discussion:


  Moon, Barmar, and Pitman support NORMAL-DELIMITER.


  Moon says ``if we vote for NOT-ZERO-OR-ONE then I think we're

  inconsistent if we don't say that (length '(#o12399)) is 2.''

  Barmar disagrees that this particular consistency is in issue.

  Moon cited another test case of #o2+2.

  JonL noted that Lucid barfs not only on #o2+2 but even on #o12399.