Common Lisp the Language, 2nd Edition
Common Lisp provides a character data type; objects of this type represent printed symbols such as letters.
In general, characters in Common Lisp are not true objects; eq cannot be counted upon to operate on them reliably. In particular, it is possible that the expression
(let ((x z) (y z)) (eq x y))
may be false rather than true, if the value of z is a character.
If two objects are to be compared for ``identity,'' but either might be a character, then the predicate eql is probably appropriate.
X3J13 voted in March 1989 (CHARACTER-PROPOSAL)
to approve the following definitions and terminology for use in
discussing character facilities in Common Lisp.
A character repertoire defines a collection of characters independent of their specific rendered image or font. (This corresponds to the mathematical notion of a set, but the term character set is avoided here because it has been used in the past to mean both what is here called a repertoire and what is here called a coded character set.) Character repertoires are specified independent of coding and their characters are identified only with a unique character label, a graphic symbol, and a character description. As an example, table 13-1 shows the character labels, graphic symbols, and character descriptions for all of the characters in the repertoire standard-char except for #\Space and #\Newline.
Every Common Lisp implementation must support the standard character repertoire as well as repertoires named base-character, extended-character, and character. Other repertoires may be supported as well. X3J13 voted in June 1989 (MORE-CHARACTER-PROPOSAL) to specify that names of repertoires may be used as type specifiers. Such types must be subtypes of character; that is, in a given implementation the repertoire named character must encompass all the character objects supported by that implementation.
A coded character set is a character repertoire plus an encoding that provides a bijective mapping between each character in the set and a number (typically a non-negative integer) that serves as the character representation. There are numerous internationally standardized coded character sets.
A character may be included in one or more character repertoires. Similarly, a character may be included in one or more coded character sets.
To ensure that each character is uniquely defined, we may use a universal registry of characters that incorporates a collection of distinguished repertoires called character scripts that form an exhaustive partition of all characters. That is, each character is included in exactly one character script. (Draft ISO 10646 Coded Character Set Standard, if eventually approved as a standard, may become the practical realization of this universal registry.)
(X3J13 voted in June 1989 (MORE-CHARACTER-PROPOSAL) to specify that an implementation must document the character scripts it supports. For each script the documentation should discuss character labels, glyphs, and descriptions; any canonicalization processes performed by the reader that result in treating distinct characters as equivalent; any canonicalization performed by format in processing directives; the behavior of char-upcase, char-downcase, and the predicates alpha-char-p, upper-case-p, lower-case-p, both-case-p, graphic-char-p, alphanumericp, char-equal, char-not-equal, char-lessp, char-greaterp, char-not-greaterp, and char-not-lessp for characters in the script; and behavior with respect to input and output, including coded character sets and external coding schemes.)
In Common Lisp a character data object is identified by its character code, a unique numerical code. Each character code is composed from a character script and a character label. The convention by which a character script and character label compose a character code is implementation dependent. [X3J13 did not approve all parts of the proposal from its Subcommittee on Characters. As a result, some features that were approved appear to have no purpose. X3J13 wished to support the standardization by ISO of character scripts and coded character sets but declined to design facilities for use in Common Lisp until there has been more progress by ISO in this area. The approval of the terminology for scripts and labels gives a hint to implementors of likely directions for Common Lisp in the future.]
A character object that is classified as graphic, or displayable, has an associated glpyh. The glyph is the visual representation of the character. All other character data objects are classified as non-graphic.
This terminology assigns names to Common Lisp concepts
in a manner consistent with
related concepts discussed in various ISO standards for coded
character sets and provides a demarcation between standardization
activities. For example, facilities for manipulating characters,
character scripts, and coded character sets are properly defined
by a Common Lisp standard, but Common Lisp should not define
standard character sets or standard character scripts.