String Extensions
The String extensions Library
Designed by the Gwydion Project
1. Introduction
String-extensions is a library of routines for working with characters and strings. String-extensions exports these modules:
- String-Conversions
- This module consists of various useful conversions involving strings.
- Character-type
- This module is a Dylanized version of the C library ctype.h
- String-hacking
- This module exports miscellanous functions and data structures that are useful when working with strings and characters.
- Substring-search
- This module contains methods for searching for fixed substrings rather than general regular expressions.
2. The String-Conversions Module
The String-Conversions module consists of various useful conversions involving strings. They are:
string-to-integer(string, #key base) => integer [Function]
integer-to-string(integer, #key base) => string [Function]
digit-to-integer(character) => integer [Function]
integer-to-digit(integer) => character [Function]
Base defaults to 10, and is the radix for the number system to convert from/to. Bases below 2 are errors, as are bases above 36. When converting from a string, the string must exactly describe a number, with no excess characters. Digit-to-integer will signal an error if the digit is non-alphanumeric. Errors will be signalled for all invalid input.
as(<string>, character) [G.F. Method]
Turns a character into the appropriate string of length one.
3. The Character-type Module
Character-type is a Dylanized version of the C library ctype.h It contains the following functions:
Function and Argument Type
|
Returns #t for these characters |
|
a-zA-Z |
alphabetic?(character)
|
a-zA-Z (same as alpha?) |
digit?(character)
|
0-9 |
alphanumeric?(character)
|
a-zA-Z0-9 |
whitespace?(character)
|
Space, tab, newline, formfeed, carriage return |
uppercase?(character)
|
A-Z |
lowercase?(character)
|
a-z |
hex-digit?(character)
|
0-9a-f |
punctuation?(character)
|
,./<>?;\:"|[]{}!@#$%^&*()-=_+`~ |
graphic?(character)
|
alphanumeric or punctuation |
printable?(character)
|
graphic or whitespace |
control?(character)
|
not printable |
4. The String-hacking Module
The String-hacking module exports miscellanous functions and data structures that are useful when working with strings and characters.
add-last(stretchy-sequence, object) => stretchy-sequence [Generic Function]
add-last(string, character) => string [G.F. Method]
Like add except its guarenteed to add the character to the end of the string.
predecessor(character) => character [Function]
Get the character before this character. Equivalent to
as(<character>, -1 + as(<integer>, character))
successor(character) => character [Function]
Get the character after this character. Equivalent to
as(<character>, 1 + as(<integer>, character))
case-insensitive-equal(object1, object2) [Generic Function]
case-insensitive-equal(string1, string2) [G.F. Method]
case-insensitive-equal(character1, character2) [G.F. Method]
Does a case insensitive equality test. Methods are provided only for strings and characters, not general collections.
<character-set> [Sealed Abstract Class]
<case-sensitive-character-set> [Class]
<case-insensitive-character-set> [Class]
A <character-set> is a non-mutable subclass of <collection>, and is conceptually an unordered set of characters. Dylan collection elements always have keys, so to fit sets into Dylan, the key of an element of a character set is the element itself. There are two instantiable subclasses of <character-set>, <case-sensitive-character-set> and <case-insensitive-character-set>. <character-set> is not instantiable; one must always specify one of the instantiable subclasses when creating a character set.
There are two ways of making a character set. The first is a method for make using the description: keyword. The value that follows the description: keyword is a string that describes the set using a notation like a regular expression character set, except without the [ and ] delimiters. For example,
make(<case-sensitive-character-set>, description: "a-z")
would be the set of all lowercase alphabetic characters.
A second way to create character sets is to use an as method. The as method basically takes a collection of characters and discards the keys of these characters. Example:
as(<case-insensitive-character-set>,
"abcdefghijklmnopqrstuvwxyz")
is again the set of all lowercase alphabetic characters. It is important to realize that the as method does not take a description:
as(<case-sensitive-character-set>, "a-z")
returns the set of a, -, and z, not the set of all lowercase alphabetic characters.
The most useful operation on character sets is member?, which does what one would expect. Another useful operation is the forward-iteration-protocol. This basically calls member? on every possible character until it finds a character that is a member of the set. This means that in a <case-insensitive-character-set>, both a and A will come up.
<byte-character-table> [Class]
A byte-character-table is a vector that uses byte characters as indices instead of integers. The following are equivalent:
regular-vector[as(<integer>, character)]
byte-character-table[character]
<byte-character-table> has absolutely no relation to <table>. It is simply a <mutable-explicit-key-collection>.
5. The Substring-search Module
Substring-search contains methods for searching for fixed substrings. It is as similar to the regular-expression module as we could make it. (See the document The Regular Expressions Library for details about regular expressions, and about the "make-fooer" convention. However, note that while the "make-fooer" convention is obsolete for regular expression functions, it is not obsolete for substring searching.) Substring functions work only on byte strings, and are always case sensitive.
substring-position [Generic Function]
(big-string, search-for-string, #key start, end)
=> position-or-false;
Returns the position of the search-for-string in the big-string (or that portion of the big-string specified by start: and end:). This search is always case sensitive.
This function uses the Boyer-Moore algorithm for long strings, and a simple dumb search for short strings. It should yield good performance under all circumstances.
make-substring-positioner [Generic Function]
(search-for-string) => an anonymous positioner
method (big-string, #key start, end) => position-or-false
substring-replace [Generic Function]
(big-string, search-for-string, replace-with-string, #key count, start, end)
=> replaced-string
Replaces the substring, or the first count instances of it if count: is specified. Note this function does not support start: or end:.
make-substring-replacer [Generic Function]
(search-for :: <byte-string>, #key replace-with)
=> an anonymous function replacer that is either
method (big-string, #key count, start, end) => new-string
or
method (big-string, replace-with-string, #key count, start, end)
Copyright 1994, 1995, 1996, 1997 Carnegie Mellon University. All rights reserved.
Send comments and bug reports to gwydion-bugs@cs.cmu.edu