The Distributed Type Lattice

My system assumes a set of object types stored on various oracles. They can be arranged in a directed acyclic graph, where the edges represent subtype relations. (These graphs are often known as "type lattices", even when mathematical lattice properties need not hold.) Other types of graphs and edges can be imagined as well, but these notes will mainly focus on the subtype relation.

Conceptually, there is one global graph, though a particular oracle may only know of parts of the graph (due to propagation delays, types that are not widely publicized, or selective interest.)

Since the type graph is much more spread out (and less centrally controlled) than typical type graphs, we need the following:

A model of type relations that's as simple as feasible (so people know how to fit new types into the graph correctly without requiring specialized and complex knowledge)
Ways to avoid conflicts in type names (so two types don't get the same name)
Ways to allow the graph to be updated flexibly (minimize bottlenecks)
Ways to minimize the effects of bad changes to the graph (e.g. someone adds a "subtype" that isn't really a subtype)

The model is driven by pragmatic concerns. Formal elegance is desirable, to help the model scale well, but usefulness is the final arbiter.

It should be noted that the mutability of objects can play a large role in how a type model is designed. (For example, if objects are mutable, then history properties may need to be considered, as in [Wing-and-Liskov].) I'm going into this with the important assumption that objects are not mutable as far as the remote object invocation model is concerned. (You can, however, construct "new" objects based on old ones, and a particular reference may point to a different object, conceptually, over time. Reference shifting is the 'back door' to allowing things like files, directories, and search indexes to change. We'll see how far I can get with this model; I may have to revise it later.)

With that in mind, here's my conception of the type model:

Type definitions

All objects are instances of particular object types. The type of an object is explicitly noted; an object does not suddenly become another type unless it is explicitly converted. An object type is defined by its operations, attributes, and optional semantics. I assume you know what operations and attributes are; we can go into more detail later on as needed. Semantics can constrain the results of operations and the value of attributes. (For instance, in a "date" type, the integer value of the "month" attribute may be constrained to be in the range 1..12; and the value of this attribute may affect the legal range of the "day-of-month" attribute.)

Objects also (may) have encodings, but these are not defining characteristics of the type. (Reason: type definitions tell you the minimal information of an object; whereas encodings give you maximal information.)

Semantics are important to know for a type and its operations/attributes. In practice, sometimes they are expressed formally, sometimes informally. I plan to allow both, since in practice requiring everyone to adopt a single formal semantics language to express all their constraints will either make the cost of adding types prohibitive, or (more likely) make people simply omit semantic constraints that they intuitively know should be there, but can't express.

My system will allow semantics to be associated with a type, an operation, or an attribute. This information slot may contain a formal specification in some language like Larch or Z, or it may just be a piece of prose as short as "The attributes of this mail-header type conform to RFC 822."

Subtypes

Types may be related to other types via a subtype relationship. An object satisfying the definition of a type also satisfies the definition of the type's supertypes. Specifically, all operations and attributes of the supertype are supported by the subtype, and any explicit semantics given to the supertype also hold for the subtype. (Since semantics can be informally expressed, the subtyping relationship will not be automatically inferred for any two types; rather, someone must explicitly express such a relationship.) A particular operation's implementation may be overridden in subtyping, but the semantic guarantees (such as they are) should be preserved. They may, however, give additional semantic properties. (And if some behavior in the supertype is noted as "undefined", it may be defined in subtypes.)

A type may have multiple supertypes. This raises the question of what happend if supertype A has operation foo, and supertype B also has a (completely different) operation foo. In this case, if operation foo is invoked, it must be explicitly requested which supertype's foo operation is meant. (Abstractly, you can think of this information as always being required, but in single inheritance, or where no name conflicts occur, the supertype in question can be inferred automatically, so does not need to be explicitly determined.)

The effect of attempting to invoke an operation defined in two supertypes without specifying which operation is meant is undefined. It may execute one of them, or raise an error/exception.

Other issues:

Strict substitutability implies contravariance of arguments (if we allow any variance at all), rather than the covariance of languages like Eiffel. This is probably okay, even though covariant behavior is sometimes expected, as long as we can handle "exceptional" conditions okay. Must think more about this.
How do exceptions fit into all this?

Naming of types

In order to deal with type names reasonably, type names include a schema designation, and then an identifier within that schema. (For example, the type "mime:text/plain" identifies the type called "text/plain" in the MIME schema. Different schemas specify different controls over the namespace.

Some schemas are managed by a central authority or standards body. For example, I might manage the "e:" schema to cover essential types. Other standards bodies might have control over their own schemas (e.g. "mime:", "ietf:" (if ietf had standard types), etc.) Schemas are assigned by a centralized body (me, to start with).

Random types (those not blessed by a particular schema standards body) are identified by the "net:" schema. A type name in the "net:" schema includes the name or location of the type oracle where the type was first registered. It also includes a further string generated by that oracle. It is up to the oracle to ascertain that it does not generate the same string twice in its lifetime. (This requirement is similar to the Message-ID generation done by news-posting programs.)

For example, a type defined on the oracle at gs1.sp.cs.cmu.edu might be named something like this:

net:bookrecord-052694@gs1.sp.cs.cmu.edu

This identifies a type initially registered on the type oracle at GS1. The generated name in this case includes both a date code (to help ensure uniqueness) and a mnemonic string provided by the user. Since there were no other "bookrecord" types defined on that site on that date, the date code is sufficient to ensure uniqueness of the type, as long as the oracle remembers the names of types generated that day, and as long as no one else pretends to be gs1.

A type name may identify at most one type. A type may, however, be known by more than one name. The type above, for instance, may be "blessed" by some schema maintainer (say LOC), and then also be known by the shorter name:

LOC:bookrecord

The appropriate aliasing can be stored by interested type oracles.

Updating the type graph

The type graph is updated by registering a change or addition with an oracle. While many registrations will propagate globally, there will be cases where a definition is meant to be kept locally, either because it's a private type, or because it's still under development. Conceptually, these types can be considered as disconnected or hidden parts of the type graph.

How do changes in the type graph affect agents? Here are some possible changes:

Creating a new type
Creating a new type encoding
Creating a new type operation or attribute
Adding a subtype of an existing type
Adding a supertype of an existing type
Registering an agent for a conversion or operation
Deleting or downgrading registrations in the graph

Creating a new type in itself is harmless enough, as long as unique-naming conventions are followed. At worst, the type is internally inconsistent, but that only affects those programs dependent on the malformed type.

Creating a new type encoding is also harmless, provided that the encoding is correct. New encodings do not invalidate any existing encodings, nor do they create any obligations for supertypes and subtypes. Encoding name discipline can probably be relaxed from type name discipline, since creating two encodings of the same type with the same name is highly unlikely, and can be fixed locally. Alternatively, the creator of a type can get authority over the namespace of its encodings.

Creating a new operation or atttribute is a bit tricky, if the type has already been defined. It is possible to change the definition of a type be doing so. This may invalidate existing objects or subtypes of that type (so that they no longer satisfy the definition). The latter invalidation, in particular, may have large repercussions on the type graph.

It is useful here to make a distinction between basic operations (and attributes) and derived operations. The values of derived operations and attributes can (in theory, at least) be calculated on the basis of the basic attributes and operations on an object, and no other object information. For example, a date object may have the basic attributes day-of-month, month, and year. A weekday attribute may safely be added as a derived attribute, since this can be calculated from the existing basic attributes (assuming the Gregorian calendar is used). So this does not change the basic definition of the type. Adding an "hour" attribute, however, would change the definition. So derived operations may be freely added to a type, but basic ones cannot. The type information will note which operations are basic and which are derived.

Question: since the oracle won't be able to make the distinction on its own, do we need to worry about people trying to register basic operations as "derived"? If so, how do we handle this?

When defining a new type, someone may realize that another basic operation was really wanted for the type. The only solution in the schema so far is to register a new type. This may leave lots of mostly unused types littering the landscape. We may want to look into possibly marking type definitions as to their status. An "obsolete" type could, for instance, not be propagated further, or eventually expired, and additions of subtypes could be prohibited. And perhaps an "experimental" type could be allowed to have basic operations added, but subtypes could not be added except by the type owner; and the type would be understood to be "use-at-your-own-risk".

Adding a subtype will often be done at the same time as a type is created, since a type may be designed to be a subtype of an existing type. While it's possible for a subtype to be added that really doesn't follow the definition of a supertype, that's mainly the problem of the subtype owner. We might, however, want to restrict the addition of this relationship to the owner of the subtype.

Adding a supertype will similarly be done sometimes when one wants to exploit the commonalities between two existing types. (For example, one may find that there are two types that have been created for handling book records. A common supertype can then be created to help make use of the information provided by both types.) It is sufficient to ensure that the supertype does not make any guarantees which are not also made by the subtype. The subtype owner, as in the case above, seems to be the logical person to ensure this guarantee.

Registering an agent is generally safe, provided that the agent does what's advertised. (If it doesn't, see below). An agent may wish to only advertise its services within a particular domain or area. Registration may also be a good time to record certain meta-data about the operation (e.g. who's permitted to use the agent, how much the operation costs, etc.)

Deleting or downgrading registrations. A stupid or uncooperative person may register an agent promising a service that it doesn't provide, or may add an encoding, operation, or subtype linkage that's bogus. The owner of a type has the right to delete inappropriate derived operations, encodings, agent registrations, or subtype-supertype links. In addition, the owner of an oracle may always decide to "forget" about a bad piece of information, or possibly remember it, but label it as "bad", so that people querying the oracle about it know to ignore it. Perhaps "bad" advisories may be propagated to other oracles, and incorporated (or not) at the discretion of the oracle maintainer.

Meta-question: how careful should we be about allowing possible bad changes and insisting on authentication? I'd like to avoid unnecessary bottlenecks, and not have too high an overhead on "authenticating" oneself, as long as mistakes or vandalism can be easily corrected. As I mentioned at the beginning, simplicity is good.

Compare with netnews, where control messages are unauthenticated, but are either trusted, or manually forwarded to a news site maintainer for review. On the other hand, digital signatures are coming into use, but may be overkill if a type creator disappears or loses interest.

Of course, type information can be more important than newsgroups to many people. What we may want to do is (as with X) provide methods to ensure security, in ways different type or oracle maintainers might want, but not dictate a policy. In such a case, we would simply have to make sure we supported the basic authentication concerns above, and others that people might want.

Special types

A few kinds of type constructions are ubiquitous, such as sequences. For these, we introduce some special types that are not defined explicitly by any person, but that can be confected on the fly when needed.

Top type. Some attributes or type requests may be satisfied with any object. The top-level type e:obj is assumed to be a supertype of all types in the system. It defines no methods, and has no attributes or encodings. (Any "operations" on this type are, for now at least, presumed to be done directly through the inter-agent protocol.)

Sequences. Type names of the form c:seq:sometype are sequences of sometype. I will define standard methods for accessing elements in the sequence, and for encoding the sequence of elements. These types need not be registered, as the appropriate type definitions can be automatically generated by any type oracle on demand. I have also defined "c:oneof" types to deal with unions.

Questioni Do we need any other special types? Other possibilities are arbitrary records, and "composite types" that have the characteristics of two or more types. I intend to keep the "type of reference" completely separate from the "type of thing being referenced".