TOP: Typed Object Protocol

Version 0.2

August 30, 1996

Status of this document

This document specifies a preliminary version of Typed Object Protocol, used by information agents in the computation model of my thesis. This version is currently implemented by the server at tom.cs.cmu.edu as of April 1996.

The definition of TOP prior to version 1.0 is subject to change without notice. Pre-1.0 versions need not be forward- or backward-compatible with any other versions.

Summary of the protocol

TOP is used by agents in information systems to retrieve, convert, and perform various operations on typed objects. In TOP, a client contacts a server using a stream connection (such as TCP), and issues a series of requests. After receiving a request, the server sends a reply. A given information agent may act as both a client and a server, and may talk to multiple clients and servers simultaneously.

TOP servers, unlike HTTP servers, do not normally terminate a session after sending a reply, unless the client requests to end a session. However, TOP is not designed to be used directly by end users, nor are clients expected to keep connections open indefinitely. Therefore, a server may terminate a TOP session if it does not receive a request within 60 seconds of the last reply. The server may also terminate a connection if a client fails to complete a request within a certain time period. This time period should not be less than 60 seconds, and should probably be significantly longer. Of course, agents should assume that network and machine failures may cut off a connection at any time.

Requests consist of one or more lines, depending on the kind of request given. The first line of any request starts with the name of the request followed by a space or a line termination. The server is always expected to give some sort of response to the first line of the request; this may either be a final reply or a reply indicating that the client should continue. All lines are terminated with the carriage return character (CR) followed by the line-feed character (LF). Delimited VALUE blocks (q.v.) may, however, contain arbitrary data, including carriage returns, line-feeds, and nulls, between the delimiters. Outside of delimited VALUE blocks, all characters transmitted shouild use the ASCII character set.

In the descriptions below, a "word" consists of non-whitespace printable ASCII characters. A word is delimited by whitespace, or by the beginning or end of a line. Material in <angle brackets> indicates required parameters. Material in [square brackets] indicates optional parameters. If no space exists between the brackets, the parameter is required to be a single word.

Responses to requests begin with a 3-digit code, followed by a space. In general, codes starting with 2 indicate an ordinary response. Codes starting with 3 indicate that the client is expected to provide more information to complete the request. Codes starting with 4 or 5 indicate an error or exception; 4 implies that the client is responsible for the exception, 5 that the server is responsible. (The actual assignment of responsibility is sometimes a judgement call.)

Unless otherwise noted, responses are a single line.

Version 0.2 of TOP defines the following requests:

PROTO: Used to negotiate protocol
NOOP: Used for synchronization
QUIT: Used to terminate a session
OPER: Used to carry out a remote operation
ATTR: Used to fetch an object attribute
CNVT: Used to convert an object to an alternate type or encoding
TYPQ: Used to get information about a type
TYPL: Used to list types that have been registered or changed recently
REGI: Used to register new type information
AUTH: Used to request an authorization

Request descriptions

PROTO: Protocol negotiation

This PROTO request is used to ensure that the client and the server agree on a protocol variation to use. It is expected that all versions of TOP will support this command.

The client sends a request specifying the protocol it wishes to use. The server then decides what protocol it will use to talk to the client and sends back a reply specifying that protocol. This may be the protocol that the client requested, or it may be another protocol.

Request format:

PROTO <protocol specification>

Reply format:

201 <protocol specification>

If the PROTO request results in a change of protocol, that change takes place for client messages following the PROTO line, and for server messages following the reply line.

The protocol specification consists of one or more words on a line. The first word indicates a particular protocol version. That version may define an interpretation of any remaining words. TOP version 0.2 uses the following protocol specification, with no additional words defined:

TOP/0.2

PROTO may be used to change protocol family altogether if the client and server are willing. (For instance, a particular TOP server might also function as an HTTP server.) A TOP server, however, should not switch to a non-TOP protocol unless the client specifically requests it.

NOOP: Synchronization

The NOOP command may be used to ensure that the client and server remain in sync with each other. It may also be used to keep a session open, in cases where the server might otherwise time out between commands.

Request format:

NOOP [optional extra text]

Reply format:

200 [optional extra text]

If the client supplies additional text after the NOOP, the server must echo back exactly that text after the reply code. Otherwise, the server may supply whatever text it likes, or none at all.

QUIT: Terminates a session

The QUIT command is used to terminate a session. Clients should, if possible, issue this command before dropping a connection.

Request format:

QUIT

Reply format:

205 [optional extra text]

After the server sends the reply line, it drops the connection with the client.

OPER: Carries out a remote object operation

The OPER command is a request for the result of a method (object operation) on an object. This is a multi-line command, and the format is more complex than the earlier commands.

The format of the request begins with the initial line:

OPER <opname> [optional-typename]

where opname is the name of the operation desired, and typename is the name of the type that defines the operation. If the typename is omitted on this line, it will be inferred from the object supplied later. Clients may wish to supply the typename at this point to perform an operation defined in the supertype, or to give the server a chance to see if the named operation is known before sending any more data.

After the server received the initial line, it sends back:

300 [optional text]

to indicate that it is ready to receive the rest of the request, or a reply code between 400 and 599 to indicate that it will not carry out the request, and that no more of the request should be sent.

If the server indicates that it will accept it, the remainder of the request is then sent. This consists of the following (in any order), followed by a single line saying "END".

An OBJ block (unless the operation is a class operation)
An optional FMT line.
Zero or more EXPECT lines.
Zero or more ARG blocks.

The OBJ block consists of a line with the single word "OBJ", followed by a data description block specifying the object of this operation. Data description blocks are described below.

An ARG block consists of a line with the single word "ARG", followed by a data description block specifying an argument. The first ARG block specifies the initial argument of the operation, the second the next argument, and so on.

An EXPECT line's first word is "EXPECT". Its second word is a type name, and the remaining words are encoding names. This line indicates that the client will accept a result value with the indicated type, encoded by the indicated encodings. Encoding names are optional, but will be applied strictly left to right if present, with no extra intervening encodings. If the type of the last encoding (or the base type, if no encodings are given) is not self-encoding, the server may apply whatever additional encodings it wishes, to represent the value in a self-encoding type (q.v.).

If multiple EXPECT lines exist, earlier EXPECT lines are considered higher priority than later EXPECT lines. Thus, a client can indicate an order of preference for result types and encodings.

EXPECT lines are strictly advisory. The server may return its result based on one of them if it wishes, or it can ignore them.

A FMT line consists of the word "FMT" followed by one of "VALUE", "REF", or "NOVAL". This is an advisory line, indicating whether the client wants to receive the actual result value of the operation, or just a reference to the result, or wants no value or reference at all. [Note: This line may become more complex in later versions of the protocol.] This can be useful when the result of an operation may be large.

Data description blocks. A data description block consists of the following:

A TYPE line.
Zero or more ENC lines (which must appear after the TYPE line)
An optional META block.
A REF block, a VALUE block, or a line saying "NOVAL". Only one of these may appear, and it appears at the end of the data description block.

A TYPE line consists of the word "TYPE" followed by the type name.

An ENC line consists of the word "ENC" followed by a type name, followed by an encoding name. The ordering of ENC lines is significant; the data in the data description block is of the type given in the TYPE line, encoded as the type given in the first ENC line by the method given in the first ENC line, further encoded in successive ENC lines. If a literal value is being given in this data description block, the last encoding must be in a self-encoding type. The byte-sequence type ("e:byteseq") and its subtypes are self-encoding.

A META block consists of a line saying "META", followed by a data description block describing the meta-data for the object.

A REF block consists of a line saying "REF", followed by a data description block describing the reference. The reference can later be resolved, according to the semantics of the reference type, to yield the value of the specified object.

A VALUE block may take one of two forms. If the encoded value of the object being passed in this data description block consists entirely of non-whitespace printable ASCII characters, and is no more than 200 bytes long, a VALUE block can consist of a single line with the word "VALUE" followed by the encoding of the value of this data description block.

Alternatively, a VALUE block consists of a line with the single word "VALUE", followed on subsequent lines by a delimited block of data. The first character following the end of the VALUE line is the delimiter. The delimiter is usually the " character, but can be another character. The data block ends with an unquoted occurrence of the delimiter character. Within the delimited data block, the following quotation rules apply:

\d or \" substitutes for the delimiter character within the data block.
\q or \\ substitutes for the \ character within the data block.

Note that using \d and \q instead of \\ and \" will cause recursive applications of the substitution to grow linearly rather than exponentially (as long as the delimiter character is not 'd' or 'q').

The delimited data block is followed by a CR-LF.

Final reply to OPER request. If the operation is carried out successfully, the server replies in this format:

200 [optional extra text]

and follows that line with a data description block for the result. If the operation is unsuccessful, the server will return a single line with a reply code from 301 to 599, and an optional string further explaining the reason for failure. The following codes are currently defined:

301: Use a different server. Following on the same line will be one or more parenthesized tuples, each indicating the location of a server. The tuples consist of the hostname of the server followed by a space, followed by the portname of the server. Material outside the parentheses is ignorable, and parentheses cannot be nested. Multiple parenthesized tuples indicates that the client can choose among multiple servers to carry out the request (with preferred servers coming earlier).
400: Bad request, or syntax error
401: Unknown type
402: Invalid inputs to operation (precondition failure)
404: Object not found.
410: You are not permitted to do this request.
500: Generic server-side error.
501: Request unsupported here.
502: Object operation not implemented here.
503: System resources exhausted.

Other codes may be defined later.

There is currently no exception model defined, other than the use of these return codes. The '402' message can be used for precondition exceptions.

ATTR: Fetches an object attribute

The ATTR request returns a particular attribute of an object. The format of the request and reply are the same as for OPER, except that no ARG blocks are used.

CNVT: Gets an equivalent of an object in a new type or encoding

CNVT requests an object corresponding to the supplied object, but in a format compatible with one of the EXPECT lines supplied. The format of the first line is:

CNVT [optional-conversion-methodname] [optional-original-typename]

The format of the rest of the request, and of the server replies, is the same as for OPER, except for the following:

No ARG blocks are used.
At least one EXPECT line is required. A successful conversion must conform to one of these EXPECT lines.
An optional TOP line may be included (before the end of the request). This consists of the word "TOP" followed by a type name. This means that the original object and the converted object must be indistinguishable from the perspective of the type given in the TOP line.

If a conversion method name is given, that method must be used for the conversion. Otherwise, the server can choose any method, or sequence of methods, satisfying the request. If the server cannot satisfy the request, it returns the error code 502 (object operation not found).

A server need not support CNVT at all (unless it's a type broker.) If CNVT is not supported at all, the error code 501 (request not supported here) is returned after the first line of the request. (If the request is okay, the code 300 is returned after the first line, as with OPER.)

TYPQ: Gets information about a type

The TYPQ request returns an object giving information about a type. The format of the request is:

TYPQ <typename>

The format of the reply to a successful request is

200 [optional text here]

followed by a data description block for an object describing the type. Type descriptions are described in more detail elsewhere.

A server does not need to support TYPQ at all, unless it's a type broker. It should return a response code of 501 if TYPQ is not supported, and 404 (Object not found) if it cannot find an object describing the requested type.

TYPL: Lists types that have been registered or changed recently

The TYPL request returns an object giving a list of all types, or of recently changed types. The format of the request is either:

TYPL

TYPL <date> <time> [GMT]

TYPL by itself returns a list of all the types registered at the server. With a date and time code, it returns a list of all the types whose descriptions have changed since the date and time given.

The date code is sent as 8 digits in the format YYYYMMDD, where YYYY is the year, MM is the two digits of the month (01 = January .. 12 = December) and DD is the day of the month (with leading zero, if appropriate).

The time code is sent as 6 digits in the format HHMMSS with HH being hours on the 24-hour clock, MM minutes 00-59, and SS seconds 00-59. The time is assumed to be in the server's timezone unless the token "GMT" appears, in which case both time and date are evaluated at the 0 meridian.

If the operation is carried out successfully, the server replies in this format:

200 [optional extra text]

and follows that line with a data description block for the result. If the operation is unsuccessful, the server will return a single line with a reply code from 400 to 599. If the operation is successful, but no types meet the criterion, the operation will return a data description block with an empty-string value.

The encoding used in the data description block should consist of a sequence of type names separated by CR-LF. There may also be a CR-LF at the end of the value, but this is not necessary. Types known under multiple names may be listed under one or more of those names.

REGI: Registers information about a type

The REGI request registers information about a type. Only type brokers support this command. REGI can be used to register new types, aliases, type attributes or operations, encodings, or agents that can work with a type.

The REGI command gives no guarantees that the type broker will update its type database in any way, nor does it make any guarantees about when the database will be updated. Some kinds of registry information may be automatically processed by brokers; others may be handled in batches, or with human intervention.

The format of the request begins with the initial line:

REGI <kind> <typename> [additional arguments]

where kind indicates the kind of registration: one of type, alias, supertype, attribute, operation, encoding, or agent. The typename argument indicates the name of the type for which this information is being registered. The number and nature of the additional arguments depends on the kind of request.

The alias and supertype registration requests consist of a single line. For the other requests described here, the server will send back a reply code between 400 and 599 if it rejects the request and does not wish to receive the rest. Otherwise, it will send back

300 [optional text]

after receiving the initial line. The client should then complete the request with an appropriate data description block.

The formats of the different registration requests are as follows:

REGI alias <typename> <alias>

This is a single-line request for alias to be an alternate name for an existing type with the name typename.

REGI supertype <typename> <supertype>

This is a single-line request to register the type named supertype as a supertype for the type named typename. Both types must already have been registered. The two types must satisfy the supertype relation as defined in the TOP type model (described elsewhere).

REGI type <typename>

This is the beginning of a request to register a new type named typename. If the server indicates that the request should continue, the client sends a data description block for the type description object. (This object conforms to the type description for the "s:typedefn-0.2" type.)

REGI attribute <typename> <attributename>

This is the beginning of a request to register a new type attribute named attributename on the type named typename. If the server indicates that the request should continue, the client sends a data description block for the type description object.

Because the basic definition of types cannot change once a type is registered, the attribute must be derivable from already existing attributes and operations. A later document will explain this in more detail.

REGI operation <typename> <opname>

This is the beginning of a request to register a new type operation named opname on the type named typename. If the server indicates that the request should continue, the client sends a data description block for the type description object.

Because the basic definition of types cannot change once a type is registered, the operation must be derivable from already existing attributes and operations. A later document will explain this in more detail.

REGI encoding <typename> <encname>

This is the beginning of a request to register a new encoding names encname on the type named typename. If the server indicates that the request should continue, the client sends a data description block for the type description object. This block includes both the name for the encoding and the name for the type in which the encoding is represented.

Further details on other kinds of registration requests (agent, conversion) are forthcoming.

AUTH: Gives authorization

The description of this command is forthcoming.

An example session

The following transcript shows the protocol being put through its paces. Text in italics is issued by the client; text in courier comes from the server. Boldface indicates commentary.

PROTO TOP/0.2
201 TOP/0.2
NOOP Hello there!
200 Hello there!

TYPQ e:int
200 Type description object follows.
TYPE net:typename-060394@gs1.sp.cs.cmu.edu
ENC e:text oracle-protocol
VALUE
"NAME e:int
SUPER e:obj
[semantics and some operations omitted]
OPER e:int plus {
ARG e:int arg
SEM
\dReturns the sum of the supplied object and the argument.\d
}

ENC e:text ascii-rep [One encoding defined for ints]
" [End of the VALUE block for the type description object]

OPER plus
300 Send object and parameters.
OBJ
TYPE e:int
ENC e:text ascii-rep
VALUE 9
ARG
TYPE e:int
ENC e:text ascii-rep
VALUE
"87"
END
200 Value follows.
TYPE e:int
ENC e:byteseq ascii-rep
VALUE 96

[The next section involves an attempt to uncompress a file in the server's local filesystem. The file is compressed, and (when uncompressed) ends with a carriage return-newline. In practice, most servers would not allow general access to their filesystems like this.]
CNVT
300 Send object and parameters.
OBJ
TYPE e:text
ENC e:byteseq unix-compress
REF
TYPE s:url
ENC e:text standard
VALUE http://www.cs.cmu.edu/~spok/foo.txt.Z
EXPECT e:text
END
200 Value follows.
TYPE e:byteseq
VALUE "These are the times that try to trick the transcripts. "

QUIT 205 Nice talking to you. [The server terminates the connection.]

`Summary of reply codes`



  200
Okay.
  
201
Remainder of this line is the protocol I'm using.
  
202
Authorization request recorded.
  
205
Goodbye.
  
300
Please continue.
  
301
Use a different server.
  
310
Authorization request challenged.
  
400
Bad request.
  
401
Unknown type.
  
402
Invalid inputs to operation (precondition failure)
  
404
Object not found.
  
410
Request forbidden.
  
500
Unspecified server error.
  
501
Request not supported here.
  
502
Object operation not found.
  
503
System resources exhausted.


spok@cs.cmu.edu (John Ockerbloom)