Model reconstruction determines the truth values of the force-dynamic relations
on a frame-by-frame basis in the input movie.
Intervals of constant truth value for a given force-dynamic relation are taken
to be primitive event occurrences.
LEONARD uses event logic to infer compound event occurrences from
primitive event occurrences.
For example, for the image sequence in Figure 1(a), model
reconstruction determines that the green block supports the red block from
Frame 0 to Frame 13 and that the hand is attached to the red block from
Frame 13 to Frame 20.
This will be denoted as and
, i.e. that the primitive event types
and
occurred
during the intervals [0,13) and [13,20) respectively.
The compound event type
might be defined as
i.e. followed by
.
(In the above, I use `;' informally as a sequence operator.
The precise definition of `;' will be given momentarily.)
The task of the event-logic inference procedure is to infer
, i.e. that the compound event
type
occurred during the interval
[0,20).
Event logic provides a calculus for forming compound event types as
expressions over primitive event types.
The syntax and semantics of event logic will be described momentarily.
Event-logic expressions denote event types, not event occurrences.
As such, they do not have truth values.
Rather, they are predicates that describe the truth conditions that must hold
of an interval for an event to occur.
In contrast, an event-occurrence formula does have a truth value.
If is an event-logic expression that denotes a primitive or compound
event type, and
is an interval, then
is an atomic
event-occurrence formula that is true if and only if the truth conditions for
the event type
hold of the interval
.
denotes coincidental occurrence, the fact that an occurrence
of
started at the beginning of
and finished at the end of
.
would not hold if an occurrence of
did not precisely coincide
with
, but instead overlapped,
partially or totally, with
.
Event types have internal temporal structure that render this distinction
important.
In the case of primitive event types, that structure is simple.
Each primitive event type is derived from a predicate.
If
is a predicate, then
denotes the primitive event
type derived from
.
A primitive event type
holds of an interval if the
corresponding predicate
holds of every instant in that
interval.
This means that
and
might
have different truth values.
For example, if
is true of every instant in [0,2) and false of every
other instant, then
is true while
is false.
Event logic takes coincidental occurrence to be a primitive notion.
As will be demonstrated below, overlapping occurrence is a derived notion that
can be expressed in terms of coincidental occurrence using compound
event-logic expressions.
Two auxiliary notions are needed to define the syntax and semantics of event
logic.
First, there are thirteen possible relations between two intervals.
Following [4], I denote these relations as =, <, >,
,
,
,
,
,
,
,
,
, and
and refer to them
collectively as Allen relations throughout the paper.
The names
,
,
,
, and
are
mnemonics for meet, overlap, start, finish, and
during respectively.
The inverse relations, such as
, whose names end in
are the same as the corresponding relations, such as
, whose names do
not end in
except that the arguments are reversed.
Figure 8 depicts all thirteen Allen relations graphically.
Second, I define the span of two intervals
and
, denoted
, as the smallest super-interval of both
and
.
Figure 8: The Allen relations, the thirteen possible relations between two
intervals.
The syntax of event logic is defined as follows.
We are given finite disjoint sets of constant symbols
along with a finite set of primitive event-type symbols, each of a specified
arity.
Constant symbols, such as and
, denote objects in the input
movie while primitive event-type symbols, such as
,
denote parameterized primitive event types.
An atomic event-logic expression is a primitive event-type symbol of arity n
applied to a sequence of n constants.
For example,
.
An event-logic expression is either an atomic event-logic expression or
one of the compound event-logic expressions
,
,
, or
,
where
and
are event-logic expressions
and
.
Informally, the semantics of compound event-logic expressions is defined as follows:
Formally, the truth of an atomic event-occurrence formula is
defined relative to a model.
Let I be the set of all intervals and O be the set of all objects in
the movie.
A model M is a map from primitive event-type symbols of arity n to subsets
of .
(M can be viewed as either a model or as a movie.)
M thus maps primitive event-type symbols to relations that take an interval
as their first argument, in addition to the remaining object parameters.
The semantics of event logic is formally defined by specifying an entailment
relation
as follows:
Figure 9 shows the primitive event types currently
used by LEONARD.
The definitions in Figure 9 are formulated in terms of
the predicates ,
,
, and
that
are produced by model reconstruction as described in
Section 2.
An object is supported if it is not grounded.
Two objects contact if their polygons touch and they are on the same layer.
Two polygons p and q touch, denoted
, if they intersect
and their intersection has zero area.
Two objects are attached if there is a rigid or revolute joint that joins
them.
Determining whether an object x supports an object y is a little more
complex and requires counterfactual reasoning.
Let
be the set of polygons in a scene,
be the
most-preferred model of the scene as produced by model reconstruction,
and
be true if the scene
is stable under
an interpretation I.
Stability analysis, i.e. the
predicate, is a subroutine used by the
model-reconstruction component.
An object x supports an object y if y is not grounded in the
most-preferred model I and a variant of
with x removed is
not stable under a variant of I where all objects except for y and those
rigidly attached, directly or indirectly, to y are grounded.
In Figure 9,
denotes
the variant of
with x removed,
denotes
the fact that x is rigidly attached to y,
denotes
the reflexive transitive closure of the
relation,
denotes the set of objects that
are rigidly attached, directly or indirectly, to y, and
denotes a variant of I where all objects except for y and those
rigidly attached, directly or indirectly, to y are grounded.
Figure 9: Definition of the primitive event types used by LEONARD.
Figure 10 shows the compound event-type definitions
currently used by LEONARD.
denotes an event type where x picks y up off of z.
It is specified as a sequence of three intervals, where x is not attached
to and does not support y in the first interval but is attached to and does
support y in the third interval.
Additionally, z supports y in the first interval but does not support y
in the third interval.
Furthermore, several conditions must hold in both the first and third
intervals: x must be unsupported, y must not support either x or z,
x and z must not support each other, and y must not be attached to z.
During the second interval, intermediate between the first and third
intervals, either x is attached to y or y is attached
to z.
Additionally, several conditions must hold throughout the entire event: x,
y, and z must be distinct and y must be supported.
denotes an event type where x puts y down on z.
It is specified in a fashion that is similar to
but where
the three subevents occur in reverse order.
denotes an event type where w puts x down on y
which is resting on z.
It is specified as
, where z supports but is not
attached to y and z is distinct from w, x, and y.
denotes an event type where w picks x up off
of y which is resting on z.
It is specified as
, where z supports but is not
attached to y and z is distinct from w, x, and y.
denotes an event type where w picks x up off of y and
puts it down on z which is distinct from y.
denotes an event type where w first puts y down
on z then sometime later stacks x on top of y.
Finally,
denotes an event type where w first
unstacks x from on top of y (which is resting on z) and then sometime
later picks y up off of z.
Figure 1 shows sample movies depicting occurrences of the event
types
and
.
Figures 11 through 15 show sample movies
depicting occurrences of the event types
,
,
,
, and
respectively.
Figure 10: The lexicon of compound event types used by LEONARD.
Figure 11: An image sequence depicting a stack event.
Figure 12: An image sequence depicting an unstack event.
Figure 13: An image sequence depicting a move event.
Figure 14: An image sequence depicting an assemble event.
Figure 15: An image sequence depicting a disassemble event.
Nominally, all atomic event-logic expressions are primitive event types.
However, we allow giving a name to a compound event-logic expression and using
this name in another event-logic expression as short hand for the named
expression with appropriate parameter substitution.
This is simply a macro-expansion process and, as such, no recursion is allowed.
This feature is used in Figure 10 to define
,
, and
in terms of
;
,
, and
in terms of
;
in terms of
, which is itself defined in terms of
; and
in terms of
, which is itself defined in terms of
.
The overall goal of the event-classification component is to infer all
occurrences of a given set of compound event types from a given set of
primitive event occurrences.
The model-reconstruction component combined with the primitive event-type
definitions given in Figure 9 produces a set of
primitive event occurrences for a given scene sequence.
Figure 10 lists parameterized compound event types.
These are instantiated for all tuples of objects in the scene sequence to yield
ground compound event-logic expressions.
The event-classification component infers all occurrences of these compound
event types that follow from the set of primitive event occurrences.
Let us define to be
.
The model-reconstruction component combined with the primitive event-type
definitions given in Figure 9 produces M.
Instantiating the parameterized compound event types from
Figure 10 for all object tuples yields a set of
event-logic expressions.
The event-classification component computes
for every
in
this set.
In principle, could by implemented as a straightforward
application of the formal semantics for event logic as specified above.
There is a difficulty in doing so, however.
The primitive event types have the property that they are liquid.
Liquid events have the following two properties.
First, if they are true during an interval
, then they are also true during
any subinterval of
.
Second, if they are true during two overlapping intervals
and
, then
they are also true during
and any subinterval of
.
For example, if an object is supported during [1,10], then it also is
supported during [2,5], [3,8], and all other subintervals of [1,10].
Similarly, if an object is supported during [1,5] and [4,10], then it also
is supported during [1,10] and all of its subintervals.
[48] introduced the notion of liquidity and [68],
[22], [69], and [30] have observed that
many event types have this property.
Because the primitive event types are liquid, they will hold over an infinite
number of subintervals.
This renders the formal semantics inappropriate for a computational
implementation.
Even if one limits oneself to intervals with integral endpoints, the primitive
event types will hold over quadratically many subintervals of the scene
sequence.
Furthermore, a straightforward computational implementation of the formal
semantics would be inefficient, because it requires quantifying over
subintervals to implement
and quantifying over pairs of
subintervals to implement
.
The central result of this paper is a novel representation, called
spanning intervals, that allows an efficient representation of the
infinite sets of subintervals over which liquid event types hold along with an
efficient inference procedure that operates on that representation.
This representation, and the inference procedure that implements
,
are presented in the next section.