VSAM Testbed System

We have built a VSAM testbed system to demonstrate how automated video understanding technology enables a single human operator to monitor a wide area.  The testbed system consists of multiple sensors distributed across the campus of CMU,  tied to a control room located in the Planetary Robotics Building (PRB). The testbed consists of a central operator control unit (OCU) which receives video and Ethernet data from multiple remote sensor processing units (SPUs). The OCU is responsible for integrating symbolic object trajectory information accumulated by each of the SPUs together with a 3D geometric site model, and presenting the results to the user on a map-based graphical user interface (GUI).  Each logical component of the testbed system architecture is described briefly below.

Sensor Processing Units (SPUs)

The SPU acts as an intelligent filter between a camera and the VSAM network.  Its function is to analyze incoming video streams for the presence of significant entities or events, and to transmit the results symbolically to the OCU.  This arrangement  allows for many different sensor modalities to be seamlessly integrated into the system.  By performing as much video processing as possible on the SPU, bandwidth requirements of the VSAM network are reduced. Full video signals do not need to be transmitted; only symbolic data extracted from video signals.
 

Logically, each SPU combines a camera with a local computer that processes the incoming video.  Many types of sensors and SPUs have been  incorporated into the VSAM IFD testbed system: a) color camera with active pan, tilt and zoom control; b) thermal sensor; c) a relocatable van; and d) an airborne sensor.  In addition, two sensors from other groups have been successfully integrated: e) a Columbia-Lehigh omnicamera; f) a Texas Instruments indoor activity monitoring system. By using a pre-specified communication protocol,  these systems were able to directly interface with the VSAM network
 
 
A variety of SPUs have been incorporated into the VSAM IFD testbed system.

The relocatable van and airborne SPU warrant further discussion. The relocatable van SPU consists of a sensor and pan-tilt head mounted on a small tripod that can be placed on the vehicle roof when stationary.  All video processing is performed on-board the vehicle, and results from object detection and tracking are assembled into symbolic data packets and transmitted back to the operator control workstation using a radio Ethernet connection.   The major research issue involved in demonstrating the redeployable van involves rapid calibration of sensor pose after redeployment, so that object detection and tracking results can be integrated into the VSAM network (via computation of geolocation) for display at the operator control console.

The airborne sensor and computation packages are mounted on  a Britten-Norman Islander twin-engine aircraft operated by the U.S. Army Night Vision and Electronic Sensors Directorate.  The Islander is equipped with a FLIR Systems Ultra-3000 turret that has two degrees of freedom (pan/tilt), a Global Positioning System (GPS) for measuring position, and an Attitude Heading Reference System (AHRS) for measuring orientation.   The continual self-motion of the aircraft introduces challenging video understanding issues.  For this reason, video processing is performed using the Pyramid Vision Technologies PVT-200, a specially designed video processing engine.
 
NVESD's Islander aircraft provides an airborne SPU platform.

Operator Control Unit (OCU)

The VSAM OCU accepts video processing results from each of the SPUs and integrates the information with a site model and a database of known objects to infer activities that are of interest to the user.  This data is sent to the GUI and other visualization tools as output from the system. One key piece of system functionality provided by the OCU is sensor arbitration. Care must be taken to ensure that an outdoor surveillance system does not underutilize its limited sensor assets. Sensors must be allocated to surveillance tasks in such a way that all user-specified tasks get performed, and, if enough sensors are present, multiple sensors are assigned to track important objects.  The system performs a greedy optimization based on a tasking cost function to determine the best combination of SPU tasking to maximize overall system performance requirements.

The OCU also contains a site model representing VSAM-relevant information about the area being monitored. This includes both geometric and photometric information about the scene, represented using a combination of image and symbolic data.   The OCU uses the site model to support a) object geolocation via intersection of viewing rays with the terrain, b) visibility analysis (predicting what portions of the scene are visible from what sensors) so that sensors can be efficiently tasked, and c) specification of the geometric location and extent of relevant scene features. For example, we might directly task a sensor to monitor the door of a building, or to look for vehicles passing through a particular intersection.
 

Graphical User Interface (GUI)

One of the technical goals of the VSAM IFD effort is to demonstrate that a single human operator can effectively monitor a significant area of interest.  Keeping track of multiple people, vehicles, and their interactions, within a complex urban environment is a difficult task. The user obviously shouldn't be looking at dozens of screens showing raw video output.  That amount of sensory overload virtually guarantees that information will be ignored, and requires a prohibitive amount of transmission bandwidth.  Our approach is to provide an interactive, graphical user interface (GUI) that uses VSAM technology to automatically place dynamic agents representing people and vehicles into a synthetic view of  the environment. This approach has the benefit that visualization of scene events is no longer tied to the original resolution and viewpoint of a single video sensor. The GUI currently consists of a map of the area, overlaid with all object locations, sensor platform locations, and sensor fields of view.  In addition, a low-bandwidth, compressed  video stream from one of the sensors can be selected for real-time display. The GUI is also used for sensor suite tasking.  Through this interface, the operator can task individual sensor units, as well as the entire testbed sensor suite, to perform surveillance operations such as generating a quick summary of all object activities in the area.
 

 

Communication

The nominal architecture for the VSAM network allows multiple OCUs to be linked together, each controlling multiple SPUs. Each OCU supports exactly one GUI through which all user-related command and control information is passed. Data dissemination is not limited to a single user interfacer -- it is also accessible through a series of visualization nodes (VIS).

There are two independent communication protocols and packet structures supported in this architecture: the Carnegie Mellon University Packet Architecture (CMUPA) and the Distributed Interactive Simulation (DIS) protocols. The CMUPA is designed to be a low bandwidth, highly flexible architecture in which relevant VSAM information can be compactly packaged without redundant overhead.   All communication between SPUs, OCUs and GUIs is CMUPA compatible. The CMUPA protocol specification document and code can be downloaded.

VIS nodes are designed to distribute the output of the VSAM network to where it is needed. They provide symbolic representations of detected activities overlaid on maps or imagery. Information flow to VIS nodes is unidirectional,  originating from an OCU. All of this communication uses the DIS protocol, developed and used by the Distributed Simulation community. An important benefit to keeping VIS nodes DIS compatible is that it allows us to easily interface with synthetic environment visualization tools such as ModSAF and ModStealth. See the section on VSAM visualization.
 

Current Testbed Infrastructure

As of Fall 1999, the VSAM IFD testbed system on the campus of Carnegie Mellon University consisted of 14 cameras distributed throughout campus.  All cameras are connected to the VSAM Operator Control Room in the Planetary Robotics Building (PRB): ten are connected via fiber optic lines, three on PRB are wired directly to the SPU computers, and one is a portable Small Unit Operations (SUO) unit connected via wireless Ethernet to the VSAM OCU.  The work done for VSAM 99 concentrated on increasing the density of sensors in the Wean/PRB area. The overlapping fields of view in this area of campus enable us to conduct experiments in wide baseline stereo, object fusion, sensor cuing and sensor handoff.
 
The backbone of the CMU campus VSAM system consists of six Sony EVI-370 color zoom cameras installed on PRB, Smith Hall, Newell-Simon Hall, Wean Hall, Roberts Hall, and Porter Hall. All have active pan, tilt and zoom control. Five of these units are mounted on Directed Perception pan/tilt heads, one is on a Sagebrush Technologies pan/tilt head. Two stationary fixed-FOV color cameras mounted on the peak of PRB facilitate work on activity analysis, classification, and sensor cuing. Three stationary fixed-FOV monochrome cameras mounted on the roof of Wean Hall are connected to the Operator Control Room over a single multimode fiber using a video multiplexor. A Raytheon NightSight PalmIR thermal (FLIR) sensor can also be mounted on Wean. A portable sensor unit was built to allow further software development and research at CMU in support of the DARPA Small Unit Operations (SUO) program. This unit consists of the same hardware as SPUs that were delivered to Fort Benning, Georgia in 1999.

The Operator Control Room in PRB houses the SPU, OCU, GUI and development workstations -- nineteen computers in total.  The four most recent SPUs are Pentium III 550 MHz computers.  Dagwood, a single ``compound SPU'', is a quad Xeon 550 MHz processor computer, purchased to conduct research on classification, activity analysis, and digitization of three simultaneous video streams. Also included in this list of machines is a  Silicon Graphics Origin 200, used to develop video database storage and retrieval algorithms as well as designing user interfaces for handling VSAM video data.
 
Operator Control Room located in PRB on the CMU campus

Two auto tracking Leica theodolites (TPS1100) are installed on the corner of PRB, and are hardwired to a data processing computer linked to the VSAM OCU. This system allows us to do real-time automatic tracking of objects to obtain ground truth for evaluating the VSAM geolocation and sensor fusion algorithms.  This data can be displayed in real-time on the VSAM GUI. An Office of Naval Research DURIP grant provided funds for  two Raytheon NightSight thermal sensors, the Quad Xeon processor computer,  the Origin 200, an SGI Infinite Reality Engine and the Leica theodolite surveying systems.