Once the sensor is pointing in the
right direction at the right zoom factor, all moving objects extracted
are compared to the specific object of interest to see if they match.
This need to re-acquire a specific object is a key feature necessary
for multi-camera cooperative surveillance. Obviously viewpoint-specific
appearance criteria are not useful, since the new view of the object
may be significantly different from the previous view. Therefore, recognition
features are needed that are independent of viewpoint. In our work
we use two such criteria: the object's 3D scene trajectory as determined
from geolocation, and a normalized color histogram of the object's image
region.
|
Slaving is a relatively simple exercise if both cameras are calibrated with respect to a local 3D terrain model. We have shown that a tracked object's 3D geolocation can be determined to reasonable accuracy (roughly 1 meter of error for a person 50 meters away) by intersecting backprojected viewing rays with the terrain. After estimating the 3D location of a person from the first camera's viewpoint, it is an easy matter to transform the location into a pan-tilt command to control the second camera. The figure below shows an example of camera slaving. A person has been detected automatically in the wide-angle view shown in the left image, and a second camera has been tasked to move slightly ahead of the person's estimated 3D trajectory, as shown in the right image.
Example of multi-camera slaving -- tracking a person. | |
Example of multi-camera slaving -- tracking a vehicle. |
For cameras located far apart geographically, it is obvious that we need to have very good camera calibration, and an accurate 3D site model. We have also developed a sensor slaving method that works for closely located cameras. This method requires only image-based computations (no geolocation computation or extrinsic camera calibration). Furthermore, intrinsic parameters are needed only by the slave camera, which has to determine the pan/tilt angles needed to point towards each pixel in the image. The basic idea is to form a mosaic by warping the master camera view into the pixel coordinate system of the slave camera view. Image trajectories of objects detected in the master view can then be transformed into trajectories overlaid on the slave camera view. The slave camera can then compute the pan-tilt angles necessary to keep the object within its zoomed field of view.