Temporal task segmentation

In order to analyze a given sequence of a set of sensory data of an entire task, we have to preprocess it by dividing it into constituent parts which we can then individually analyze. The basic phases that compose a grasping task are the pregrasp, grasp and manipulation phases.

The sensory data that are available from our observation module are the human finger joint angles and the 3D pose (position and orientation) of the hand relative to a global coordinate system. The questions then are: What are the features that can be used as discriminants in segmenting the task? and How can these features be used to segment the task?

To answer these questions, we refer to the literature on human hand motion, primarily in the psychology circles. Many of the these studies concentrate on the reaching action (i.e., pregrasp phase) of the human hand and the effect of differing object sizes and visual conditions (occluded or unoccluded view). The two most frequently used features are the hand speed and the grip aperture (which is the distance between the tips of the thumb and the index finger). Typical profiles of these features are shown below.

As can be seen, both the speed and grip aperture profiles have the characteristic inverted bell shapes. This is not too surprising, since the hand undergoes an acceleration phase and decceleration phase in reaching for an object; in addition, the hand fingers move to widen (which, at its widest, should be greater than the width of the object at the intended grasp positions) in anticipation of the grasp. It is interesting to note that the peak of the grip aperture profile normally occurs after that of the speed profile.

Features for task segmentation

In light of the evidence offered by studies on human hand motion, we chose the following features:

Fingertip polygon area
The fingertip polygon is the polygon whose vertices are the fingertips (shown below).

Its area is an indication of the width of the grip aperture.
Speed of hand motion
Volume sweep rate
The volume sweep rate is the product of the first two features. It measures the rate of change in both the fingertip polygon area and the speed in 3D space. It turns out to be more effective in segmenting the task than the first two features. The physical interpretation of the volume sweep rate is shown below.

The segmentation algorithm

The assumptions made are:

The pregrasp and manipulation phases interleave.
This basically means that the action of reaching for an object is a purposeful one, i.e., there is no sudden change in hand motion midway to reach for another object before the initially targeted object is grasped.
There are no rapid or jerking motions.
There are several reasons for this. First, the acquisition or observation module would not be able to sample the motion sufficiently fast enough. Second, the characteristic inverted bell-shaped profiles may be violated. Third, the type of grasp (which is a dynamic one) may not be recognized by the taxonomy; such a grasp is beyond the scope of our present study.
Profiles within pregrasp phases resemble parabolas.
This assumption is a reasonable one, in light of the empirical results reported in various human hand motion studies.

The segmentation algorithm is a relatively simple one. It comprises the following steps:

Hypothesize the task breakpoints separating the phases.
The task breakpoints are initially hypothesized from the local minima of the hand speed profile.
Calculate the RMS error of fitting parabolas to hypothesized pregrasp curves (using the volume sweep rate profile).
For each set of breakpoints, the mean of this RMS fit error is calculated.
Find the combination of the breakpoints which yields the minimum mean RMS fit error.

To illustrate, consider the two sets of hypothesized breakpoints below. The first one yields a good fit, as can be seen (the e's are the RMS errors of parabolic fit to the volume sweep rate profile during the hypothesized pregrasp phase). The result of a bad choice of breakpoints can also be seen in the second example.

References

S.B. Kang and K. Ikeuchi, "Determination of motion breakpoints in a task sequence from human hand motion," to appear in Proc. IEEE Int'l Conf. on Robotics and Automation, San Diego, CA, May 1994.
S.B. Kang and K. Ikeuchi, Temporal segmentation of tasks from human hand motion, Tech. Rep. CMU-CS-93-150, Carnegie Mellon University, April 1993.

Return to Research interests