Next: Results Up: Reducing switching costs in Previous: Analysis Contents

State of the art in cycle stealing

There has been no previous analytical work on the performance of cycle stealing with switching times and thresholds. The analysis of cycle stealing without switching times and thresholds under exponential job size distributions has been studied by Green [61] and Stanford and Grassmann [185,186]. Since the analysis of cycle stealing involves 2D Markov chains even without switching times and thresholds, in [61,185,186], the 2D Markov chains are analyzed by truncating the state space (and the resulting 1D Markov chains are analyzed by matrix analytic methods (Section 3.2)).

Despite the lack of analytical study, cycle stealing has been implemented and used in various systems, including networks of workstations (NOWs). Below, we review cycle stealing mechanisms in NOWs, classifying them into three types, depending on the behavior when an idle workstation is reclaimed by its owner. Other computer systems with cycle stealing are also discussed briefly.

In the first type of cycle stealing mechanism, the jobs that have been running on an idle server are killed when the idle server is reclaimed by its owner. Butler [139] is a representative implementation of this type of cycle stealing. This cycle stealing model is also studied by Bhatt et. al. [20] and Rosenberg [163,164,165], who consider a theoretical problem of optimal cycle stealing strategy. This type of cycle stealing is the most conservative in that the cost (time) required for the owner to reclaim his idle machine is smallest. However, in this model, all the work that has been done on the idle workstation is lost on reclaiming.

In the second type of cycle stealing, the job that have been running on an idle workstation is thus migrated to other idle workstations, while keeping the state of the job. The cycle stealing model that we study in this chapter is close to this type. Condor [115,116], Sprite [48], and Benevolent Bandit [54] are representative implementation of this type of cycle stealing^6.1. A disadvantage of the first two types of cycle stealing mechanisms is that they cannot efficiently make use of short idle periods, and this is a motivation for the next mechanism.

In the third mechanism, the jobs that have been running on an idle workstation are preempted but keep staying on the workstation when the idle workstation is reclaimed by its owner. This mechanism for networks of workstations is implemented as Stealth Distributed Scheduler [109] and Linger-Longer [170]. Awerbuch et. al. [13] consider a problem of optimal policies in this mechanism.

Cycle stealing is also popular in Internet computing, which allows one to steal idle cycles of personal computers at home. Internet computing is used for the purpose of searching extraterrestrial intelligence, protein folding research, cancer research, AIDS research, finding large prime numbers, etc. Systems that support Internet computing include Javelin [41]. Other concepts related to load balancing or cycle stealing include dynamic resource allocation, which dynamically assign resources among multiple applications at data centers [9,39,38], and content distribution networks [107] such as Akamai. Adaptive parallelism is also related to cycle stealing; it dynamically reallocates processors to parallel applications, depending on the availability of the processors. Representative implementations of adaptive parallelism includes Piranha [36] and ATLAS [15]. More generally, the concept of using multiple systems as a single resource is studied in the context of grid computing [14,40,56]. System implementations towards facilitating grid computing includes AppLeS [18], Legion [62], and MOL [59].

Next: Results Up: Reducing switching costs in Previous: Analysis Contents

Takayuki Osogami 2005-07-19