In this chapter, we consider the particular model of a distributed server system in which hosts are homogeneous and the execution of jobs is nonpreemptive (run-to-completion), i.e. the execution of a job cannot be interrupted and subsequently resumed. We consider both the case where jobs are immediately dispatched upon arrival to one of the host machines for processing (the immediate dispatching model), and the case where jobs are held in a central queue until requested by a host machine (the central queue model). The immediate dispatching of jobs is sometimes crucial for scalability and efficiency, as it is important that the router not become a bottleneck. On the other hand, the central queue model allows us to delay the decision of where to assign jobs, and this can lead to better utilization of resources and lower response time of jobs.
The immediate dispatching and central queue models with the nonpreemptive assumption are motivated by servers at supercomputing centers, where jobs are typically run to completion. The schedulers at supercomputing centers typically only support run-to-completion within a server machine for the following reasons. First, the memory requirements of jobs tend to be huge, making it expensive to swap out a job's memory for timesharing [52]. Also, many operating systems that enable timesharing for single-processor jobs do not facilitate preemption among several processors in a coordinated fashion [157]. Our models are also consistent with validated stochastic models used to study a high volume web sites [82,184], studies of scalable systems for computers within an organization [166], and telecommunication systems with heterogeneous servers [24].