An important problem in designing multiserver systems is how to set up system configurations, given a limited amount of resources and user requirements. For example, a system architect may want to decide whether he/she should buy a few fast (but expensive) servers or many slow (but cheap) servers to minimize mean response time, given a limited amount of budget. In this chapter, we study this problem in particular settings: Is it preferable to use a single fast server of speed , or slow servers each of speed to minimize mean response time? What is the optimal ? Although our analysis via DR extends to more general settings, taking into account the actual costs of servers as a function of their speed, this would complicate the analysis and obscure the insights into the fundamental nature of multiserver systems. Our goal in addressing the above problem is to illuminate principles on how the number of servers affects the performance of multiserver systems, which are useful in multiserver configurations.
The question of ``how many servers are best'' in exactly our settings has been asked by a stream of research [120,130,131,172,175,188], but all of this work is limited to the case where jobs are served in the order of their arrivals (first come first served, FCFS). However, real-world systems are often not simply FCFS, where all jobs have equal importance. Rather, there is often inherent scheduling in the system to allow high priority jobs to move ahead of low priority jobs. For example, the high priority jobs may carry more importance, e.g. representing users who pay more. Alternatively, high priority jobs may simply have small service demand, since giving priority to short jobs reduces mean response time overall. Either way, priorities are fundamental to many modern systems.
This motivates the question of what the optimal system configuration is (a few fast servers or many slow servers) in a setting where there are different priority classes. We specifically assume that the high priority jobs and the low priority jobs arrive according to independent Poisson processes, their service demands have phase type (PH) distributions (as defined in Section 2.2), and jobs within each class are served in FCFS order. With these assumptions, the system behavior can be modeled as a foreground-background (FB) process, as discussed in Section 3.4, and it allows us to analyze this system via DR, for the first time.
In particular, DR allows us to address the following questions:
We also compare the analysis via DR with existing approximations in the literature, and find that the existing approximations can lead to qualitatively misleading conclusions regarding the optimal number of servers. Since existing approximations have poor accuracy and DR becomes computationally expensive as the number of servers and priority classes increases, we propose a new approximation based on DR, which we refer to as DR-A (DR with class aggregation). DR-A allows us to efficiently analyze the performance of multiserver systems with many priority classes. We will study the error in the existing approximations and DR-A extensively, and find that the error in DR-A is within 5% for a range of parameters, which is much smaller as compared to the existing approximations.
The rest of this chapter is organized as follows. In Section 4.2, we discuss prior work dealing with our question of ``how many servers are best?'' In Section 4.3, we answer question 1 above in the case of an M/PH//FCFS queue with a single priority class. In Section 4.4, we address all the four questions above for the case of an M/PH/ dual-priority queue. In Section 4.5, we propose DR-A and study the accuracy of DR-A and existing approximations.