Designing robust resource allocation policies (Chapter 7)

A common problem in multiserver systems is deciding how to allocate resources (e.g. operators, CPU time, and bandwidth) among jobs. A good resource allocation policy that maximizes system performance often has parameters (e.g., thresholds) that need to be tuned. Since the optimal settings of the parameters typically depend on environmental conditions such as system loads, a resource allocation policy whose parameters are chosen to achieve the best performance under a certain load can provide poor performance when the load fluctuates or when the estimation/prediction of the load is wrong.

Designing resource allocation policies that are robust against load fluctuations and misestimation is important, since irregularity of arrival processes has been observed in various systems in various ways, including long range dependence, self-similarity, time of day effect, and flash crowd (also known as slashdot effect). Recent studies show long-range dependence or self-similarity in arrival processes in various computer and communication systems, including the Internet [46,114,158,206,207], supercomputing centers [183], web servers [80,81], and MPEG streams [110]. Squillante et al. show that correlation in arrival process can have a big impact on the system performance [183]. Time of day effect [35] has been observed in web servers [10,81,80] and supercomputing centers [53,78,208]. Flash crowd [10] and Slashdot effects [199] refer to the phenomenon that it is not uncommon to experience a more than 100-fold increase in demand when a site becomes popular, and these effects are often observed in web servers.

In Chapter 7, we will design and study characteristics of various resource allocation policies for the Beneficiary-Donor model (see Figure 1.11), which frequently comes up in the literature [4,17,58,61,71,105,125,176,181,182,185,186,205]. Whereas the standard metric for evaluating resource allocation policies for the Beneficiary-Donor model is the (weighted) mean response time, we choose to consider an additional metric, which we refer to as robustness. We introduce two types of robustness: static robustness and dynamic robustness. Static robustness measures robustness against misestimation of load. To evaluate static robustness, we analyze the mean response time of resource allocation polices for a range of loads to see how a policy tuned for one load behaves under different loads. Dynamic robustness measures robustness against fluctuations in load. To evaluate dynamic robustness, we analyze the mean response time of resource allocation policies under fluctuating load.

**Figure 1.11:** The Beneficiary-Donor model. Jobs arrive at queue 1 and queue 2 with average arrival rates $\lambda_1$ and $\lambda_2$ , respectively. Jobs have exponentially distributed service demands; however, the running time of a job may also depend on the affinity between the particular server and the particular queue. Hence, we assume that server 1 processes jobs in queue 1 with rate $\mu_1$ (jobs/sec), while server 2 can process jobs in queue 1 with rate $\mu_{12}$ (jobs/sec) and can process jobs in queue 2 with rate $\mu_2$ (jobs/sec).
	$\includegraphics[width=.24\linewidth]{fig/model/affinity.eps}$

Although various resource allocation policies in the Beneficiary-Donor model have been studied in the literature, the majority of the prior work has investigated only the optimality of these resource allocation policies in limiting or special cases. With respect to the analysis of mean response time, only coarse approximations exist for most of the resource allocation policies in the Beneficiary-Donor model, since an exact analysis would involve multidimensional Markov chains. DR allows us to analyze the mean response time, as well as the static and dynamic robustness, of these and other resource allocation policies.

We will see that single-threshold (resource) allocation policies have either poor mean response time or poor static robustness. This motivates us to introduce a multi-threshold allocation policy, whose threshold value is self-adapted to the load, and show this has significant advantage over single-threshold allocation policies with respect to achieving both low mean response time and good static robustness. Surprisingly, we will find that the use of multiple thresholds offer only small advantage over a single-threshold with respect to dynamic robustness. That is, a (single-threshold) policy that has poor static robustness can excel in dynamic robustness.