next up previous
: Discussion : Object Placement : Clustering

Load Shedding

The preceding heuristic attempts to cluster mutually interested objects together in order to reduce the number of replicas in the system. However, it does not attempt to balance the load on servers in the system. The object placement component uses a separate load shedding heuristic to offload objects from a node when it is too heavily loaded (i.e., above $ highwater$). However, this heuristic must also try to avoid increasing the number of replicas in the system to reduce the danger of oscillations resulting with the clustering heuristic. It proceeds as follows:

When $ load_n > highwater$, node $ sid_n$ attempts to offload primaries from itself to more lightly loaded nodes in the system. To decide which primary objects to offload, $ sid_n$ first traverses the (undirected) interest graph formed by the objects (both primary and replica) in its object store and finds the connected components formed by interest links. The motivation here is that a component can be migrated to another node without increasing aggregate load in the system since no object in the component maintains a link to $ sid_n$. Hence, at worst, migrating the entire component only maintains existing inter-node interest links and does not create new ones. We can actually ignore the replicas in the interest graph when forming components; however, by including replicas in the component, we are guaranteed to eliminate them away from $ sid_n$ when migrating the entire component, since no other object on $ sid_n$ is interested in them. Hence, the savings that $ sid_n$ obtains by migrating a component $ C$, the set of primary objects in the component, is:

$\displaystyle savings(C) ~~=~~ \sum_{x \in C} f_x + \sum_{k \neq n} weight(C,k)~,$    

where:

$\displaystyle weight(C)$ $\displaystyle =$ $\displaystyle ~~\sum_{x \in C} u_x \cdot interested(x,n) + \sum_{y \in I_n} u_y~,$    
$\displaystyle I_n$ $\displaystyle =$ $\displaystyle ~~\{ y ~\vert~ \forall k \neq n, y \in interests(C,k) \}~,$    

(i.e., the remote interests of objects in $ C$). In other words, migrating $ C$ saves $ sid_n$ the fixed cost of each object in $ C$, the cost of replicas it is maintaining for each object in $ C$, and the cost of each remote replica objects in $ C$ are interested in.

The load discovery primitive described above is used to discover nodes that are below $ lowwater$. A random subset of these nodes is picked as migration candidates (to avoid everyone offloading to the same nodes). $ sid_n$ can decide to migrate a component of objects $ C$ to node $ sid_m$ if:

$\displaystyle highwater ~~<~~ load_m + \sum_{x \in C} f_x + \sum_{k \neq m} weight(C,k)~,$    

in other words, if migrating the component will not bring $ sid_m$ over the high water mark. Note again that this is a conservative estimate, since $ sid_m$ may already be maintaining some of the replicas in $ I_m$.

We consider migrating smaller clusters first, because they are less likely to oscillate between nodes than larger ones and they impact fewer objects. The candidates are examined in ascending order by their load and the heuristic offloads clusters to the candidate with the lightest load (updating the estimated load as we decide to move a cluster to a node) until the savings brings $ sid_n$'s load below $ highwater$.


next up previous
: Discussion : Object Placement : Clustering
Ashwin Bharambe 平成17年3月2日