Load Balancing in Distributed Hash Tableschimdmi/pubs/master_thesis.pdf · 2012. 4. 4. ·...

Universita degli studi Padova - Facolta di IngegneriaCorso di Laurea Specialistica in Ingegneria Informatica

Load Balancing inDistributed Hash Table

Laureando: Michele Schimd

Relatore: Prof. Enoch Peserico

Co-Relatore: Dott. Paolo Bertasi

Tesi di Laurea Specialistica in IngegneriaInformatica

Anno Accademico 2008 - 2009

Contents

1 Introduction on P2P networks 51.1 P2P Characteristics . . . . . . . . . . . . . . . . . . . . . . . . 51.2 Unstructured and structured P2P networks . . . . . . . . . . . 71.3 The routing problem . . . . . . . . . . . . . . . . . . . . . . . 8

1.3.1 Routing in unstructured P2P . . . . . . . . . . . . . . 8

2 Distributed Hash Table (DHT) 92.1 Introduction to DHT . . . . . . . . . . . . . . . . . . . . . . . 9

2.1.1 Resource sharing . . . . . . . . . . . . . . . . . . . . . 102.2 DHT implementations . . . . . . . . . . . . . . . . . . . . . . 11

2.2.1 Chord . . . . . . . . . . . . . . . . . . . . . . . . . . . 112.2.2 Kademlia . . . . . . . . . . . . . . . . . . . . . . . . . 13

2.3 Caching and load balancing . . . . . . . . . . . . . . . . . . . 142.3.1 Caching strategies . . . . . . . . . . . . . . . . . . . . 142.3.2 Related works on caching . . . . . . . . . . . . . . . . 15

3 Caching algorithm: a case of study 173.1 The model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17

3.1.1 Routing model . . . . . . . . . . . . . . . . . . . . . . 183.1.2 Request model . . . . . . . . . . . . . . . . . . . . . . 19

3.2 Analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 203.2.1 Two keys analysis . . . . . . . . . . . . . . . . . . . . . 213.2.2 LRU-algorithm analysis . . . . . . . . . . . . . . . . . 25

3.3 Extensions to the model . . . . . . . . . . . . . . . . . . . . . 273.4 Conclusions . . . . . . . . . . . . . . . . . . . . . . . . . . . . 28

4 Lower Bound Analysis 294.1 Introduction to competitive analysis . . . . . . . . . . . . . . . 294.2 Oblivious adversary . . . . . . . . . . . . . . . . . . . . . . . . 30

4.2.1 The model . . . . . . . . . . . . . . . . . . . . . . . . . 304.2.2 The analysis . . . . . . . . . . . . . . . . . . . . . . . . 31

2

4.2.3 Number of subworlds . . . . . . . . . . . . . . . . . . . 324.3 Adversary pattern . . . . . . . . . . . . . . . . . . . . . . . . . 32

4.3.1 Number of requests for the popular world . . . . . . . 334.4 Load analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . 334.5 Generalized model . . . . . . . . . . . . . . . . . . . . . . . . 344.6 Proof of the bound . . . . . . . . . . . . . . . . . . . . . . . . 36

4.6.1 Adaptive offline adversary . . . . . . . . . . . . . . . . 364.6.2 Oblivious randomized adversary . . . . . . . . . . . . . 37

4.7 Conclusions . . . . . . . . . . . . . . . . . . . . . . . . . . . . 37

A Traces of ED2K server 41A.1 Experimental results . . . . . . . . . . . . . . . . . . . . . . . 41

A.1.1 Experimental datas . . . . . . . . . . . . . . . . . . . . 41

3

Abstract

Peer-to-peer (P2P) networks have received incresing interest during recentyears thanks to their flexibility and scalability. P2P networks implement acompletely serverless architecture where each participant (peer) acts bothas client and server. Roughly speaking there are two possible ways of im-plementing a peer-to-peer network: structued and unstructured. Unstruc-tured networks are simple to implement and easy to maintain but they wasteconsiderable amount of bandwidht during lookup procedure. Structured net-works are based on distributed algorithms called DHT (Distributed HashTable) which use a distributed data structure to decrease the bandwidthnecessary for lookup procedures. To achieve load balancing among nodesDHTs simply rely on evenly distribution of hash values performed by a con-sistent hash function, this is not sufficient in the real world applications whereitems are requested following a Zipf-like distribitution inducing strong skew-ness in item popularity. To deal with this issue several caching algrithmhave been proposed so far, the idea is to replicate the popular item on nodesother than the original owner and spread the entire load among all copies(original and cached) as fairly as possible. We will show in this work thata simple caching algorithm is uneffective until routing algorithm is modifiedto account balance problems. The analysis is perfomed using competitiveanalysis techniques which will allow us to obtain different bounds dependingon knowledge supplied to the adversary.

This thesis is organized as follow: chapter 1 gives a brief introduction topeer-to-peer networks, chapter 2 describes DHT and some issues related toload balancing, chapter 3 presents the study of performance for a particularcaching algorithm and a particular DHT implementation, chapter 4 presentsall our analysis on effectiveness of caching strategies over most of the DHTimplementation, appendix A reports some real traces for an ed2k servershowing skewness of item popularity, finally conclusions and future directionsare given at the end of this thesis.

4

Chapter 1

Introduction on P2P networks

Peer-to-peer (P2P) networks have received incresing interest during recentyears thanks to their flexibility and scalability. Many applications have beenproposed and implemented in P2P networks like distributed storage, VoIP(Voice over IP) and filesharing [1, 2], this last is sometimes wrongly identifiedwith P2P itself, but file sharing is only one of hundreds applications thatcould gain feature and improve performance on a completely distributed andself-organizing P2P network.

In this chapter are introduced some of the main characteristics of P2Pnetworks, at the end structered and unstructured P2P networks are pre-sented and briefly compared. A more detailed discussion of structure net-works (which are the main topic of this thesis) is presented in chapter 2.

1.1 P2P Characteristics

In this section are introduced and described some of the most importantfeature that P2P networks can offer, they are: serverless, self-organization,scalability and flexibility.

Serverless Most of the traditional network services are based on the client-server paradigm, well known example are: World Wide Web, e-mail, DomainName Server and many others. Almost all of widely used network services arebased on the client-server paradigm. Even though very simple to practicallyrealize such services have an intrisic weak point, the server is usually a singlevery poweful machine that is supposed to serve many clients, if the serveris not available for a period of time the service cannot be supplied duringthe same period. To overcome this weak point P2P networks implement acompletely serverless architecture where every machine act both as client

5

and server, this duality originates the name peer which will be used in thiswork to refer a node participating to the P2P network.

Self-organizing In P2P networks there isn’t a central control of networktraffic and sessions, nodes1 can join and leave the network at any momentwithout necessary advertising their behaviour, this introdues high churn inthe network which continuosly changes its topology. Algorithms controllingthe network have to deal with this issue and provide effective and efficientmethods to preserve connectivity even after sudden disconnections of somepeers.

Scalability One of the most important feature of P2P networks is scal-ability which assures that the network can grow up to several milions ofnodes without observing significant degradation on perfomance. Scalabilityissue has a direct impact on algorithm design for network maintainance, thehop-count (which is the number of intermediate nodes that have to been tra-versed during a lookup procedure) and store requirements should grow veryslow even when the number of peers N rapidly grows. It is quite commonto find implementaton of peer-to-peer networks that guarantee O(logN) re-quirements for both hop-count and storage.

Flexibility The key of P2P systems success is the possibility of use themas a general-purpose distributed platform over which deploy a custom dis-tributed application. This is usually hard to obtain because of the absenceof a central control, all informations must be spread throughout peers and,at the same time, they must also be efficiently available at any node whenneeded. Moreover different applications could have different requirements,for example a storage application could ask for service preserving integrityeven if this comes with higher access time, while VoIP probably needs areal time access to all transmitted packets with the possibility of, eventually,loosing some of the sent data as long as the quality of conversation is notcompromised. How all these different services are developed over a commonP2P platform strongly depends on the implementation of the network itselfand is still a hot topic faced by many research communities worldwide.

1terms node and peer are used interchangeably throughout this work

6

1.2 Unstructured and structured P2P net-

works

Roughly speaking there are two possible ways of implementing a peer-to-peernetwork:

unstructured networks: based on flooding discovering algorithms

structuerd networks: implementing a complex infrastructure in order tomaintain connectivity of various nodes.

While unstructured networks are simple to implement and easy to main-tain, participant nodes generate heavy traffic load wasting bandwidth which isone of the scarcest resource in P2P systems. A widely used and deployed un-structured P2P application is Gnutella [2]. Gnutella and applications basedon its protocol are rapidly becoming obsolete due to introdution of structuredpeer-to-peer networks. Structured networks are based on distributed algo-rithms called DHT (Distributed Hash Table), these techniques are the maintopic of this thesis and will be widely presented in chapter 2.

In both cases, network can be thought as a graph G = (V,E), each vertexv ∈ V corresponds to a peer2 while an edge e = (vi, vj) ∈ E connectingvertex vi and vj represents a link between the peers on that vertices, it isoften used the term neighbours for nodes that are connected through anedge. With the term link is here indicated the possibility of nodes connectedwith it to directly send a message using the underlying “physical” network(which usually is the Internet), in other words a link is a pair of IP address

and port number that allows a node to send a message to another one.Since no relationship is usually established between physical position (i.e.IP address) and graph position, it’s customary to call the P2P an overlaidnetwork meaning that edges of the graph generate a new network which istopologically super-imposed to the physical one.

It’s important to note that even though the network graph is a goodmathematical tool to describe the overlaid, none of the participant nodecould maintain all information about it, this would require a O(N2) spacewhich is far away from beeing a scalable solution. In this scenario the routingproblem (presented below) is not obvious and algorithm solving it determinethe efficacy of the P2P implementation.

2Note that a physical machine could host more than one peer and, hence, could bemapped into more than one vertex of the graph

7

1.3 The routing problem

In P2P overlaid networks the problem of routing can be described as follow:given the graph G = (V,E) representing the network, and given a routingreqest from source node s ∈ V to destination node d ∈ V , we want to finda sequence of node s, n1, n2, . . . , nk, d ni ∈ V i = 1, . . . k such that (s, n1) ∈E (nk, d) ∈ E and (ni, ni+1) ∈ E ∀i = 1, . . . , k − 1 this sequence is alsocalled a path from s to d. The number of intermediate nodes ni determinesthe efficiency of the routing algorithm. The most common situation forstructured P2P networks an algorithm that maintains k O(logN), this shouldguarantee a high degree of scalability of the routing algorithm.

1.3.1 Routing in unstructured P2P

Unstructured systems use flooding of the network to route messages, whennode s needs a route to node d it sends a RREQ (Routing REQuest) messageto all its neighbours, all peers receveing a RREQ for the first time, forward itto all their neighbours expect, eventually, that from which the message waspreviously received.

It’s clear that flooding, yet simple, is a bandwidth-consuming algorithm, asingle message could potentially reach every node of the network, it turns outthat, in the worst case, the number of hops needed to reach the destinationis O(N). When RREQs are frequent, the amount of generated traffic could belarge enough to induce a collapse of the entire network, when this happens allnodes in the network spend almost all of their bandwidth to forward requestsleaving few possibilites of transfering datas.

Finally it’s important to observe that time and bandwidth, consumedduring lookup, is overhead from the user point of view, this is only interestedon the “physical” resource.

8

Chapter 2

Distributed Hash Table (DHT)

In this chapter we present Distributed Hash Table (DHT) algorithms usedto implement structured P2P networks. First are presented the basic char-acteristics of DHT then two example (Chord and Kademlia) are discussed.At the end of the chapter is presented the caching problem around whichis developed this thesis. Subsequent chapters will expand caching and loadbalancing problems.

2.1 Introduction to DHT

DHT algorithms are used to avoid flooding as a routing procedure, the keyidea is to maintain information on network topology using a data structuredistributed among all participants so that a routing procedure is not com-pletely “blind”, rather it can use information stored at intermediate nodesto approach the target avoiding bandwidth waste and useless forwarded mes-sages.

First of all an identifier (ID) is assigned to every node connected tothe network, more precisely it’s common to deal with implementation wherejoining nodes self-assign an address chosen from a very large space. Commonimplemented DHT use 160 bits identifier created via a one way hash function(e.g. SHA-1). As will be explained later in this chapter it is important tochose a hash function that evenly distributes nodes on the addresses space.

The second step is to define a metric on the address space, the metric isused to establish a distance relationship between all nodes in the overlay sothat given two nodes n1 and n2

1, the function d(n1, n2) represents the distance(within the address space) between the two nodes. Routing procedures arethen implemented using an iterative greedy algorithm that decreases the

1We assume here that nodes are represented using their identifier

9

value of distance between the destination node d and the i-th intermediatenode ni.

2.1.1 Resource sharing

P2P networks are constructed to allow all nodes share their own resources(files, disk space, . . . ), in unstructured P2P networks resources are discoveredusing flooding, however in structured networks a more efficient method canbe implemented.

Suppose that node s (which we often refer to as the owner of the resource)wants to share a given resource, first it must use the “physical resource” (e.g.the file) to create an hash value for it. This hash function must:

1. produce values that are equally spread among the same addressspace of identifiers,

2. be known to every node in the network so that every peer wouldobtain the same value once owning the same resource.

After having created the hash value h of the resource the owner starts apublish procedure in which it identifies the node r (which we call the respon-sable of resource h2) such that d(r, h) is minimum for all connected nodesr. The key observation is that publish procedure uses the same algorithm ofrouting, this algorithm is often refered to as lookup. Once lookup terminatesreturning r as the responsable node for resource h, s sends a STORE messageto r for the resource h finally r inserts a record in a local table containing allinformations useful to contact s (i.e. ID, IP address and port number), atthis point the resource h has been published by peer s.

When a resource is needed a find procedure is initiated, the node willingto obtain the resource creates the hash value h and starts a lookup proce-dure obtaining a link to node r responsable for the resource h. When rreceives a request for resource h it sends back the entire table containing allinformations about nodes that have published that resource.

To augment availability of resource is a common practice to publish aphysical resource with many associated descriptors. For each descriptor(which could be, for example, a key word describing it) is created a hashvalue and a publish procedure is initiated. Since the focus of this thesis is onDHT algorithm we will always identify a resource with an hash value, fromthis point of view a resource has no concrete counter-part on the applica-tion (e.g. a file could be stored with several descriptors hence it would betranslated in several resources).

2Note that as for the nodes also resources are uniquely identified by their hash value

10

When first introduced DHTs were considered effective algorithms for P2Pstructured overlay, in fact on an ideal world they would present some desirablefeatures:

• lookup procedure can be implemented so that O(logN) intermediatenodes are involved,

• store requirement for maintaining the overlay is O(logN),

• if a good hash function is used, resources are evenly spread among allparticipant, in other words if there are M total resources and N totalnodes (M >> N is the common case) every node is responsible forM/N reources, which is the optimal load distribution,

• Network is self organizing in the sense that leaving and joining aresimple procedure and will not affect the functionality of the network(if some redundancy is inserted).

We will show in next chapters, however, that DHTs, as described above,do not perform well in real world even when caching techniques are used, thisis due to the skewness of requests generated by all participant. This is quiteintuitive if we think to file sharing applications in which there are few filesrequested many times and many files requested few times. This situationleads to a problem known as load balancing which is the core of this thesis.

2.2 DHT implementations

We present in this section two of the most succesfully DHT implementations,the first is Chord [19] which is widely discussed and studied in literature, andthe second is kademlia [13] which is widely deployed in popular filesharingapplications (e.g. eMule [1]). Other DHT implementations have been pro-posed so far [15, 10, 21, 16] all of which share the same principles describedin the previous section.

2.2.1 Chord

Chord [19] maps nodes into a logical ring where addresses wrap-around.Given an address n it is possible to identify one successor succ(n) and onepredecessor pred(n) on the logical ring, both predecessor and successor arecomputed modulo 2m where m is the size in bits of the address. A node nmaintains a finger list which is a table containing a link for succ(n+2i−1) i =1, . . . , logN .

11

From a logical point of view the i-th finger link defines a range of addressesfrom n + 2i−1 to n + 2i − 1 (as usual all operations are modulo 2m), toapproaching a resource the lookup algorithm finds out the proper range andthen forwards the request to its contact on that range, since the distancebetween the source address and the target address is (at least) halved ateach iteration, the total number of steps (which is actually the number ofintermediate nodes) is O(logN). Figure 2.1 shows an example of a Chord

Figure 2.1: Example of the Chord ring with m = 4

ring with m = 4, node 4 is the source of the lookup for identifier 2, 4 outgoingarrows represent the finger links, while the thick arrows represent links usedby the lookup procedure. At the beginning of the lookup 4 identifies 12 as thenext hop, the request is hence forwarded to 12 which iterates the procedureidentifying 0 as its next step, finally 0 have a direct link to 2 and forwardsthe message to the final destination.

In [19] load balancing was only achieved by using a standard consistenthash function for both identifiers and resources, however many improvementshave been proposed in order to ensure a fair load distribution, most of thissolutions resort to the concept of Virtual Server which are briefly explainedlater in this chapter.

12

2.2.2 Kademlia

Kademlia [13] uses an XOR distance function to implement the metric onthe overlay. Distance in kademlia is defined as d(s, d) = s ⊕ d, this metricdefines a binary tree topology for the address space were nodes are placedon the leaves of the tree. If we label with 0 the left arcs and with 1 rightarcs, the path from the root to a leaf defines the binary representation of theidentifier of that leaf (i.e. node). A node n = bm−1bm−2, . . . , b1, b0 maintainsa set of m = O(logN) buckets, for i = 1, . . . ,m the i-th bucket stores allknown links to nodes that:

1. share the same address prefix bm−1bm−2, . . . , bi and

2. have a different value on the bit bi.

Usually the bucket dimension is limited to k = 20 links to maintain a reson-able space requirements O(logN).

The routing procedure starts by individuating the first (starting from themost significant) different bit between source and destination address, thisbit indtifies a bucket j from which a link is chosen, the node selected is thenqueried in order to obtain the α nearest nodes it knows, these nodes are theninserted in the proper buckets, the lookup is then iterated for remaining bitsand stops when no nearest nodes are discovered. Since at each step (at least)one bit is adjusted, the lookup procedure takes O(m) = O(logN) iterations.

Figure 2.2: Example of a logical kademlia tree with m = 4

Figure 2.2 shows an example of a kademlia tree with m = 4, and onepossible path from node 13 to node 3. Depending on the content of bucketsand on the policy of selection of nodes from buckets, several paths could beconstructed from 13 to 3. In [13] authors proposed to organize buckets asqueues, the head of queue contains the most active node (i.e. that supposedto stay online longer than others).

13

2.3 Caching and load balancing

Both chord and kademlia don’t provide any specific method for load balancingthey simply relies on the evenly distribution of hash values (also called keys)among all participant. As already noted this is not sufficient because it iswell known that resource request pattern is strongly skewed due to itempopularity, it has been shown that items are requested following a Zipf-likedistribitution [7].

When highly skewed requests are generated not all nodes receive thesame quantity of messages, there are nodes (responsibles for popular items)that must serve more requests than others and in this case they becomebootlenecks of the entire system. The problem of spreading the entire load Levenly among all N participants in known as load balancing problem. Severalapproaches to load balancing has been proposed [6, 3, 18, 20, 17, 8, 12, 9]they can roughly categorized into virtual server and caching approaches.

Virtual server based apporaches attempts to balance load distribution bymoving peers on zones with high traffic, usually this is done by maintaininga set of virtual peers in different position of the address space, at one time isactivated only a subset (usually only one) of virtual peers, each of these peersact indipendently exactly as if they were running on different machines. Evenif wide applied this category of load balancing algorithm is not consideredin this thesis because they don’t offer a good shield against skewed requests,this is due to the fact that simply moving nodes would not alleviate load onpopular items owners, to decrease the effect of skeweness is then necessaryto implement some replication mechanisms.

2.3.1 Caching strategies

A more effective way of achieving load balancing is to replicate the sameresource on several nodes, more precisely many nodes are responsable for thesame resource, the operation of replicating items is known as caching. Toeffectively implement a caching strategy is necessary to finely tune severalparameters:

• placement strategy indicates on which nodes (with respect to therequest) place copies of the original item

• number of replicas indicates, for each item, how many cached copiescould the system maintain and

• eviction policy indicates how a node behaves when the cache requestsare larger than the available cache size, in this case algorithms should

14

smartly decide if an item has to been removed and, eventually, whichto remove.

Placement strategy In order to maximize effectiveness of the cache strat-egy, cache algorithms should guarantee that cached copies are hit with highprobability, to ensure that a copy is hit usually replicas are placed alongpaths leading to the original (i.e. the one determined by the lookup proce-dure) node. Several methods have been proposed in literature some claimingthat the optimal position placement is near to the requester other claimingthat copies should be placed near the receiver.

Number of replicas The more an item is retreived the more load is in-duced on original responsible and all cached copies, having many replicasof the same item could be useful as long as cached copies are hit, proposedstrategies don’t limit the number of replicas that a single item can have, ithas also been shown that the optimal number of copies is proportional to itspopularity [14].

Eviction policy All nodes participating an overlay, with cache strategyimplemented, are supposed to maintain a fixed amount of space for cachingpurposes, to simplify we can suppose that the cache size is proportional to thenumber of item that can be cached c ignoring that some items may requiremore space than others. When all c slots are used for cached items and anew cache request is received, a node must decide:

1. wether or not an old item must be removed to make room for the newone and

2. which item must, eventually, be removed

The set of rules governing above decisions are known as eviction policy.We will show in next chapters that no matter how all these parameters

are set there always exists a sequence of requests (a pattern) that “breaksdown” the caching strategy in the sense that caching turn out to be useless.

2.3.2 Related works on caching

Several strategies to load balancing structured networks have been proposed.Kaashoek and Stoica [11] introduced the concept of Virtual Server (VS). Onnetwork joining every peer self-assigns several addresses, each correspond-ing to a virtual positions in the identifier space, only one address is used at

15

the time, the selection of such address depends on the load observed. Re-cently several enhancements have been proposed, a more fine estimation ofthe number of virtual position is proposed by Ledlie and Seltzer [12], whileheterogeneity is treated by Chen et al. [5]. Even if traffic is balanced usingVS the problem of item popularity is not solved as already observed earlier.

To deal with item popularity several caching algrithm have been proposed,the idea is to replicate the item on nodes other than the original responsibleand spread the entire load among all copies (original and cached) as fairly aspossible. The problem is to find the optimal number of copies, the best placefor them and the eviction policy in such a way that the entire replicationscheme is effective. Gopalakrishnan et al [6] proposed to create a replicaon each node of the path from requester to original owner, while Bianchiet al. [3] observed that, as replicas move far from original position, theireffectiveness decrease, therefore they proposed to place replica only at thelast hop. Rao et al. [14] replicate on nodes of a k-ary tree rooted at theoriginal resource responsible where children are the k neghbours of a node,the replication scheme is similar to a breadth-first filling and the popularitems are supposed to be replicated in more tree levels than non-popularone. Shen [17] proposed to place replicas at nodes that have forwarded manyqueries for the item. Harvesf and Blough [8] proposed to place replicas onChord [19] ring so that they belong to disjoint paths, Tewari and Kleinrock[20] attempted to balance load by creating a number of replicas proportionalto the skewness of queries. It is also possible to reorganize routing table sothat the load is balanced by properly routing the requests, this approach wasproposed by Shen and Xu [18] and also implemented by Bianchi et al. [3].

Despite to all these tecniques, we found a way to generate requests sothat caching strategies becomes uneffective. All our studies and results arepresented in the next chapters.

16

Chapter 3

Caching algorithm: a case ofstudy

Our analysis of caching effectiveness on DHT based P2P networks startswith a simple, yet useful, case of study. The aim of this chapter is to showthat caching strategies are not always effective if their behaviour is known inadvance. This chapter presents a caching algorithm developed by Bianchi et.al. [3] and a slighlty modified version of it (where LRU: Least Recently Usedeviction policy is used). In both cases we will show that an ad-hoc sequenceof requests can be generated to “cheat” the caching policies. Although allthis chapater is based on a kademlia network, it can be easily adapted towork on different DHTs.

3.1 The model

To describe the behavior of load balancing algorithms in kademlia-based DHTnetworks, we must first provide a (simplified) model for kademlia. Let m bethe number of bits used for identifiers, hence the total number of nodessupported by the overlay is 2m. Suppose now (without loss of generality)that the network is full in the sense that every node with address ni ∈0, 1, . . . , 2m − 1 is connected (we will use either ni or i for refering a nodesince the identifier uniquely determines the node in the overlay), in otherwords the total number of nodes is exactly N = 2m.

Let M be the total number of keys stored in the overlay, note that tipicallyM >> N , this is clearly a problem since keys and nodes share the sameaddress space (i.e. every key j can be identified by an unique hash valuehj ∈ 0, 1, . . . , 2m − 1). This leads to inconsistency between our model andreal world due to our “full network” hypothesis. To further simplify our

17

Figure 3.1: A simply kademlia tree with m = 4

analysis we will suppose that each node is responsable for exactly one key(i.e. M = N) and we will also use the node address as the hash value of thatkey.

3.1.1 Routing model

The iterative lookup algorithm implemented in kademlia [13] is also simpli-fied, once a node generates a request this is routed to its destination choosinga random node on the nearest “half world” in which the target (the respon-sable of the requested resource) node is placed.

Consider the Figure 3.1 suppose that nodes 8 wants to perform a lookupover the address 3, the first query message is sent to a random choosen nodein the “half-tree” where 3 (the target) is placed (i.e. the tree containingnodes 0 through 7), suppose that such a node (i.e. node choosen by 8)is 5. In this case the request is forwarded to 5 which iteratively performsthe lookup algorithm choosing a random node in the “fourth-tree” where 3 isplaced, suppose that 5 chooses the node 2 at this point node 2 has no possiblechoices since it’s the (only) sibling of the target hence it (surely) will forwardrequest to the 3 which is the target node. Strictly speaking lookup proposedin [13] performs all iterations on the source node1, for our purposes this isequivalent because we are modelling the load on nodes in the overlay whichremains unchanged within a constant factor.

This routing model allows us to bound the hop number per request withthe following:

Hmax = dlog |j − i|e (3.1)

where i is the node that generates the request and j is the target value.We suppose moreover that a route from ni to nj once randomly created

remains the same across all the requests, this is a good approximation since

1Recall the kademlia lookup procedure explained in section 2.2.2

18

the entries in the bucket for kademlia tend to remain stable during an entiresession (expect for bootstrapping period).

3.1.2 Request model

The generation of request in the overlay is constructed in such a way thatthe load balancing protocol is heavily stressed that is, the time is dividedinto T intevals, at each interval t ∈ [0, T ) every node decides whethe ornot generate a request with probability pr, each request generation “hit”the j-th key with probability pj. Once the request is generated a routepath is constructed starting from the node i and ending on node j (notethat we use as target the same hash as the key since we are now supposingthat M = N so that the address space for identifiers and resources is thesame). Some semplification are used for treatability purposes, first of allnodes act indipendently from others in generating requests. We will assumethat all the probability distribution functions does not change during theperiod [0, T ). We will suppose that the events “generating a request” and“generating request for target” j are indipendent. Finally all the probabilitiesare assumed to be the same for every node on the overlay, so that propertiesstudied in a single node can be extended to all others.

From the above assumptions we can estimate the average number of re-quests generated at every node during the T cycles, indicating with Ri therequest generation process at i-th node, the expected number of generatedrequests is:

µr = E[Ri] = prT (3.2)

which is the same for every node.We can calculate the probability of generating a request for the key j at

the node i, given that a request is generated; this simplify to a product ofprobabilities thanks to the indipendance between two events:

P (ri = j) = prpj (3.3)

At this point we are able to calculate the average load (without load balancingand caching) for the node nj which is responsable for the key j:

Lj = TE

[N−1∑i=0

prpj

]= TNprpj (3.4)

An important parameter is also the average number of hop-per-request whichwill tell us how requests approach the target. Recall that (3.1) gives anupper bound for the number of hop, since our assumption is that every

19

node generates a request indipendently from all others, we can conclude thatthe average number of hop is m

2that is, a node ni starts the request and

must traverse (on average) “half word” before landing to the target. Thisis an important parameter cause indicates also the average number of nodesinvolved in the request.

3.2 Analysis

Using our simplified model we perform now an analysis of the requests bal-ancing through all nodes in the overlay. We start by taking a rough kademliaalgorithm where no replication mechanism is implemented so that a requestfor key j must reach the target nj before it could be satisfied. During thisanalysis we will indicate with 0 − cousins(nj) the node with the only lesssignificant bit of identifier different from nj (i.e. 0−cousins(nj) is the siblingof nj), 1 − cousins(nj) will indicate the nodes that shares the same m − 2most significant bits and are different in the (m − 1)-th bit (the less signif-icant bit is unimportant), with the same schema is defined k − cousins(nj)for all 0 ≤ j ≤ m− 1.

In our model we have that the requests in different nodes for the key jfollow the law:

L(j)(ni) =

Lj if i = j

Lj

2c+1 if ni ∈ c− cousins(nj)

(3.5)

where Lj is calculated in (3.4). The sibling of nj (i.e. 0 − cousins(nj))

receives Lj

2requests, the 1− cousins(nj) receive Lj

4requests each, and so on.

As consequence we can observe that item replication should be placed asclose as possible “relatives” of j. For example if we replicate key on nodex ∈ 3 − cousins(nj) only 1

16of the total requests will be intercepted by x

leaving nj with 1516

of the enitre load Lj, hence is not convenient to replicateacross a routing path from the request generator to the target except for thevery last hops of the path.

Finally let us calculate the distance:

2i ≤ d(x, nj) < 2i+1 if x ∈ i− cousins(nj) (3.6)

In the case of one single key makes no sense to discuss a replicationpolicy since there’s no eviction problem and every node x that see more thanCth (cache threashold) requests for the key j stores a replica for the key.Note that Cth could be calculate as function of d(x, nj) in order to maximizeeffectiveness and minimize the total number of cached copies.

20

3.2.1 Two keys analysis

We turn now our attention to the (more interesting) case where two (or more)keys are stored in the overlay, let say that those keys are j and k managed bynj and nk respectively. Consider a node x, there could be two possibilieties:d(x, nj) = d(x, nk) and d(x, nj) 6= d(x, nk). These two cases are equallyinteresting since there is no way to say that, given one of those condition, theoptimal replication policy is uniquely determined, this can (heavily) dependon the probabilities pj and pk. To better understand this fact consider thecase in which pj = 0.2 (pk = 0.8) and let x be a 4 − cousins(nk) and a2 − cousins(nj), by calculating L(j)(x) and L(k)(x) according to (3.5) theyare equal. More generally the following equality holds:

2cjpk = 2ckpj (3.7)

if the node is in ck − cousins(nk) and in cj − cousins(nj), (3.7) becomes:

log2

pk

pj

=ckcj

(pj 6= pk)

In our example ck

cj= 2 and pk

pj= 4. Note that while a node is always

able to compute (with only locally stored information) cj and ck, pj and pk

are assumed to be unknown and can only be estimated tracing the traffic“sniffed” by the node itself.

Algorithm 1 Caching algorithm proposed by Bianchi et. al.

After receiving Cth total requestsni ← local nodenj ← last request source nodeRefresh all values w[o] for all objects stored in ni

if (ni is loaded) thenpop obj ← most popular object stored by ni

nc ← nodes that sent most requests for pop objsend a cache request to nc for pop obj

end ifreset all counters

Consider now the replication algorithm proposed in [3] which is listedin Algorithm 1, the idea behind such algorithm is to replicate the item ifthe current node (which already has the item in the cache) is overloaded andplace the replication on the last hop that, more than others, requested it. Thealgorithm use some weighted values for deciding the threashold of replication(this feature is not explicitated in Algorithm 1, but it can be found in [3]).

21

Figure 3.2: A worst case example for Bianchi et. all algorithm

We will now demostrate that Algorithm 1 is worse than expected. Con-sider the situation of Figure 3.2 Assume now that the cache dimension ateach node is c = 1 (all nodes can store one replica), the total number ofrequests before cache reorganization starts Cth = 2 and ignore the value ofweights (β) which can only cause the sequence to eventually become longerand composed by several subsequence described below. Suppose now thatour request and routing model holds, and take the two routing examplesshown in figure, if n13 looks for key 6 it first sends a message to n7 whichroutes the message to the target n6, similary if n11 looks for the key 1 it firstcontacts n7 which will route the request to n1. Take now the sequences ofrequest generated by n11 and n13 respectively:

R11 = 〈1, 1, 1, . . .〉R13 = 〈6, 6, 6, . . .〉

The algorithm works as follow: when n1 receives two consecutive requests for1 from node n7 it sends a replication request to n7 for its item, in a similar wayacts n6, at this point n7 should decide which replication maintain between 1and 6, what will happen (with high probability) is that n7 switches its cachecontent between the two values (this is due to the fact that in [3] there is nopolicy for accepting or rejecting replications). In this case both n1 and n6

receive much more than the fair load, it is easy to prove that the load that anode receives is O(R) where R is the total number of requests generated forthe owned item. In other words the load experienced by n1 and n6 is the same(within a constant factor) as if there weren’t a cache strategy implemented.

Even in the case that n7 decides (somehow) to use one of the keys 1 and6 (which will probably be 6 cause it’s the nearest key) what will happen isthat, while n6 will receive less requests, n1 still must reply to all requestsfor item 1 generated by n11 because the replication will never reach n7 and,

22

moreover, never propagates through the route from n11 to n1. In this case n1

receives O(R) requests. What we have proved above is that the replicationalgorithm proposed in [3] is not strong without improving it (for exampleusing a random generated route from source to destination for every request).

We can generalize the above example in a very straightforward way: sup-pose that node nc stands in two (or more) different paths and suppose (forsimplicity) that it is the last hop node for one of these (we say that thisnode is a crossroads) then all nodes in any of the paths reaching nc will notparticipate in the replication mechanism expect that for the c keys cachedat nc (which will be probably the c nearest to nc). Since we can not (andwe don’t want) avoid crossroads in our network, we are now interested onoverlay performance in face of crossroads number and placement. First obs-verse that problem arises when two keys are in the same “half-tree” and thecache size is c = 1. In the general case M/2 keys are placed on N/2 nodesin first (last respectively) half-tree. The problem could be completely solvedusing a cache of size M/2 for each node, however this requires store spaceO(logN) at every node which is not scalable and practical. Our analysiswill first concentrate on calculaing how many crossoroads there could be ini − cousins(nj) j = 0, 1, . . . ,m − 1, after that we will calculate the worstcase behavior of any crossroads with respect to routing for nj, finally we willbe able to estimate the total payout of node nj with crossroads giving a lowerbound to performance of the entire algorithm. Generalizing this reasoning toall nodes sharing popular keys could give us a better understanding of loadbalancing across nodes in the overlay.

Number of crossroads calculation Consider now (for simplcity) a net-work with two only keys j and k managed, respectively, by nodes nj andnk, suppose moreover that distance between these two nodes is less than 2m

meaning that the two keys are mapped into nodes that are on the same half-tree, if not, no nodes are crossoroads for the two keys. Let be R(j) and R(k)

the total number of requests for item j and k respectively.In order to obtain the average number of crossroads we must before cal-

culate the probability p(j)i that a node ni stands on (at least) one route to a

generic node nh. The average number of requests per node is:

R(j)i =

R(j)

2l+1+ δ if ni ∈ l − cousins(nj) i = 0, 1, . . . ,m− 2 (3.8)

where

δ =M

2

1

2m−1=M

2m' 0

is the load induced by all nodes standing on the other half-tree than ni.

23

At this point we can conclude that ∀ni ∈ l − cousins(nj):

p(j)i =

1

2l+1 if 0 ≤ l ≤ m− 20 if l = m− 1

(3.9)

if δ ' 0 then:m−2∑h=1

R(j)

2h' R(j) ⇒

m−2∑h=1

1

2h' 1

and what defined in (3.9) is a probability distribution function. We can

assume that the probabilities p(j)i and p

(k)i are disjoint cause the routes are

randomly selected. Hence, indicating with p(j,k)i the probability that node ni

is a crossroads for keys j and k, we have that:

p(j,k)i = p

(j)i p

(k)i =

1

4

1

2l1+l2(3.10)

if ni ∈ l1 − cousins(nj) and ni ∈ l2 − cousins(nj).The expected number of crossroads is:

Nc =2m−2

2l1+l2(3.11)

For typical values of m and for ni close enough to nj and nk this valuecould be high. For example m = 16 (which is very low however it representsa network with 64K nodes) supposing that ni is 2-cousin of both nj and nk,we have:

Nc =214

24= 1024

Degradation of performance in face of crossroads We calculate nowhow many requests are are forwarded to the original owner node, when anode ni is a crossoroad for the keys j and k. As discussed above the numberof requests that ni receives for items j and k are (respectivley):

R(j)

2lij+1

R(k)

2lik+1

where lxy indicates the cousin degree between nx and ny. An important rela-tion between lxy and dxy (distance between nodes in the underlying metric)is the follow:

lxy + 1 = blog2 dxyc ⇒ lxy + 1 ≤ log2 dxy ⇒1

2lxy+1≥ 1

2log2 dxy

24

1

2lxy+1≥ 1

dxy

(3.12)

Calculating d, which is the average distance between two keys, will give us alower bound to E[1/(2lxy+1)] which is the average fraction of requests passingthrough a node x placed at distance dxy from the target y. Suppose that Nnodes are connected to the overlay and M keys are totally mapped on it.We focus our attention on the case if N < M because, even if this is not thedefualt situation, the number of popular keys is assumed to be smaller thanthe total number of nodes. In this case it is easy to see that:

d =N

M

Suppose now for simplicity that ni store a replication of the key for which ithas received the greater number of requests and define:

R = minR(j), R(k)

we can now calculate the average number of requests that ni does not satisfy(i.e. that are forwarded to nodes closest to the destination) is:

Li =R

2lxy+1≥ R

d=R ·MN

(3.13)

where we have used (3.10) and (3.12). Equation (3.13) says that as R −→ Nthe number of requests forwarded becomes higher, however R cannot exceedN2−1 because of its definition as the minimum betweenR(j) andR(k). Looking

to equation (3.13) we have started to develop a more generalized analysispresented in chapter 4.

3.2.2 LRU-algorithm analysis

Suppose now that Algorithm 1 still holds, but the node receiving a repli-cation request performs the (simple) pre-processing of request illustrated inAlgorithm 2. The node receiving a replication request decide whether or notaccept replication based on a simply LRU eviction policy that is, if the cacheis full and j is not present, then the object evicted from the cache is thatwhich has been accessed farthest in the past.

Our analysis is now focused on proving that, even with this modifica-tion, Algorithm 1 still offers pure performance in face of ad-hoc constructedsequence of requests. In the general situation suppose that the number of ex-changed keys is X+1 and the cache dimension is X. Let X = k0, k1, . . . , kXbe the set of all keys exchanged in the network, we say that a node ni is a

25

Algorithm 2 Simple LRU pre-process of replication request

/* On receiving of a replication request for item j */if (j /∈ C) then

if (C is full) theno← index of less recently accessed item in CC[o]← j

elseadd j to C

end ifend ifRefresh access information about item j

Figure 3.3: A worst case for LRU eviction policy

X − crossroads if ∀k ∈ X exists a path to nk traversing ni. It should beclear that having a cache dimension less than the number of keys, evictionproblems may arise because not all keys can be cached at intermediate nodesand, at least one of them, must be out of cache. The choice made by Algo-rithm 2 is what we call a least recently used eviction policy. We will restrictour analysis to the simple case where X = 3 without preventing our results.In Figure 3.3 is shown a simple network with 3 keys which are: 1, 4 and 6.Note that node n7 is a crossroads for all of those keys, even if in figure n7 islast-hop node for all such keys, the analysis can be performed also when thiscondition doesn’t hold (i.e. when n7 is an intermediate node but is not thelast). Suppose now that cache size in n7 is 2 (the number of keys minus 1)and consider the following requests sequence received by node n7:

R7 = 〈4, 4, 6, 6, 1, 1, 4, 4, 6, 6, 1, 1, . . .〉

it is easy to show that cache will be completely useless in this case andevery request causes a miss on the cache. Hence the corresponding request

26

Figure 3.4: Cache behavior on LRU eviction in face of ad-hoc requests se-quence

must be routed to the owner node (or to a node closest to it). In thiscase all nodes receive all the requests incoming for the keys they’re owningvanishing all the load balancing benefits of caching algorithm. In Figure 3.4is represented the general situation for a cache of size 2 with 3 keys exchangedand a sequence like above; for our example k = 4, j = 6 and h = 1. Notethat, at every iteration, the corresponding requested key is missed into thecache and, therefore, forwarding to the next hop is required.

3.3 Extensions to the model

Even if all our analysis is based on a simplified version of kademlia DHT,the extesion to real networks can be done in a very strayghtforward manner.Instead of talking about k−cousins(ni) we could extend the definition to thek−neighbour(ni) the important difference is that kademlia metric ensure thatgiven a node ni and d > 0 there exists only one address x (in the entire addressspace) such that: distance(ni, x) = d, in this case every k − neighbours(ni)set would contain at most one node. To avoid this problem we must provide adefinition that discriminates by taking (instead of the obvious one) a differentdefinition for k − neighbours(ni) set that takes into account the number ofhops remaining to reach ni in place of the real distance induced by theunderlying metric. Note that this choice is perfectly consistent with thedefinition of k−cousins(ni) in which the (maximum) number of hops neededto reach the target is k + 1 (since the 0 − cousins(ni) still needs a hop toreach ni).

27

3.4 Conclusions

The algorithm proposed in [3] seems to offer good performance during exper-imental evaluations however, our goal were to find out a particular sequence(or pattern) which could force a fail on the replication policy. Having founda way to construct ad-hoc sequences that vanish all efforts made by thealgorithm itself, suggested us that the right way of implementing cache algo-rithms is more complex that the one proposed in [3]. This arose the suspectthat caching algorithms are not sufficient to guarantee load balancing amongparticipants. We think that also routing algorithm should be designed sothat caching can be effectively implemented. Most of the caching algorithmsfails because the replication policy implemented is weak in two main point.

1. Replications are triggered by node which sense to be loaded while thenode receiving replication has not enough information to decide, ina active way, whether or not accept them. The only way it has toenforce the replication policy is to manage its own cache in a “smart”way, however we have shown above that a simple LRU policy is notsufficient.

2. When a node is “forced” by its successor on a path to cache an item, itmust maintain, and eventually propagate, the items stored in order toforce other nodes (coming before it in the path) to share the popularkeys. This could seem to be a good policy, but it can turn out to be avery harmfull way. In fact when a node has filled its cache with keys ithas no way of propagating replication for keys other than those storedin its cache which could be useful when popular keys are near the node.

28

Chapter 4

Lower Bound Analysis

The case of study presented in chapter 3 suggested us that caching algo-rithm can be useless when ad-hoc sequences of requests are generated, how-ever we need still to prove that our feeling does not depend on the specificcache algorithm and/or DHT implementation by showing that a more gen-eral analysis leads to same results. In this chapter we present a very generalanalysis on caching algorithm effectiveness. We will obtain a lower boundon the load experienced by some nodes in the overlay proving that a sim-ple caching algorithm is uneffective until routing algorithm is also modifiedto account balance between nodes. The chapter is organized in two parts,the first presents an analysis based on competitive analysis with an obliviousadversary where a kademlia-like DHT is used. When the adversary has noinformation about cached items (i.e. it doesn’t now the content of caches)we will prove that there must exist a node that experiences a Ω(

√N) load.

After developing this analysis we perform a further generalization so thatour approach still works on several DHT algorithms sharing some properties.On this generalized model we perform again a competitive analysis using anadaptive offline adversary and an oblivious randomized adversary, in boththese cases we will prove that even if caching is implemented, there mustexist a node experiencing a load Ω

(N

log N

).

4.1 Introduction to competitive analysis

The analysis perfomed in this chapter are based on competitive analysis tech-niques. The idea of competitive analysis is to bound the perfomance of anonline algorithm (i.e. the algorithm we want to study) when it is involvedin a two player game. The second player is called adversary (or offline algo-rithm), the adversary tries its best to make the online algorithm working as

29

worst as possible.During “the game” online algorithm runs on input constructed by the ad-

versary, it is common to give to the adversary all information about the stepsperformed by the online algorithm, with this power the adversary should beable to generate arbitrary bad inputs testing how online algorithm behaves inthe worst case. When this feature is not enough to heavily stress online algo-rithm, the adversary could be enhanced giving to it the possibility of lookingat the internal state of the algorithm so that the input sequence could befurther bad.

Based on the capabilities supplied, several class of adversaries can bedefined, we briefly describe here those classes that will be used in this work,a more detailed description as well as a more rich list of classes can be foundin [4].

Oblivious adversary The weakest adversary that can be used is calledoblivious adversary. The oblivious adversary knows nothing about the onlinealgorithm except for the input and the task that the online algorithm isgoing to perform. Using an oblivious adversary gives a very large bound onthe performance of the online algorithm, but it can be useful to bound thebehaviour of a wide class of algorithms regardless the specific implementation.

Adaptive offline adversary The strongest possible adversary that can beused id called adaptive offline adversary. This adversary knows everythingabout the online algorithm even (if any) the random number generator. Itcan be used to get the bound on the performance for the worst case executionof the online algorithm.

Oblivious random adversary The oblivious adversary can decide its“moves” during the game based on randomization, we will use an oblivi-ous adversary (i.e. adversary knowing nothing about the online algorithm)where the decisions are made in a random fashion.

4.2 Oblivious adversary

4.2.1 The model

The first analysis we perfomed on caching algorithm for DHT we againstarted from a kademlia-like DHT, the model is quite similar to that used inchapter 3. We will assume that the overlay is “full” with N = 2m nodes andM = N stored keys. The routing tables on single nodes are constructed in

30

a very regular way. For each bucket a single node is stored, this node corre-spond to node differing in only one bit of the address. For example for m = 4node 0 contains link to nodes 1, 2, 4 and 8. We will also suppose that linksare bidirectional that is, if nj has a link to node ni then ni has (in the properbucket) the same link to nj. Figure 4.1 shows an example of the routing

Figure 4.1: Example of the routing tabls organization for the oblivious ad-versary analysis

tables for nodes 0, 1 and 2 where only the link between these nodes are ex-plicitally depicted. A further hypothesis is that routing tables doesn’t changeduring the analysis which is a typical behaviour of kademlia-like networks.

4.2.2 The analysis

Let N be the number of nodes in the overlay and consider m neighbourscomponing the set of nodes with popular keys, we call this set popular world.Suppose, without loss of generality, that nodes in the popular world are num-bered starting from 0, so that they are: n0, n1, . . . , nm−1, we will suppose thatm is a power of two for calculation purposes. We first construct a partitionof the entire set of nodes into subworlds in the following way: M0 is thepopular world containing |M0| = m nodes, M1 is the set of m nodes sharingthe same prefix of M0, also this set contains |M1| = m nodes, practically M1

contains the nodes: nm, nm+1, . . . , n2m−1. Now we define the generic Mi setin a recursive way:

• The size of Mi is twice the size of Mi−1 (|Mi| = 2|Mi−1|).

• The nodes in Mi are the |Mi| = 2i−1m nodes immediatly following(with respect to the enumeration introduced in the model) nodes inMi−1.

The idea is to divide the entire network into subworlds that share same prefix(remember that m is a power of two) so that, with respect to the popularworld M0, they behave like routers since they receive requests from other

31

Figure 4.2: Partition of the overlay into sub-worlds

worlds and forward them closer to the target. Clearly every node on everyworld can send its own request. Figure 4.2 shows an example of the partitionexplained above. Before exactly compute the number of subworlds we recallthat

|Mi| = 2|Mi−1| ∀i = 2, . . . , m (4.1)

where we have assumed that there are m + 1 subworlds: M0,M1, . . . ,Mm.Equation (4.1) will be useful when defining the adversary pattern of requestsgeneration.

4.2.3 Number of subworlds

We have definined each world Mi, but we have not yet calculated how manyworlds are there in our partition. This is, however, a trivial task:

N = m+m∑

i=1

2i−1m = m

1 +m−1∑j=0

2j

= m (1 + 2m − 1) (4.2)

In equation (4.2) we have simply imposed that the m nodes componing M0,plus the nodes componing each subworld Mi, must equals N . We can nowcalculate the value of m once we set m = Θ(

√N)

m = log2

N

m= log2N − log2m = n− n

2=n

2(4.3)

where N = 2n. Notice that m = Θ(logN).

4.3 Adversary pattern

We explain now how the adversary chooses the order (which is however unim-portant) and the target of requests generated by every single node in the

32

overlay. We will give this pattern for every single subworld Mi with i 6= 0.Consider the subworld Mi since m is a power of two we know that |Mi| is apower of two as well, so that we can create a partition of Mi into Ma

i andM b

i with:

• Mai

⋃M b

i = Mi

• Mai

⋂M b

i = ∅

• |Mai | = |M b

i |

• Mai contains the |Mi|/2 nodes of Mi with lowest ID and M b

i containsthe remaining |Mi|/2.

At this point we can explain how every single node behaves.

1. ∀x ∈ M bi , x generates a request for a popular key k ∈ M0. All these

requests are forwarded to the next hop in the path to M0, and hencethey land into Mi−1. Note that, by construction of the links, each nodey ∈Mi−1 receives exactly one request from M b

i (i ≥ 2).

2. ∀x ∈ Mai , x generates a request for a non-popular key k /∈ M0. All

these requests land to the union of the Mj with j = 0, . . . , i− 2. Eventhough we’re not interested in these requests, is important to note thatthey are additive terms to the final computation of the load expiriencedby nodes.

4.3.1 Number of requests for the popular world

After having defined the request generation pattern we can calculate howmany requests are generated for the poplar world M0. The value can be ob-tained in a very simple way observing that every subworld, except M0, con-taining |Mi| nodes generates |M b

i | = |Mi|/2 requests for the popular world,since the size of M0 is m we can conclude that the total number of requestsfor popular keys is:

Lpop =N −m

2=N −

√N

2= Θ(N) (4.4)

4.4 Load analysis

To perform the analysis of load experienced by nodes in the popular world,we first of all find out how many out of the Lpop total requests land on M0

in a single round (i.e. the pattern described above applied once).

33

We define LMii = 0, . . . m as the total number of requests that a node in

Mi receives, this number depends only on nodes in Mj for j > i.Now we have to calculate what is the load LMi

for a single node in theworld Mi, note that thanks to the symmetry of our model we can find outthe load of a single node in Mi as representative of the entire set of nodesbelonging to that world. The load of x ∈Mi is the sum of all loads forwardedto x. Since the only nodes that can forward requests to x are those in Mj

for j > i we have that:

LMi≤ 1 + LMi+1

+ LMi+2+ . . . LMm−1 (4.5)

where LMm = 0 and the first term of the right side is the request generatedby the world Mi. Considering the worst case for equation (4.7) we see thatthe load at world Mi doubles with respect to the load at the world Mi+1,that is:

LMi≤ 2LMi+1

(4.6)

the base case of equation (4.6) is i = m − 1 (note that LMm = 0), we cannow calculate the exact load experencied by a node in Mi which is:

LMi≤ 2m−1−i =

√N

2i+1(4.7)

From equation (4.7) LM1 ≤√N/4 <

√N . This means that every node

x ∈ M1 receives a number of requests that is strictly less than the numberof nodes in the popular world M0. Suppose now that we are “sitting” on anode of M0, then we would see a total number of LM1 requests, the adversarycan choose such requests all different to each other forcing the requests tobe forwarded directly to M0. In other words we have proven that cachingalgorithm are uneffective on nodes of M1.

What we proved above is that all Lpop requests generated, and reachingM1, must be forwarded to M0, we have also seen in (4.4) that the number ofthe request is Θ(N) since each of

√N node in M1 receives the same number

of requests (for the simmetry of our model in the connection between nodes).We conclude that Θ(N) requests must be forwarded to

√N nodes, hence

there must exist at least one node x ∈ M0 such that load experienced by xis Ω(

√N).

4.5 Generalized model

Our analysis will prove that caching algorithms effectiveness can be lowerbounded in the sense that load balancing among nodes is not achieved. Let

34

L be the maximum load, we will prove that L = Ω( Nlog2 N

) when only load

generated by resource serving is considered and route load is ignored. Wewill first define some topoligical properties of the overlay useful to: constructa request generation pattern and perform the analysis.

Our approach resorts to competitive analysis tecqniques where the onlinealgorithm is represented by the caching strategy and the adversary generatesrequests from one node to another1. In order to apply our analysis the net-work should have some topological properties. It is easy to verify that almostall DHTs proposed in literature [19, 13, 16, 21, 10] own below properties and,therefore, are subject to our analysis.

Definition 1 We say that a DHT owns the neighbourhood property ifthere exist a subset M0 of neighbour nodes (which we call popular world)such that, for every route starting in x /∈ M0 and leading to y ∈ M0, afterpassing for the first time through a node z ∈M0 all subsequent nodes are stillin M0.

Definition 2 We say that a DHT owns the diameter property if, forevery node x, each route from x to y ∈ M0 will pass through (strictly) lessthan |M0| nodes not belonging to M0.

The basic idea is to hit as few nodes (i.e. those in M0) with as much requestspossible. The number of hit nodes cannot be too small otherwise the cachestrategy will create a “shield” made of cached copies, neither the generatedrequests can arbitrarly grow since they are bounded by the number of nodesN . This tradeoff guided the definition of the adversary pattern. We denotewith Mi i = 1, . . . , m the set of nodes that are i hop far from M0, we willsuppose (without loss of generality) that the size of Mi (denoted with |Mi|) iseven for all i. Similarly to what we did in section 4.2, the pattern is createdby first partitiong the sets Mi in Ma

i and M bi with |Ma

i | = |M bi |, request

generation is defined as follow:

• ∀x ∈M bi , x generates a request for a popular key k ∈M0,

• ∀x ∈Mai , x generates a request for a non-popular key k /∈M0.

Nodes in M0 are excluded from the above pattern. An important parameteris the number of requests for popular world calculated as:

Lpop =N − |M0|

2(4.8)

which is proportional to N . If no cache strategy were implemented all theLpop requests should be served by nodes in M0 leading to a load Ω( N

|M0|). Wewill show that this load can be achieved even when caching is in place.

1Recall our assumption that each node owns exactly one one key

35

4.6 Proof of the bound

Our proofs resort to the competitive analysis approach, the online algorithmis represented by the caching strategy and the adversary decides how requestsare generated. We will perform two different analyses depending on thetype of adversary: adaptive offline and randomized oblivious. To prove thatL = Ω( N

|M0|) we have to properly set: the size of popular world M0 and theway in which the adversary decides the target nodes.

The key idea is to prove that, regardless of adopted strategy, the onlinealgorithm cannot take advantage of cached items. This guarantees that allrequests will reach M0 so that the bound is obtained by dividing the entireload calculated with (4.8) by |M0|.

4.6.1 Adaptive offline adversary

In this first case we give the adversary the ability to look at all caches beforedeciding the target resource. During this analysis we set |M0| = log2N andcache size c = 1 that is, every node is able to store a single replication. Theadversary behaviour is here explained. First the generator node is chosenamong those not yet considered during the current round2, then a path lead-ing to in M0 is selected. Thanks to the diameter property, we know that thenumber of hops is at most |M0 − 1|. By looking to the intermediate nodes,the adersary learns about all the |M0−1| cached items. Since there are fewercached items than popular keys, there must exist a key k, owned by x ∈M0,not cached in any of the intermediate nodes. The adversary chooses k astarget for the next request.

With the above pattern we are sure that a request for popular worldreally reaches it. Now we use the neighbourhood property, to state that arequest must be served by a node y ∈ M0 proving that the load on M0 isexactly Lpop. Now using (4.8) with |M0| = log2N and assuming that all loadis evenly spread on nodes in the popular world3, we can conclude that:

L ≥ Lpop

|M0|=N − log2N

2 log2N(4.9)

hence L = Ω( Nlog2 N

).

2We limit our analysis here to a single round where N − |M0| different nodes performa single query. Generalization to more rounds can be done in a very straightforward way.

3Note that this is the best case for balancing among nodes in M0

36

4.6.2 Oblivious randomized adversary

The above proof is heavily based on the (strong) assuption the the adver-sary knows everything about cache contents. We perform the next analysisusing a weaker adversary, known as oblivious randomized adversary, which iscompletely blind and cannot look at cached items. We set |M0| = qc log2Nand we allow the adversary only to randomly choose a key in M0. We use aslightly modified diameter property requiring that a path to M0 contains atmost log2N − 1 intemediate nodes4.

We first need to calculate the expected number of requests reaching M0

when the adversary uniformly choses the terget. Since there are at mostc(log2N−1) cached items at intermediate nodes, we obtain a hit probability :

p >(q − 1)c log2N

qc log2N=q − 1

q(4.10)

with 1 − p representing the probability that a request reaches the popularworld.

The whole above process is equivalent to performing Lpop Bernoulli ex-periments for which the expectation is:

E[LM0 ] = Lpop(1− p) =Lpop

q(4.11)

Combining (4.8), (4.10) and (4.11) gives:

L = Ω

(N

qc log2N

)(4.12)

where L has to be interpreted as the expected maximum load.

4.7 Conclusions

In this chapter we presented our analyses on caching effectiveness, obtainedresults are summurized in Table 4.1. It is important to note that each boundwe calculated above is (within a constant factor) the same we would haveobtained if caching algorithm was not implemented. This is a strong indica-tion on the failure of caching algorithms and the necessity of a more generalapproach on load balancing.

4This property is very similar to that defined in Definition 2, but it is easy to verifythat this is a stronger assumption since M0 − 1 = qc log N − 1 ≥ log N − 1 for qc > 1

37

Table 4.1: Summury of obtained bounds

Adversary type Subworld dimension Maximum load

Oblivious√N Ω(

√N)

Adaptive offline Θ(logN) Ω(

Nlog N

)Oblivious randomized Θ(logN) Ω

(N

log N

)

Anothe key observation is that all results discussed are indipendent fromthe model we provided at the beginning. In other words our assumptions areuseful only for calculation purposes and do not effect the final results. Toproove this claim let us discuss all hypothesis made throughout this chapter.

1. Full network: is useful only to define the request generation pattern, ifsome nodes are missed on the overlay, then links to that nodes can bereplaced with others standing on the same bucket.

2. M < N allows us to cope with the hypothesis of full network (asdiscussed in chapter 3), however it would be sufficient that exist Θ(N)keys with Θ(|M0|) of them beeing popular to validate our analyses.

Another important assumption is that popular items are placed amongnodes that are neighbours. This could seem a too much restrictive hypothesis,however we should observe that using a consistent hash function we haveno control about stored item positions and, theoretically, a situation withpopular items very close to each other is possible. Moreover the patterndefined above is not only useful for bound caching effectiveness, but it canalso be used to create Denial of Serive (DoS) attacks on nodes of the popularworld. These attacks could partially threat usability and functionality of theentire network.

38

Conclusions and futuredirections

DHT are widely used algorithms to construct overlay P2P networks, howeverthey strongly rely on consistent hashing to balance load among all partic-ipants. The net result is that, when used in real world, DHT offer poorperformance due to unbalanced popularity of stored item. To cope withthis issue many load balancing algorithms have been proposed, even whenexperimental results showed that proposed methods improve overall systemperformance, we have proved that none of these techniques are completelyeffective, they are subject to bad request pattern that can completely avoidtheir effectiveness.

The reason of failure of caching methods for load balancing is mainlydue to the lack of integration with the routing procedure. Routing proce-dure determines some zones of the network that contain neighbours nodes,such nodes can be heavily queried together in order to increase the overallload experienced. The key observation is that our analysis holds as long asneighbourhood and diameter properties are owned by the DHT.

The future implementation of DHT and caching, hence, cannot be treatedas disjoint problem, the main contribution of this work is to explicit the strongcorrelation between these two aspects, if this relationship will be ignored itwill be probably impossible to create e completely efficient DHT implemen-tation. The ideal solution would be to create DHT overlay networks wheretwo sets of nodes cannot be neighbour for all nodes in the overlay, to thebest of our knowledge there is no work on this topic and we believe that thisshould be the future directions of research in DHT field.

Having DHT algorithms able to self-balancing load among all connectednodes, would further increase the intrisic scalibility of the network as wellas the efficiency of all applications using the underlying infrastructure. AlsoInternet traffic could be decreased if the overlay is balanced leading to abetter utilization of the bandwdth which is, actually, a scarce yet wastedresource.

39

On the other hand P2P applications are growing at high pace and possibleutilizations of a reliable, efficient and secure P2P platform are uncountable.Many reaserch communities still work on DHT related topics like: rangequeries and multi-attribute queries, all of these aspect could gain importanceand applicability over a completely self-balancing DHT networks. All thesefeature could lead to a new generation of DHT that may support all kindof search primitives, becoming a real base for developing distributed appli-cations like: storage, databases and, obviously all traditional client-serverapplications: web server, domain name server, mail server, . . .

40

Appendix A

Traces of ED2K server

A.1 Experimental results

To validate our analyses we need to prove that results presented in this thesiscan be observed also in practical scenarios.

A.1.1 Experimental datas

Collected datas refers to an ed2k server running 24 hours per day from De-cember 22nd 2008 to July 14th 2009. Since we were interest only in popular-ity of first ranked search keys, we first purged all non ASCII queries and allqueries with more than a single token. After this we further limit the size ofanalyzed data to: the first 1024 and 64k search keys. Table A.1 contains moreinteresting measures extracted from the traced datas. We have supposed thateach item is owned by a single node hence the number of items consideredis also the number of nodes in the overlay N . The number of queries rowcounts how many queries have been totally generated, this is the entire load

Table A.1: Summury of traced data

1k 64k

Number of item 1024 65536Number of queries 862789 1966554

Items with 50% of load 142 1613Nodes with 50% of load 13,87% 2,46%Load at first logN nodes 13,622% 7,963%

41

that nodes should eqaully share. We also calculated how many items are in-volved in the 50% of the total queries, this value gives a rough indication ofthe skewness, as reported in Table A.1 much of the load is managed by verysmall fraction of the total number of nodes, for the 64k case only the 2, 46%of nodes should serve the 50% of load. Finally another interesting measure isthe percentage of load globally experienced by the logN most loaded nodes.This measure is quite interesting in our case since we showed in the previousanalyses that Θ(logN) can receive great amount of load. Figure A.1 shows

Figure A.1: Frequency of the top 1024 ranked queries for ed2k trace

the distribution of popularity for the top 1024 keys queried during our traces.The figure shows a clear unfairness in requested keys validating once againour assumption of a zipf-like distribution for item popularity. Our analysispresented in chapter 4 strongly relied on the hypothesis that the first c logNranked keys received most of the load, table A.1 and figure A.2 validate thishypothesis. Once we eliminate the first ranked search key (which is quitemore popular then the others) we see in figure A.2 that there are a bounchof keys (all around a relative frequency of 0.01) that receive almost the sameload. Figure A.2 represents a focus on the top 50 positions of figure A.1,then we concluded that also our hypothesis of almost fairness between thehighest request keys is experimental verified and hence our model gives agood mathematical descripion of real scenarios.

42

Figure A.2: Frequency of the top 50 ranked queries for ed2k trace

We finally estimated the exponent of the zipf-like distribution for ourtraced datas. Starting from the simplest version of Zipf-distribution:

r(i) =K

αi(A.1)

we searched the value of α that best fits our experimental datas. K wasset so that r(i) = 1 and then a Minimum Squared Distance has been usedto minimize the distance between theoretical values given by (A.1) and realdata interpolation. The result is that the distance is minimized for α = 0.66which is quite coherent with other works in literature [7].

43

Bibliography

[1] The emule project: http://www.emule-project.net.

[2] The gnutella protocol specification: http://wiki.limewire.org/index.php.

[3] S. Bianchi, S. Serbu, P. Felber, and P. Kropf. Adaptive load balancingfor dht lookups. In Computer Communications and Networks, 2006.ICCCN 2006. Proceedings.15th International Conference on, pages 411–418, Oct. 2006.

[4] Allan Borodin and Ran El-Yaniv. Online computation and competitiveanalysis. Cambridge University Press, New York, NY, USA, 1998.

[5] Zhi Chen, Guowei Huang, Jing Dong Xu, and Yang Yang. Adaptive loadbalancing for lookups in heterogeneous dht. In Embedded and UbiquitousComputing, 2008. EUC ’08. IEEE/IFIP International Conference on,volume 2, pages 513–518, Dec. 2008.

[6] V. Gopalakrishnan, B. Silaghi, B. Bhattacharjee, and P. Keleher. Adap-tive replication in peer-to-peer systems. In Distributed Computing Sys-tems, 2004. Proceedings. 24th International Conference on, pages 360–369, 2004.

[7] Ashish Gupta, Peter Dinda, and Fabian Bustamante. Distributed pop-ularity indices. In in Proceedings of ACM SIGCOMM, 2005.

[8] C. Harvesf and D.M. Blough. The effect of replica placement on routingrobustness in distributed hash tables. In Peer-to-Peer Computing, 2006.P2P 2006. Sixth IEEE International Conference on, pages 57–6, Sept.2006.

[9] M. Hefeeda and O. Saleh. Traffic modeling and proportional partialcaching for peer-to-peer systems. Networking, IEEE/ACM Transactionson, 16(6):1447–1460, Dec. 2008.

44

[10] M. Frans Kaashoek and David R. Karger. Koorde: A simple degree-optimal distributed hash table. In Kaashoek and Stoica [11], pages98–107.

[11] M. Frans Kaashoek and Ion Stoica, editors. Peer-to-Peer SystemsII, Second International Workshop, IPTPS 2003, Berkeley, CA, USA,February 21-22,2003, Revised Papers, volume 2735 of Lecture Notes inComputer Science. Springer, 2003.

[12] J. Ledlie and M. Seltzer. Distributed, secure load balancing with skew,heterogeneity and churn. In INFOCOM 2005. 24th Annual Joint Confer-ence of the IEEE Computer and Communications Societies. ProceedingsIEEE, volume 2, pages 1419–1430 vol. 2, March 2005.

[13] Petar Maymounkov and David Mazieres. Kademlia: A peer-to-peerinformation system based on the xor metric. In IPTPS ’01: RevisedPapers from the First International Workshop on Peer-to-Peer Systems,pages 53–65, London, UK, 2002. Springer-Verlag.

[14] Weixiong Rao, Lei Chen, Ada Wai-Chee Fu, and YingYi Bu. Optimalproactive caching in peer-to-peer network: analysis and application. InCIKM ’07: Proceedings of the sixteenth ACM conference on Conferenceon information and knowledge management, pages 663–672, New York,NY, USA, 2007. ACM.

[15] Sylvia Ratnasamy, Paul Francis, Scott Shenker, and Mark Handley. Ascalable content-addressable network. In In Proceedings of ACM SIG-COMM, pages 161–172, 2001.

[16] Antony I. T. Rowstron and Peter Druschel. Pastry: Scalable, decentral-ized object location, and routing for large-scale peer-to-peer systems. InMiddleware ’01: Proceedings of the IFIP/ACM International Confer-ence on Distributed Systems Platforms Heidelberg, pages 329–350, Lon-don, UK, 2001. Springer-Verlag.

[17] Haiying Shen. Ead: An efficient and adaptive decentralized file replica-tion algorithm in p2p file sharing systems. In Peer-to-Peer Computing ,2008. P2P ’08. Eighth International Conference on, pages 99–108, Sept.2008.

[18] Haiying Shen and Cheng-Zhong Xu. Elastic routing table with prov-able performance for congestion control in dht networks. In ICDCS ’06:Proceedings of the 26th IEEE International Conference on Distributed

45

Computing Systems, page 15, Washington, DC, USA, 2006. IEEE Com-puter Society.

[19] Ion Stoica, Robert Morris, David Karger, M. Frans Kaashoek, and HariBalakrishnan. Chord: A scalable peer-to-peer lookup service for internetapplications. In SIGCOMM ’01: Proceedings of the 2001 conferenceon Applications, technologies, architectures, and protocols for computercommunications, pages 149–160, New York, NY, USA, 2001. ACM.

[20] S. Tewari and L. Kleinrock. Proportional replication in peer-to-peernetworks. In INFOCOM 2006. 25th IEEE International Conference onComputer Communications. Proceedings, pages 1–12, April 2006.

[21] B.Y. Zhao, Ling Huang, J. Stribling, S.C. Rhea, A.D. Joseph, and J.D.Kubiatowicz. Tapestry: a resilient global-scale overlay for service deploy-ment. Selected Areas in Communications, IEEE Journal on, 22(1):41–53, Jan. 2004.

46

List of Figures

2.1 Example of the Chord ring with m = 4 . . . . . . . . . . . . . 122.2 Example of a logical kademlia tree with m = 4 . . . . . . . . . 13

3.1 A simply kademlia tree with m = 4 . . . . . . . . . . . . . . . 183.2 A worst case example for Bianchi et. all algorithm . . . . . . . 223.3 A worst case for LRU eviction policy . . . . . . . . . . . . . . 263.4 Cache behavior on LRU eviction in face of ad-hoc requests

sequence . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 27

4.1 Example of the routing tabls organization for the obliviousadversary analysis . . . . . . . . . . . . . . . . . . . . . . . . . 31

4.2 Partition of the overlay into sub-worlds . . . . . . . . . . . . . 32

A.1 Frequency of the top 1024 ranked queries for ed2k trace . . . . 42A.2 Frequency of the top 50 ranked queries for ed2k trace . . . . . 43

47

List of Tables

4.1 Summury of obtained bounds . . . . . . . . . . . . . . . . . . 38

A.1 Summury of traced data . . . . . . . . . . . . . . . . . . . . . 41

48

Load Balancing in Distributed Hash Tableschimdmi/pubs/master_thesis.pdf · 2012. 4. 4. ·...

Documents

Transcript of Load Balancing in Distributed Hash Tableschimdmi/pubs/master_thesis.pdf · 2012. 4. 4. ·...