Performance of Random Walks in One-Hop Replication Networks

Random walks are gaining much attention from the networks research community. They are the basis of many proposals aimed to solve a variety of network-related problems such as resource location, network construction, nodes sampling, etc. This interest on random walks is justified by their inherent properties. They are very simple to implement as nodes only require local information to take routing decisions. Also, random walks demand little processing power and bandwidth. Besides, they are very resilient to changes on the network topology. Here, we quantify the effectiveness of random walks as a search mechanism in one-hop replication networks: networks where each node knows its neighbors' identity/resources, and so it can reply to queries on their behalf. Our model focuses on estimating the expected average search time of the random walk by applying network queuing theory. To do this, we must provide first the expected average search length. This is computed by means of estimations of the expected average coverage at each step of the random walk. This model takes into account the revisiting effect: the fact that, as the random walk progresses, the probability of arriving to nodes already visited increases, which impacts on how the network coverage evolves. That is, we do not model the coverage as a memoryless process. Furthermore, we conduct a series of simulations to evaluate, in practice, the above mentioned metrics. Our results show a very close correlation between the analytical and the experimental results.


Introduction
Random walks are a mechanism to route messages through a network. At each hop of the random walk, the node holding the message forwards it to some neighbor chosen uniformly at random. Random walks have interesting properties: they produce little overhead and network nodes require only local information to route messages. In turn, this makes random walks resilient to changes on the network structure. Thanks to these features, random walks are useful for different applications, like routing, searching, sampling and self-stabilization in diverse distributed systems such as Peer-to-Peer (P2P) and wireless networks [1][2][3][4][5][6][7][8][9][10].
Past works have addressed the study of random walks. Some of this research has focused on the coverage problem, trying to find bounds for the expected number of hops taken by a random walk to visit all vertices (nodes) in a graph 1 G (C G ) [11][12][13][14]. Results vary from the optimal C G of complete graphs Θ(n log n) [11] (where n is the number of vertices) to the worst case found in the lollipop graph Θ(n 3 ) [15]. Barnes and Feige in [16] generalize this bound to the expected number of hops to cover a fraction ( f < n) of the vertices of the network, which they found is Θ( f 3 ). Other works, for example, are devoted to find bounds on the expected number of steps before a given node j is visited starting from node i (H i, j ). For example, it is known that the upper bound for H i, j is Θ(n 3 ) [17]. Many of these results are based on the study of the properties of the transition matrix P and adjacency matrix A in spectral form [18].
The previous results are used in several works to discuss the properties of random walks in communication networks. Gkantsidis et al. [19] apply them to argue that random walks can simulate random sampling on P2P networks, a property that in their opinion justifies the 'success of the random walk method' when proposed as a search tool [3] or as a network constructing method [9]. Adamic et al. [20] study the search process by random walks in power-law networks applying the generating function formalism. This work seems deeply inspired by a previous contribution of Newman et al. [21], who study the properties (mean component size, giant component size, etc.) of random graphs with arbitrary degree distribution. This paper introduces a study of random walks from a different perspective. It does not study the formal bounds in the amount of hops to cover the network. Instead, it tries to estimate the efficiency of the random walk as a search mechanism in communications networks, applying network queuing theory. It takes into account the bounded processing capacities of the nodes of the network and the load introduced by the search messages, that are routed using random walks. To obtain this load, we 1 The term time to refer to the number of hops of the random walk (that is, its length) is usual in many previous works. Thus, for example, C G is often denoted the cover time. However, in this work we will use the term time to refer to the duration of the random walk. To avoid confusion, from now on the term time will only denote the physical magnitude. need to estimate first the average search length, which in turn is computed from the expected average coverage: the average number of different nodes covered at each hop of the random walk. A distinguishing feature of our work is that, as in the case of Adamic et al. [20], it deals with a scenario that has not been very exhaustively explored although, in our opinion, is quite interesting in the communications field: one-hop replication networks.
One-hop Replication One-hop replication networks (also called lookahead networks [22]) are networks where each node knows the identity of its neighbors and so it can reply on their behalf. Hence, to find a certain node by a random walk it suffices to visit any of its neighbors. This feature is present for example in social networks, where to find some person it is usually enough to locate any of her/his friends [20]. Also, certain proposals to improve the resource location process on P2P systems [2,23] (some based on random walks) assume that each node knows the resources held by its neighbors, so to discover some resource (such as a file or a service) it suffices to visit any of the neighbors of the node(s) holding it.
In one-hop replication networks, when the random walk visits some node i we say it also discovers the neighbors of i. Hence, we will use two different terms to refer to the coverage of the random walk. We denote by visited nodes those that have been traversed by the random walk, and by covered nodes the visited nodes and their neighbors. See Figure 1 for an illustrative example.

Previous Work and the Revisiting Effect
There is some research work related with the characterization of random walks in one-hop replication networks. In [24] the authors prove that in the power-law random graph the amount of hops for a random walk to discover the graph is sublinear (faster than coupon collection, with which the random walk is compared in [19]). Also, Manku et al. [22] study the impact of lookahead on P2P systems where searches are routed through greedy mechanisms. In another work, Adamic et al. [20] try to find analytical expressions for C G the cover time of a random walk in power-law networks with two-hops replication. They detected divergences between the analytical predictions and the experimental results. The reason for such discrepancy, as the authors point out, is the revisiting effect, which occurs when a node is visited more than once. In smallworld networks, where a small number of nodes are connected to other nodes far more often than the rest, it is quite common for random walks to visit often these highly connected nodes.

Our Contributions
Although there is a plethora of interesting results about random walks, we have noticed that there are situations where current findings are not straightforward to apply, especially on communication networks with one-hop replication. For example, in such networks, we can be interested on studying beforehand the expected behavior of the random walk to evaluate if it suits the system requirements. We characterize the random walk performance by four values: • The expected coverage. Given by the expected number of visited and covered nodes of each degree k at each hop l of the random walk. • The expected average search length. Expected length of searches in number of hops, assuming that the source and destination nodes of each search are chosen uniformly at random. Obtained from the coverage estimations. • The expected average search duration. Expected time to solve searches. Obtained from the average search length, given the processing capacity of each node and the load on the network due to queries. • The maximum load that can be injected to the network without overloading it.
In this work we provide a set of expressions that model the behavior of the random walk and give estimations for the three previous parameters. Our claim is that these expressions can be used as a mathematical tool to predict how random walks will perform on networks of arbitrary degree distribution. Then, we do not only address the coverage problem (i.e. to estimate the amount of nodes covered after each hop of the random walk), but we also apply queuing theory to model the response time of the system depending on the load. As we show, this approach allows to compute in advance important magnitudes, such the expected search duration or the maximum load that can be managed by the network before getting overloaded. Additionally, we find our model useful to study how certain features of the network impact on the performance of searches. For example we find that the best average search time is achieved only if the nodes with higher degrees have also greater processing capacities.
The expressions related with the estimation of covered nodes at each hop are the most complex part of the model. They must deal both with the one-hop replication feature and the revisiting effect. However, we should remark that the model can be trivially adapted to networks where the one-hop replication property does not hold, and the search finishes only when the node we are searching for is found (see the last paragraph in Section 2.4).
Likewise, it is easy to modify the model to a variation of the random walk where each node avoids sending back the message to the node it received it from at the previous hop. We denote this routing mechanism avoiding random walks, and we deem it interesting for two reasons. First, intuitively, it should improve the random walk coverage (we have confirmed this experimentally). Second, it can be implemented in real systems using only local information, just as the pure random walk (the sending node only needs to know from which neighbor the message came from). A feature of our proposal is that it does not require the complete adjacency matrix A, that in some situations could be unknown. Instead, thanks to the randomness assumption we apply it only needs the degree distribution of the network to compute the metrics we are interested in. On the other hand, this work is focused on networks with good connectivity and where the nodes degrees are independent (see Section 2.1).
Another property of this model is that it takes into account the revisiting effect by modeling the coverage of the random walk at each hop l depending on the coverage at the previous hop l − 1. That is, the evolution of the coverage is not assumed to be a memoryless process, a simplification that can lead to errors as seen in [20].
The rest of the paper is organized as follows. Section 2 introduces our analysis of the coverage and average search length of random walks, along with some experimental evaluation. Section 3 is centered on obtaining the average search time of random walks. Finally, in Section 4, we state our conclusions and propose some potential future work.

Analysis of Random Walks
In this section, we analyze the behavior of random walks in arbitrary networks.

Model and Assumptions
We will represent networks by means of undirected graphs G = (V, E), where vertices V represent the nodes and edges E ⊆ V ×V are the links between nodes. There are no links connecting a vertex to itself, or multiple edges between the same two vertices. This does not simplify our model, but makes it closer to real scenarios like typical P2P networks. We denote by |V| = n the number of nodes in the graph and by n k the number of nodes that have degree k (i.e., the number of nodes that have k neighbors, k kn k = 2|E|). For all vertices its degree k is lower than the size of the network n, as in typical real world networks (such as social and pure P2P networks) each node is connected to only a subset of the other vertices in the system 2 . We also denote by p k the probability that some node in the network, chosen uniformly at random, has degree k (i.e., p k = n k /n). The average degree of a network is given by k = k k p k . For a given network, the distribution formed by the probabilities p k (for all k) is known as the degree distribution of such a network.
A random walk over G can be defined as a Markov Chain [15] process M G where the transition matrix P = [P i j ] is defined as: where P i j is the probability of moving from node i to node j, and d(i) is the degree of node i. P allows to study the probability of visiting each node at each hop l. This probability is expressed in the state probability vector, q l = (q l 1 , q l 2 , ..., q l n ), where q l i represents the probability that the random walk visits node i at hop l. This probability evolves as q l = q l−1 P.
Assuming that G is connected and finite, then M G is irreducible: any node can be reached from any other node, and the average path length between two any nodes is finite. Assuming also that G is non-bipartite, then we can state that M G is aperiodic and so we are able to apply the Fundamental Theorem of Markov Chains [15]. This theorem states that in such graph M G is ergodic an exists an unique state probability distribution π, denoted the stationary distribution, such that πP = π, π = (π 1 , π 2 , ..., π n ), where π i is: Intuitively, π represents the steady state of M G . That is, π i represents the probability that the node i is visited at any hop of the random walk once the stationary distribution has been reached. This probability is proportional to the degree of i, d(i).

Mixing Rate and Conductance
We are interested on how fast the random walk converges to π, a magnitude that is called the mixing rate [18]. We require a fast convergence in order to be able to apply Equation 6.
The convergence rate is related with the eigenvalues of the transition matrix P. A vector x is an eigenvector of P with eigenvalue λ iff xP = λ x (so for example π is an eigenvector of P with eigenvalue 1). It is well known [18] that P has n real eigenvalues λ 0 = 1 > λ 1 ≥ ... ≥ λ n−1 ≥ −1 (and in fact, if G is non-bipartite then λ n−1 > −1). It is also known [25] that the convergence rate to π is governed by the second largest eigenvalue modulus of P, max{λ 1 , |λ n−1 |}. In most real world networks we can safely assume that λ 1 > |λ n−1 | [18,19,25]. The following holds for a random walk starting at node i [18]: where P (l) i is the distribution of the state of the random walk at hop l, when i is the initial state. Thus, we can expect a fast mixing for high values of the spectral gap 1 − λ 1 . Now, the λ 1 value is strongly related with the conductance of the network, Φ G . Informally, the conductance measures how well 'connected' the graph is. It is defined as follows. For S ⊆ V, the cutset of S , C(S ), is the set of edges with one endpoint in S and the other endpoint inS . The volume of S , vol(S ), is defined as the sum of degrees of the nodes in S , i.e., vol(S ) = i∈S d(i). Then the conductance of G is computed as: The relationship between the conductance and the convergence is given by the following expression (Cheeger's inequality) [18]: So a good conductance leads to high mixing rates, that is, the random walk state will converge quickly to the stationary distribution π. The intuition behind this fact is that in graphs with good conductance the random walk will be able to move to any region of the graph easily, whichever the origin node, and so it will evolve quickly to the equilibrium. We reason that high connectivity is to be expected in many real world networks (specially communication networks) and network models [26][27][28].
Therefore, we can assume that the probability that the node visited by the random walk has degree k at each hop of the random walk, P(k), is also proportional to k and can be computed as: We will apply Equation 6 intensively for our analysis of the coverage. Of course, its correctness depends on the distance of the random walk to the stationary distribution, or how fast it converges to it. Another issue to be taken into account is the possible dependencies between successive steps of the random walk. Our analysis estimates the average number of nodes visited and covered by the random walk at a certain hop from the values estimated at the previous hop. The new estimation is done assuming that the random walk has statistical properties similar to the random sampling of nodes where the probability of choosing a certain node is proportional to k i , despite the apparent dependencies between consecutive hops.
Also, the work by Gkantsidis et al. [19] shows the similarities between independent sampling and random walks, that we assume for our mean based analysis. As the authors state, in networks with good connectivity and expansion properties (which are strongly related to λ 1 ) the random walk has a behavior close to independent sampling, being the probability of choosing some node proportional to its degree.
Besides, we have performed some experiments to verify the correctness of this hypothesis. The results, shown in Figure 2 confirm it is a valid assumption. Also, we would like to remark that the property expressed by Equation 6 is in fact assumed in previous works about random walks (e.g., [20,21]) and backed by [19].
Another important issue we have tested is how 'fast' the random walk evolves to a state where the assumption of Eq. 6 holds. Figure 3 shows how the random behaves. It can be seen that, almost immediately after hop 0 (start node), the probability of reaching a node of degree k is P(k).
We should note that the good conductance property, that implies that the random walk can move from any node to any other node in few steps, discards some topologies such as cycles.

Independence of Nodes Degrees
Finally, we assume that the degrees of neighbors are independent. That is, given any two connected nodes i and j ((i, j) ∈ E) and any two degree values k 1 and k 2 , then This property holds in networks built by random mechanisms, like the ones used to built the ER and small-world networks we target in our experiments. To confirm that the degree independence assumption is valid we have run some experiments, (a) Erdos-Renyi networks.

Fig. 2.
In these figures, we show the probability of a search message arriving at a particular node as a function of its degree. We have used both Erdos-Renyi and small-world (power-law) networks formed by 50, 000 nodes, with different average node degrees (10, 20 and 30). The same experiments have been performed with networks formed by 25, 000 and 100, 000 nodes, and we found similar results. As it can be readily seen, the probability of a search message arriving at a particular node is proportional to the degree of the node.   whose results are shown in Figure 4. These experiments aim to measure if the probability of reaching a node of degree k when following a random walk is affected by the degree k ′ of the node the random walk was in the previous hop (P(k/k ′ )). Our results lead to the conclusion that ∀k, k ′ P(k/k ′ ) = P(k), that is, k ′ does not have an impact on k.  Probability Degree of node the rw comes from, k'

P(10) P(10/k') P(20) P(20/k') P(30) P(30/k') P(40) P(40/k')
(b) Small-world network, k = 10. Fig. 4. These figures compare the probability P(k) of reaching a node of degree k as defined by the model, with the measured probability of reaching a node of degree k given that the rw comes from a node of degree k ′ , P(k/k ′ ). Both for ER and small-world networks the experimental results are averaged over three different networks with the same average degree and size (n = 10 5 ).
We should note also that this property is not fulfilled in certain graphs like those built by preferential mechanisms where it is well-known that there is a correlation among neighbors degrees [29]. This could lead to certain deviations in mean-based analysis of the random walk (as our own).
In the following, we study how many different nodes are visited by a random walk as a function of its length (i.e., of the number of steps taken) and of the degree distribution of the chosen network. Subsequently, we extend this result to also consider the neighbors of the visited node. These metrics allow us to quantify how much of a network is being "known" throughout a random walk progress. Then, we turn our attention to provide an estimation of the average search length of a random walk. In the last subsection, we validate our analytical results by means of simulations. We assume that only the degree distribution p k and the size n = |V| of the network are known.

Number of Visited Nodes
This metric represents the average number of different nodes that are visited by a random walk until hop l (inclusive), denoted by V l . Note that nodes may each be visited more than once, but revisits are not counted.
To obtain V l , we first calculate the average number of different nodes of degree k that are visited by a random walk until hop l (inclusive), denoted by V l k . We make a case analysis: • When l = 0 (i.e., in the source node): Since the source node of the random walk is chosen uniformly at random, then the probability of starting a random walk at a node of degree k is p k . Therefore, • When l = 1 (i.e., at the first hop): Here we apply that the probability of visiting some node of degree k at any hop is given by P(k) ( Equation 6). This is based on the assumption that the random walk behaves similarly to independent sampling despite dependencies between consecutive hops (based on [19], see Section 2.1). We deem this premise to be reasonable even at the first stages of the random walk, due to the high mixing rates found in the type of networks on which we focus our work (again, see Section 2.1). Recall that the experimental evaluation both of this assumption (Fig. 2) and of our model (shown in Section 2.5), seem to verify this. Thus, we have that • When l > 1: we must take into account the probability of the random walk arriving at an already visited node. To compute such a probability, we define the following two values: · P v (k, l): This represents the probability that, if the random walk arrives at a node of degree k at hop l, that node has been visited before. It can be obtained as follows: Note that we put V l−2 k instead of V l−1 k because the node visited at hop l − 1 can not be visited at hop l (no vertex is connected to itself). · P b : This is the probability that at any given hop the random walk is moving back to the node where it came from 3 . Since any visited node has degree k with probability P(k), then the random walk will go back through the same link from which it came with probability 1/k. Therefore, we have: Using these probabilities, V l k can be written as Finally, taking the results obtained in Equations 7, 8 and 11, we have that the total number of different nodes visited until hop l is

Number of Covered Nodes
This metric provides an estimation of the average number of different nodes covered by a random walk until hop l (inclusive), denoted by C l . A node is covered by a random walk if such a node, or any of its neighbors, has been visited by the random walk.
To obtain C l , we first calculate the number of different nodes of degree k covered at hop l, denoted by C l k .
• When l = 0: The first term takes into account the possibility that the source node has degree k. The second term refers to the number of neighboring nodes (of the source node) of degree k. If the source node has degree j (which happens with probability p j ) then, on average, j P(k) nodes of degree k will be covered, since each one of the j neighboring nodes of the source node will have degree k with probability P(k).
• When l > 0: Given a link ( , ) ∈ E, we say that it has two endpoints, which are the two ends of the link. We denote the endpoint of the link at node by ( ), and similarly the endpoint of the link at node by ( ). We say that ( ) hooks onto node . We also say that ( ) has been checked by a random walk if such a random walk has visited node . These concepts are graphically explained in Fig. 5. Now, let us denote by E l the number of endpoints checked for the first time at hop l, and by P u (k, l) the probability that these endpoints hook onto still uncovered nodes of degree k. Then, C l k (where l > 0) can be written as follows: · To obtain E l , we consider the number of different endpoints checked after hop l to be j jV l j . So, the number of endpoints checked for the first time at hop l is However, one of the endpoints hooks onto the node the random walk comes from (i.e., it cannot increase the amount of nodes that are covered). Thus: · To obtain P u (k, l), on one hand we consider the overall number of endpoints hooking onto uncovered nodes of degree k just before hop l is k(n k − C l−1 k ). On the other hand, the overall number of endpoints is j j n j , and the overall number of checked endpoints until hop l − 1 (inclusive) is j j V l−1 j . That is, the number of endpoints not checked just before hop l is j j n j − j j V l−1 j . Therefore, we can write: Substituting Equation 15 and 16 into Equation 14, we have that Finally, taking into account Equations 13 and 17, we have that the total number of (a) Erdos-Renyi network.
(b) Small-world network. nodes covered after hop l is

Average Search Length
Using the previous metric, we are now able to provide an estimation of the average search length of random walks, denoted by l. Formally, l is given by the following expression: where P f (l) is the probability that the search finishes at hop l (i.e., the probability that the search is successful at hop l, having failed during the previous l − 1 hops). Let us define the probability of success at hop l, denoted by P s (l), as the probability of finding, at that hop, the node we are searching for. P s (l) can be obtained as the relation between the number of new nodes that will be covered at hop l, and the number of nodes that are still uncovered at hop l. That is, Now, P f (l) can be obtained as follows: Therefore, l can be written as

Experimental Evaluation
We have run a set of experiments to evaluate the accuracy of the expressions presented in the previous subsections. The results obtained are presented in this section.
For our work, we consider two kinds of network: small-world networks (constructed as in [21]) and Erdos-Renyi networks (constructed as in [30]).
• Small-world networks [21,31]. In [32] it is shown that many real world networks present an interesting feature: each node can be reached from any other node in few hops. These networks are typically denoted small-world networks. The Internet, the Web, the Science collaboration graph, etc. are examples of real world networks that are consistent with this property. This kind of networks are also specially interesting for our work because here the revisiting effect commented in Section 1 is strongly present due to the uneven degree distribution. We build small-world networks using the mechanism described in [21], which leads to networks whose degree distribution follows a power-law distribution p k ∼ k −α (power-law networks). • Erdos-Renyi (ER) random networks [30]. For two any nodes i, j ∈ V there is a constant probability c that they are connected ((i, j) ∈ E). The resulting degree distribution is a binomial distribution p k ∼ n k c k (1 − c) n−k .
See Figure 6 for an illustrative example of both kinds of networks.

Number of Visited and Covered Nodes
Our first goal is to study the evolution of the network coverage by random walks in real networks.
The experiments were run on networks of two sizes, n = 5 · 10 4 and n = 10 5 nodes. Networks were built using three different average degrees: k = 10, k = 20 and k = 30. In each network we ran 10 4 random walks of length n = |V|. The source node of each random walk was chosen uniformly at random. From the experiments, we obtained the average number of visited and covered nodes for each degree k at each hop l. Finally, for each network, we extracted its degree distribution n k and apply the expressions described in the previous section to get a prediction of those values, given by V l k and C l k . Results are shown in Figures 7, 8, 9, and 10. For the sake of clarity, the experimental results are shown every 2000 hops in all figures. Model predictions, on the other hand, are drawn as lines. Figure 7(a) shows the evolution of the number of visited nodes in ER and smallworld networks of size n = 5 · 10 4 nodes, with two different average degrees k = 10 and k = 30. We see that, although the length of the random walks is enough to potentially include all the nodes, only a fraction of them are visited. This happens because of the revisiting effect, and it is more evident when the number of hops increases, since the probability of revisiting grows with the number of hops. The revisiting effect is stronger in small-world networks than in random networks. The reason is the uneven distribution of the nodes degrees: there are some nodes with a very high degree that will be visited once and again by the random walk. Thus, the chances of finding new nodes at each hop are lowered faster in small-world networks than in ER networks. Also, we observe in Figure 7(a) that in networks of smaller k the revisiting effect is stronger. Finally, Figure 7(b) shows the impact of the network size n on the amount of visited nodes. As expected, a greater n implies a lesser number of revisits for the same number of hops. In all cases, the prediction V l of the total amount of different nodes visited is very close to the experimental results.
In Figure 8 we study the accuracy of the predictions of the amount of visited nodes of a particular degree k at each hop l, V l k . We draw the results and predictions of degrees k = k + 5 and k = k − 5, for k = 10, k = 20 and k = 30. Again, it can be seen that the model predictions fit very well with the experimental results, despite the revisits and the different behavior observed for different degrees. Figure 9 gives the results of the experiments run to study the coverage of the random walk. Figure 9(a) shows how the coverage grows faster in small-world networks than in ER networks for networks of the same average degree k. This contrasts with the amount of visited nodes, that behave in the opposite way (see previous paragraphs). The reason is the presence of well-connected nodes, that are quickly visited during the first hops of the random walk and increase considerably the coverage because of the high amount of neighbors they have. For example, after 4000 hops, the random walk has covered about half of the small-world network with k = 10, while in the ER network of the same k the random walk only has covered close to 30% of the nodes. Moreover, we can see that the network average degree has also an important impact on the coverage. In both kind of networks the coverage grows faster when the average degree is higher. Besides, we observe that the difference of the coverage for both networks decreases more quickly for a higher k.  for networks of different size and k. In addition, Figure 9(b) compares the results of the coverage for ER networks of different sizes and average degrees. As it could be expected, the networks of smaller size require less hops to be covered. We observe also that the average degree has an important influence on the coverage difference. The greater the average degree, the faster the coverage of both networks converges. In all cases, the C l values given by the model predict very well how the coverage behaves and evolves. Finally, we check the model accuracy for random walks that avoid the previous node, the avoiding random walk. As stated in Section 2.2, the avoiding random walk can be easily implemented by our model just by setting P b = 0 (see Equation 10). Results are shown in Figure 11. There we compare the coverage of pure and avoiding random walks in ER and small-world networks of size n = 10 5 nodes and average degree k = 10. Figure 11(a) confirms that, as expected, the avoiding random walk is able to visit a greater number of different nodes, as the revisiting effect is, to a certain degree, lessened. However, Figure 11(b) shows that this has little impact on the network coverage. We find that there is only a small increase on the amount of covered nodes when using avoiding random walks, for both kind of networks. Nonetheless, in all cases the V l and C l values given by the model are very close to real results.

Average Search Length
For the experiments regarding the average search length we used networks whose sizes ranged from 10 4 to 2 · 10 5 nodes. In each experiment we ran 10 4 searches, averaging the obtained results. At each search, two nodes (one corresponding to the source and the other to the destination) were chosen uniformly  at random. Starting from the source, a random walk traversed the network until the destination node was found (i.e., a neighbor of the destination is visited).
The first thing to note is that the average search length grows linearly with the network size in both ER and small-world networks. Besides, the average degree k has an important effect on the results. The bigger the k, the shortest the searches are. The reason is that a higher k implies that at each hop more nodes of the network are discovered. Also, it can be observed in Figure 12 that the average search length is greater in ER networks than in small-world networks. This can be explained if we take into account that random walks, on average, cover more nodes in small-world networks than in ER networks (see Figures 9).
As in the previous experiments, Figure 12 also shows that our experimental results regarding the average search length correspond very close to the analytical results that were obtained.
At this point, we would like to note that, given the assumptions we made in our analytical model, it seems that the very good match achieved with the experimental results could only occur if these assumptions are correct. As a matter of fact, we have verified, in practice (see Figs. 2 and 4), that the type of networks we consider in this paper, indeed, fulfill our assumptions.
On the other hand, it is clear that if we take into account networks that do not fulfill some of our assumptions, then a certain mismatch should be expected. For instance, networks built by preferential mechanisms are known not to preserve the independence of degrees of neighbors [29]. Therefore, we should not aim for a very close correspondence between analytical and experimental results. We have performed the same experiments we ran for random and small-world networks regarding the average search length, but this time with networks built using the preferential attachment mechanism proposed by Barabási [31]. Now, we have observed that, as expected, in preferential networks our experimental results do not correspond very close to the analytical results (see Fig. 13(a)). Instead, the model seems to be con-  sistently pessimistic. Also, the error continuously grows with the network size.
Finally, we have tested the model against Toroidal networks of different average degrees k = 10 (5 dimensions) and k = 16 (8 dimensions). Our intention is to analyze networks which are not random at all. Results, which are shown in Fig. 13(b), show a very clear mismatch among the results predicted by the model and the actual performance of the random walk.

Duration of Searches by Random Walks
In this section, we present the second part of our model. Here we provide useful expressions that allow to predict the performance of random walks as a search tool, which is the main goal of this work. These expressions rely on the same estimation of the average search length (like the one described in the previous section), that is combined with Queuing Theory [33]. As a result, given the processing capacities and degrees of nodes, we are able to compute two key values: • The load limit: the searches rate limit that the network can handle before saturation. • The average search time: the average time it takes to complete a search, given the global load.
Also, we show how these expressions can be used to analyze which features a network should have so random walks have a better performance (i.e., searches are solved in less time). In particular, we focus on studying the relationship between degree and capacity distributions, showing that the minimum search time is obtained when nodes of higher capacities are also those of higher degrees.
In our analysis, networks are assumed to be Jackson networks [33]: the arrival of new searches into the network follows a Poisson distribution and the service at each node is a Poisson process.

Searches Length and Load on Nodes
Our first step is to set the relationship between the average searches length and the system load. Each search is processed, on average, 1 + l times (once at the source node, and once at each step of the random walk). Using this, we can express the total load on all the nodes of the system, λ, as where γ is the load injected in the system by new searches, that we assume to be known. Note that λ is composed of the new generated searches (γ), plus the searches that move from one node to another, denoted by γ ′ . Hence, To compute the load on each particular node j, λ j , let us take into account that the probability that a random walk visits a node is proportional to the node's degree (see Section 2). This implies that, for each node j ∈ V, the load on node j due to search messages, denoted γ ′ j , is proportional to its degree k j . As a result, we have that there is a value τ such that γ ′ j = τ k j , for all j. Hence, γ ′ = j γ ′ j = τ d, where d is the sum of all degrees in the network (i.e., d = k n k k). Therefore, Assuming that all nodes generate approximately the same number of new searches (γ/n), we can compute the average load at node j as where the first term represents the load due to search messages, and the second term to the searches generated at node j. Note that any other search generation rate model can be implemented just by changing the term γ/n.

Average Search Duration
In order to obtain the average search duration, T r , we use Little's Law [33], which states that where r is the average number of resident searches in the network (i.e., searches that are waiting or being served), and γ is the average number of searches generated per unit of time (i.e., the arrival rate of searches). Observe that γ is assumed to be known. Hence, the challenge to compute T r is to obtain r. Let r j be the number of resident searches in node j. Then, r = j r j .
To obtain r j , we apply Little's Law again, this time individually to each node j: where T j r is the average search time at node j and λ j is the average load at node j, which includes both searches generated at node j and searches due to messages from other nodes. Next we use that, by Jackson's Theorem [34] (recall we assume the network to be a Jackson network), each node j can be analyzed as a single M/M/1 queue with Poisson arrival rate λ j and exponentially distributed service time with mean T j s (which can be computed from the node capacity, that we assume to be known). Then: where ρ j is the utilization rate and T j s is the average service time at node j. As ρ j = λ j T j s , we can write Once we have λ j and T j r , we can combine them to obtain That is, we have provided an expression that computes the average search time using the topology, the average service times of nodes, and the search arrival rate.

Load Limit
Implicitly, in our previous results it has been assumed that no node is overloaded (i.e., λ j < 1/T j s for all j). Otherwise, the network would never reach a stable state. Thus, a key value for any network is its load limit: the minimum search arrival rate (γ) that would overload the network, denoted by γ o . Clearly, γ o = min j {γ j o } being γ j o the minimum search arrival rate that would overload node j.
From Equation 26, we have that Also, since no node must be overloaded, it must be satisfied that Combining Equation 32 with Equation 33 we have that, for each j, the following  .
Therefore, the load limit for node j is and

Experimental Evaluation
Average Search Duration In this subsection, we present the results of a set of experiments addressed to evaluate, in practice, the accuracy of our model for the average search time. As in the previous experiments (Section 2.5), we conducted extensive simulations over ER and small-world networks. All networks are made up of 10 4 nodes.
In each experiment, nodes generate new searches following a Poisson process with rate γ/n, where γ is the global load on the network. When a node starts a search for a resource, it first checks whether it already knows that resource (i.e., if the node itself or any of its neighbors hold the resource). If so, the search ends successfully. Otherwise, a search message for the requested resource is created and sent to some neighbor node chosen uniformly at random. When a node receives a search message, it also verifies whether it knows the resource. If so, the search is finished. Otherwise, the search is again forwarded to another neighbor chosen uniformly at random. The experimental results are obtained by averaging the results that were obtained.
Capacities are assigned so that nodes with a higher degree are given a higher capacity. All nodes are assumed to have the same number of resources w = 10, 000. Each resource is held by one node, and all resources have the same probability of being chosen for search. The processing time at each node i follows an exponential distribution with an average service time computed as T i s = w k i /c i . This average is computed dividing the amount of resources checked for each search (the total amount of resources known, w(k + 1), minus the resources of the node the search message came from, w) by the node's capacity.
For each load, we measured the average search times experimentally for each network. Results are shown in Fig. 14. It can be seen that, as expected, the average search time always increases with the load, undergoing a higher growth when it approaches the maximum arrival rate. Furthermore, our experimental results show a very close correspondence with the analytical results that were obtained.

Load Limit
We have computed the γ o values for random and small-world networks with different average degrees. For each kind of network and average degree five networks were built with the capacity distribution presented in Table 1. Our goal was to observe the variation of the γ o for networks of the same type and k, and also to study the difference among the γ o values depending on the network kind and average degree.
Results, which are shown in Figure 15, differ for random and small-world networks.
The first thing to note is that small-world networks can handle a greater load than random networks.
Small-world networks present variations of the γ o values even for networks of the same average degree. Despite this variation, it is clear that the load limit tends to grow with the k. The reason is that a greater k implies a smaller global load for the same rate of queries injected to the system. Recall that the total load is given by (1 + l)γ (Equation 23) and that higher average degrees lead to lesser average searches lengths l (Figure 12(b)). Hence, it is possible to perform more queries before overloading the network.
Erdos-Renyi networks however behave in a very different manner. They present very little variations of the γ o values. And, more surprising, there is a small decrease of the load limit when the k grows. This contrasts with the behavior of small-world networks. As it is shown in Figure 12(a), larger average degrees imply smaller average searches lengths and so a smaller global load. However, the γ o that can be handled by the network does not change accordingly to this. The reason seems to be that in ER networks the load is more evenly distributed among nodes. This implies that low capacity nodes have to handle an important amount of searches. Besides, a greater average degree impacts on the average services times T s of these nodes, as they know, and so they have to process, more resources per search. Hence, these nodes keep being the bottleneck of the network despite the smaller average search length, preventing the system to be able to handle a greater load.
However, it is important to recall that these results are also due to the capacity distribution used, and how it was distributed among the nodes. In small-world networks, if we assign low capacities to high degree nodes we can expect them to become bottlenecks of the network that force small γ o values. In ER networks, adding more high capacity nodes could change the γ o tendency so it would grow with the average degree. Exploring all these phenomena is beyond the scope of this paper.

Optimal Relationship between Degree and Capacity Distributions
In this section we show that, when there is a full correlation between the capacity of a node (i.e., the number of searches a node can process per time unit) and its degree, this leads to a minimal value of the average search time T r .
Let us first state the relation we assume between the capacity c j and the average service time T j s of a node j. We assume that the first is a parameter that does not depend on the degree or the number of resources known by the node, and only depends on the processor and network connection speeds. We assume that the second is a strictly increasing function of the node's degree f (k j ). We assume that a node's service time is directly proportional to its degree and inversely proportional to its capacity as follows: Let us now consider a pair of nodes i, j ∈ V, such that k j > k i (so f (k j ) > f (k i )), and two possible positive capacities c 1 and c 2 , such that c 1 > c 2 . We show that, if no other degree or capacity assignment changes, having c j = c 1 and c i = c 2 gives a smaller average search time, T r , than the average search time T ′ r with reverse assignment c ′ j = c 2 and c ′ i = c 1 .
Using Eq. 37, we obtain the following possible average service times: in which T s,1 are the service times obtained with the first capacity assignment and T s,2 are the service times obtained with the second. From the above equations, we have Let λ i and λ j be the loads on i and j. Since k i < k j , then λ i < λ j . Hence, from this and Eq. 40, we find that λ i T i s,1 + λ j T j s,1 < λ i T i s,2 + λ j T j s,2 .
To compute the values T r and T ′ r , we use Eq. 31 (42) where r i and r j are obtained with the first capacity assignment and r ′ i and r ′ j with the second. Observe that r h remains the same for any node h that is neither i nor j, because its degree, load, and capacity are just the same for both cases. Hence, if r i + r j < r ′ i + r ′ j then T r < T ′ r .
Finally, applying Eqs. 39 and 41, we conclude that and hence T r < T ′ r .
This proves that, for a given degree distribution, the best performance will be obtained by assigning the largest capacities to the nodes with the largest degrees. Note that we have found a condition that is necessary in order to attain the minimum possible T r , once the degree distribution has been set. However, different degree distributions can obtain very different T r values.

Conclusions
In this paper, we have presented an analytical model that allows us to predict the behavior of random walks. Furthermore, we have also performed some experiments that confirm the correctness of our expressions.
Some work can be carried out to complement our results. For instance, several random walks can be used at the same time, a situation that could be used to further improve the efficiency of the search mechanism. These random walks could run independently or, in order to cover separated regions on the graphs, coordinate among them in some way.