# Researchers develop method to predict source of network diffusion

(Phys.org) -- In building network models, researchers have shown it’s possible to show how information moves from a source node to many and sometime all of the nodes in a network using available data and probability calculations. Not so easy is doing the reverse, i.e. finding the source after data has already diffused throughout a network. Building a model that could do so would have innumerable applications, ranging from tracing rumors on Twitter back to the original poster to discovering where an epidemic got its start. Now new research by a team at the École Polytechnique Fédérale de Lausanne in Switzerland has shown that using techniques similar to triangulation methods that can locate an individual phone from cell towers, it’s possible to predict the source in a network using limited data sets. The team, led by Pedro Pinto, has published its findings in the journal Physical Review Letters.

To find the physical location of a single cell phone to within a few city blocks, engineers look at data from just three cell towers within which the phone is located. By noting the time stamp on the incoming data, it’s possible to deduce, or triangulate, the likely position of the phone. Pinto el al used a similar technique to narrow down the source of data in a diffused network.

The idea they say is to look at the arrival times of data to a node, be it a cell tower, a village in Africa experiencing a cholera epidemic or finding the leader of a terrorist network. Nodes in any network can be associated by drawing lines between them. The way to trace back in time then, involves following the lines that are most likely to lead to the source. Of course while that sounds easy, figuring out which lines to follow back most certainly is not, especially when there is limited or missing information, or when a network is so large looking at every node becomes impossible. That’s where the techniques the team developed come in handy. They used arrival times and probabilistic equations to derive maximum likelihood estimations to help them guess which path to take at each node.

It seems to work. They applied their modeling technique to a cholera outbreak that occurred in Africa back in 2000 and achieved an error rate of less than four hops using data from just twenty percent of the communities involved, which is of course quite impressive. Unfortunately their techniques can only be applied under certain pure conditions, i.e. when there’s a single source, when there’s only one choice at each node, etc. but that doesn’t take away from what they’ve accomplished, likely the instigation of a whole new area of network research.

More information: Locating the Source of Diffusion in Large-Scale Networks, Phys. Rev. Lett. 109, 068702 (2012). DOI:10.1103/PhysRevLett.109.068702 (ArXiv preprint)

Abstract
How can we localize the source of diffusion in a complex network? Because of the tremendous size of many real networks—such as the internet or the human social graph—it is usually unfeasible to observe the state of all nodes in a network. We show that it is fundamentally possible to estimate the location of the source from measurements collected by sparsely placed observers. We present a strategy that is optimal for arbitrary trees, achieving maximum probability of correct localization. We describe efficient implementations with complexity O(Nα), where α=1 for arbitrary trees and α=3 for arbitrary graphs. In the context of several case studies, we determine how localization accuracy is affected by various system parameters, including the structure of the network, the density of observers, and the number of observed cascades.

Journal information: Physical Review Letters